504 - Gateway Time-out when formatting

Working on connecting our internal S3 StorageGrid to Kubernetes installed Dremio 4.0.2.

After choosing a directory containing parquet files it looks like it is working fine, but on save it gives a 504 - Gateway Time-out

Tried all kinds of settings:

fs.s3a.connection.maximum to 10000
fs.s3a.max.threads to 5000

Enable compatibility and asynchronous access is checked.

Is there a log file that contains debugging info?

Server.log usually contains errors for REST API calls. If there are no errors logged, one thing you can do is run SELECT * FROM source.folder which will autopromote the folder - the query profile (if it fails) will contain the error.

Note that you need to enable Automatically format files into physical datasets when users issue queries. for the Source in the Metadata section (the default behavior was changed so it may be turned on if it was created a while back).

Ok. Got some error info back running a Select * from source.folder…

Waited for 15000ms, but tasks for ‘Fetch parquet metadata’ are not complete. Total runnable size 11, parallelism 11.

Did you read from many parquet (e.g. the result of spark streaming) in a single file path or just a few of them?

I’ve got different directories with different parquet files.
Some directories contain 10 or less files. Others have subdirectories with 100s of parquet files.
They were all created using pyarrow and sized to be 128 megs or less

I came in this morning and the number of 504 errors have almost disappeared and when they do show up after I do a web page refresh it shows the directory in Purple so the format was still successful.

I’m basically migrating 3 terabytes of files from HDFS to S3.

Something is probably wrong with our S3 storage…

The root problem was just stability with our S3 system, but now I’m seeing the same problem with JSON source files.

select * limit 10000 works fine, but select count(*) or any type of non-limiting query creates a GC and the server becomes unstable and needs to be recycled.

2020-02-10T22:56:51.583+0000: [GC (Allocation Failure) [PSYoungGen: 1395008K->384K(1396224K)] 2561866K->1167290K(2983936K), 0.0078312 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2020-02-10T22:56:52.123+0000: [GC (Allocation Failure) [PSYoungGen: 1395072K->320K(1396224K)] 2561978K->1167298K(2983936K), 0.0088563 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2020-02-10T22:56:52.669+0000: [GC (Allocation Failure) [PSYoungGen: 1395008K->460K(1395200K)] 2561986K->1167479K(2982912K), 0.0091217 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

@david.lee

GC allocation Failures are fine can you check if you see “Full GC” during that time?

I did find the root problem of the 504 gateway timeouts… I had to change the consistency setting for my S3 bucket…

http://docs.netapp.com/sgws-111/index.jsp?topic=%2Fcom.netapp.doc.sg-s3%2FGUID-B48E07AA-B1F5-41E6-964C-81B599517A45.html

read-after-new-write (Default) Provides read-after-write consistency for new objects and eventual consistency for object updates. Offers high availability and data protection guarantees. Matches AWS S3 consistency guarantees.

Note: If your application uses HEAD requests on objects that do not exist, you might receive a high number of 500 Internal Server errors if one or more Storage Nodes are unavailable. To prevent these errors, set the consistency control to “available” unless you require consistency guarantees similar to AWS S3.

available (eventual consistency for HEAD operations) Behaves the same as the “read-after-new-write” consistency level, but only provides eventual consistency for HEAD operations. Offers higher availability for HEAD operations than “read-after-new-write” if Storage Nodes are unavailable. Differs from AWS S3 consistency guarantees for HEAD operations only.

Thanks for the update and the useful information @david.lee, glad it is working now