Server.log usually contains errors for REST API calls. If there are no errors logged, one thing you can do is run SELECT * FROM source.folder which will autopromote the folder - the query profile (if it fails) will contain the error.
Note that you need to enable Automatically format files into physical datasets when users issue queries. for the Source in the Metadata section (the default behavior was changed so it may be turned on if it was created a while back).
I’ve got different directories with different parquet files.
Some directories contain 10 or less files. Others have subdirectories with 100s of parquet files.
They were all created using pyarrow and sized to be 128 megs or less
I came in this morning and the number of 504 errors have almost disappeared and when they do show up after I do a web page refresh it shows the directory in Purple so the format was still successful.
I’m basically migrating 3 terabytes of files from HDFS to S3.
The root problem was just stability with our S3 system, but now I’m seeing the same problem with JSON source files.
select * limit 10000 works fine, but select count(*) or any type of non-limiting query creates a GC and the server becomes unstable and needs to be recycled.
read-after-new-write (Default) Provides read-after-write consistency for new objects and eventual consistency for object updates. Offers high availability and data protection guarantees. Matches AWS S3 consistency guarantees.
Note: If your application uses HEAD requests on objects that do not exist, you might receive a high number of 500 Internal Server errors if one or more Storage Nodes are unavailable. To prevent these errors, set the consistency control to “available” unless you require consistency guarantees similar to AWS S3.
available (eventual consistency for HEAD operations) Behaves the same as the “read-after-new-write” consistency level, but only provides eventual consistency for HEAD operations. Offers higher availability for HEAD operations than “read-after-new-write” if Storage Nodes are unavailable. Differs from AWS S3 consistency guarantees for HEAD operations only.