I have a parquet file generated by pyarrow.parquet saved to an S3 bucket. When I try to bring that file into Dremio as a dataset, I get the following error: Failed to read parquet footer for file //Dremio_poc/stm.parquet.gzip.
There is no entry in the jobs viewer for this activity so there is no way to export a profile or debug the issue.
Update: When I upload the same file to Dremio and specify it’s format as parquet, it opens without issues. Is there a problem with opening parquet files from an S3 source as a dataset?
What version of Dremio?
Can you share the file?
Have you had a chance to look at my parquet file?
Apologies for the delay.
I uploaded this file to an S3 bucket and did not get any error, but I noticed that it is not, in fact, compressed, though the file extension indicates this. When you try to access/format a parquet file compressed with gzip or zip or something to that effect, Dremio will not be able to read the footer.
Perhaps you have some files that have the gzip extension but are not actually compressed whereas others are?
What is the best way to compress these files and why would it open via upload but not when retrieved from S3?
I have verified that a parquet file has been compressed correctly with gzip format but am still getting the same error from Dremio when trying to open it from an S3 bucket. Is there s specific compression format I should be using?