Unable to read parquet file generated by python

summersmd · June 27, 2019, 4:28pm

I have a parquet file generated by pyarrow.parquet saved to an S3 bucket. When I try to bring that file into Dremio as a dataset, I get the following error: Failed to read parquet footer for file //Dremio_poc/stm.parquet.gzip.

There is no entry in the jobs viewer for this activity so there is no way to export a profile or debug the issue.

summersmd · June 27, 2019, 4:51pm

Update: When I upload the same file to Dremio and specify it’s format as parquet, it opens without issues. Is there a problem with opening parquet files from an S3 source as a dataset?

ben · June 27, 2019, 10:20pm

@summersmd
What version of Dremio?
Can you share the file?

summersmd · July 1, 2019, 4:45pm

3.2.3
sample.parquet.zip (1.1 KB)

summersmd · July 18, 2019, 2:41pm

Have you had a chance to look at my parquet file?

ben · July 18, 2019, 11:09pm

@summersmd,
Apologies for the delay.

I uploaded this file to an S3 bucket and did not get any error, but I noticed that it is not, in fact, compressed, though the file extension indicates this. When you try to access/format a parquet file compressed with gzip or zip or something to that effect, Dremio will not be able to read the footer.

Perhaps you have some files that have the gzip extension but are not actually compressed whereas others are?

summersmd · July 19, 2019, 2:40pm

What is the best way to compress these files and why would it open via upload but not when retrieved from S3?

summersmd · July 22, 2019, 2:47pm

I have verified that a parquet file has been compressed correctly with gzip format but am still getting the same error from Dremio when trying to open it from an S3 bucket. Is there s specific compression format I should be using?

Topic		Replies	Views
Unable to read Parquet footer with file generated with turbodbc	12	8085	November 21, 2017
Parquet File error	6	6838	October 7, 2019
Able to read parquet file with parquet-tools, but not dremio	11	3948	August 15, 2019
Errors reading parquet files	4	1368	April 1, 2022
Dremio S3 Compatibilty Mode Unable to read parquet	7	1637	May 3, 2021

Unable to read parquet file generated by python

Related topics