Dremio Parquet Reader

Monika_Goel · November 20, 2018, 8:07pm

Hi Team,

I am facing error while reading Spark O/P directory(with _SUCCESS & Parquet file) in Dremio 3.1C version while it was working fine in 2.1.6 version. In 3.1 version it fails with ‘_SUCCESS is not a Parquet file (too small)’.

Please confirm if this is an anticipated change in current release?

Please confim, how Dremio will behave if parquet file doesn’t contain any record, file only have MetaData Info but no actual record. In such scenario below are my findings from server.log:
Dremio 3.1: Results in NullPointerException
Dremio 2.1.6: Able to capture MetaData from file but fails with ‘IndexOutOfBoundsException: Index: 0, Size: 0’

My question is shouldn’t Dremio able to read remaining parquet files from mapped location in Dataset and ignore file with no record? Please confirm.

Thanks.

anthony · November 20, 2018, 8:14pm

_SUCCESS is not a Parquet file (too small) is a known error that will be fixed in our next patch. Meanwhile, you should be able to query the data directly without any issues, it’s only the preview mode that has this error.

Monika_Goel · November 20, 2018, 8:15pm

@anthony I am facing this error while creating source from UI in Dremio

anthony · November 20, 2018, 8:16pm

Correct. If you are trying to add a new dataset, you ignore the error by just clicking the ‘Save’ button. Another option is instead of browsing to the file/folder in HDFS, you can just query it directly with SQL. Both have been confirmed to bypass the issue.

Monika_Goel · November 20, 2018, 8:24pm

Thanks @anthony. It worked.
Can you please provide some insight in below scenario
Dremio unable to create HDFS source when directory contains one parquet file that doesn’t contain any record (only have MetaData Info but no actual record). In such scenario below can be seen in server.log:
Dremio 3.1: Results in NullPointerException
Dremio 2.1.6: Able to capture MetaData from file but fails with ‘IndexOutOfBoundsException: Index: 0, Size: 0’

My question is shouldn’t Dremio able to read remaining parquet files from mapped location in Dataset and ignore file with no record? Please confirm.

Topic		Replies	Views
Able to read parquet file with parquet-tools, but not dremio	11	3956	August 15, 2019
Error in parquet reader (complex)	2	957	December 26, 2022
Dremio iceberg fail to read parquet generated by itself Dremio University	1	1052	May 26, 2023
Regression in parquet reader in version 3.3.1	27	3196	October 1, 2019
Parquet metadata error - is Parquet v2.0 file format supported?	3	4094	January 28, 2019

Dremio Parquet Reader

Related topics