I am facing error while reading Spark O/P directory(with _SUCCESS & Parquet file) in Dremio 3.1C version while it was working fine in 2.1.6 version. In 3.1 version it fails with ‘_SUCCESS is not a Parquet file (too small)’.
Please confirm if this is an anticipated change in current release?
Please confim, how Dremio will behave if parquet file doesn’t contain any record, file only have MetaData Info but no actual record. In such scenario below are my findings from server.log: Dremio 3.1: Results in NullPointerException Dremio 2.1.6: Able to capture MetaData from file but fails with ‘IndexOutOfBoundsException: Index: 0, Size: 0’
My question is shouldn’t Dremio able to read remaining parquet files from mapped location in Dataset and ignore file with no record? Please confirm.
_SUCCESS is not a Parquet file (too small) is a known error that will be fixed in our next patch. Meanwhile, you should be able to query the data directly without any issues, it’s only the preview mode that has this error.
Correct. If you are trying to add a new dataset, you ignore the error by just clicking the ‘Save’ button. Another option is instead of browsing to the file/folder in HDFS, you can just query it directly with SQL. Both have been confirmed to bypass the issue.
Thanks @anthony. It worked.
Can you please provide some insight in below scenario
Dremio unable to create HDFS source when directory contains one parquet file that doesn’t contain any record (only have MetaData Info but no actual record). In such scenario below can be seen in server.log: Dremio 3.1: Results in NullPointerException Dremio 2.1.6: Able to capture MetaData from file but fails with ‘IndexOutOfBoundsException: Index: 0, Size: 0’
My question is shouldn’t Dremio able to read remaining parquet files from mapped location in Dataset and ignore file with no record? Please confirm.