We are moving to Dremio for reading parquet files as it is a wonderful tool. That said
when running with the latest version 4.18 it will error when reading parquet files with zero rows.
I saw this was fixed in the 3.2 release notes. I tried running version 3.2 and the problem went away.
Are all Parquet files zero rows? The fix in 3.2 was to read a folder of Parquet files in which there is a mix of zero rows and non-zero rows Parquet files. We will still error out if all Parquet files are zero rows. Can you please confirm ? Also kindly send us the profile of the failed job
This happens when reading a folder that contains a mix of zero row and non-zero row parquet files.
It only takes 1 zero row parquet file to cause the problem.
Attached below is the profile you requested. Thanks for the help!
Wayne
The attached zipfile contains 2 parquet files. One empty and one with data.
I created the parquet files using Apache Drill with these CTAS statements
create table dfs.parquet.sample/20200426 as select ‘SampleUserName’ as name, 100 as memberid
create table dfs.parquet.sample/20200427 as select name,memberid from oracle.PRODUCTION.MEMBER where name = ‘does not exist’
In a clustered environment this will produce the error if I restart the coordinator. If I also restart the executors the problem will go away.
Procedure
Create the 1st file (The one containing data)
Import the dataset
Create the 2nd file
Restart the coordinator (Clustered environment)
Run a query on the sample folder (You will see the problem)
Hello @balaji.ramaswamy I’ve similar problem but with reflections. I’ve a raw reflection applied to VDS but when this VDS have 0 rows the reflection shows are unavailable and querys push downs to datasource
Are you seeing the reflection data has zero rows, we then mark it as invalid but if you know the source has data now, you should be able to refresh reflection and then run the query