Hi,
I’ve got a parquet dataset stored on S3 that is organised into folders with year /month/day structure (such as year=2018/month=10/day=15/)
I had only 3 root folders at the moment (year=2016 , year=2017 and year=2018).
Today I generated parquet files for new root folder year=2015.
It’s been many hours since the folder contains parquet files, however such folder seem to be completely ignored by Dremio :
Query
select * from mydataset where dir0=‘year=2015’ limit 50
returns nothing.
I’ve tried to refresh the dataset, restart Dremio, deactivate the accelerations, refresh the accelerations…
Any advise ? thanks
Which directory did you refresh? The directory that contains the partitions? Assuming “mydataset” is in “mysource” and contains the partition directories “year=2016”, “year=2017” that command would be
Hi @ben, I refreshed no directory. I refreshed the dataset, which is the root directory (not the partitions directories)
I’ve just did it again some minutes ago, and received the summary “Metadata for table ‘S3a.swn-sframe.raw_wtb_actions’ refreshed.” But still my query returns no row.
However, I’m currently suspecting that some parquet files having no rows are causing the issue.
I get an error in the logs
Caused by: java.lang.RuntimeException: Error in parquet reader (complex).
Message: Failure in setting up reader…
…
at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[na:1.8.0_181]
at java.util.ArrayList.get(ArrayList.java:433) ~[na:1.8.0_181]
at com.dremio.exec.store.parquet2.ParquetRowiseReader.setup(ParquetRowiseReader.java:260) ~[dremio-sabot-kernel-2.1.6-201809161906440178-edb5b4d.jar:2.1.6-201809161906440178-edb5b4d]
… 72 common frames omitted
It seems that Dremio parquet reader is not able to read parquet files with no rows… (my parquet files are generated by pandas 0.23.4 with pyarrow 0.11). I will remove those “empty” parquet files and see if it solves the issue.
@rajupillai, what are you trying to do and what is the specific error that you are getting? Can you post the output from Dremio’s {{server.log}} or the Dremio UI?