Folder not taken into account in a parquet dataset

Hi,
I’ve got a parquet dataset stored on S3 that is organised into folders with year /month/day structure (such as year=2018/month=10/day=15/)

I had only 3 root folders at the moment (year=2016 , year=2017 and year=2018).

Today I generated parquet files for new root folder year=2015.

It’s been many hours since the folder contains parquet files, however such folder seem to be completely ignored by Dremio :
Query
select * from mydataset where dir0=‘year=2015’ limit 50
returns nothing.
:face_with_raised_eyebrow:

I’ve tried to refresh the dataset, restart Dremio, deactivate the accelerations, refresh the accelerations…
Any advise ? thanks

Hi @dfleckinger,

Which directory did you refresh? The directory that contains the partitions? Assuming “mydataset” is in “mysource” and contains the partition directories “year=2016”, “year=2017” that command would be

ALTER PDS mysource.mydataset REFRESH METADATA

Hi @ben, I refreshed no directory. I refreshed the dataset, which is the root directory (not the partitions directories)
I’ve just did it again some minutes ago, and received the summary “Metadata for table ‘S3a.swn-sframe.raw_wtb_actions’ refreshed.” But still my query returns no row.

However, I’m currently suspecting that some parquet files having no rows are causing the issue.
I get an error in the logs

Caused by: java.lang.RuntimeException: Error in parquet reader (complex).

Message: Failure in setting up reader…

at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[na:1.8.0_181]
at java.util.ArrayList.get(ArrayList.java:433) ~[na:1.8.0_181]
at com.dremio.exec.store.parquet2.ParquetRowiseReader.setup(ParquetRowiseReader.java:260) ~[dremio-sabot-kernel-2.1.6-201809161906440178-edb5b4d.jar:2.1.6-201809161906440178-edb5b4d]
… 72 common frames omitted

It seems that Dremio parquet reader is not able to read parquet files with no rows… (my parquet files are generated by pandas 0.23.4 with pyarrow 0.11). I will remove those “empty” parquet files and see if it solves the issue.

Try promoting (formatting) the individual partition directory for the year or the parquet file(s) themselves. Are you able to do that without error?

Yes, we also found it the hard way that empty files presented a problem and had to remove them periodically.

Hi @dfleckinger , did you figure it out ? I am facing same issue and I don’t have an empty file.

@rajupillai, what are you trying to do and what is the specific error that you are getting? Can you post the output from Dremio’s {{server.log}} or the Dremio UI?