We’re using Dremio CE 24.0.x to query parquet files stored on top of S3, and have set automatic PDS formatting on query and 6 hours metadata refresh schedule
We’ve noticed that PDS metadata refresh only works for PDS that doesn’t contain metadata.
For example, I have a bucket path some/path/to/dataset/YYYY/MM/DD/
containing multiple parquet files that are added throughout the day
We query both some/path/to/dataset
and some/path/to/dataset/YYYY
, but notice automatic refresh is only taking place for some/path/to/dataset
. That means that the data under some/path/to/dataset/YYYY
is not being refreshed until we refresh it manually using the ALTER PDS
statement
metadata refresh logs indicating metadata refreshing taking place on time, and again, looks like paths queried above the date buckets are refreshed properly
Do you have any idea why could that be happening?