S3 Source Metadata Refresh for time bucket formatted PDS

We’re using Dremio CE 24.0.x to query parquet files stored on top of S3, and have set automatic PDS formatting on query and 6 hours metadata refresh schedule

We’ve noticed that PDS metadata refresh only works for PDS that doesn’t contain metadata.

For example, I have a bucket path some/path/to/dataset/YYYY/MM/DD/ containing multiple parquet files that are added throughout the day

We query both some/path/to/dataset and some/path/to/dataset/YYYY, but notice automatic refresh is only taking place for some/path/to/dataset. That means that the data under some/path/to/dataset/YYYY is not being refreshed until we refresh it manually using the ALTER PDS statement

metadata refresh logs indicating metadata refreshing taking place on time, and again, looks like paths queried above the date buckets are refreshed properly

Do you have any idea why could that be happening?

@balaji.ramaswamy

@sheinbergon

automatic PDS formatting has nothing to do with periodic metadata refreshes and addition of parquet files under the same folder. That option is only to automatically convert a folder to a PDS when a folder is queried

I assume you have promoted some/path/to/dataset and not some/path/to/dataset/YYYY

refreshes will happen only once a hour via background refresh (default can be changed) and will refresh everything under some/path/to/dataset

Is the purple grid on some/path/to/dataset or some/path/to/dataset/YYYY?