Metadata Refresh not working as expected

I’m using latest community version of Dremio (20.1.0).
Our datasource is S3.

Metadata expiration/refresh/query are all set to 6 hours.
I can observe the metadata_refresh.log and see the source for its recognized datasets are all being refreshed preiodically.

Our datasets are partitioned by “topic”/“YYYY”/“MM”/"dd"

So, while it seems that queries on “topic” are able to retrieve fresh data. Queries on “topic”/“YYYY”/“MM” do not, unless I issue ALTER PDS METADATA REFRESH manually

Data is stored inside parquet, and does include schema changes.

Server restarts daily, so it’s not a dead thread issue

Any help would be much advised.

@sheinbergon Expiry should never be equal to refresh/query, it looks like the datasets are expiring too soon. Can you change expiry to 18 hour and see?

If I have refresh set to 6 hours and expiration set to 9 hours, would that be OK?

@sheinbergon If for some reason one of your metadata refreshes fails then it ill expire, so better to atleast give slightly more than 2x

@balaji.ramaswamy Thank you for following up.
It’s still happening. I’ve lowered fetch to 3 hours and expire is now 9.

I still have a feeling it relates to the fact one of the data set is deeply nested within the other.

I will see how things behave and will let you know.

@sheinbergon Can you also attach the metadata_refresh.log from the coordinator log folder? Also a few from previous days, should be under log/archive