I’m using latest community version of Dremio (20.1.0).
Our datasource is S3.
Metadata expiration/refresh/query are all set to 6 hours.
I can observe the metadata_refresh.log and see the source for its recognized datasets are all being refreshed preiodically.
Our datasets are partitioned by “topic”/“YYYY”/“MM”/"dd"
So, while it seems that queries on “topic” are able to retrieve fresh data. Queries on “topic”/“YYYY”/“MM” do not, unless I issue ALTER PDS METADATA REFRESH manually
Data is stored inside parquet, and does include schema changes.
Server restarts daily, so it’s not a dead thread issue
Any help would be much advised.
@sheinbergon Expiry should never be equal to refresh/query, it looks like the datasets are expiring too soon. Can you change expiry to 18 hour and see?
If I have refresh set to 6 hours and expiration set to 9 hours, would that be OK?
@sheinbergon If for some reason one of your metadata refreshes fails then it ill expire, so better to atleast give slightly more than 2x
@balaji.ramaswamy Thank you for following up.
It’s still happening. I’ve lowered fetch to 3 hours and expire is now 9.
I still have a feeling it relates to the fact one of the data set is deeply nested within the other.
I will see how things behave and will let you know.
@sheinbergon Can you also attach the metadata_refresh.log from the coordinator log folder? Also a few from previous days, should be under log/archive