We are working with a NAS datasource (2 CSV files) which is periodically refreshed by external process.
It works fine on testing environment with Dremio as standalone node, but not on production environment with Dremio as cluster nodes : even if these 2 files are modified, the changes in NAS datasource are detected by Dremio few hours after the updates…
So what is the refresh policy defined to update existing NAS datasource ?
Have a look at this page if the docs:
What is your current setting doe your NAS source?
Yep, I have tested the 3 different fetch modes…
But maybe the minimal refresh policy is 1h after changes in physical datasource.
Currently, I use the default settings :
Dataset Handling :
Remove dataset definitions if underlying data is unavailable.
Fetch mode : Only Queried Datasets
Fetch every : 1 Hour(s)
Expire after : 3 Hour(s)
My concerns are about the difference between standalone mode vs cluster mode.
It seems than updates in physical datasource are detected sooner than expected with standalone mode…
Are you saying that in a standalone mode the new data is seen after one hour but in the case of a cluster?
In standalone mode, new data are seen before 1 hour but not in the case of a cluster mode
So I reduce the fetch every property to 5 minutes on production for the NAS source.
It seems to work.
Thanks for the help