I am using the REST API to launch reflections but cannot find information on 2 features that would be practical for my use case:
- Can we trigger an incremental refresh on a physical dataset using the API ?
- Can this triggered refresh exclude automatically refreshing the dependant virtual datasets ?
I already launch virtual dataset reflection updates when I want to so this automatic refresh is not necessary.
Incremental is a property at the PDS level, so you can set it there
To disable automatic refresh, check never refresh, never expire at the PDS the VDS is built on via the UI
- The triggering of a PDS refresh once you have set it up is
See Dremio REST API Docs
- No, using this incremental refresh method, you cannot prevent the virtual datasets from refreshing as soon as the incremental refresh completes. You would need to cancel them before they proceed/complete.
I would like #2 as a feature certainly it would help. I think the reason why they force it is dependency management, timers, and reflection expiration settings all are variables.
I have seen there is some risk depending on which reflection expires when (or gets marked invalid by the scheduler) that the newer data from the incremental refresh will not be used if a VDS reflection exists that is marked ‘valid’ and that gets chosen to accelerate the VDS creation of a dependent VDS along the same query path. As a result, when building the VDS layer, I wait for all incremental refresh triggered refreshes to conclude, then recreate any downstream ‘final VDS’ reflections. I want to be sure they didn’t consider any stale VDS’s when they use an incremental refresh that has different expirations on the child VDS’s.
Hi @balaji.ramaswamy, @datocrats-org,
thank you both for your feedback and experience sharing.
I have been developing some scripts using the API similarly to what you describe, making sure to decouple the “core” PSD/VSD" from the run of the specific downstream “projects” VSDs…let see how it goes !