Do I understand it right that, having an incremental refresh policy on a physical dataset, Dremio will attempt to refresh reflections for derived datasets also incrementally?
If this is the case, then does Dremio support incremental reflections for derived datasets with unions?
The reason I’m asking is that, according to Dremio documentation, incremental reflections don’t work for derived datasets with joins.
I plan to have a derived dataset with a code like this:
SELECT f1, f2 FROM my_datasource
UNION ALL
SELECT f2, f1 FROM my_datasource
where my_datasource is a physical dataset with a raw reflection with incremental refresh policy.
No. I raised it separately as a support ticket and was told by Dremio support that this would not currently work. We had already version higher than 2.0 and I did not receive any updates on this since then.
I found that it bypasses the incremental reflection of the physical source and simply does a full refresh with a direct query to the source.
This definitely would be a huge improvement if it was fixed in the next release. There are numerous reasons why this is important.
For me, the top of the priority list is getting very large data sets into Dremio when they fail or they are too lengthy with a full update. There is really no other way to split up the extract into Dremio from that I can tell.
@muv@bjbhjsj@dealercrm we currently support incremental update for datasets that don’t include joins or unions – using full update is the expected behavior otherwise. Incremental update enhancement for such patterns is on our roadmap – but not currently slated towards any immediate release.
If you are an enterprise edition customer, we should work with your account executive so that we can better understand impact of this on your use cases and discuss priorities.
Any ideas on this? The only way I have been able to work around this is by dumping the very last union into a scratch table and creating a reflection on it (but this is far from elegant)
@akikax As mentioned earlier here, the approach you mentioned is probably best (else would require full update). The ability for incremental update on a VDS level is something being tracked for the future.