Advice on downstream reflections from INCREMENTAL physical

Saltxwater · March 11, 2021, 5:33pm

I’m testing out dremio’s applicability for our use case and would like some advice on how to setup our various datasets and reflections.

The source data are parquet files, one file per day per dataset, 2 or 3 datasets. (Currently we’re rebuilding these as we work out what we need / what format, etc. but long term this would be immutable append only).

I’ve set these as INCREMENTAL refresh but is this necessary or would dremio automatically detect this?
I’ve added raw reflections on these with partition field “date”.
~30,000,000 rows, 100Gb for 10 years data in one (PD1).
~60,000,000 rows, 20Gb for 1 year data in another (PD2)

We also have an MSSQL source. This data can be updated, but the tables themselves are only ~10,000 rows so not so concerned about these. (PD3)

Downstream from these physical datasets I’ve constructed some virtual datasets doing various joins between them, performing aggregations on rows.
On my initial testing of this, a raw reflection of this VDS is taking >1 hour joining PD1 with PD2 (and PD2 is going to get much larger as we get more years data in it!).
If this is just a one off when we make changes to the VDS then no problem… but is this reflection going to need to refresh every day (every time we add new daily files to PD1 and PD2)?
I assume this is the case because of the docs: “Reflections on virtual datasets that include joins cannot be incrementally updated. Dremio falls back to using full refresh for these datasets”
We would ultimately want more VDS’s downstream of this, joining with PD3…

What do you recommend for this case? How could I better structure our setup?

Slight aside, we’d also like PIVOT support but from what I’ve read that isn’t supported yet!

Thanks for your help!

Topic		Replies	Views
How to get “Identify new records using the field” appear? Dremio University	9	2952	October 17, 2019
Incremental refresh using API	3	1680	May 31, 2021
How to orchestrate reflection refresh incrementally in dremio-oss incase of overwriting partition in a datapipeline	2	590	October 25, 2023
Update partition of a reflection	19	2975	December 9, 2021
Incremental reflections with union	9	2708	December 4, 2019

Advice on downstream reflections from INCREMENTAL physical

Related topics