Raw Reflection Deltas

Hi there,

Is it possible to use SQL to compare reflections on the same datasource?

i.e. Compare Old/New Versions after a refresh.

Jonathan

@jonrmayer want to make sure we fully understand your ask. Could you share a basic example of what you are envisioning for a hypothetical dataset and a set of reflections?

I need to implement some form of Change Data Capture

With a standard Incremental Refresh - Dremio looks for a timestamp or an autoincrementing key to identify new records.

Unfortunately, the Data I am working with do not contain these options/ and there is no way to alter the underlying database schema.

Identifying Change through Reflection Deltas seems to be the best way forward.

Thanks,
Jonathan

Thanks for the details. A few thoughts:

  • You can experiment with querying different materializations of a given reflection. These are stored under <DREMIO_DISTRIBUTED_STORAGE_ROOT>/accelerator/ (specified using paths.dist in dremio.conf). However, by default Dremio checks/cleans-up previous materializations of reflections every 24 hours.
  • You can manually materialize the dataset using CREATE TABLE AS into Dremio’s $scratch space, and then run queries. $scratch space is stored under <DREMIO_DISTRIBUTED_STORAGE_ROOT>/scratch/. This way you manage any required cleanup manually. Here is a sample:
CREATE TABLE $scratch.my_table
AS select * from TPCH.lineitem
--
select count(*) from $scratch.my_table