Reflection and datasource scan

Hi @balaji.ramaswamy ,

In Dremio, How Reflection gets stored? Are they kept all reflections on memory or use database for that or use disk ?

And How dremio refresh reflection and datasource? Does it perform complete scan or delta scan? If delta scan, Does it run that scanning on some time range?

@Ayush.goyal I would suggest going over the below documentations links and then come back with specific questions

Where are reflections stored
Data Reflections
Refresh reflection API

Thanks
Bali

Hi @balaji.ramaswamy ,

How internally dremio manage reflection when we create it. Does it use RocksDb to store reflection or it uses underlying filesystem or keep it in memory?

@Ayush.goyal Metadata about reflections are stored in RocksDB but the actual Parquet files are in the distributed storage mention in the above link

Hi @balaji.ramaswamy ,

Can we change it from distribute storage to in-memory cache? Is dremio allows that?

@Ayush.goyal Currently this is not possible

@balaji.ramaswamy How reflection and data refresh happens? Does Dremio do full scan after mentioned time or do delta scan? and If it does delta scan how it decides delta value?

@Ayush.goyal Did you read the documentation link I sent about refreshing reflections, talks about full and incremental refreshes

@balaji.ramaswamy ,

Those reflection refresh incremental/full is per dataset wise. Can’t we do it per source.
and How change in table in mysql/s3 (like delete column) sync with the dremio(full scan or incremental)? Please refer doc for that also.

@Ayush.goyal incremental option is only at dataset level, there are a few restrictions

  • deletes/updates are not yet supported
  • joins are not yet supported

@balaji.ramaswamy I checked if we are changing data directly to the source then i can see the changes in dremio also. So to do that, does dremio perform full scan or incremental scan and where can i see that properties?

and what do you mean by joins are not yet supported?

@Ayush.goyal In the reflection definition if you choose incremental then only new records will be refreshed, please read restrictions on incremental reflections in above documentation link

If you have a Virtual Data Set and it is a join of 2 Physical datasets, then we cannot do incremental refresh on the VDS