Reflection and datasource scan

In Dremio, How Reflection gets stored? Are they kept all reflections on memory or use database for that or use disk ?

And How dremio refresh reflection and datasource? Does it perform complete scan or delta scan? If delta scan, Does it run that scanning on some time range?

@Ayush.goyal I would suggest going over the below documentations links and then come back with specific questions

Where are reflections stored
Data Reflections
Refresh reflection API


How internally dremio manage reflection when we create it. Does it use RocksDb to store reflection or it uses underlying filesystem or keep it in memory?

@Ayush.goyal Metadata about reflections are stored in RocksDB but the actual Parquet files are in the distributed storage mention in the above link

Can we change it from distribute storage to in-memory cache? Is dremio allows that?

@Ayush.goyal Currently this is not possible

@balaji.ramaswamy How reflection and data refresh happens? Does Dremio do full scan after mentioned time or do delta scan? and If it does delta scan how it decides delta value?

@Ayush.goyal Did you read the documentation link I sent about refreshing reflections, talks about full and incremental refreshes

Those reflection refresh incremental/full is per dataset wise. Can’t we do it per source.
and How change in table in mysql/s3 (like delete column) sync with the dremio(full scan or incremental)? Please refer doc for that also.

@Ayush.goyal incremental option is only at dataset level, there are a few restrictions

  • deletes/updates are not yet supported
  • joins are not yet supported

@balaji.ramaswamy I checked if we are changing data directly to the source then i can see the changes in dremio also. So to do that, does dremio perform full scan or incremental scan and where can i see that properties?

and what do you mean by joins are not yet supported?

@Ayush.goyal In the reflection definition if you choose incremental then only new records will be refreshed, please read restrictions on incremental reflections in above documentation link

If you have a Virtual Data Set and it is a join of 2 Physical datasets, then we cannot do incremental refresh on the VDS