Hi @balaji.ramaswamy ,
In Dremio, How Reflection gets stored? Are they kept all reflections on memory or use database for that or use disk ?
And How dremio refresh reflection and datasource? Does it perform complete scan or delta scan? If delta scan, Does it run that scanning on some time range?
@Ayush.goyal I would suggest going over the below documentations links and then come back with specific questions
Where are reflections stored
Data Reflections
Refresh reflection API
Thanks
Bali
Hi @balaji.ramaswamy ,
How internally dremio manage reflection when we create it. Does it use RocksDb to store reflection or it uses underlying filesystem or keep it in memory?
@Ayush.goyal Metadata about reflections are stored in RocksDB but the actual Parquet files are in the distributed storage mention in the above link
Hi @balaji.ramaswamy ,
Can we change it from distribute storage to in-memory cache? Is dremio allows that?
@Ayush.goyal Currently this is not possible
@balaji.ramaswamy How reflection and data refresh happens? Does Dremio do full scan after mentioned time or do delta scan? and If it does delta scan how it decides delta value?
@Ayush.goyal Did you read the documentation link I sent about refreshing reflections, talks about full and incremental refreshes
@balaji.ramaswamy ,
Those reflection refresh incremental/full is per dataset wise. Can’t we do it per source.
and How change in table in mysql/s3 (like delete column) sync with the dremio(full scan or incremental)? Please refer doc for that also.
@Ayush.goyal incremental option is only at dataset level, there are a few restrictions
- deletes/updates are not yet supported
- joins are not yet supported
@balaji.ramaswamy I checked if we are changing data directly to the source then i can see the changes in dremio also. So to do that, does dremio perform full scan or incremental scan and where can i see that properties?
and what do you mean by joins are not yet supported?
@Ayush.goyal In the reflection definition if you choose incremental then only new records will be refreshed, please read restrictions on incremental reflections in above documentation link
If you have a Virtual Data Set and it is a join of 2 Physical datasets, then we cannot do incremental refresh on the VDS