I’m using hive external tables created on parquet files on s3 as my physical data source. Will there be any benefit of creating raw reflections on s3 in dremio?
Also, how is the reflection refreshed? Is it refreshed completely every interval or does it identify the changed records and only refreshes the changes?
You will see more performance benefits with aggregation reflections - https://docs.dremio.com/acceleration/creating-reflections.html#aggregation-reflections
The reflections are refreshed via a time interval or by using our REST API. It can either do a complete refresh or, if datasource permits, an incremental refresh - https://docs.dremio.com/acceleration/updating-reflections.html
Is there a way to create partial raw reflection? What I mean is, suppose I’ve a table with data from 2000 to 2018 and I only want to create reflection for 2018 data. Can I do that?
Yes an example is you would create a VDS with a query that applies that date filter (
select * from table where year = 2018) and create a raw reflection on that.