How does reflection work

nkz · September 8, 2024, 1:11pm

Hey,

I’m wondering how does reflections work internally ?

Like If I have some Data stored in Amazon S3 or GCS, does Dremio copy all the data in order to create the Raw Reflections ? Or does is just reads the data once, gets some metadata and uses it ?

balaji.ramaswamy · September 14, 2024, 12:48am

@nkz Few things,

Raw reflection on the entire table (with no filters) and selecting all columns will create a copy in your dist location defined for reflections
Every time you refreh it would try and do an incremental reflection whenever possible so you will not see multiple copies
Now, with that said, few recomendations
Always create a VDS on top of your iceberg table on only the columns that you often query and rows that you often query. For example, you may have historical data in the table but may only query last 2 years of data (from the dashboard) then create the VDS only on the last 2 years
The PDS may contain several columns but the dashboard may only query a specific set of columns, create the CVDS only on those columns
Create the raw reflection on the VDS
Now coming to raw reflections, if your query is using aggregates (which is usually the case on dashboard queries) then create agg reflections over raw reflections as that will be a focussed smaller foot print reflection and both creating the reflection will be fast and the query (dashboard) using the reflection will really benefit.
depending on the size of the dataset, sometimes creating a raw and then an agg would benefit, as the Agg reflection creation would be accelerated by the raw
Lastly, only refresh when needed as every refresh will take cpu/memory
Remember to create a separate engine and route reflection creations to that engine

Thanks
Bali

Topic		Replies	Views
Reflections on hive external tables	3	1331	December 28, 2018
Why use reflection on reading data from S3?	2	2757	September 15, 2018
Large Reflection creation, speed and performance	4	2240	April 16, 2019
Reflection and datasource scan	11	1132	May 21, 2021
Dremio not using refelections when dataset based on query, uses reflection when query is stored as view in DB	13	2274	January 6, 2020

How does reflection work

Related topics