Does Dremio Reflection help merge/insert-update/filter where query (just like Index)?

Hi, just moved from SQL Server to Dremio. And I have a question about Dremio performance with Reflection.

For example, in SQL Server, in an Insert/Update job, we create an Index on the lookup columns to make the process of searching for matching rows faster. Also, query with where like where index_col = xxx is also faster.

When I read the doc about Reflection of Dremio, it said that Reflection = Index. I wonder if it functions the same?

For example, if I create a Reflection/Index on the Id column of TableA/B, will the following queries be faster?

select * from TableA where Id <= 10;

or

merge into TableA using TableB on TableA.Id = TableB.Id

when matched then update set *

when not matched then insert *

When I read the doc, it seems like the Reflection feature only works with the Aggregate query?

Still don’t know if Reflection works the same with Index or not. I mean, some mechanic like stored a sorted HashMap of all the values in an indexed column somewhere.

For example, in the image above, can I just create “Aggregation“ Refections on 1 Dimension Column Id and hope that it helps my merge query?

Also, does Raw Reflections mean create Indexes on all Columns?

@quangbilly79 Reflection = Index i just a comparision to enhance performance, just like database indexes, too many reflections can also cause overhead during planning. With that said you would have to create reflections by selecting the most commonly used columns in your VDS’s. Create a VDS on the most commonly queries columns and columns that have FILTERS and create a raw reflection on the VDS (if there are not aggregations in the final query) If the final query is an aggregation query, create an aggregation reflection with measures being the aggregated columns and choose the measures like SUM, MAX, COUNT, MIN etc. The dimensions are the columns in the group by. The partition columns should be carefully chosen as they should not be on high cardinality columns like an ID field and something like a date filter that is ost commonly used in dash boards. SORT can be used on high cardinality columns example on non-partition columns that are used as FILTERS

Here is the best practices page

1 Like