Create Reflection on Id column, filter where Id = x. Reflection won't work. WTF?

I can’t understand how Reflection works. At first, I thought it was just like Index, but no. I already read the Dremio doc about Reflection!

Below is an example.

Create a Raw Reflection on the ShipmentId column on hive_source.test_iceberg.factorderrawscspos_reflection table

I run a simple query like below

select

\*

from hive_source.test_iceberg.factorderrawscspos_reflection

where

1 = 1

and ShipmentId = 3851793

;

But the plan said that Reflection is not used because it does not cover the query! WTF??? It’s literaly in the Where clause!

As I understand, Dremio will save a list of ShipmentId with the pointer to the corresponding row in the original table. ShipmentId will be saved as a Binary Tree data structure for fast searching. When I query for ShipmentId x, Dremio will search in the Tree for the value x, and the row that it points to, then return that row (like searching the Table of Contents in a book). So why does my query not use Reflection?

Please note that my table is very large (~ 1B of rows), the time is fast since it’s my 3rd query, my 1st query took 10s or something.

When you do a SELECT * you are selecting all the columns. However, your reflection has only one column ShipmentId.

To test if the reflection is picked up, change your query to:

SELECT ShipmentId
FROM   hive_source.test_iceberg.factorderrawscspos_reflection
WHERE  ShipmentId = 3851793;

Or, add the rest of the columns in the reflection (ideally, whichever columns you are actually interested in querying instead of SELECT *)

Yep, it works, but do I have to create a raw reflection on every column? I mean, I just want to optimize where the query is on 1 or 2 columns. Creating Raw Reflection on every columns is just like create an index on every column, which is not optimized.

If it’s the same as the Index, then why, in the case of “select * where” query, the index works in other SQL databases (mysql, sql server, postgresql,…), but reflection won’t work in Dremio case with the same query?

@quangbilly79 We have to look at the dashboards and see commonly queries columns. We also need to see of the queries are more aggregates then an agg reflection would be more effective. If your dashboards and reports say query 50 columns all the time and we have outliers then create a VDS on the 50 columns and create a reflection on the VDS. The outliers will go back to the base table but most of your queries (aim for 95%) will be fast, if there are specific columns we are always grouping by, create an agg reflection on those dimensions and select the measures the dashboards requires like SUM, MAX, MIN etc. Also another thing to look at is if the dashboard need all the data in the base table. Sometimes the table can have say 10 years of data but say 95% queries are hitting the last 2 years then your VDS can be on 50 columns with 2 years of data and queries using the reflection will be even faster as the reflection size will be much less, So all this depends on the nature of dashboards/reports.

Thanks, Bali

@balaji.ramaswamy Thanks, but could you tell me more about how the Reflection in Dremio works behind the scenes (how data is stored, the data structure,…)?

I mean, every open source DB engine like SQL Server, MySQL, Postgres,… there are many detailed documents on how the Index works, so that people know to optimize the index, which kind of query will take advantage of the index,… themself just by doing some researching.

I googled lots about Dremio Reflection, but still can’t understand how it truly works (like the data structure,…)

A simple problem like why

select * from tableX where colY = 1

When creating an Index on colY column on SQL Server, it helps the query run faster

But when creating a Reflection on colY column on Dremio, it won’t work

There are many other complex queries like merge, join,… I need to analyze. So I need to deeply understand how Reflection works behind the scenes.

Since Dremio has an open source version, I don’t think it’s much of a unshareable secret or something. If I have time, I can check the source code or sth. But please could you tell me more about Dremio Reflection? Please!