select_from_grouping_view_not_accelerated.zip (51.5 KB)
select_from_grouping_view_accelerated_by_a_all_cols_in_reflections.zip (61.8 KB)
select_from_combined_and_group_accelerated.zip (68.0 KB)
vds_definitions.sql.zip (600 Bytes)
I am puzzled about the way Dremio is or is not accelerating queries on the same dataset for the same results, depending on how I am querying, or how the reflections are defined.
The source data is a parquet dataset stored in minio containing the following columns:
numberA: Integer idA: Integer numberB: Integer idB: Integer start_ts: DateAndTime end_ts: DateAndTime unused_int: Text unused_string: Text record_type: Text
I have added a VDS on top of this with an added column for partitioning (
DATE_TRUNC of start_ts). On top of this VDS, I have two RAW reflections with all but the
unused_* columns selected, partitioned on
day_date. One sorted on
numberA, and one sorted on
I have attached the view definitions and 3 profiles.
All 3 profiles gives the same resultset on the same underlying data.
Profile 1: select_from_grouping_view_not_accelerated: Querying through
v_unidirectional. Query is not accelerated
Profile 2: select_from_combined_and_group_accelerated: Same query as above, but querying directly on
v_combined, using the query defined in
v_unidirectional with an added WHERE clause. Query is accelerated by both RAW reflections
Profile 3: select_from_grouping_view_accelerated_by_a_all_cols_in_reflections: Querying through
v_unidirectional. Query is now accelerated by one of the RAW reflections (after including the two
unused_* columns in the raw reflections)
These queries are all fairly fast due to the size of the dataset. (Only 20 mio rows). In our actual setup, the difference between the queries from profile 1 and 2 is 25 minutes vs. 8 seconds.
Unfortunately, I can’t share profiles from this setup, so I have spent some time reproducing the issue on a mock dataset.
My questions are:
- Why is the query in profile 1 not accelerated when the query in profile 2 is?
- Why is the query in profile 3 accelerated when the query in profile 1 is not? The only difference is addition of unused columns to the RAW reflection. Our real dataset has more columns. All columns in the source dataset must be included for the query to be accelerated.
- Profile 1 and 2 both show the same number of input rows. How is this possible if the profile 1 query is not accelerated? The source data is not sorted by numberA
- How do I tell what makes Dremio choose not to accelerate in Profile 1?