Job doesnt get accelerated by aggregation reflection

vincent_mayer · June 20, 2023, 8:18am

I am performing this sql query

select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date ‘1998-12-01’ - interval ‘90’ day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;

This is the according job profile:
b1621d6f-4bc2-4532-b963-7a91f170bb9d.zip (23,2 KB)

I created one raw reflection and one aggregation reflection on the table lineitem. But only the raw reflection gets picked to accelerate the query.

This is the job profile for the raw reflection:
4abc01b1-b814-498c-87bf-7f579607517a.zip (17,2 KB)

This is the job profile for the aggregation reflection:
0d2a9c0c-0719-4de2-b4dc-710db7501bc7.zip (27,8 KB)

Can anyone help me so the sql query also gets accelerated by the aggregation reflection?
Thanks in advance!

balaji.ramaswamy · June 26, 2023, 10:29pm

@vincent_mayer I see in your query you have used the below dimensions

l_returnflag, l_linestatus,

But I see your Agg reflection has below dimensions

l_returnflag, l_shipdate

l_linestatus is missing,

Similarly, the Agg reflection has the below measures defined

l_quantity and l_extendedprice (SUM)

While the query is using the below easures

l_quantity, l_extendedprice, l_discount, l_extendedprice, l_tax

Also, 3 of the measures use Avg and this is not part of the reflection definition

Kindly, let me know if you have any questions

vincent_mayer · July 2, 2023, 1:45pm

@balaji.ramaswamy I did everything as you have described. But Dremio is still not using my agg reflections. Any idea why?

balaji.ramaswamy · July 2, 2023, 5:34pm

@vincent_mayer Let me review, can you please send me the latest profile after you have done the changes?

vincent_mayer · July 3, 2023, 7:10am

@balaji.ramaswamy This the job profile for the query I performed:
284c9173-f202-4787-898d-b404eeb0896d.zip (23,3 KB)
I have created one agg reflection for the dimensions l_returnflag and l_linestatus and one agg reflection for the measures SUM(l_extendedprice) and SUM(l_quantity).
37715891-11c4-4b2b-bc7d-192ff833c9b8.zip (19,1 KB)
b284b304-4819-4027-bbf2-d38f76a07832.zip (4,8 KB)

Thanks for your help!

balaji.ramaswamy · July 6, 2023, 5:38am

@vincent_mayer The dimensions and measures used in the query needs to be selected in the same Aggregation reflection. I still see some missing ones. will send over the SQL for you to create the right Agg reflection

Delete all the reflections on postgres.public.lineitem
create a VDS say in a space called “orders” and call the VDS “lineitem_vds” on lineitem that has the below columns in addition to the columns in lineitem

l_extendedprice * (1 - l_discount) as disc_price
l_extendedprice * (1 - l_discount) * (1 + l_tax) as charge

run the below SQL or choose the below dimension and measures on lineitem_vds and create an agg reflection

alter dataset orders.lineitem_vds CREATE AGGREGATE REFLECTION lineitem_vds_agg1
  using dimensions (l_returnflag,l_linestatus,l_shipdate) measures (l_quantity(sum, avg),
  l_extendedprice(sum, avg), l_discount(avg), disc_price(sum), charge(sum), l_orderkey(count))

Once the Agg reflection is created, run the below SQL

select
	l_returnflag,
	l_linestatus,
	sum(l_quantity) 										as sum_qty,
	sum(l_extendedprice) 									as sum_base_price,
	sum(l_extendedprice * (1 - l_discount)) 				as sum_disc_price,
	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) 	as sum_charge,
	avg(l_quantity) 										as avg_qty,
	avg(l_extendedprice) 									as avg_price,
	avg(l_discount) 										as avg_disc,
	count(*) 												as count_order
from
	lineitem
where
	l_shipdate <= date '1998-12-01' - interval '90' day
group by
	l_returnflag,
	l_linestatus
order by
	l_returnflag,
	l_linestatus;

vincent_mayer · July 6, 2023, 7:29am

balaji.ramaswamy:

alter dataset orders.lineitem_vds CREATE AGGREGATE REFLECTION lineitem_vds_agg1
  using dimensions (l_returnflag,l_linestatus,l_shipdate) measures (l_quantity(sum, avg),
  l_extendedprice(sum, avg), l_discount(avg), disc_price(sum), charge(sum), l_orderkey(count))

Thank you for the detailed explanation. Some of it I have already tried, e.g. dimensions and measures in on agg reflection.

I created everything as described but my query still doesnt get accelerated. Here is the job profile.
08fc3397-7b8d-4915-a94a-ed10c60e21e4.zip (14,2 KB)
Any idea why?

Just for my understanding. I thought reflections get used by Dremios optimizer even if they only partly satisfy the query. Is that right?

Benny_Chow · July 6, 2023, 5:53pm

@vincent_mayer Can you provide a verbose profile for the last one you sent above? This will include a lot more logging in the acceleration tab which will explain why your reflection isn’t matching.

vincent_mayer · July 7, 2023, 8:56am

@Benny_Chow how can I create a verbose profile?

Benny_Chow · July 7, 2023, 2:20pm

You need to enable this support option and re-run the query.

https://docs.dremio.com/software/jobs/raw-profile/analyzing-profiles/#considerations

vincent_mayer · July 8, 2023, 8:39am

@Benny_Chow Thanks for the link. This is the job profile with the verbose key activated.
8a518acd-ed94-435d-9f2b-cef120aa1ac1.zip (20,5 KB)

Benny_Chow · July 9, 2023, 6:32am

That’s right. Reflections can match into parts of the user query.

I looked at your verbose profile. The VDS orders.lineitem_vds and the agg reflection lineitem_vds_agg1 look fine to me. The reflection should match into your query.

I see you are on version 20 which is from Dec 2021. Many reflection matching improvements have been made since. Is it possible to upgrade to version 24?

If you don’t want to upgrade, you can just take your original query, save it as a VDS and enable a raw reflection on it. Running the original query should match with the reflection even if you don’t explicitly reference the VDS in the query.

vincent_mayer · July 9, 2023, 3:41pm

@Benny_Chow Thanks for your help.

Atm I can not upgrade to a new community version because I dont have access to a NAS System, HDFS or Data lake storage to store the data reflections. As far as I know you can not use the pdf storage for your data reflections as of community version 21.0.0. Please correct me if I am wrong.

I ended up doing what you have described. I created a raw reflection on my query to accelerate it.

Benny_Chow · July 9, 2023, 6:34pm

Got to hear the workaround works for you.

As for PDFS, reflections in Iceberg and unlimited splits do not support this. There’s so many performance benefits with these features which is why we no longer support PDFS.

Topic		Replies	Views
Query was NOT accelerated	14	1782	January 12, 2020
Acceleration is not used when WHERE [DATETIME] is in the query	6	1594	January 10, 2020
Reflection refresh is not accelerated if it involves a JOIN	3	1249	May 15, 2019
How to effectively accelerate specific query Apache Iceberg	7	59	September 26, 2024
Aggregate Reflection is not used with query having join between 2 tables	2	23	May 28, 2025

Job doesnt get accelerated by aggregation reflection

Related topics