Job doesnt get accelerated by aggregation reflection

I am performing this sql query

select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date ‘1998-12-01’ - interval ‘90’ day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;

This is the according job profile:
b1621d6f-4bc2-4532-b963-7a91f170bb9d.zip (23,2 KB)

I created one raw reflection and one aggregation reflection on the table lineitem. But only the raw reflection gets picked to accelerate the query.

This is the job profile for the raw reflection:
4abc01b1-b814-498c-87bf-7f579607517a.zip (17,2 KB)

This is the job profile for the aggregation reflection:
0d2a9c0c-0719-4de2-b4dc-710db7501bc7.zip (27,8 KB)

Can anyone help me so the sql query also gets accelerated by the aggregation reflection?
Thanks in advance!

@vincent_mayer I see in your query you have used the below dimensions

l_returnflag, l_linestatus, 

But I see your Agg reflection has below dimensions

l_returnflag, l_shipdate

l_linestatus is missing,

Similarly, the Agg reflection has the below measures defined

l_quantity and l_extendedprice (SUM)

While the query is using the below easures

l_quantity, l_extendedprice, l_discount, l_extendedprice, l_tax

Also, 3 of the measures use Avg and this is not part of the reflection definition

Kindly, let me know if you have any questions

@balaji.ramaswamy I did everything as you have described. But Dremio is still not using my agg reflections. Any idea why?

@vincent_mayer Let me review, can you please send me the latest profile after you have done the changes?

@balaji.ramaswamy This the job profile for the query I performed:
284c9173-f202-4787-898d-b404eeb0896d.zip (23,3 KB)
I have created one agg reflection for the dimensions l_returnflag and l_linestatus and one agg reflection for the measures SUM(l_extendedprice) and SUM(l_quantity).
37715891-11c4-4b2b-bc7d-192ff833c9b8.zip (19,1 KB)
b284b304-4819-4027-bbf2-d38f76a07832.zip (4,8 KB)

Thanks for your help!

@vincent_mayer The dimensions and measures used in the query needs to be selected in the same Aggregation reflection. I still see some missing ones. will send over the SQL for you to create the right Agg reflection

  • Delete all the reflections on postgres.public.lineitem
  • create a VDS say in a space called “orders” and call the VDS “lineitem_vds” on lineitem that has the below columns in addition to the columns in lineitem
l_extendedprice * (1 - l_discount) as disc_price
l_extendedprice * (1 - l_discount) * (1 + l_tax) as charge
  • run the below SQL or choose the below dimension and measures on lineitem_vds and create an agg reflection
alter dataset orders.lineitem_vds CREATE AGGREGATE REFLECTION lineitem_vds_agg1
  using dimensions (l_returnflag,l_linestatus,l_shipdate) measures (l_quantity(sum, avg),
  l_extendedprice(sum, avg), l_discount(avg), disc_price(sum), charge(sum), l_orderkey(count))

Once the Agg reflection is created, run the below SQL

select
	l_returnflag,
	l_linestatus,
	sum(l_quantity) 										as sum_qty,
	sum(l_extendedprice) 									as sum_base_price,
	sum(l_extendedprice * (1 - l_discount)) 				as sum_disc_price,
	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) 	as sum_charge,
	avg(l_quantity) 										as avg_qty,
	avg(l_extendedprice) 									as avg_price,
	avg(l_discount) 										as avg_disc,
	count(*) 												as count_order
from
	lineitem
where
	l_shipdate <= date '1998-12-01' - interval '90' day
group by
	l_returnflag,
	l_linestatus
order by
	l_returnflag,
	l_linestatus;

Thank you for the detailed explanation. Some of it I have already tried, e.g. dimensions and measures in on agg reflection.

I created everything as described but my query still doesnt get accelerated. Here is the job profile.
08fc3397-7b8d-4915-a94a-ed10c60e21e4.zip (14,2 KB)
Any idea why?

Just for my understanding. I thought reflections get used by Dremios optimizer even if they only partly satisfy the query. Is that right?

@vincent_mayer Can you provide a verbose profile for the last one you sent above? This will include a lot more logging in the acceleration tab which will explain why your reflection isn’t matching.

@Benny_Chow how can I create a verbose profile?

You need to enable this support option and re-run the query.

https://docs.dremio.com/software/jobs/raw-profile/analyzing-profiles/#considerations

@Benny_Chow Thanks for the link. This is the job profile with the verbose key activated.
8a518acd-ed94-435d-9f2b-cef120aa1ac1.zip (20,5 KB)

That’s right. Reflections can match into parts of the user query.

I looked at your verbose profile. The VDS orders.lineitem_vds and the agg reflection lineitem_vds_agg1 look fine to me. The reflection should match into your query.

I see you are on version 20 which is from Dec 2021. Many reflection matching improvements have been made since. Is it possible to upgrade to version 24?

If you don’t want to upgrade, you can just take your original query, save it as a VDS and enable a raw reflection on it. Running the original query should match with the reflection even if you don’t explicitly reference the VDS in the query.

@Benny_Chow Thanks for your help.

Atm I can not upgrade to a new community version because I dont have access to a NAS System, HDFS or Data lake storage to store the data reflections. As far as I know you can not use the pdf storage for your data reflections as of community version 21.0.0. Please correct me if I am wrong.

I ended up doing what you have described. I created a raw reflection on my query to accelerate it.

Got to hear the workaround works for you.

As for PDFS, reflections in Iceberg and unlimited splits do not support this. There’s so many performance benefits with these features which is why we no longer support PDFS.