select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date ‘1998-12-01’ - interval ‘90’ day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
@vincent_mayer The dimensions and measures used in the query needs to be selected in the same Aggregation reflection. I still see some missing ones. will send over the SQL for you to create the right Agg reflection
Delete all the reflections on postgres.public.lineitem
create a VDS say in a space called “orders” and call the VDS “lineitem_vds” on lineitem that has the below columns in addition to the columns in lineitem
l_extendedprice * (1 - l_discount) as disc_price
l_extendedprice * (1 - l_discount) * (1 + l_tax) as charge
run the below SQL or choose the below dimension and measures on lineitem_vds and create an agg reflection
alter dataset orders.lineitem_vds CREATE AGGREGATE REFLECTION lineitem_vds_agg1
using dimensions (l_returnflag,l_linestatus,l_shipdate) measures (l_quantity(sum, avg),
l_extendedprice(sum, avg), l_discount(avg), disc_price(sum), charge(sum), l_orderkey(count))
Once the Agg reflection is created, run the below SQL
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Thank you for the detailed explanation. Some of it I have already tried, e.g. dimensions and measures in on agg reflection.
I created everything as described but my query still doesnt get accelerated. Here is the job profile. 08fc3397-7b8d-4915-a94a-ed10c60e21e4.zip (14,2 KB)
Any idea why?
Just for my understanding. I thought reflections get used by Dremios optimizer even if they only partly satisfy the query. Is that right?
@vincent_mayer Can you provide a verbose profile for the last one you sent above? This will include a lot more logging in the acceleration tab which will explain why your reflection isn’t matching.
That’s right. Reflections can match into parts of the user query.
I looked at your verbose profile. The VDS orders.lineitem_vds and the agg reflection lineitem_vds_agg1 look fine to me. The reflection should match into your query.
I see you are on version 20 which is from Dec 2021. Many reflection matching improvements have been made since. Is it possible to upgrade to version 24?
If you don’t want to upgrade, you can just take your original query, save it as a VDS and enable a raw reflection on it. Running the original query should match with the reflection even if you don’t explicitly reference the VDS in the query.
Atm I can not upgrade to a new community version because I dont have access to a NAS System, HDFS or Data lake storage to store the data reflections. As far as I know you can not use the pdf storage for your data reflections as of community version 21.0.0. Please correct me if I am wrong.
I ended up doing what you have described. I created a raw reflection on my query to accelerate it.
As for PDFS, reflections in Iceberg and unlimited splits do not support this. There’s so many performance benefits with these features which is why we no longer support PDFS.