I’m using latest 3.0.0 version and want to accelerate a join query by existing reflections.
I first prepared two reflections for two single tables.
select * from A where date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
select * from B where date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
Then I tried 3 kinds of join queries but none can be accelerated by two reflections.
select *
from A a
join B b
on a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
and a.date=b.date
select *
from A a
join B b
on a.date=b.date
and a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
Could you advice any join query sample can be accelerated?
Hey @pisces312 have you tried working with a virtual dataset for this? Here is a suggestion:
Create a virtual dataset with the join ( A join B ) and then create a raw reflection on the VDS. You could either include the date filter in the VDS definition or not include it and use it as partition/sort field to accommodate all date options. Remember that you can most likely still keep your existing end-user queries (the 3 that you shared) as is – no need to query the VDS to take advantage of the reflection. Dremio should, in most cases, leverage the reflection on the VDS when is sees A join B type queries that is covered by that reflection.
Actually I have tried several ways to create reflection.
Way 1:
select *
from A a
join B b
on a.date=b.date
and a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
But I noticed the physical plan of reflection creation shows two SQLs without where clause.
select * from A
select * from B
I expect it will have a where date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
The full table scan will really take quite a long time.
Way 2:
select *
from A a
join B b
on a.date=b.date
The reflection physical plan will be the join query itself plus all common fields as join filter.
Way 3:
select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
It got similar physical plan as way 2.
Yes, we are on the same page. I create VDS and then reflection by REST API.
The problem is that reflection creation will take much time and cannot take advantage of single table’s reflection.
I tried the SQL you provided.
select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
The physical plan of #1 is like below.
Jdbc(sql=[SELECT
FROM A
INNER JOIN B ON A.DATE=B.DATE
WHERE A.DATE = CAST(DATE ‘2018-11-09’ AS DATE)])
See below, it looks like the reflections did match but did not get picked. It could be either a costing issue or an exception while substitution. Can you please send us the server.log when the REFRESH REFLECTION ran?
answerlog (core)
Raw Reflection (raw): considered, matched, not chosen.
answers (core)
Raw Reflection (1) (raw): considered, not matched.
Aggregation Reflection (agg): considered, not matched.
signups (core)
Raw Reflection (raw): considered, not matched.
answer_raw (core)
Raw Reflection (raw): considered, matched, not chosen.
users_userprofile (HaygotHourly.haygot)
Raw Reflection (raw): considered, matched, not chosen.