JOIN query cannot be accelerated by existing reflections

#1

I’m using latest 3.0.0 version and want to accelerate a join query by existing reflections.

I first prepared two reflections for two single tables.
select * from A where date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
select * from B where date=TO_DATE(‘20181109’, ‘YYYYMMDD’)

Then I tried 3 kinds of join queries but none can be accelerated by two reflections.
select *
from A a
join B b
on a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
and a.date=b.date

select *
from A a
join B b
on a.date=b.date
and a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)

select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)

Could you advice any join query sample can be accelerated?

1 Like
#2

Hey @pisces312 have you tried working with a virtual dataset for this? Here is a suggestion:

Create a virtual dataset with the join ( A join B ) and then create a raw reflection on the VDS. You could either include the date filter in the VDS definition or not include it and use it as partition/sort field to accommodate all date options. Remember that you can most likely still keep your existing end-user queries (the 3 that you shared) as is – no need to query the VDS to take advantage of the reflection. Dremio should, in most cases, leverage the reflection on the VDS when is sees A join B type queries that is covered by that reflection.

#3

Thanks @can.

Actually I have tried several ways to create reflection.

Way 1:
select *
from A a
join B b
on a.date=b.date
and a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)

But I noticed the physical plan of reflection creation shows two SQLs without where clause.
select * from A
select * from B
I expect it will have a where date=TO_DATE(‘20181109’, ‘YYYYMMDD’)

The full table scan will really take quite a long time.

Way 2:
select *
from A a
join B b
on a.date=b.date
The reflection physical plan will be the join query itself plus all common fields as join filter.

Way 3:
select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
It got similar physical plan as way 2.

#4

Just to make sure we’re talking about the same exact steps:

  1. Create VDS with date filter:
select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)
  1. Create raw reflection with ALL columns included on top of this VDS.
  2. Once reflection is built and available, test any of your initial 3 queries (with date filter)

Alternatively:

  1. Create VDS without date filter:
select *
from A a
join B b
on a.date=b.date
  1. Create raw reflection with all columns AND date column as partition.
  2. Once reflection is built and available, test any of your initial 3 queries (with any date filter)

Could you confirm? It would be great if you could share a query profile for a few queries where this is problematic.

#5

Yes, we are on the same page. I create VDS and then reflection by REST API.
The problem is that reflection creation will take much time and cannot take advantage of single table’s reflection.

I tried the SQL you provided.

select *
from A a
join B b
on a.date=b.date
where a.date=TO_DATE(‘20181109’, ‘YYYYMMDD’)

The physical plan of #1 is like below.
Jdbc(sql=[SELECT
FROM A
INNER JOIN B ON A.DATE=B.DATE
WHERE A.DATE = CAST(DATE ‘2018-11-09’ AS DATE)])

#6

@pisces312 was this resolved for you? If yes, what was the solution?

My queries are not getting accelerated when there is a join involved.
Attaching a query profile incase someone still is looking at this thread.

84a9d3a3-47db-4942-be9c-c2ea745b693b.zip (53.8 KB)

#7

@joejk

See below, it looks like the reflections did match but did not get picked. It could be either a costing issue or an exception while substitution. Can you please send us the server.log when the REFRESH REFLECTION ran?

  • answerlog (core)
    • Raw Reflection (raw): considered, matched, not chosen.
  • answers (core)
    • Raw Reflection (1) (raw): considered, not matched.
    • Aggregation Reflection (agg): considered, not matched.
  • signups (core)
    • Raw Reflection (raw): considered, not matched.
  • answer_raw (core)
    • Raw Reflection (raw): considered, matched, not chosen.
  • users_userprofile (HaygotHourly.haygot)
    • Raw Reflection (raw): considered, matched, not chosen.

Thanks
@balaji.ramaswamy

#8

Thanks for the response @balaji.ramaswamy.

I have moved on to trying some other workarounds. Here are the serverlogs if someone wants to dig deeper. server.log.zip (820.8 KB)