Dremio not using refelections when dataset based on query, uses reflection when query is stored as view in DB

Using latest version of Dremio community edition 1.3.1-201712020438070881-a7af5c8

Created a dataset based on following query (MYSQL source)

select
salesorderheader.SalesOrderID
, SalesOrderDetailID
,RevisionNumber
,OrderDate
,DueDate
,ShipDate
,Status
,OnlineOrderFlag
,SalesOrderNumber
,PurchaseOrderNumber
,AccountNumber
,CustomerID
,ContactID
,SalesPersonID
,salesorderheader.TerritoryID
,billaddress.addressline1 bill_addressline1
,billaddress.addressline2 bill_addressline2
,billaddress.city bill_city
,billstate.stateprovincecode bill_state
,billstate.countryregioncode bill_countrycode
,ShipToAddressID
,shipaddress.addressline1 ship_addressline1
,shipaddress.addressline2 ship_addressline2
,shipaddress.city ship_city
,shipstate.stateprovincecode ship_state
,shipstate.countryregioncode ship_countrycode
,shipmethod.name as shipmethod
,CreditCardID
,CreditCardApprovalCode
, Comment
, CarrierTrackingNumber
, OrderQty
, ProductID
, SpecialOfferID
, UnitPrice
, UnitPriceDiscount
, LineTotal
from mysql_ps2.adventureworks.salesorderheader
inner join mysql_ps2.adventureworks.salesorderdetail on salesorderheader.salesorderid = salesorderdetail.salesorderid
left join mysql_ps2.adventureworks.address as billaddress on salesorderheader.billtoaddressid = billaddress.addressid
left join mysql_ps2.adventureworks.address as shipaddress on salesorderheader.shiptoaddressid = shipaddress.addressid
left join mysql_ps2.adventureworks.stateprovince as billstate on billaddress.stateprovinceid = billstate.stateprovinceid
left join mysql_ps2.adventureworks.stateprovince as shipstate on shipaddress.stateprovinceid = shipstate.stateprovinceid
left join mysql_ps2.adventureworks.shipmethod on salesorderheader.ShipMethodID = shipmethod.ShipMethodID

Defined raw and aggregate reflections on the data set

Ran following query on dataset (reflection not used)

select territoryid, sum(“OrderQty”),
sum(“UnitPrice”), sum(“UnitPriceDiscount”), sum(“LineTotal”)
from “dw_adworks”.“f_sales” group by territoryid

image

Now converted the query for dataset as view in MYSQL, created exact same reflections on the new data set based on view in MYSQL, ran the same query again and Dremio used reflection

Any explanation for this behavior?

The behavior is consistent with other queries as well
— when dataset is based on query

– when dataset is based on VW based on same query

Profiles for both cases
Profile ref not used DS based on Query 64fecde4-fa6a-4772-9b79-1a59325c1fd1.zip (5.0 KB)
Profile ref used DS based on MYSQL View 9cc66af0-2fe4-4118-804e-e2786427e055.zip (6.1 KB)

Hi @pradeep,

When you create a new reflection you need to wait approx ~1minute (the exact time however ultimately depends on your data size, always verify that the reflection is ready for use before you query against it) before you can start using the new reflection:

  1. Reflection is being created:
    01

  2. Reflection is rdy for use:
    02

In both cases have you waited until the reflection is ready for use, or did you quickly run the 1st query and always ran the 2nd query on the respective view later?

Thanks,
Danny

Can’t tell from your screen shots whether you have territoryid defined as a dimension for your aggregation reflection? Anything you want to group by should be defined as a dimension.

Profile ref not use DS based on query simple select 1238551c-c1e3-4a76-ab92-46cbac158ff6.zip (10.9 KB)
Tried again just now. More than 24 hours after creating the reflection, still not using the reflection in case dataset is based on query

Regarding note from Kelly, I do have territoryid set as dimension in aggregate reflection

I also have raw reflection

Even select * from dataset based on query does not use raw reflection

image

Profile ref not use DS based on query simple select 1238551c-c1e3-4a76-ab92-46cbac158ff6.zip (10.9 KB)

Hi @pradeep,

In this case we would need to see a query profile to determine why the defined reflection was not used, it is possible at the planning phase that it had been determined it “costs” more to use the defined reflection and the query executes faster without using it.

Could you attach such a query profile:

So that we could determine why your reflection was not used.

Thanks,
Danny

On your aggregation reflection you have territoryid as a dimension but it looks like you don’t have any measures, but maybe they simply don’t fit in the same screen. For this to work a given reflection should have the necessary dimensions AND measures in place (in other words, they can’t span reflections).

I’m guessing you have this right and the underlying issue is something else. However, the query profile image you shared shows that “Unnamed Reflection” did not cover the query, which is what would happen if you had territoryid as a dimension and did not have OrderQty, UnitPrice, UnitPriceDiscount, and LineTotal configured as measures in the same reflection.

Measures are there. When I created the dataset based on the view (based on same query) the reflections are used

I also created raw reflections, which should be used on select * from dataset query, but its not used when dataset is based on query and used when dataset is based on view

Upgraded to 1.4.4-201801230630490666-6d69d32, but the issue is still there

To reproduce you can download adventureworks for MySQL from here
https://sourceforge.net/projects/awmysql/files/?source=navbar

— here is the query I used to create dataset in dremio
select
salesorderheader.SalesOrderID
, SalesOrderDetailID
,RevisionNumber
,OrderDate
,DueDate
,ShipDate
,Status
,OnlineOrderFlag
,SalesOrderNumber
,PurchaseOrderNumber
,AccountNumber
,CustomerID
,ContactID
,SalesPersonID
,salesorderheader.TerritoryID
,billaddress.addressline1 bill_addressline1
,billaddress.addressline2 bill_addressline2
,billaddress.city bill_city
,billstate.stateprovincecode bill_state
,billstate.countryregioncode bill_countrycode
,ShipToAddressID
,shipaddress.addressline1 ship_addressline1
,shipaddress.addressline2 ship_addressline2
,shipaddress.city ship_city
,shipstate.stateprovincecode ship_state
,shipstate.countryregioncode ship_countrycode
,shipmethod.name as shipmethod
,CreditCardID
,CreditCardApprovalCode
, Comment
, CarrierTrackingNumber
, OrderQty
, ProductID
, SpecialOfferID
, UnitPrice
, UnitPriceDiscount
, LineTotal
from mysql.adventureworks.salesorderheader
inner join mysql.adventureworks.salesorderdetail on salesorderheader.salesorderid = salesorderdetail.salesorderid
left join mysql.adventureworks.address as billaddress on salesorderheader.billtoaddressid = billaddress.addressid
left join mysql.adventureworks.address as shipaddress on salesorderheader.shiptoaddressid = shipaddress.addressid
left join mysql.adventureworks.stateprovince as billstate on billaddress.stateprovinceid = billstate.stateprovinceid
left join mysql.adventureworks.stateprovince as shipstate on shipaddress.stateprovinceid = shipstate.stateprovinceid
left join mysql.adventureworks.shipmethod on salesorderheader.ShipMethodID = shipmethod.ShipMethodID

– here is query used to create view in MYSQL (reflections are working when dataset is based on view)

create or replace view vw_f_sales as
select
salesorderheader.SalesOrderID
, SalesOrderDetailID
,RevisionNumber
,OrderDate
,DueDate
,ShipDate
,Status
,OnlineOrderFlag
,SalesOrderNumber
,PurchaseOrderNumber
,AccountNumber
,CustomerID
,ContactID
,SalesPersonID
,salesorderheader.TerritoryID
,billaddress.addressline1 bill_addressline1
,billaddress.addressline2 bill_addressline2
,billaddress.city bill_city
,billstate.stateprovincecode bill_state
,billstate.countryregioncode bill_countrycode
,ShipToAddressID
,shipaddress.addressline1 ship_addressline1
,shipaddress.addressline2 ship_addressline2
,shipaddress.city ship_city
,shipstate.stateprovincecode ship_state
,shipstate.countryregioncode ship_countrycode
,shipmethod.name as shipmethod
,CreditCardID
,CreditCardApprovalCode
, Comment
, CarrierTrackingNumber
, OrderQty
, ProductID
, SpecialOfferID
, UnitPrice
, UnitPriceDiscount
, LineTotal
from salesorderheader
inner join salesorderdetail on salesorderheader.salesorderid = salesorderdetail.salesorderid
left join address as billaddress on salesorderheader.billtoaddressid = billaddress.addressid
left join address as shipaddress on salesorderheader.shiptoaddressid = shipaddress.addressid
left join stateprovince as billstate on billaddress.stateprovinceid = billstate.stateprovinceid
left join stateprovince as shipstate on shipaddress.stateprovinceid = shipstate.stateprovinceid
left join shipmethod on salesorderheader.ShipMethodID = shipmethod.ShipMethodID

@pradeep do you have any update on this, did you have any luck investigating and resolving this issue? I am having the same problem from my end…

Hi, I am facing the same issue.
I built a raw reflection on all columns of VDS “a”.
Afterwards I do a “select * from a” but this does not get accelerated by the reflection.

@dorianb Can you share a query job profile please?

I’m having the same issue @dorianb is having

@kprifogle

Kindly send us the profile of the query that should have used reflections but did not

Thanks
@balaji.ramaswamy