Available raw reflection not being used

Dremio is on a different server to MySQL so I don’t think that is the issue. I’ll send you the query profile after the acceleration completes but I’ll do a find-replace on the database and table names.

Had the same issue once the reflection had been regenerated…I sent you the query profile.

@mbillson,

Did you do a preview or a run when you did SELECT * on the dataset ?
Please note that preview will not be accelerated using a reflection.

I did a preview first and then a run.

I’m getting similar problems with many other datasets that were previously accelerated. I don’t think any of my queries have been accelerated since upgrading to 1.1.0.

Edit: Actually some of them are being accelerated now after I manually made the reflections refresh by removing a parameter and then putting it back. Are reflections automatically carried over (correctly) when a version update is done?

In any case this still doesn’t explain the problem I was having above.

@mbillson,

I can reproduce the problem that you have encountered with the UNION ALL query. I will dig more into this.

In relation to the reflections not getting refreshed after upgrade, we are looking into that as well.

Here is another query on a single virtual dataset:

select sum( m) as s from [space].[dataset] where mm = ‘Coin_Purchases’ and L>=‘2016-01-01’ and L<‘2016-01-31’ and lower( country)=‘germany’

where I have a reflection on the table which Displays m, mm and country and Displays and Sorts L. What are the rules determining when a query is accelerated or not? Because I have tailor made that reflection for this query but Dremio refuses to use it…

@mbillson,

If there is a way to satisfy query (fully or partially) with the one of the existing reflections, then the query will be accelerated. Our rules are written towards this goal.

The query that you have given should have been accelerated. I created a mysql table based on the example that you have given, created a Virtual dataset on it , created a raw reflection on it as you specified. Then I ran the exact same query and it got accelerated.
Could you please share the profile for this query so that I can analyze it further ? If you prefer, you can send it to benoy at dremio dot com

Regarding your previous query with UNION, it seems queries with UNION is not properly accelerated and this will be fixed soon in the upcoming releases. Thank you so much for pointing out this issue.

OK I’ve sent you the query profile. It’s probably worth mentioning that in order to overcome the problem with UNION ALL above I made this table from a MySQL table joined with a CSV file (of the big UNION) stored on HDFS, so what is going on here is a bit more complicated than just a MySQL table.

Thanks @mbillson. I will go through the profile and request any further information if required to reproduce this issue.

@mbillson,

As you had noted earlier, the queries with UNION is not properly accelerated. We have fixed this issue in the upcoming release 1.2. This will be available for download within a week or so.

thank you,
Benoy

Hi,
i’m encountering issue when there is 2 UNION ALL in the same query, for doing a Union of 3 existing Virtual Datasets, all of them having a raw acceleration.

SELECT *
FROM A
UNION ALL
SELECT *
FROM B
UNION ALL
SELECT *
FROM C
, it seems that A and B are accelerated but C is not.

When i switch the order, it seems that the 2 first are accelerated, the 3rd one is not.

@dfleckinger our QA team just gave the scenario above a go with simple select * VDSs, but haven’t been able to reproduce the issue you ran into.

If possible, could you please share:

  • The query profile for both queries where you switch 2nd and 3rd VDS locations? Please enable planner.verbose_profile option from Admin > Advanced Settings > Dremio Support before running and sharing the query profiles.
  • The definitions for all 3 VDSs – interested in the general structure, exact filters, field names, etc, not important.
  • Version of Dremio.

@can, Thanks for your reply.

Dremio version is 1.3.1-201712020438070881-a7af5c8 community edition

First query (with one UNION ALL) is well accelerated. acceleration_ok_680dad2c-9f84-4e1e-af5e-f88b0d86b359.zip (42.6 KB)

2nd query (with 2 UNION ALL) uses 2 reflections but not the third:acceleration_ko_de4cc1bf-70b1-41db-80ed-1984d70228ac.zip (73.6 KB)

The 3 VDS have the same structure, only the filter creteria are changing between the 3 VDS
SELECT field1, coalesce(bool1,0) as bool1, coalesce(bool1,0) as bool2,
to_char(ts,‘YYYY-MM’) as “month”, ndv(vid) as “visitors”
FROM parquet_dataset pqds inner join MongoDB_collection col on pqds.id= col.id
WHERE pqds.field2 in (‘value1’,‘value2’)
GROUP BY col.field1, coalesce(bool1,0), coalesce(bool2,0),to_char(ts,‘YYYY-MM’)

@dfleckinger looks like the 2nd query profile uses 3 reflections. I’m trying to understand which one is unexpected:

  • 8904b5ed-7fd5-441e-be44-c37c94165b8d (raw): considered, matched, chosen.
  • all_fields (raw): considered, matched, chosen.
  • eb797b8b-4c70-451f-aa9b-3b05e7e2d524 (raw): considered, matched, chosen.

Could you confirm the ID of the other reflection that you expected to be matched and chosen?

@can : the all_fields (raw) is unexpected, it corresponds to the reference table used to build the reflection that was not matched.
I think the ID of the expected reflection to be matched is 2e5ff5fb-d622-48b1-91fe-fe07ebf10828

By the way what’s the best way to get the ID of a reflection when I know it’s name and the dataset ?
Thanks

Got it. Looking at the profile, not sure what’s causing the other reflection not to get matched, we’ll need to investigate further.

We’re working on a few things to make navigating between datasets, reflections and materializations easy. For now, you can use this query (which uses old acceleration vocabulary: layout = reflection):

select * from sys.layouts a join sys.accelerations l on a."acceleration_id" = l."acceleration_id"
1 Like

Thanks a lot for this query that enables me to have metadata information on reflections !

I also have a straight forward UNION ALL query between two datasets which both have Raw reflections and the Raw reflection building for the union query does not use the reflections of the two datasets that were being unioned.

This seems challenging because we have to union many different data sets together and if the reflections are not used then our ability to query with dremio is sufficiently hampered.

@kprifogle

Would you be able to send us a profile?

Thanks
@balaji.ramaswamy