Available raw reflection not being used

mbillson · August 16, 2017, 1:30pm

I have a virtual dataset with the following four columns which is the result of a convoluted MySQL query. The columns are:

user (varchar 65536), dt (date), type (varchar 65536), amount (bigint).

I have created a raw reflection on the dataset that just displays the four columns (and I waited the prescribed 30 seconds after reflection creation) but when I run a simple SELECT * on the dataset the reflection is not used and instead the very complicated MySQL query is executed again. Probably worth mentioning that I updated the whole Dremio install to 1.1.0 earlier on today but I had similar problems with 1.0.8 also.

kelly · August 16, 2017, 2:22pm

Can you share a query profile for this? That will help us better understand what is happening.

Instructions can be found here.

Following that guide, you can also check to see if your acceleration job completed successfully. That would be good to know as well. You can filter on the jobs for those of type “Accelerator”:

Alternately, you can search the “Contains text” and use “CREATE TABLE”

Also, how many rows are in the MySQL table, and how many are returned by your query?

mbillson · August 16, 2017, 2:42pm

Thank you for your quick reply:

The accelerator did terminate successfully after about 55 minutes (unusually long compared to some other reflections of similar size I have created). The MySQL query is a UNION ALL of several GROUP BYs from about 6 different tables: Together they have about 50 million rows but the number my query returns is about 5 million rows. The query that Dremio sends to MySQL is pretty clunky though, a 7 level deep SELECT * FROM (SELECT * FROM (x Union Y) UNION Z) etc but it returns pretty fast (~4 minutes) when I just ask for the query result: The slowdown comes when I try to make the accelerator.

I can’t really share a query profile as I don’t want to distribute to the public internet the database and table names from the underlying MySQL server it falls back to. One theory I have had is that I might have renamed the virtual dataset at some point in the process of the Accelerator creation. To guard against this I have started the accelerator regenerating itself which is about 30 minutes in so I’ll try again and let you know what happens when that is done.

kelly · August 16, 2017, 2:55pm

After that acceleration completes, you can also send us the profile via email if you’d like.

kelly at dremio dot com.

Are dremio and MySQL on the same single server? They may be competing for the same resources in creating the reflection.

mbillson · August 16, 2017, 2:59pm

Dremio is on a different server to MySQL so I don’t think that is the issue. I’ll send you the query profile after the acceleration completes but I’ll do a find-replace on the database and table names.

mbillson · August 16, 2017, 3:22pm

Had the same issue once the reflection had been regenerated…I sent you the query profile.

benoy · August 16, 2017, 4:39pm

@mbillson,

Did you do a preview or a run when you did SELECT * on the dataset ?
Please note that preview will not be accelerated using a reflection.

mbillson · August 16, 2017, 5:30pm

I did a preview first and then a run.

mbillson · August 17, 2017, 3:10pm

I’m getting similar problems with many other datasets that were previously accelerated. I don’t think any of my queries have been accelerated since upgrading to 1.1.0.

Edit: Actually some of them are being accelerated now after I manually made the reflections refresh by removing a parameter and then putting it back. Are reflections automatically carried over (correctly) when a version update is done?

In any case this still doesn’t explain the problem I was having above.

benoy · August 17, 2017, 3:44pm

@mbillson,

I can reproduce the problem that you have encountered with the UNION ALL query. I will dig more into this.

In relation to the reflections not getting refreshed after upgrade, we are looking into that as well.

mbillson · August 21, 2017, 1:54pm

Here is another query on a single virtual dataset:

select sum( m) as s from [space].[dataset] where mm = ‘Coin_Purchases’ and L>=‘2016-01-01’ and L<‘2016-01-31’ and lower( country)=‘germany’

where I have a reflection on the table which Displays m, mm and country and Displays and Sorts L. What are the rules determining when a query is accelerated or not? Because I have tailor made that reflection for this query but Dremio refuses to use it…

benoy · August 21, 2017, 5:46pm

@mbillson,

If there is a way to satisfy query (fully or partially) with the one of the existing reflections, then the query will be accelerated. Our rules are written towards this goal.

The query that you have given should have been accelerated. I created a mysql table based on the example that you have given, created a Virtual dataset on it , created a raw reflection on it as you specified. Then I ran the exact same query and it got accelerated.
Could you please share the profile for this query so that I can analyze it further ? If you prefer, you can send it to benoy at dremio dot com

Regarding your previous query with UNION, it seems queries with UNION is not properly accelerated and this will be fixed soon in the upcoming releases. Thank you so much for pointing out this issue.

mbillson · August 22, 2017, 9:35am

OK I’ve sent you the query profile. It’s probably worth mentioning that in order to overcome the problem with UNION ALL above I made this table from a MySQL table joined with a CSV file (of the big UNION) stored on HDFS, so what is going on here is a bit more complicated than just a MySQL table.

benoy · August 23, 2017, 4:17pm

Thanks @mbillson. I will go through the profile and request any further information if required to reproduce this issue.

benoy · August 29, 2017, 8:39pm

@mbillson,

As you had noted earlier, the queries with UNION is not properly accelerated. We have fixed this issue in the upcoming release 1.2. This will be available for download within a week or so.

thank you,
Benoy

dfleckinger · January 16, 2018, 6:57pm

Hi,
i’m encountering issue when there is 2 UNION ALL in the same query, for doing a Union of 3 existing Virtual Datasets, all of them having a raw acceleration.

SELECT *
FROM A
UNION ALL
SELECT *
FROM B
UNION ALL
SELECT *
FROM C
, it seems that A and B are accelerated but C is not.

When i switch the order, it seems that the 2 first are accelerated, the 3rd one is not.

can · January 16, 2018, 9:16pm

@dfleckinger our QA team just gave the scenario above a go with simple select * VDSs, but haven’t been able to reproduce the issue you ran into.

If possible, could you please share:

The query profile for both queries where you switch 2nd and 3rd VDS locations? Please enable planner.verbose_profile option from Admin > Advanced Settings > Dremio Support before running and sharing the query profiles.
The definitions for all 3 VDSs – interested in the general structure, exact filters, field names, etc, not important.
Version of Dremio.

dfleckinger · January 16, 2018, 9:54pm

@can, Thanks for your reply.

Dremio version is 1.3.1-201712020438070881-a7af5c8 community edition

First query (with one UNION ALL) is well accelerated. acceleration_ok_680dad2c-9f84-4e1e-af5e-f88b0d86b359.zip (42.6 KB)

2nd query (with 2 UNION ALL) uses 2 reflections but not the third:acceleration_ko_de4cc1bf-70b1-41db-80ed-1984d70228ac.zip (73.6 KB)

The 3 VDS have the same structure, only the filter creteria are changing between the 3 VDS
SELECT field1, coalesce(bool1,0) as bool1, coalesce(bool1,0) as bool2,
to_char(ts,‘YYYY-MM’) as “month”, ndv(vid) as “visitors”
FROM parquet_dataset pqds inner join MongoDB_collection col on pqds.id= col.id
WHERE pqds.field2 in (‘value1’,‘value2’)
GROUP BY col.field1, coalesce(bool1,0), coalesce(bool2,0),to_char(ts,‘YYYY-MM’)

can · January 16, 2018, 10:04pm

@dfleckinger looks like the 2nd query profile uses 3 reflections. I’m trying to understand which one is unexpected:

8904b5ed-7fd5-441e-be44-c37c94165b8d (raw): considered, matched, chosen.
all_fields (raw): considered, matched, chosen.
eb797b8b-4c70-451f-aa9b-3b05e7e2d524 (raw): considered, matched, chosen.

Could you confirm the ID of the other reflection that you expected to be matched and chosen?

dfleckinger · January 16, 2018, 10:25pm

@can : the all_fields (raw) is unexpected, it corresponds to the reference table used to build the reflection that was not matched.
I think the ID of the expected reflection to be matched is 2e5ff5fb-d622-48b1-91fe-fe07ebf10828

By the way what’s the best way to get the ID of a reflection when I know it’s name and the dataset ?
Thanks

Topic		Replies	Views
Reflections not used	7	755	June 6, 2023
Wrong content when using a raw reflection	4	1294	March 6, 2019
Can't setup a raw reflection on a virtual dataset out of memory	1	1149	March 25, 2019
Puzzling behavior with RAW Reflections and VDS	6	1291	August 31, 2021
Virtual datasets	1	961	February 14, 2020

Available raw reflection not being used

Related topics