I have a virtual dataset with the following four columns which is the result of a convoluted MySQL query. The columns are:
user (varchar 65536), dt (date), type (varchar 65536), amount (bigint).
I have created a raw reflection on the dataset that just displays the four columns (and I waited the prescribed 30 seconds after reflection creation) but when I run a simple SELECT * on the dataset the reflection is not used and instead the very complicated MySQL query is executed again. Probably worth mentioning that I updated the whole Dremio install to 1.1.0 earlier on today but I had similar problems with 1.0.8 also.
Can you share a query profile for this? That will help us better understand what is happening.
Instructions can be found here.
Following that guide, you can also check to see if your acceleration job completed successfully. That would be good to know as well. You can filter on the jobs for those of type “Accelerator”:
Alternately, you can search the “Contains text” and use “CREATE TABLE”
Also, how many rows are in the MySQL table, and how many are returned by your query?
Thank you for your quick reply:
The accelerator did terminate successfully after about 55 minutes (unusually long compared to some other reflections of similar size I have created). The MySQL query is a UNION ALL of several GROUP BYs from about 6 different tables: Together they have about 50 million rows but the number my query returns is about 5 million rows. The query that Dremio sends to MySQL is pretty clunky though, a 7 level deep SELECT * FROM (SELECT * FROM (x Union Y) UNION Z) etc but it returns pretty fast (~4 minutes) when I just ask for the query result: The slowdown comes when I try to make the accelerator.
I can’t really share a query profile as I don’t want to distribute to the public internet the database and table names from the underlying MySQL server it falls back to. One theory I have had is that I might have renamed the virtual dataset at some point in the process of the Accelerator creation. To guard against this I have started the accelerator regenerating itself which is about 30 minutes in so I’ll try again and let you know what happens when that is done.
After that acceleration completes, you can also send us the profile via email if you’d like.
kelly at dremio dot com.
Are dremio and MySQL on the same single server? They may be competing for the same resources in creating the reflection.
Dremio is on a different server to MySQL so I don’t think that is the issue. I’ll send you the query profile after the acceleration completes but I’ll do a find-replace on the database and table names.
Had the same issue once the reflection had been regenerated…I sent you the query profile.
Did you do a preview or a run when you did SELECT * on the dataset ?
Please note that preview will not be accelerated using a reflection.
I did a preview first and then a run.
I’m getting similar problems with many other datasets that were previously accelerated. I don’t think any of my queries have been accelerated since upgrading to 1.1.0.
Edit: Actually some of them are being accelerated now after I manually made the reflections refresh by removing a parameter and then putting it back. Are reflections automatically carried over (correctly) when a version update is done?
In any case this still doesn’t explain the problem I was having above.
I can reproduce the problem that you have encountered with the UNION ALL query. I will dig more into this.
In relation to the reflections not getting refreshed after upgrade, we are looking into that as well.
Here is another query on a single virtual dataset:
select sum( m) as s from [space].[dataset] where mm = ‘Coin_Purchases’ and L>=‘2016-01-01’ and L<‘2016-01-31’ and lower( country)=‘germany’
where I have a reflection on the table which Displays m, mm and country and Displays and Sorts L. What are the rules determining when a query is accelerated or not? Because I have tailor made that reflection for this query but Dremio refuses to use it…
If there is a way to satisfy query (fully or partially) with the one of the existing reflections, then the query will be accelerated. Our rules are written towards this goal.
The query that you have given should have been accelerated. I created a mysql table based on the example that you have given, created a Virtual dataset on it , created a raw reflection on it as you specified. Then I ran the exact same query and it got accelerated.
Could you please share the profile for this query so that I can analyze it further ? If you prefer, you can send it to benoy at dremio dot com
Regarding your previous query with UNION, it seems queries with UNION is not properly accelerated and this will be fixed soon in the upcoming releases. Thank you so much for pointing out this issue.
OK I’ve sent you the query profile. It’s probably worth mentioning that in order to overcome the problem with UNION ALL above I made this table from a MySQL table joined with a CSV file (of the big UNION) stored on HDFS, so what is going on here is a bit more complicated than just a MySQL table.
Thanks @mbillson. I will go through the profile and request any further information if required to reproduce this issue.
As you had noted earlier, the queries with UNION is not properly accelerated. We have fixed this issue in the upcoming release 1.2. This will be available for download within a week or so.
i’m encountering issue when there is 2 UNION ALL in the same query, for doing a Union of 3 existing Virtual Datasets, all of them having a raw acceleration.
, it seems that A and B are accelerated but C is not.
When i switch the order, it seems that the 2 first are accelerated, the 3rd one is not.
@dfleckinger our QA team just gave the scenario above a go with simple
select * VDSs, but haven’t been able to reproduce the issue you ran into.
If possible, could you please share:
- The query profile for both queries where you switch 2nd and 3rd VDS locations? Please enable
planner.verbose_profile option from Admin > Advanced Settings > Dremio Support before running and sharing the query profiles.
- The definitions for all 3 VDSs – interested in the general structure, exact filters, field names, etc, not important.
- Version of Dremio.
@can, Thanks for your reply.
Dremio version is 1.3.1-201712020438070881-a7af5c8 community edition
First query (with one UNION ALL) is well accelerated. acceleration_ok_680dad2c-9f84-4e1e-af5e-f88b0d86b359.zip (42.6 KB)
2nd query (with 2 UNION ALL) uses 2 reflections but not the third:acceleration_ko_de4cc1bf-70b1-41db-80ed-1984d70228ac.zip (73.6 KB)
The 3 VDS have the same structure, only the filter creteria are changing between the 3 VDS
SELECT field1, coalesce(bool1,0) as bool1, coalesce(bool1,0) as bool2,
to_char(ts,‘YYYY-MM’) as “month”, ndv(vid) as “visitors”
FROM parquet_dataset pqds inner join MongoDB_collection col on pqds.id= col.id
WHERE pqds.field2 in (‘value1’,‘value2’)
GROUP BY col.field1, coalesce(bool1,0), coalesce(bool2,0),to_char(ts,‘YYYY-MM’)
@dfleckinger looks like the 2nd query profile uses 3 reflections. I’m trying to understand which one is unexpected:
- 8904b5ed-7fd5-441e-be44-c37c94165b8d (raw): considered, matched, chosen.
- all_fields (raw): considered, matched, chosen.
- eb797b8b-4c70-451f-aa9b-3b05e7e2d524 (raw): considered, matched, chosen.
Could you confirm the ID of the other reflection that you expected to be matched and chosen?
@can : the all_fields (raw) is unexpected, it corresponds to the reference table used to build the reflection that was not matched.
I think the ID of the expected reflection to be matched is 2e5ff5fb-d622-48b1-91fe-fe07ebf10828
By the way what’s the best way to get the ID of a reflection when I know it’s name and the dataset ?