I have a virtual dataset with the following four columns which is the result of a convoluted MySQL query. The columns are:
user (varchar 65536), dt (date), type (varchar 65536), amount (bigint).
I have created a raw reflection on the dataset that just displays the four columns (and I waited the prescribed 30 seconds after reflection creation) but when I run a simple SELECT * on the dataset the reflection is not used and instead the very complicated MySQL query is executed again. Probably worth mentioning that I updated the whole Dremio install to 1.1.0 earlier on today but I had similar problems with 1.0.8 also.
Following that guide, you can also check to see if your acceleration job completed successfully. That would be good to know as well. You can filter on the jobs for those of type âAcceleratorâ:
The accelerator did terminate successfully after about 55 minutes (unusually long compared to some other reflections of similar size I have created). The MySQL query is a UNION ALL of several GROUP BYs from about 6 different tables: Together they have about 50 million rows but the number my query returns is about 5 million rows. The query that Dremio sends to MySQL is pretty clunky though, a 7 level deep SELECT * FROM (SELECT * FROM (x Union Y) UNION Z) etc but it returns pretty fast (~4 minutes) when I just ask for the query result: The slowdown comes when I try to make the accelerator.
I canât really share a query profile as I donât want to distribute to the public internet the database and table names from the underlying MySQL server it falls back to. One theory I have had is that I might have renamed the virtual dataset at some point in the process of the Accelerator creation. To guard against this I have started the accelerator regenerating itself which is about 30 minutes in so Iâll try again and let you know what happens when that is done.
Dremio is on a different server to MySQL so I donât think that is the issue. Iâll send you the query profile after the acceleration completes but Iâll do a find-replace on the database and table names.
Iâm getting similar problems with many other datasets that were previously accelerated. I donât think any of my queries have been accelerated since upgrading to 1.1.0.
Edit: Actually some of them are being accelerated now after I manually made the reflections refresh by removing a parameter and then putting it back. Are reflections automatically carried over (correctly) when a version update is done?
In any case this still doesnât explain the problem I was having above.
Here is another query on a single virtual dataset:
select sum( m) as s from [space].[dataset] where mm = âCoin_Purchasesâ and L>=â2016-01-01â and L<â2016-01-31â and lower( country)=âgermanyâ
where I have a reflection on the table which Displays m, mm and country and Displays and Sorts L. What are the rules determining when a query is accelerated or not? Because I have tailor made that reflection for this query but Dremio refuses to use itâŚ
If there is a way to satisfy query (fully or partially) with the one of the existing reflections, then the query will be accelerated. Our rules are written towards this goal.
The query that you have given should have been accelerated. I created a mysql table based on the example that you have given, created a Virtual dataset on it , created a raw reflection on it as you specified. Then I ran the exact same query and it got accelerated.
Could you please share the profile for this query so that I can analyze it further ? If you prefer, you can send it to benoy at dremio dot com
Regarding your previous query with UNION, it seems queries with UNION is not properly accelerated and this will be fixed soon in the upcoming releases. Thank you so much for pointing out this issue.
OK Iâve sent you the query profile. Itâs probably worth mentioning that in order to overcome the problem with UNION ALL above I made this table from a MySQL table joined with a CSV file (of the big UNION) stored on HDFS, so what is going on here is a bit more complicated than just a MySQL table.
As you had noted earlier, the queries with UNION is not properly accelerated. We have fixed this issue in the upcoming release 1.2. This will be available for download within a week or so.
Hi,
iâm encountering issue when there is 2 UNION ALL in the same query, for doing a Union of 3 existing Virtual Datasets, all of them having a raw acceleration.
SELECT *
FROM A
UNION ALL
SELECT *
FROM B
UNION ALL
SELECT *
FROM C
, it seems that A and B are accelerated but C is not.
When i switch the order, it seems that the 2 first are accelerated, the 3rd one is not.
@dfleckinger our QA team just gave the scenario above a go with simple select * VDSs, but havenât been able to reproduce the issue you ran into.
If possible, could you please share:
The query profile for both queries where you switch 2nd and 3rd VDS locations? Please enable planner.verbose_profile option from Admin > Advanced Settings > Dremio Support before running and sharing the query profiles.
The definitions for all 3 VDSs â interested in the general structure, exact filters, field names, etc, not important.
The 3 VDS have the same structure, only the filter creteria are changing between the 3 VDS
SELECT field1, coalesce(bool1,0) as bool1, coalesce(bool1,0) as bool2,
to_char(ts,âYYYY-MMâ) as âmonthâ, ndv(vid) as âvisitorsâ
FROM parquet_dataset pqds inner join MongoDB_collection col on pqds.id= col.id
WHERE pqds.field2 in (âvalue1â,âvalue2â)
GROUP BY col.field1, coalesce(bool1,0), coalesce(bool2,0),to_char(ts,âYYYY-MMâ)
@can : the all_fields (raw) is unexpected, it corresponds to the reference table used to build the reflection that was not matched.
I think the ID of the expected reflection to be matched is 2e5ff5fb-d622-48b1-91fe-fe07ebf10828
By the way whatâs the best way to get the ID of a reflection when I know itâs name and the dataset ?
Thanks