Accelerator/Reflection giving incorrect result

When I run the following SQL query on a table I get a column “measure” with integer values and a column “ratio” with float results showing percentage of guids who appear on a row with measure value k as a percentage of those with measure value 0.

SELECT measure, count(distinct guid)/(first)*100.0 as ratio
FROM u_retention AS l
LEFT JOIN
(select count(distinct guid) as first from guid_table where measure = 0) r
ON TRUE
GROUP BY measure, first ORDER BY measure

So far so good, but when I enable a Raw Reflection which merely has display ticked for every column in the table (though the same problem resulted with more complex Reflection) I get 100 in the first ratio row as expected but then 0 in every subsequent row. When I select count(distinct guid) and “first” as separate columns the results are correct for both these columns but the division seems to not be performed properly.

I tried putting the count distinct in a subquery but this didn’t help.

Hi,

Could you share the Virtual Dataset definition (i.e. the SQL) for the virtual dataset that you added a raw reflection on, please.

thanks
Ron

It was just a “SELECT * FROM MySQL.guid_table” where the table was stored (as you can guess) on a MySQL server.

Thanks, knowing which table you accelerated helps try to dig into it.

Are you running Dremio on a single node? If not, how many coordinators and executors? Also, are you using PDFS or HDFS for your distributed storage setting (dist path)?

Another sanity check: Are you clicking “Run” or “Preview” execute the query? The results of Preview are based on a sample, so they may be incorrect.

I am running it on a single Ubuntu desktop node and everything is set to
default settings (i.e. I literally just downloaded the .gz, unzipped it and
went ./bin/dremio start). I am using the web UI on a Windows 10 machine on
the same network.

I am pressing run rather than preview. Interestingly when I press preview I get something which looks like it might be correct on the partial data (maybe it doesn’t accelerate the preview query).

Hi

We are trying to reproduce this on our side.
Could you please get the profile and upload it here ?

You can do it by going to the Jobs tab, select the job which was accelerated using reflection , click download profile .

Screenshot attached.

I’m afraid it will have to wait until next week to do that and I will see
if it is possible then. The table in MySQL had the following columns
(variable names changed) and types, as far as I can remember (I’ll correct
it if any of it is wrong on Monday.)

guid (VARCHAR(40)) dtime (DATETIME) mtime (DATETIME) measure (INT(11)) o
(VARCHAR(40)) m (VARCHAR(40)) b (INT(11)) c (VARCHAR(40)) g (FLOAT) b_gp
(VARCHAR(40)) g_g (VARCHAR(40))

The actual query had a WHERE clause on the m column which when applied
meant that the o column was always null. It is actually a fairly
denormalised table with a lot of stuff in it not relevant to this query,
hence the use of the m column to get the data I actually care about. I’ll
test the query again as I entered it above on Monday.

Many thanks.

I would assume this is related to BigInt/BigInt getting truncated during division. Can you try doing 100.0*count(distinct guid)/(first) instead?

I tried two similar queries (attaching the screenshots here).

SELECT city, 100.0* count(distinct review_count)/(first) as ratio --> returned me non-zero fractional values.
SELECT city, count(distinct review_count)/(first) * 100.0 as ratio --> returned me zeros most of the time.

Yeah doing 100.0*count(distinct guid)/(first) gave me the correct result. Thanks for the help!