When I run the following SQL query on a table I get a column “measure” with integer values and a column “ratio” with float results showing percentage of guids who appear on a row with measure value k as a percentage of those with measure value 0.
SELECT measure, count(distinct guid)/(first)*100.0 as ratio
FROM u_retention AS l
LEFT JOIN
(select count(distinct guid) as first from guid_table where measure = 0) r
ON TRUE
GROUP BY measure, first ORDER BY measure
So far so good, but when I enable a Raw Reflection which merely has display ticked for every column in the table (though the same problem resulted with more complex Reflection) I get 100 in the first ratio row as expected but then 0 in every subsequent row. When I select count(distinct guid) and “first” as separate columns the results are correct for both these columns but the division seems to not be performed properly.
I tried putting the count distinct in a subquery but this didn’t help.
Thanks, knowing which table you accelerated helps try to dig into it.
Are you running Dremio on a single node? If not, how many coordinators and executors? Also, are you using PDFS or HDFS for your distributed storage setting (dist path)?
I am running it on a single Ubuntu desktop node and everything is set to
default settings (i.e. I literally just downloaded the .gz, unzipped it and
went ./bin/dremio start). I am using the web UI on a Windows 10 machine on
the same network.
I am pressing run rather than preview. Interestingly when I press preview I get something which looks like it might be correct on the partial data (maybe it doesn’t accelerate the preview query).
I’m afraid it will have to wait until next week to do that and I will see
if it is possible then. The table in MySQL had the following columns
(variable names changed) and types, as far as I can remember (I’ll correct
it if any of it is wrong on Monday.)
guid (VARCHAR(40)) dtime (DATETIME) mtime (DATETIME) measure (INT(11)) o
(VARCHAR(40)) m (VARCHAR(40)) b (INT(11)) c (VARCHAR(40)) g (FLOAT) b_gp
(VARCHAR(40)) g_g (VARCHAR(40))
The actual query had a WHERE clause on the m column which when applied
meant that the o column was always null. It is actually a fairly
denormalised table with a lot of stuff in it not relevant to this query,
hence the use of the m column to get the data I actually care about. I’ll
test the query again as I entered it above on Monday.
I would assume this is related to BigInt/BigInt getting truncated during division. Can you try doing 100.0*count(distinct guid)/(first) instead?
I tried two similar queries (attaching the screenshots here).
SELECT city, 100.0* count(distinct review_count)/(first) as ratio --> returned me non-zero fractional values.
SELECT city, count(distinct review_count)/(first) * 100.0 as ratio --> returned me zeros most of the time.