How to accelerate count distinct queries with aggregation reflection

Switch · May 24, 2022, 2:58am

Hi,

I’m having trouble with count distinct reflections
My SQL is simple:
“SELECT count(distinct id) FROM Common.xx where event_time = ‘2022-5-20’”
But this query only accelerated by raw reflection, not the aggregation reflection.

As far as I know, there are two ways to solve this:

use select ndv(id) instead. But ndv is an approximation of count.
create a VDS using select count(distinct xx) and create raw reflection on it. But this may result in plenty of redundant VDSs

So, is there any other better solutions for count distinct?

Thank you

bennychow · May 24, 2022, 4:03pm

You can create an agg reflection with dimensions defined on the id and event_time fields. Hopefully, the unique combinations of id and event_time will be significantly less than the underlying table. If you are always filtering on event_time, then you could also partition on this field too.

Switch · May 26, 2022, 1:15am

It works, thank you!

Topic		Replies	Views
Reflection with count distinct	5	2916	January 15, 2018
Multiple Count Dinstinct in one Reflection	3	1239	December 10, 2018
Setting up reflections on distinct count	1	1295	February 25, 2019
Optimization/Acceleration of select distinct query on attribute	4	2952	December 7, 2017
Accelaration not effective on select distinct * from table Apache Iceberg	3	69	September 19, 2024

How to accelerate count distinct queries with aggregation reflection

Related topics