How to accelerate count distinct queries with aggregation reflection


I’m having trouble with count distinct reflections
My SQL is simple:
“SELECT count(distinct id) FROM Common.xx where event_time = ‘2022-5-20’”
But this query only accelerated by raw reflection, not the aggregation reflection.

As far as I know, there are two ways to solve this:

  1. use select ndv(id) instead. But ndv is an approximation of count.
  2. create a VDS using select count(distinct xx) and create raw reflection on it. But this may result in plenty of redundant VDSs

So, is there any other better solutions for count distinct?

Thank you

You can create an agg reflection with dimensions defined on the id and event_time fields. Hopefully, the unique combinations of id and event_time will be significantly less than the underlying table. If you are always filtering on event_time, then you could also partition on this field too.

1 Like

It works, thank you!

1 Like