When looking at the query profile you’ve provided, we can see that Dremio pushes down the filters on
cltp into ES, but not the aggregations on the analyzed field (expected) – which is probably why this is taking longer than you expect. We end up having to read the data after the filters are applied before we can do teh aggregation in Dremio. As @anthony mentioned, the rate at which ES can return data is the bottleneck.
In general, Dremio doesn’t pushdown aggregations if the field is analyzed or normalized for correctness reasons. Imagine you have a value in this field for “Los Angeles”. When analyzed, this may be split into “los” and “angeles”. At this point, if we were to pushdown aggregation on this field instead of grouping on “Los Angeles”, you’d be grouping by both on “los” and “angeles” separately.
If this is a common use-case where you need performance, I’d follow @kelly’s suggestion above of using reflections.