We are trying to run a query like following and it’s pushing down different ES query to different versions of ES clusters
Select t.a, count(t.a) as a_count
From tbl t
Group by t.a
Order by t.a ASC
Limit 5
Now for ES version 5.6.2 the pushdown query involves the group by & aggregation and rest of the order by and limit clauses are executed in Dremio i imagine. But for newer ES 6.0.0, we don’t get a pushdown at all - it’s just a match_all query. Because of this there is a huge discrepancy in performance, ranging from about 3-5 seconds in the first case to 3.2 minutes in the second.
How do we figure out why it’s behaving like this and decides to not pushdown in the newer cluster?
After investigating the query profile, it seems group by behaviour is determined by elastic type for the group by field - group by text fields are not pushed down but keyword, date and long are. Which makes sense because of the uninverting index performance issues and fielddata being disabled by default.
Another related question is why are order by not pushed down to elastic. A simple query with ordering and limit is expected to be fast by our users, but since pushdown doesn’t work the queries take awfully longer :
SELECT t.a
FROM tbl t
ORDER BY t.ingestDate desc
LIMIT 10
In this case, we want to look at the latest 10 rows ordered by ingestDate field. But since pushdown doesn’t work in this case, it gets the whole dataset to dremio before returning the final result set. What’s the rationale behind this?