Hi expert,
I would like to check with you, currently our dremio connected directly to ElasticSearch Cluster as datasource.
Currently im testing the performance by executing 1 of reporting query as per below :-
SELECT _
_ to_char(logins.date_time_iso8601,‘yyyy-MM-dd HH:MI:SS’) AS transformed_date, _
_ logins.data.school_code AS school_code, _
_ logins.data.auth.role AS user_profile, _
_ COUNT(DISTINCT(logins.data.session)) AS total_logins, _
_ COUNT(DISTINCT(logins.data.auth.name)) AS unique_logins _
_ FROM
_ “ES Cluster Dev”.“auth_login-2018_06_06”.index_typ logins_
_ WHERE 1=1_
– AND logins.data.auth.role = ‘Student’
_ GROUP BY_
_ to_char(logins.date_time_iso8601,‘yyyy-MM-dd HH:MI:SS’) ,_
_ logins.data.school_code ,_
_ logins.data.auth.role_
I’ve notice that the query runs in acceptable speed (less than 5 second) when I disable the WHERE clause filter. Yet when I enable it to make the filtration for only Student the query runs in unacceptable speed (longer than 30 second) .
Is there any way that I could improve on this issue?
If the datasource are connected to RDBMS then it’s easy for me to tackle the performance from the source directly but as the data are NoSQL under ElasticSearch so im quite clueless where could I improve on it.
Below are the profiler if you guys want to have a deep look on it.
873b63f5-3257-402e-944c-bd1740604849.zip (17.1 KB)
Thanks and appreciate the effort and thought on this.