How to improve query response times?

ravi · July 15, 2020, 6:44pm

Hi Everyone,

We’re evaluating dremio on a medium sized dataset (40gb), below are some more details on this setup.

Machine config: 1 coordinator (16gb 4cpu), 2 worker nodes (64gb 16cpu each), 600GB ssd each.
Data Source: 40gb csv file residing in a S3 bucket
Reflections: Raw reflection + aggregation reflection (on pdfs)
Worker node memory config: heap=10GB, direct=48gb
Memory config in admin panel: 15GB for large queries and 5gb for small queries (approx)

For this performance check we’ve devised a range of aggregation queries, query combinations include grouping on 1,2,3 dimension + a date, operation on a single measure, with and without any dimension/date filters

captured execution times:
ranging from 2sec - 1min, when queries are run sequencially
ranging from 2sec - 4mins when run parallelly

We have a few questions:

Are we missing any crucial setting which is degrading the overall query performance?
When we run the same query multiple times, the subsequent query executions take exactly same amount of time, shouldn’t the Cloud Cache kickin after a few runs? (i checked and it is enabled in the source settings though)
What could be a reasonable number in terms of query execution time on this setup.

Attaching a few profiles of slow running queries for your reference.

3m3s.zip (34.7 KB) 9m24s.zip (51.9 KB) 51s.zip (30.5 KB) 1m-47s.zip (33.3 KB)

Thanks,
Teja

Topic		Replies	Views
How to Optimize Query Performance for Large Datasets in Dremio Dremio Cloud	1	446	June 21, 2024
How to speed up dremio	8	3420	August 1, 2018
Query was cancelled planning time exceeded 60 seconds	3	1996	June 7, 2018
Disk Space Utilisation and Query Execution Time	1	1641	May 25, 2018
Evaluating Dremio	3	2101	May 17, 2018

How to improve query response times?

Related topics