Disk performance for Dremio

dacopan · November 30, 2024, 10:41pm

Hello friends we are investigating best disk performance for dremio, we read that some metrics are important

sequential read/write
random read/write
IOPS
throughput

based on your experience If we have about 300GB of data which are more important metric? sequential or random ?
some minimal values for rest of metrics iops and throughput?
I know that more factors influence in a real life but i need some estimate

balaji.ramaswamy · December 1, 2024, 7:49pm

@dacopan Is this for your Dremio KVstore that needs to be on a SSD. SSD’s are usually random

https://www.baeldung.com/cs/sequential-vs-random-write

dacopan · December 1, 2024, 8:04pm

thank you @balaji.ramaswamy and for processing the data? and for iceberg, reflection, more optimal disk with how many IOPS or throughput?
how can check inside a job to check if a query is slowing by too few resources of IOPS or throughput?

also I have a common use case and some ideas to make a cache system to accelerate “dashboard adhoc queries”

balaji.ramaswamy · December 5, 2024, 6:09am

@dacopan if you see TABLE_FUNCTION operator spend time on column wait time and even with c3 you are seeing high wait times then we need to look into the disk read performance

similarly if you see high wait times for external_sort or parquet_writer then we should look into disk write performance

dacopan · December 6, 2024, 10:06pm

And if I have high values in “Avg Process Time” whats means? slow CPU? but job only shows “01.19s CPU used”

1a7d8ea9-a3be-46fb-af47-b68589c3632d.zip (276,0 KB)

balaji.ramaswamy · December 9, 2024, 1:42am

@dacopan The scan is single threaded and I have to check why, your row count estimate is high enough to spin parallel threads. I assume you have several cores on that node. I see you are using only one node for both C+E. That should not prevent using multiple threads. Are all your scans across all your jobs always single threaded?

Can try the below 2?

alter session set planner.slice_target=1;
run query again

Send profile

dacopan · December 9, 2024, 2:09am

Yes server has 48 cores

How can i check this in profile?

with slice_target =1
45ef5e89-c1c0-468c-b5de-54c90f9059e5.zip (198,6 KB)

withou modify slice_target
ef23b92f-9f0c-4cfa-b32d-eb35ba289246.zip (198,4 KB)

balaji.ramaswamy · December 9, 2024, 4:14pm

@dacopan Problem seems to have gone even with slice target at default, let us know if you have any other profiles that have high rows but still single threaded . Every 100K row count estimate (planning tab final physical transformation) should spin a thread up to 75% of number of cores

Topic		Replies	Views
Iceberg query performance with many parquet files Dremio University	12	1319	July 22, 2023
Low performance compared with Trino	3	4829	April 29, 2021
How to Optimize Query Performance for Large Datasets in Dremio Dremio Cloud	1	421	June 21, 2024
Slow preview/run on reasonably sized machine	3	1112	October 24, 2019
Improve S3 Parquet mapping and metadata update	2	1386	January 8, 2020

Disk performance for Dremio

Related topics