Disk performance for Dremio

Hello friends we are investigating best disk performance for dremio, we read that some metrics are important

  • sequential read/write
  • random read/write
  • IOPS
  • throughput

based on your experience If we have about 300GB of data which are more important metric? sequential or random ?
some minimal values for rest of metrics iops and throughput?
I know that more factors influence in a real life but i need some estimate

@dacopan Is this for your Dremio KVstore that needs to be on a SSD. SSD’s are usually random

https://www.baeldung.com/cs/sequential-vs-random-write

thank you @balaji.ramaswamy and for processing the data? and for iceberg, reflection, more optimal disk with how many IOPS or throughput?
how can check inside a job to check if a query is slowing by too few resources of IOPS or throughput?

also I have a common use case and some ideas to make a cache system to accelerate “dashboard adhoc queries”

@dacopan if you see TABLE_FUNCTION operator spend time on column wait time and even with c3 you are seeing high wait times then we need to look into the disk read performance

similarly if you see high wait times for external_sort or parquet_writer then we should look into disk write performance

1 Like

And if I have high values in “Avg Process Time” whats means? slow CPU? but job only shows “01.19s CPU used”

1a7d8ea9-a3be-46fb-af47-b68589c3632d.zip (276,0 KB)

@dacopan The scan is single threaded and I have to check why, your row count estimate is high enough to spin parallel threads. I assume you have several cores on that node. I see you are using only one node for both C+E. That should not prevent using multiple threads. Are all your scans across all your jobs always single threaded?

Can try the below 2?

alter session set planner.slice_target=1;
run query again

Send profile

Yes server has 48 cores

How can i check this in profile?

with slice_target =1
45ef5e89-c1c0-468c-b5de-54c90f9059e5.zip (198,6 KB)

withou modify slice_target
ef23b92f-9f0c-4cfa-b32d-eb35ba289246.zip (198,4 KB)

@dacopan Problem seems to have gone even with slice target at default, let us know if you have any other profiles that have high rows but still single threaded . Every 100K row count estimate (planning tab final physical transformation) should spin a thread up to 75% of number of cores