Hello friends we are investigating best disk performance for dremio, we read that some metrics are important
sequential read/write
random read/write
IOPS
throughput
based on your experience If we have about 300GB of data which are more important metric? sequential or random ?
some minimal values for rest of metrics iops and throughput?
I know that more factors influence in a real life but i need some estimate
thank you @balaji.ramaswamy and for processing the data? and for iceberg, reflection, more optimal disk with how many IOPS or throughput?
how can check inside a job to check if a query is slowing by too few resources of IOPS or throughput?
also I have a common use case and some ideas to make a cache system to accelerate “dashboard adhoc queries”
@dacopan if you see TABLE_FUNCTION operator spend time on column wait time and even with c3 you are seeing high wait times then we need to look into the disk read performance
similarly if you see high wait times for external_sort or parquet_writer then we should look into disk write performance
@dacopan The scan is single threaded and I have to check why, your row count estimate is high enough to spin parallel threads. I assume you have several cores on that node. I see you are using only one node for both C+E. That should not prevent using multiple threads. Are all your scans across all your jobs always single threaded?
Can try the below 2?
alter session set planner.slice_target=1;
run query again
@dacopan Problem seems to have gone even with slice target at default, let us know if you have any other profiles that have high rows but still single threaded . Every 100K row count estimate (planning tab final physical transformation) should spin a thread up to 75% of number of cores