Number of Threads

tylercmp · March 24, 2021, 6:17pm

Hi,

I have Dremio deployed to an EKS cluster and have a couple question:

Is there a way to configure the number of threads used?
Any best practices on this configuration?
What is the default number of threads used?

Thanks

balaji.ramaswamy · March 25, 2021, 7:48am

@tylercmp

When you say threads, is it for the scans? # of threads depends on 3 factors

Number of input splits
Number of cores per executor (all executors should have same number of cores)
Estimated number of rows on the scan

We should leave it to Dremio to decide, are you facing a performance issue? If yes, kindly share the query profile

tylercmp · March 25, 2021, 7:34pm

My cluster is set up with 2 coordinators and 3 executors. My use case is small, low latency queries at a high frequency. At a moderate load I was seeing good performance and low resource utilization (CPU and Memory). After increasing to a higher load, I’m seeing degraded performance, but poor resource utilization.

Moderate Load Query Profiles:
a5476b14-1dff-46be-ad3a-b1c1f56ba912.zip (14.0 KB)

High Load Query Profiles:
07bd3b77-cb6f-4168-a7f2-0d398426cbf5.zip (28.1 KB)

High Load CPU/Memory:

tylercmp · March 25, 2021, 7:35pm

Additional Query Profiles:

Moderate Load Query Profiles:
544ace0c-e972-47ae-b691-a9e32d11106d.zip (12.1 KB)

High Load Query Profiles:
084ce920-1649-4333-9fd2-0bd943b2c7fe.zip (13.9 KB)

balaji.ramaswamy · March 26, 2021, 5:55am

@tylercmp

The queries are completing in 135ms, what is your expectation?

alex.shi · December 29, 2021, 9:00am

2min.zip (2.5 MB)
@balaji.ramaswamy Can you help me optimize this query? I don’t understand the sqlprofile, Please! TKS!

balaji.ramaswamy · January 3, 2022, 5:38am

@alex.shi Here is where the time is spent

01-xx-01 ARROW_WRITER 8s wait time on IO, are you writing results to local disk or a distributed storage like S3/HDFS? When writing to a distributed storage, this wait time is sometimes expected

01-xx-03 PROJECT has a SETUP time of 27s, this is due to the Gandiva expression getting evaluated, if you re run the query, it should go away. If it is still there then we can look at increasing the cache size but for that we need to make sure we have enough memory from the OS available

02-xx-00 HASH_PARTITION_SENDER and 02-xx-01 PROJECT ~ 34s on SETUP, the PROJECT is again due to the above Gandiva expression build time

There a few FILTER (like 06-xx-03 FILTER) operators that have taken ~ 5s each on SEUP for the same Gandiva setup time,

Phases 1,2 and 14 are high on parallelism and this cause some CPU contention as shown under sleep time, are there other queries running at the same time?

alex.shi · January 4, 2022, 3:07am

Thank you for your feedback, and NOW I will answer your questions:

I write results to HDFS.
I have one coordination node and nineteen execution nodes.
The Server physical memory of the coordination node is 125GB, and I set DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=110000, DREMIO_MAX_HEAP_MEMORY_SIZE_MB= 80000.
The Server physical memory of the execution node is 256GB. and I set DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=120000, DREMIO_MAX_HEAP_MEMORY_SIZE_MB= 60000.
I disabled swap, but when I run the top command to check the memory, I found that the physical memory usage was small and the virtual memory usage was large. Physical memory usage is low
There is only one query at a time. I don’t know why is the concurrency high.

I look forward to your answers!@balaji.ramaswamy

alex.shi · January 12, 2022, 7:35am

I have to say that the lack of feedback/very slow response from Dremio.
I found the sqlprofile really hard to understand and not well documented.

balaji.ramaswamy · January 13, 2022, 7:57am

@alex.shi

Apologies for the delay, was there a reason to have such a high heap memory? Dremio execution all happens on direct memory and hence on the executor you can set the below values

comment out below 2,
#DREMIO_MAX_MEMORY_SIZE_MB=
#DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=

enable below 2
DREMIO_MAX_HEAP_MEMORY_SIZE_MB= 8192
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=112640 # we can increase this later if required

The coordinator is heap intensive, so the below values,
comment out below 2,
#DREMIO_MAX_MEMORY_SIZE_MB=
#DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=

enable below 2
DREMIO_MAX_HEAP_MEMORY_SIZE_MB= 16384
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=16384

If the coordinator or executors runs out of heap or has long GC pauses, we can enable histograms to find out why and then take next steps

alex.shi · January 13, 2022, 9:16am

Thank you for your feedback！！
I reconfigured according to this parameter, and the performance is no better.
I’d like to ask you one more question，since HDFS input is slow, I have disabled the “enable local caching for hdfs” option to see if there is a way to specify caching to local disk instead of HDFS.Maybe that’ll give me a little bit of a performance boost.@balaji.ramaswamy

balaji.ramaswamy · January 18, 2022, 4:16pm

Caching should help with the wait times incurred

Topic		Replies	Views
Dremio vs. database - how best to control JDBC parallelism?	3	1941	August 14, 2020
Dremio using low memory	1	1006	January 23, 2019
How to get all Description of Support Key?	7	2955	January 14, 2023
Optimization and scaling question	2	1827	April 16, 2019
Dremio Aggregate reflections go OOM	4	1967	January 17, 2018

Number of Threads

Related topics