we have a problem with some CPU usage staying high despite no jobs being processed in Dremio. The only way to remove this problem is to delete the executor so it can restart fresh. We coud not identify a specific job being the cause of it. It can start even if all jobs are successful. It has adverse impacts on new jobs performance, cancelling jobs is longer…
Eg. CPU usage in a K8s node with 1 executor with the problem (only 1 pod on this K8s worker, the pod/dremio-executor-0):
Thread dump when CPU usage is off (obtained running for i in seq -w 1 1 300; do jstack -l $i > ThreadDump$i.txt; sleep 1; done): TreadDump1.zip (6,9 Ko)
Server logs generated by the pod/dremio-executor-0 are definetly odd with huge amounts of Debug logs generated per minutes. Sample when we it has the CPU problem: executor0_logs.zip (3,1 Mo)
@allCag Very recently I found a similar behavior where queries took more time to execute and it turned out that the root debug logger was on, can you send us your logback.xml from the executor and we can validate
@allCag Can we go back to the default logback.xml, restart executors and after see if the behavior still stays the same? Although I do not see any debug enabled on the logback.xml you have sent
We observed less recurrence of the problem when reducing to 1 concurrent reflection (it still happens but not for all executors). For this test with the factory default logback.xml we stressed Dremio allowing 3 concurrent reflections.
@allCag Those messages can be ignored. I am not able to follow why you have the debug’s on? How is that related to concurrency settings? If you are saying at concurrency 3 CPU is always high, could be related. Can we try this