Dremio executors not being provisioned

mirelagrigoras · April 11, 2022, 9:35pm

Hello everybody!
We have a Dremio cluster with one master node and 5 executors that are provisioned through Yarn.
Because we had 2 of the executors not running, we stopped the them via UI (the Provisioning page in the Admin/Cluster section). Unfortunatelly, unlike other times, the status remained in “Stopping” and not actually stopping the executors (see the image below)

So far, restarting Dremio server, Yarn or Zookeeper did not help. We currently do not see the Dremio app being started in Yarn, so it did not actually start provisioning resources for a new one, but we do see the old one stopped. Also, we do not see any errors server.log or server.out related to executors, other than:
ERROR c.d.exec.work.foreman.AttemptManager - IllegalStateException: Error: No executors are available. (which we understand, because we don’t see them running in the UI).

Do you have any idea what could cause this behavior, of Dremio not being able to provision the executors, being stuck in some sort of “Stopping” state without a clear reason/error?

Thank you for your time!

balaji.ramaswamy · April 12, 2022, 6:49am

@mirelagrigoras What version of Dremio is this? Are you able to send the command ps -ef | grep dremio on the Hadoop data nodes and see if the Dremio process is still running?

mirelagrigoras · April 12, 2022, 7:00am

@balaji.ramaswamy Thank you for the reply!
The version we use is 4.1.8 Community Edition. Yes, on the edge (master node), the Dremio process is still running. On the other ones, we don’t and there is no Dremio application running in Yarn either currently, as it was, before the last restart.
Please let me know if I could provide any other useful information.

Thank you!

balaji.ramaswamy · April 13, 2022, 7:08am

@mirelagrigoras Have seen some corner case scenarios where the previous shutdown of the application left orphaned containers still running, this can cause a new provisioning to get stuck. That is why, wanted to make sure that no Dremio processes are running on any of the data nodes, even though the application itself is not running

Also when you say you have tried to stop the Dremio cluster? I assume you have tried to restart coordinator too? and still after the coordinator comes up and logging on to the UI still shows the engine as “Stopping”

Topic		Replies	Views
YARN Executor Fails to Shutdown and Tries to Keep Running Jobs	7	1469	April 8, 2020
Dremio Yarn cluster status stays as "Starting" Dremio University	1	1395	March 20, 2019
Dremio Dockers stop as soon as i started them	5	2127	January 17, 2019
Dremio FileNotFoundException error	5	1521	September 13, 2019
How to deploy executors in UI	26	5139	June 8, 2018

Dremio executors not being provisioned

Related topics