Blacklisted nodes in yarn deployment

alvinIce · March 24, 2022, 1:24pm

Hello,
I have Dremio 19.3.0 (community edition) installed on premise using Yarn deployment and connected to an HDP cluster. But since few days we identified that for Dremio, Yarn has consider some nodes (use as executor nodes) of the cluster as blacklisted. This seems different from from “Blacklisted nodes in Dremio”.
Those nodes are blacklisted only for Dremio application on the cluster and are specified on the port 45454 of the host. As a consequence all executor currently have only one virtual core instead of 4 requested.
Can someone here explain me what generates this situation and if possible how to solve it?
Thanks

balaji.ramaswamy · March 24, 2022, 3:44pm

@alvinIce If Yarn has blacklisted the node, do you see that under Ambari? Is there a hardware or any other issue on that node? Do other Yarn applications like Spark,Hive etc use that node for query execution?

alvinIce · March 24, 2022, 4:09pm

In fact, multiple nodes are blacklisted, but on Ambari, they are all heathy.
And yes, on the nodes, other solution like Spark and Hive are also used.

balaji.ramaswamy · March 24, 2022, 5:31pm

@alvinIce Do you have many job failures?

alvinIce · March 25, 2022, 3:25pm

Yes, we do have lot of job failures of “BlockMissingException: Could not obtain block”

balaji.ramaswamy · March 25, 2022, 4:03pm

@alvinIce That is the reason Yarn is blacklisting, do we know why we are getting the blockmissingexception?

alvinIce · March 25, 2022, 4:28pm

On Ambari, no missing block and no unavailable host while we get the blaclisted nodes and the job failures on Dremio.
That’s why I don’t understand why Dremio is acting like that. And more precisely, based on what, yarn detects for Dremio (and only) that some hosts are unhealthy.
Do you think that killing Dremio containers directly fron yarn web UI can be a root cause? We did that to check if this king of situation happen, if Yarn will provide other resources automatically to Dremio. And effectively, it did. But, we identified the blacklisted nod after trying 3 times that operation…

balaji.ramaswamy · April 1, 2022, 10:48pm

@alvinIce Dremio only requests containers to RM, If RM is unable to provide the resources we need to see why on the RM side. On your Dremio server.log on the coordinator, do you see messages that Dremio requested a certain number of vcores but only got one? The resource manager log should tell us why you are getting less than requested

Topic		Replies	Views
Dremio Yarn provisioning Dremio University	3	1144	July 6, 2022
YARN Executor Fails to Shutdown and Tries to Keep Running Jobs	7	1469	April 8, 2020
Blacklisting Nodes	1	938	December 8, 2021
Dremio executors not being provisioned Dremio University	3	1359	April 13, 2022
Dremio yarn provisioning not using all the cores	1	1180	August 8, 2018

Blacklisted nodes in yarn deployment

Related topics