Hey, everyone!
We have a Dremio cluster setup provisioned through YARN and we are seeing some very high load on some of our Hadoop nodes as many services run on them. We were wondering if there is a way to force Dremio to avoid those nodes from becoming Dremio workers.
Thank you!
@mirelagrigoras You can use the UI Blacklisting nodes, IN the Node activity page, toggle 'Use for Execution"
@balaji.ramaswamy Thank you for your reply!
Actually, we use Dremio Community version on a standalone Hadoop cluster and it seems like our Node activity page looks quite different. In addition, the nodes on which YARN resources are allocated for Dremio are chosen randomly out of the nodes in the cluster (we have a number of Dremio workers which is around 25% of the number of the nodes in the Hadoop cluster and when the Dremio workers are chosen for the YARN app, they are chosen randomly at this moment, without any control to avoid particular nodes).
Is there a way to add, in the Dremio configuration, a node label that would force the YARN queue resources to be used for a particular group of nodes that have that label associated? I am looking for something similar to the way the YARN queue name is specified. Is there a way to configure the following through Dremio?
spark.yarn.executor.nodeLabelExpression=dremio
spark.yarn.queue=dremio
@mirelagrigoras The concept of labels does not exists on the Dremio side but if your Hadoop flavor is HDP or CDP (later version), you can tie the Dremio application to a set of nodes