Connect Dremio with YARN (ResourceManager) and HDFS (NameNode) in High Availability mode

Hello,

We deployed Dremio, on a cluster Hadoop cluster where we have the Namenode and the Resource Manager in High Availability mode (via Zookeeper).

So, two questions :

  • How must we declare the “Dist Path” and also the Namenode url in the Provisioning windows ?
  • How do we declare the Resource Manager in the Provisioning windows ?

The documentation (here:Deploy Dremio Executors on YARN), does not provide how to manage with Namenode and the Resource Manager deployed in High Availability.

Cheers

I suggest to put currently active RM and NN in YARN provisioning. Dremio will interact with RM and NN during launching containers. After that your RM/NN HA should take over and make sure YARN application is running smoothly.

Hi !

Thanks for your answer.

In fact, it was what we done during the configuration of Dremio: we set the active Namenode and the active ResourceManager.

The question is there already a way to use the Zookeeper key in Dremio configuration file ?

If not we will create a script that request Zookeeper to get the active Namenode and the active ResourceManager in order to modify the Dremio configuration file, before starting (the coordinator node(s)). This script must also updates the “YARN Executor” (Provisioning screen) before starting workers.

Finally, we will have to launch this script each time the active Namenode and active ResourceManager change.

Cheers

You can reference your config files on Dremio master classpath
In dremio-env add/change following:

export DREMIO_CLASSPATH_USER_FIRST=<path to your …/etc/hadoop> or whatever conf directory you use

You don’t need anything special on executor nodes, as Dremio runs within YARN containers provisioned by your Hadoop distro and should be using your RM/NN HA as any other YARN application running within your Hadoop cluster.