Restart dremio service and Fail new election

caiounderscore · November 21, 2019, 5:03pm

Hello there,

I have two master dremium for high availability, but when restarted the service of primary node master, the second master node in stand-by fails while trying to take over as the new coordinator.

Dremio.conf

paths: {
  # the local path for dremio to store data.
  local: "/mnt/dremio-metadata"

dist: "s3a://vlr-dremio4-prd/dremio-storage/"
  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false,
  coordinator.master.embedded-zookeeper.enabled: true
}

Server log:
server.log.zip (4,9,KB)

balaji.ramaswamy · November 24, 2019, 6:00am

@caiounderscore

I see your secondary coordinator is unable to talk to the ZooKeeper, see below. Are you be able to ping and telnet to the Zookeeper on the configured port from the secondary coordinator?

2019-11-21 05:12:48,406 [zk-curator-2] INFO c.d.s.coordinator.zk.ZKClusterClient - Not able to get election status in 60000ms. Cancelling election…
2019-11-21 05:12:48,407 [zk-curator-2] ERROR ROOT - Dremio is exiting. Node lost its master status.
2019-11-21 05:12:55,713 [main] INFO com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 73ms.
Base Configuration:

caiounderscore · November 27, 2019, 3:32am

@balaji.ramaswamy

I don’t use ZooKeeper external, I use ZooKeeper embedded, in other words, my secondary coordinator will be the ZooKeeper.

After secondary coordinator fails while trying to take a new coordinator, the first coordinator retake a new master, but, if i will stop service dremio service of first coordinator or reboot machine, the secondary master retake fine for a new coordinator.

The main problem is if restart service dremio of primary master, the secondary master fails when trying take over as the new coordinator.

balaji.ramaswamy · November 27, 2019, 5:28am

@caiounderscore

External ZK is a requirement for HA

https://docs.dremio.com/advanced-administration/high-availability.html

Thanks

Topic		Replies	Views
Zookeeper is not electing the slave node as master node Dremio University	4	2295	March 18, 2021
Issue with Standby Coordinator Not Taking Over as Master in Dremio Cluster	2	26	March 18, 2025
Failed to start services, daemon exiting	14	3092	April 2, 2018
Replacing old master node and dremio UI will not start	3	1303	December 17, 2018
Failure while starting dremio on executor	1	1175	October 10, 2018

Restart dremio service and Fail new election

Dremio.conf

Related topics