Hello community,
I am trying to deploy dremio using helm in EKS Cluster but the zookeeper fails to start most of the time. Below are my configs.
Zookeeper replicas = 3
Zookeeper logs:
zk-0.log:
2020-12-12 09:20:32,948 [myid:] - WARN [main:QuorumPeer$QuorumServer@173] - Failed to resolve address: zk-2.zk-hs.dremio.svc.cluster.local
java.net.UnknownHostException: zk-2.zk-hs.dremio.svc.cluster.local: Temporary failure in name resolution
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:166)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.(QuorumPeer.java:151)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:238)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:150)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
zk-1.log:
2020-12-12 09:21:04,659 [myid:2] - WARN [WorkerSender[myid=2]:QuorumPeer$QuorumServer@173] - Failed to resolve address: zk-2.zk-hs.dremio.svc.cluster.local
java.net.UnknownHostException: zk-2.zk-hs.dremio.svc.cluster.local: Temporary failure in name resolution
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:166)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:595)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)
at java.lang.Thread.run(Thread.java:748)
zk-2.log:
2020-12-12 09:22:12,837 [myid:3] - WARN [WorkerSender[myid=3]:QuorumPeer$QuorumServer@173] - Failed to resolve address: zk-1.zk-hs.dremio.svc.cluster.local
java.net.UnknownHostException: zk-1.zk-hs.dremio.svc.cluster.local: Temporary failure in name resolution
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:166)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:595)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)
at java.lang.Thread.run(Thread.java:748)
And the Interesting part is sometimes zookeeper starts without any issues.