Failed Zookeeper with k8s

Hi,
I’m trying to deploy dremio on a kubernetes cluster, but I’m having problems with zookeeper and I’m not able to solve it.

Send error message here

ZOO_MY_ID=3
ZOO_SERVERS=server.1=zk-0.zk-hs.dremio.svc.cluster.local:2888:3888;2181 server.2=zk-1.zk-hs.dremio.svc.cluster.local:2888:3888;2181 server.3=zk-2.zk-hs.dremio.svc.cluster.local:2888:3888;2181 
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2023-04-03 14:07:32,615 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@177] - Reading configuration from: /conf/zoo.cfg
2023-04-03 14:07:32,623 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@431] - clientPort is not set
2023-04-03 14:07:32,623 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@444] - secureClientPort is not set
2023-04-03 14:07:32,623 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@460] - observerMasterPort is not set
2023-04-03 14:07:32,624 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@477] - metricsProvider.className is org.apache.zookeeper.metrics.impl.DefaultMetricsProvider
2023-04-03 14:07:32,672 [myid:3] - INFO  [main:o.a.z.s.DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2023-04-03 14:07:32,674 [myid:3] - INFO  [main:o.a.z.s.DatadirCleanupManager@79] - autopurge.purgeInterval set to 12
2023-04-03 14:07:32,680 [myid:] - INFO  [PurgeTask:o.a.z.s.DatadirCleanupManager$PurgeTask@139] - Purge task started.
2023-04-03 14:07:32,684 [myid:3] - INFO  [main:o.a.z.j.ManagedUtil@46] - Log4j 1.2 jmx support not found; jmx disabled.
2023-04-03 14:07:32,684 [myid:3] - INFO  [main:o.a.z.s.q.QuorumPeerMain@152] - Starting quorum peer, myid=3
2023-04-03 14:07:32,701 [myid:3] - INFO  [main:o.a.z.s.ServerMetrics@64] - ServerMetrics initialized with provider org.apache.zookeeper.metrics.impl.DefaultMetricsProvider@2a65fe7c
2023-04-03 14:07:32,703 [myid:] - INFO  [PurgeTask:o.a.z.s.p.FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2023-04-03 14:07:32,707 [myid:3] - INFO  [main:o.a.z.s.a.DigestAuthenticationProvider@47] - ACL digest algorithm is: SHA1
2023-04-03 14:07:32,707 [myid:3] - INFO  [main:o.a.z.s.a.DigestAuthenticationProvider@61] - zookeeper.DigestAuthenticationProvider.enabled = true
2023-04-03 14:07:32,715 [myid:] - ERROR [PurgeTask:o.a.z.s.DatadirCleanupManager$PurgeTask@143] - Error occurred while purging.
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Unable to create data directory /datalog/version-2
	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:136)
	at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:80)
	at org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:141)
	at java.base/java.util.TimerThread.mainLoop(Unknown Source)
	at java.base/java.util.TimerThread.run(Unknown Source)
2023-04-03 14:07:32,716 [myid:] - INFO  [PurgeTask:o.a.z.s.DatadirCleanupManager$PurgeTask@145] - Purge task completed.
2023-04-03 14:07:32,721 [myid:3] - INFO  [main:o.a.z.s.ServerCnxnFactory@169] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2023-04-03 14:07:32,723 [myid:3] - WARN  [main:o.a.z.s.ServerCnxnFactory@309] - maxCnxns is not configured, using default value 0.
2023-04-03 14:07:32,726 [myid:3] - INFO  [main:o.a.z.s.NIOServerCnxnFactory@652] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 8 worker threads, and 64 kB direct buffers.
2023-04-03 14:07:32,736 [myid:3] - INFO  [main:o.a.z.s.NIOServerCnxnFactory@660] - binding to port /0.0.0.0:2181
2023-04-03 14:07:32,750 [myid:3] - INFO  [main:o.a.z.s.q.QuorumPeer@798] - zookeeper.quorumCnxnTimeoutMs=-1
2023-04-03 14:07:32,756 [myid:3] - INFO  [main:o.a.z.c.X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2023-04-03 14:07:32,758 [myid:3] - INFO  [main:o.a.z.s.p.FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2023-04-03 14:07:32,758 [myid:3] - ERROR [main:o.a.z.s.q.QuorumPeerMain@104] - Unable to access datadir, exiting abnormally
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Unable to create data directory /datalog/version-2
	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:136)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:178)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:137)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)
Unable to access datadir, exiting abnormally
2023-04-03 14:07:32,760 [myid:3] - INFO  [main:o.a.z.a.ZKAuditProvider@42] - ZooKeeper audit is disabled.
2023-04-03 14:07:32,762 [myid:3] - ERROR [main:o.a.z.u.ServiceUtils@48] - Exiting JVM with code 3

I also send the docker image that I’m trying to change to solve the problem:

FROM zookeeper:latest

USER root

RUN chmod -R a+rwx /conf /data /logs /datalog 

@emilaineborato

As the dremio service user, are you able to manually create the folder /datalog/version-2?

I tried to create it manually, but it still didn’t work.

Attempts…

FROM zookeeper:latest

ENV ZOO_CONF_DIR=/conf \
    ZOO_DATA_DIR=/data \
    ZOO_DATA_LOG_DIR=/datalog \
    ZOO_LOG_DIR=/logs 

USER zookeeper

RUN mkdir -p /datalog/version-2 && \
    chmod -R 777 /datalog
FROM zookeeper:latest

ENV ZOO_CONF_DIR=/conf \
    ZOO_DATA_DIR=/data \
    ZOO_DATA_LOG_DIR=/datalog \
    ZOO_LOG_DIR=/logs 

USER zookeeper

RUN mkdir -p /datalog/version-2

USER root
RUN chmod -R a+rwx /conf /data /logs /datalog

@emilaineborato Looks like that is the issue, you have to first fix the folder permissions outside of Dremio