External ZooKeeper Question

Having some trouble setting up the external zookeeper quorum for HA with Dremio.

I downloaded the vanilla zk tarball from Apache site. The only change I made was to the config file.
zoo.cfg:

the maximum number of client connections.

server1=<zk1.server>:2888:3888
server2=<zk2.server>:2888:3888
server3=<zk3.server>:2888:3888

dremio.conf on each of 2 coordinators:
paths: {

the local path for dremio to store data.

local: ${DREMIO_HOME}"/data"

the distributed path Dremio data including job results, downloads, uploads, etc

#dist: “pdfs://”${paths.local}"/pdfs"
}

services: {
coordinator.enabled: true,
coordinator.master.enabled: true
coordinator.master.embedded-zookeeper.enabled: false,
coordinator.master.embedded-zookeeper.port: 2181,
coordinator.web.enabled: true,
coordinator.web.port: 9047,
coordinator.web.auth.type: “ldap”,
coordinator.web.auth.ldap_config: “ad.json”,
coordinator.client-endpoint.port: 31010,
executor.enabled: false,
fabric.port: 45678
}

zookeeper: “<zk1.server>:2181,<zk2.server>:2181,<zk3.server>:2181”

I can access the Dremio Coordinator just fine but no executors connect.

dremio.conf on each executor:
zookeeper: “<zk1.server>:2181,<zk2.server>:2181,<zk3.server>:2181”

paths: {

the local path for dremio to store data.

local: ${DREMIO_HOME}"/data"

the distributed path Dremio data including job results, downloads, uploads, etc

#dist: “pdfs://”${paths.local}"/pdfs"
}

services: {
coordinator.enabled: false,
coordinator.master.enabled: false,
executor.enabled: true
}

Executor logs just say:
2018-05-06 14:23:14,990 [main] INFO c.d.d.s.exec.MasterStatusListener - Waiting for master

Could you try following:

  1. Give full stack trace of your exception from executors
  2. Can you access zookeeper server(s) from executors nodes?
    try to "telnet zk1.server 2181" (and other zk servers) from executors nodes
  3. Do you see coordinator accessing external zookeeper?
    try to "telnet zk1.server 2181" (and other zk servers) from coordinator node

FYI - following is not exactly number of client connections - it is number of zk servers.
Also I believe you should use notation: server.N=, and not serverN=

@yufeldman - Yep thanks for the help on the ZK config. I see the connections being established to the coordinators. But the executors start and then fail so the connections are closed. May not be a ZK issue now.
Communication is all ok. That was the first thing I checked. And I can see the initial connection to ZK.

Exec dremio.log:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257570
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 257570
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.lang.RuntimeException: Failure while attempting to create com.dremio.service.namespace.NamespaceServiceImplWithAuth$Factory.
at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:427)
at com.dremio.service.BinderImpl.lookup(BinderImpl.java:109)
at com.dremio.service.BinderImpl$DeferredProvider.get(BinderImpl.java:83)
at com.dremio.exec.server.ContextService.newSabotContext(ContextService.java:214)
at com.dremio.exec.server.EnterpriseContextService.newSabotContext(EnterpriseContextService.java:111)
at com.dremio.exec.server.EnterpriseContextService.newSabotContext(EnterpriseContextService.java:35)
at com.dremio.exec.server.ContextService.start(ContextService.java:159)
at com.dremio.exec.server.EnterpriseContextService.start(EnterpriseContextService.java:103)
at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
at com.dremio.dac.daemon.NonMasterSingletonRegistry.start(NonMasterSingletonRegistry.java:54)
at com.dremio.dac.daemon.DACDaemon.startServices(DACDaemon.java:171)
at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:177)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:131)
Caused by: java.lang.RuntimeException: Failure while attempting to create com.dremio.service.usergroup.UserGroupServiceImpl.
at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:427)
at com.dremio.service.BinderImpl.lookup(BinderImpl.java:109)
at com.dremio.service.BinderImpl$FinalResolver.getImplementation(BinderImpl.java:388)
at com.dremio.service.BinderImpl$InjectableReference$3.apply(BinderImpl.java:424)
at com.dremio.service.BinderImpl$InjectableReference$3.apply(BinderImpl.java:421)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at com.google.common.collect.Iterators.addAll(Iterators.java:362)
at com.google.common.collect.Lists.newArrayList(Lists.java:160)
at com.google.common.collect.Iterables.toCollection(Iterables.java:337)
at com.google.common.collect.Iterables.toArray(Iterables.java:315)
at com.google.common.collect.FluentIterable.toArray(FluentIterable.java:474)
at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:425)
… 12 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:421)
… 24 more
Caused by: java.lang.NullPointerException: Unknown store creator com.dremio.service.users.SimpleUserService$UserGroupStoreBuilder
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:250)
at com.dremio.datastore.RemoteKVStoreProvider.getStore(RemoteKVStoreProvider.java:55)
at com.dremio.service.users.SimpleUserService.(SimpleUserService.java:96)
at com.dremio.service.usergroup.UserGroupServiceImpl.(UserGroupServiceImpl.java:31)
… 29 more

The last time I saw a similar error was when one was using our MapR build files but not connecting to MapR components. Can you confirm exact Dremio file you downloaded and some more context about your env please?

I am using dremio-enterprise-2.0.1-201804132205050000-10b1de0.tar.gz.

We are running an external ZP quorum, we coordinators and executors connecting to ZPs. Everything running on Azure VMs.

Hi @kalmira - Since you are on the enterprise edition and are a Dremio customer, I would like to redirect you to our enterprise support and portal. This way, we can make sure you get the attention and resources you need to resolve any issues. Someone will be following up with you shortly!