Failure while attempting to create SimpleUserService

Hi Dremio community , im trying to create a K8s cluster with Dremio + Zookeeper , but i have some issues in both Coordinator and Executor node. this is the information:

I have distributed storage using a S3 bucket. it works on master node, i even added the bucket as a source to Dremio. i have a fully working master node, i can even connect to the dashboard ( and watch the service run ).

The problem i get in the other nodes happens when i start the service. Logs :

Mon Apr  9 09:38:02 UTC 2018 Starting dremio on dremio-app-executor-rhv9j
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15738
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.lang.RuntimeException: Failure while attempting to create com.dremio.service.users.SimpleUserService.
        at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:427)
        at com.dremio.service.BinderImpl.lookup(BinderImpl.java:109)
        at com.dremio.service.BinderImpl$DeferredProvider.get(BinderImpl.java:83)
        at com.dremio.exec.server.ContextService.newSabotContext(ContextService.java:188)
        at com.dremio.exec.server.ContextService.start(ContextService.java:145)
        at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
        at com.dremio.dac.daemon.NonMasterSingletonRegistry.start(NonMasterSingletonRegistry.java:54)
        at com.dremio.dac.daemon.DACDaemon.startServices(DACDaemon.java:174)
        at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:180)
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:164)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:421)
        ... 9 more
Caused by: java.lang.NullPointerException: Unknown store creator com.dremio.service.users.SimpleUserService$UserGroupStoreBuilder
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:250)
        at com.dremio.datastore.RemoteKVStoreProvider.getStore(RemoteKVStoreProvider.java:55)
        at com.dremio.service.users.SimpleUserService.<init>(SimpleUserService.java:96)
        ... 14 more

my executor conf file (autogenerated by k8s) :

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads,etc
  #dist: "pdfs://"${paths.local}"/pdfs"
  dist: "s3a://$MY_BUCKET_S3"
}

services: {
  coordinator.enabled: false,
  coordinator.master.enabled: false,
  executor.enabled: true
}

zookeeper: "zookeeper:2181"

my coordinator conf file (also autogenerated by k8s) :

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads,etc
  #dist: "pdfs://"${paths.local}"/pdfs"
  dist: "s3a://$MY_BUCKET_S3"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: false,
  executor.enabled: false
}

zookeeper: "zookeeper:2181" 

all my nodes share the /etc/dremio/core-site.xml

dremio version : 1.4.9-201802191836310213-7195059
java version: jre1.8.0_131

uname -a: 4.4.115-k8s #1 SMP Thu Feb 8 15:37:40 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
my cluster is working on AWS using Kops ( so my pods are created in EC2 instances )

Hope i can solve this problem to be able to share my docker images and my pod template, so the community can deploy automatically a cluster with a working Dremio :smile:

Hi Luis,

When starting a cluster, it is important that the master node is started before the executor nodes (and vice-versa when shutting a cluster down). Is this something your K8 set-up is doing? If the executor nodes start and cannot find a master node, this will cause issues.

Christy

Could change it from just bucket path to path under bucket e.g.: dist: "s3a://${MY_BUCKET_S3}/pdfs"
also try to use {} for any variable you use.

Hello Christy and tanks for the answer, for the moment i dont control the order, but i get this message even if i try to start manually after the master node is ON

Hi yufeldman and thanks for the idea, i already do this, i called dremio (s3a://${MY_BUCKET_s3}/dremio) and added to the path. i can say that this works because Master node creates directories when the cluster its created

Do you still get the same issue if you use the internal zookeeper?

what do you mean with internal zookeeper?

Dremio comes with an internal zookeeper, that it uses by default. I’m guessing you’re using an external zookeeper.

yup, external, but there is no documentation about this error code, like : if your nodes don’t connect , you’ll get that error

Hey Luis,

I’m not 100% sure this is the issue. My usual trick for debugging issues is to work things back to a known good state and then move forward. I would have expected your simple configuration to have worked, and at first glance it seems that the zookeeper configuration is something that is a potential area to be looked at.

But I do agree that sometimes our errors aren’t entirely helpful.

Thanks again Christy for your answer :smiley: , this is why this should be explained better, its my first time configuring Zookeeper to work with Dremio, but i cant find any guide or step-by-step to configure Zookeeper with Dremio correctly.

What i did, just to test is that i deleted the line of “zookeeper” from the dremio conf and i can start the service in all 3 nodes, i think it cant connect to the master and thats why it doesnt work . i dont know if i have to configure the Zookeeper manually on master to start it internal but for now, master node does not detect the executor node ( and i guess either the coordinator ).

If you are using an external ZooKeeper (which it seems like you are), you need to the following to the dremio.conf

services.coordinator.master.embedded-zookeeper.enabled: false

Maybe this link may help you - https://docs.dremio.com/advanced-administration/zookeeper.html

hi and thanks anthony, im going to try this, this should be in master node or in all nodes?

This does not work, i tried both ways, i even tried with the internal ip of the 3 pods of the zookeeper to be sure that is connecting …

i found a bit more of information in the server.log file:

2018-04-11 09:26:17,529 [main] INFO  c.d.datastore.RemoteKVStoreProvider - Starting RemoteKVStoreProvider
2018-04-11 09:26:17,590 [main] INFO  c.d.s.fabric.FabricConnectionManager - [FABRIC]: No connection active, openin
g new connection to dremio-app-master-vjjsn:45678.
2018-04-11 09:26:17,687 [FABRIC-2] ERROR com.dremio.exec.rpc.BasicClient - Failed to establish connection
java.util.concurrent.ExecutionException: java.nio.channels.UnresolvedAddressException
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:54) ~[netty-common-4.0.49.Final.jar:4.0
.49.Final]
	at com.dremio.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient
.java:202) [dremio-services-base-rpc-1.4.9-201802191836310213-7195059.jar:1.4.9-201802191836310213-7195059]
	at com.dremio.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient
.java:189) [dremio-services-base-rpc-1.4.9-201802191836310213-7195059.jar:1.4.9-201802191836310213-7195059]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) [netty-common-4.0.49.F
inal.jar:4.0.49.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) [netty-common-4.0.4
9.Final.jar:4.0.49.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) [netty-common-4.0.49.F
inal.jar:4.0.49.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) [netty-common-4.0.49.Final.
jar:4.0.49.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:241) [netty-t
ransport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1226) [netty-tr
ansport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:539) [n
etty-transport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:524) [netty-t
ransport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) [netty-tr
ansport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:539) [n
etty-transport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:524) [netty-t
ransport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) [netty-transport-4.0.49.Fin
al.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:539) [n
etty-transport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:524) [netty-t
ransport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:506) [netty-t
ransport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:970) [netty-transport-4.0.4
9.Final.jar:4.0.49.Final]
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:214) [netty-transport-4.0.49.Final.jar:4.
0.49.Final]
	at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166) [netty-transport-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) [net
ty-common-4.0.49.Final.jar:4.0.49.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [netty-transport-4.0.49.Final.jar:4.0.49.F
inal]
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [netty-com
mon-4.0.49.Final.jar:4.0.49.Final]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_131]
Caused by: java.nio.channels.UnresolvedAddressException: null
	at sun.nio.ch.Net.checkAddress(Unknown Source) ~[na:1.8.0_131]
	at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[na:1.8.0_131]
	at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:83) ~[netty-common-4.0.49.Final.jar:4.0.49.Fi
nal]
	at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:80) ~[netty-common-4.0.49.Final.jar:4.0.49.Fi
nal]
	at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_131]
	at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:80) ~[netty-common-4.0.49.Final.jar:4.0.49.
Final]
	at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:243) ~[netty-transport-4.0
.49.Final.jar:4.0.49.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:205) [netty-t
ransport-4.0.49.Final.jar:4.0.49.Final]
	... 17 common frames omitted

ERROR com.dremio.exec.rpc.BasicClient - Failed to establish connection
java.util.concurrent.ExecutionException: java.nio.channels.UnresolvedAddressException

is likely a networking issue. Is this deployed on AWS? Can the boxes ping and resolve each other successfully? Here is a list of ports that need to be opened - https://docs.dremio.com/deployment/system-requirements.html#network

Hello and thanks Anthony, i think that zookeeper is taking the name of the kubernetes pod as a hostname and the coordinator is trying to use it , i might have to modify the Zookeeper conf, i will continue my research and ill get back to you when i get an answer ( or with working template xD )

Hello Again Community, the problem that we are getting is the following :slight_smile: :

1- Dremio Master is in a pod, he has the information (host) of who is (ip) the pod.
2- When Dremio Master connects to zookeeper, the information he writes on Zookeeper is the information of the pod (master.pod)
3- when the Coordinator/Executor tries to connect to the Master they connect to Zookeeper and get the host (master.pod)
4- the master pod host is not a valid information because nor the Coordinator nor the Executor have the information about the master.pod (they have no host that tells the master.pod ip )

To be able to test the solution what we did was to add the ip of the master node in the host file in the Coordinator/executor. this works fine, the problem is that we don’t seem to find where can we set the information that Dremio uses to set the host information on Zookeeper. our thoughts about this subject is to add a configuration parameter in Dremio so we can set the host (instead of using the Master pod, to use a Master service, this way all the other pods have the information about him ).

If you know what file Dremio uses to tell Zookeeper who he is , this will be great :smiley: , in any case, thanks for your help

try to set registration.publish-host property in dremio.conf

many thanks yufeldman, i tested this and worked, we are going to create an image that can take a dns as a parameter so we can added it to the dremio conf, we are really close to have something working :smiley:

1 Like