Dremio not starting - No Cluster Identity found

Hi guys.
I have Dremio installed on a cluster (1 coordinator, 1 executor). I’ve made the proper configurations, but I’m facing the following problem when I try to start the service:

1 - Using the Dremio’s user (sudoer), I ran sudo service dremio start
2 - After some seconds trying to execute, the following error appears in the server.out log:

Tue Nov  6 17:07:00 UTC 2018 Starting dremio on MY_INSTANCE.internal
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31370
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
/opt/dremio/bin/dremio: line 108: /var/run/dremio/dremio.pid: Permission denied
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.lang.NullPointerException: No Cluster Identity found
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:787)
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:178)
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:171)
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:103)

It’s strange that a permission denied message appears, even using a sudoer user like dremio. The same occurs when I try to execute the commando as root.

The second error No Cluster Identity found is what I’m not understanding. According to this thread and this this error happens when a empty db directory is created. I’ve checked this directory and it’s not empty.

Can someone help me with this case?

Hi @Paulo_Vasconcellos

If this is a fresh install, can you please remove the db folder configured in dremio.conf and try to start Dremio again?

Thanks
@balaji.ramaswamy

Hi, @balaji.ramaswamy!

I ran sudo service dremio start. At first time, I got the message below.

Tue Nov  6 19:51:14 UTC 2018 Starting dremio on MY_IP.internal
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31370
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
/opt/dremio/bin/dremio: line 108: /var/run/dremio/dremio.pid: Permission denied
No database found. Skipping upgrade

So, I tried to execute it again and got the same original error:

Tue Nov  6 19:52:30 UTC 2018 Starting dremio on MY_IP.internal
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31370
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
/opt/dremio/bin/dremio: line 108: /var/run/dremio/dremio.pid: Permission denied
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.lang.NullPointerException: No Cluster Identity found
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:787)
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:178)
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:171)
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:103)

Here’s my dremio.conf file:

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false
}
zookeeper: "COORDINATOR_IP:2181,EXECUTOR_IP:2181"

Try this

cd /var/lib/dremio
ls -ltrh
mv db db.old
sudo service dremio start

Done, but I got this message now:


Tue Nov  6 20:04:55 UTC 2018 Starting dremio on my_ip.internal
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31370
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
No database found. Skipping upgrade
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.lang.RuntimeException: Failure while attempting to create com.dremio.service.users.SimpleUserService.
        at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:427)
        at com.dremio.service.BinderImpl.lookup(BinderImpl.java:109)
        at com.dremio.service.BinderImpl$DeferredProvider.get(BinderImpl.java:83)
        at com.dremio.exec.server.ContextService.newSabotContext(ContextService.java:215)
        at com.dremio.exec.server.ContextService.start(ContextService.java:158)
        at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
        at com.dremio.service.ServiceRegistry.start(ServiceRegistry.java:74)
        at com.dremio.service.SingletonRegistry.start(SingletonRegistry.java:33)
        at com.dremio.dac.daemon.DACDaemon.startServices(DACDaemon.java:177)
        at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:183)
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:112)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.dremio.service.BinderImpl$InjectableReference.get(BinderImpl.java:421)
        ... 10 more
Caused by: java.lang.NullPointerException: Unknown store creator com.dremio.service.users.SimpleUserService$UserGroupStoreBuilder
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:864)
        at com.dremio.datastore.LocalKVStoreProvider.getStore(LocalKVStoreProvider.java:96)
        at com.dremio.service.users.SimpleUserService.<init>(SimpleUserService.java:96)
        ... 15 more

Also, I gave permission to the dremio group on /var/run/dremio to read and write in order to check if the permission denied message disappear. Does it sound right to do it?

Thank you in advance, @balaji.ramaswamy

Check if both servers can reach each other with telnet on 2181 and 31010 ports.

After every first startup attempt failure, you will need to wipe /var/lib/dremio/db
Also, I noticed in your conf you said zookeeper: "COORDINATOR_IP:2181,EXECUTOR_IP:2181"
You only need to put the master coordinator IP, not the individual executors. Therefore, please make the zookeeper change to all your nodes, wipe /var/lib/dremio/db on every node, then start them up 1 by 1 with coordinators then executors

Hey guys.
Thank you very much for your hep. I’ve solved the problem by reinstalling Dremio again. :slight_smile: