Cloned AWS RHEL EC2 Dremio Cluster (Master and 1 Executor) will not start

After making a clone of a Dremio Cluster (Master and 1 Executor) using AWS EC2.
I am unable to start Dremio.

I get this error;

Thu Nov 8 16:09:39 UTC 2018 Starting dremio on ip-10-204-122-40.ec2.internal
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29930
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
/opt/dremio/bin/dremio: line 106: /var/run/dremio/dremio.pid: Permission denied

Verified Dremio did not start
[root@ip-10-204-122-40 dremio]# sudo service dremio status
dremio not running.
[root@ip-10-204-122-40 dremio]#

I updated the host in /etc/dremio/dremio.conf with the new host/IP for both nodes prior to start.

Here were my clone steps

  1. Created AMI of Dremio Master Node EC2 RHEL instance
  2. Created AMI of Dremio Executor Node EC2 RHEL instance
  3. Launched new EC2 Instance using AMI of Dremio Master Node EC2 RHEL instance
  4. Launched another new EC2 Instance using AMI of Dremio Executor Node EC2 RHEL instance
  5. Updated /etc/dremio/dremio.conf with the new host/IP on cloned Dremio Master Node EC2 RHEL instance
  6. Updated /etc/dremio/dremio.conf with the new host/IP on cloned Dremio Executor Node EC2 RHEL instance

Which node are you getting the above posted error in? Can you share your dremio.conf for both nodes?
FYI you can fix the permission error with a chmod or chown

I am getting this error on the Executor node.

Master Node dremio.conf

paths: {

the local path for dremio to store data.

local: “/var/lib/dremio”

the distributed path Dremio data including job results, downloads, uploads, etc

#dist: “pdfs://”${paths.local}"/pdfs"
}

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false
}

zookeeper: “10.204.122.42:2181”

Executor node dremio.conf

paths: {

the local path for dremio to store data.

local: “/var/lib/dremio”

the distributed path Dremio data including job results, downloads, uploads, etc

#dist: “pdfs://”${paths.local}"/pdfs"
}

services: {
coordinator.enabled: false,
coordinator.master.enabled: false,
executor.enabled: true
}

zookeeper: “10.204.122.42:2181”

I get a more detailed error on the Master Node saying “Address already in use”

Thu Nov 8 16:09:31 UTC 2018 Starting dremio on ip-10-204-122-42.ec2.internal
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29930
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
/opt/dremio/bin/dremio: line 106: /var/run/dremio/dremio.pid: Permission denied
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.lang.RuntimeException: java.net.BindException: Address already in use
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at com.dremio.dac.daemon.ZkServer.init(ZkServer.java:111)
at com.dremio.dac.daemon.ZkServer.start(ZkServer.java:74)
at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
at com.dremio.service.ServiceRegistry.start(ServiceRegistry.java:74)
at com.dremio.service.SingletonRegistry.start(SingletonRegistry.java:33)
at com.dremio.dac.daemon.DACDaemon.startPreServices(DACDaemon.java:170)
at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:180)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:131)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:117)
at com.dremio.dac.daemon.ZkServer$ZkEmbeddedServer.run(ZkServer.java:142)
at java.lang.Thread.run(Thread.java:748)

Can you try the below

  1. Make sure Dremio isn’t running on all the nodes. Verify process is stopped via ps -ef | grep dremio. If it is running, manually kill it
  2. Only set zookeeper: "10.204.122.42:2181" in conf on the executor nodes. I assume that IP refers to the master node’s IP, yes?
  3. Start master coordinator first, then coordinators 1 by 1