Set Up a Cluster on EC2

Hi There. I am starting out with Dremio. I have successfully set up a couple of EC2 instances running Dremio. One is set as coordinator (master) and executor, and one I have set as executor.

I can get the coordinator to run without any issues, but I’m baffled how to add the other executor to the cluster. I have tried setting the Zookeeper. The coordinator’s IP is 10.0.0.127. This is my config on the executor only:

paths: {
  # the local path for dremio to store data.
  local: ${DREMIO_HOME}"/data"
}

services: {
  coordinator.enabled: false,
  coordinator.master.enabled: false,
  executor.enabled: true
}
zookeeper: "10.0.0.127:2181"

When I look at the Web UI on the coordinator, I only see itself running. It runs fine.

Cheers.

Can you share logs on the executors side? Also please confirm ports 2181 & 45678 are open between all nodes. You can test port connection via telnet I am guessing it may be a networking issue

You need to setup the security group (in this example, sg-0f6f8070d363bcaec) to allow network traffic like this:
|Custom TCP Rule|TCP|9047|[your corporate network IP or range]|dremio UI
|SSH |TCP|22|[your corporate network IP or range]|SSH|
|Custom TCP Rule|TCP|31010|[your corporate network IP or range]|dremio client|
|Custom TCP Rule|TCP|45678|sg-0f6f8070d363bcaec (dremio)|dremio internode communication|
|Custom TCP Rule|TCP|2181|0.0.0.0/0|dremio zookeeper|
|Custom TCP Rule|TCP|2181|::/0|dremio zookeeper|

The last two entries are weird, as I would expect it to work with source = sg-0f6f8070d363bcaec, but then only the master is visible, but not the executors.
In order to make it work, I need to open the zookeeper port to the public (which is a big nono).
Any ideas on this?

I suspect it is a networking issue. I was previously unable to SSH from coordinator to executor, but now I can. Every TCP port should be open between them. In server.out on the executor, I see:

Wed Oct 24 23:16:19 UTC 2018 Starting dremio on ip-10-0-0-134.us-west-2.compute.internal
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7837
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7837
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
KVStore version is 2.1.6-201809161906440178-edb5b4d

I’m not sure where to look for Zookeeper / coordinator discovery logging. I think we are nearly there…

Also, what is the correct format for specifying the Zookeeper host? The documentation says use <host1> but is this a placehold that should be replaced with an actually host list 10.0.01 or with a host surrounded by brackets: <10.0.0.1>?

Looks like I got it!

To answer my own question, correct format of Zookeeper host is:

zookeeper: "10.0.0.127:2181"

and ensure you have turned off (false) coordinator services in your dremio.conf.

 services: {
    coordinator.enabled: false,
    coordinator.master.enabled: false,
    executor.enabled: true
}

zookeeper: "10.0.0.127:2181"

For the AWS Security groups on the cluster, I ended up using a self-referencing security group.

  1. Create a security group called DremioCluster. You’ll need an outbound rule, so add anything safe. You’ll change it in a second.
  2. Save the rule.
  3. Open the rule again for editing.
  4. You should narrow this, but it start with allowing “All Traffic”, and for source, select the current security group’s ID. (You are creating a self-referencing rule)
  5. Add all Dremio nodes to the DremioCluster group.

It may be useful to use K8S/Helm for what you’re doing: https://github.com/dremio/containers/tree/master/charts/dremio