Cannnot connect executor node to master coordinator

Hi,

I’m having issues setting up a simple cluster on EC2. I have a coordinator which also servers a executor, and an executor.

Symptoms: the web UI shows only the coordinator and not the executor node.

server log on executor keeps repeating [main] INFO c.d.d.s.exec.MasterStatusListener - Waiting for master [[master fqdn]]:45678

fqdn is a record on aws route 53 which points to the master. A dig +short [fqdn] shows the master’s actual private IP. I’ve also tested with mentionning the master’s private IP and master’s hostname (declared in executor’s /etc/hosts), with the same results.

Setup: coordinator and executor are c5.large instances running Amazon Linux. Dremio installed with latest rpm.
Both use the same security group, which is pretty lax and should not be an issue:
image

dremio-env is left untouched.
Coordinator dremio.conf:

master: {
  # the name of the master server. If this node matches the name, it starts the master service
  name: master.dremio.internal.swaven.com,
  port: 45678
}

paths: {
  # the local path for dremio to store data.
  local: "/data/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: true,
  executor.enabled: true
}

Executor dremio.conf:

master: {
  # the name of the master server. If this node matches the name, it starts the master service
  name: master.dremio.internal.swaven.com,
  port: 45678
}

paths: {
  # the local path for dremio to store data.
  local: "/data/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
}

services: {
  coordinator.enabled: false,
  executor.enabled: true
}

Instances can be pinged from each other.
I hav seen this thread and this thread, and tried what’s suggested in them, to no avail.

I’ve rebuilt both nodes from scratch a few times, with the same result each time. Is there anything I might have missed?

Thank you.

Does master:name from dremio.conf matches one from your error message or in error it is different?

Yes, that’s the same name. I forgot to redact it consistently.

Could you change following in <dremio_install>/conf/logback.xml or if you installed via rpm conf is probably in /etc/dremio/logback.xml

<logger name="com.dremio">
  <level value="${dremio.log.level:-info}"/>
</logger>

to (changing info to debug)

<logger name="com.dremio">
   <level value="${dremio.log.level:-debug}"/>
</logger>

On the executor node and restart executor. After that could you check for following statements in server.log
"Master node ‘masternode’ resolves to ‘address’.
It feels like ‘masternode’ and ‘address’ may not show the same.

Ok, that’s resolved now. You were on the right path, that was a network configuration error. On the master coordinator, I had associated its hostname with 127.0.0.1 in /etc/hosts. Hence it was starting on the local interface and could not be contacted by the executor.

Thanks for your help!