Dremio on AWS EMR

Hi

I am referring on https://docs.dremio.com/deployment/standalone-tarball.html for installation/setup.

The setting up on Dremio as a service on EMR is not possible using systemd/systemctl as those are not available on the AMI version (2017.09). Any documents specific for EMR cluster mode?

Thanks

We don’t have any document specific for EMR at this time, but tarball has a dremio.rc script under the share directory that you can use as an sysvinit (init.d) script.

Hi @laurent
Thanks! Should I just use in as a drop-in replacement ? /opt/dremio/share/dremio.rc ==> /etc/init.d/dremio ? I checked the System V script; will see if I can tinker up one. Do we’ve init scripts for System 5 (I know you mentioned there’re no documents, but, just checking on the scripts availability)

Also, I assume I have to install dremio tarball/users/dirs in each slave node. And, as per this config page https://docs.dremio.com/deployment/dremio-config.html I see master/coordinator/executor settings. For master, it’s clear; My EMR (testing) has 1 Master and 2 Slaves (core nodes), now, can I make each slaves as both coordinator and executor or I should not do it? Please let me know.

My EMR : 1 Master, 2 Core Nodes, 0 Task Nodes
dremio.conf for Slave/CoreNodes

services: {
coordinator.enabled: true,
coordinator.master.enabled: false,
executor.enabled: true
}
zookeeper: “emr-master:2181,emr-slave1:2181, emr-slave2:2181”

Thanks
@HLNA

Yes, you should be able to use dremio.rc as a drop-in init.d script. You also need to install a copy on each slave. As for roles, you can have a role playing both coordinator and master, and have all other nodes as executors (Theoretically, you can also make all nodes act both as coordinator and executor, but there’s little value here for a 3-node setup).

@laurent

I see two links for dremio.conf . The 1st one is concise, and belief it will default for other settings? because, the 2nd link has elaborate settings for fabric, web etc… should I set them or will I be able to access as long as port 9047 is open?

http://docs.dremio.com/deployment/dremio-config.html
https://docs.dremio.com/advanced-administration/configuration-files.html

Thanks
@HLNA

This one gives you full picture of all the config properties available (for advanced usage)

This one what you may need to configure to setup dremio

Thanks @yufeldman .

But, I’m using zookeeper cluster and I see two different entries for zookeeper and not sure if I’ve to integrate YARN as well?
#the zookeeper quorum for the cluster
zookeeper: “localhost:”${services.coordinator.master.embedded-zookeeper.port}
zk.client.session.timeout: 90000

In the simple config, it’s mentioned with only two zookeeper nodes; I assume, it’s 3 nodes including the master? or master is separately handled in master tag?
zookeeper: “:2181,:2181”

Thanks
@HLNA

You don’t need to use YARN while using external zookeeper

It is just an example - your quorum size may vary (while I agree two zookeepers is not a great example, as ZK quorum should be an odd number).

@yufeldman Thanks.

Attached is the complete screenshot of the dremio.conf… (EMR master node) So, I have Master tag with the IP address of the Master node (EMR), and in the zookeeper tag again, I have given all three nodes. Here master is meant for Zookeeper or for Dremio master? (assume Dremio master) because for zookeeper it should not matter as all calls are routed to the leader node right?

For rest of the nodes, I’m going to copy the same config but, making executors true. Please advice.

For now, I am not planning to use YARN … assuming all my cache/results etc should go to HDFS, I should be using it right?

Thanks,
@HLNA

@yufeldman updated previous comment with a better screenshot.

So,

  • for the Master Node (EMR), I have master/coordinator/web enabled and executor disabled
  • for the core Nodes (EMR), I have master/coordinator/web disabled and executor enabled

Also, from this page,https://docs.dremio.com/advanced-administration/zookeeper.html
I am not sure what this means “When no ZooKeeper path is specified, Dremio defaults to /dremio”

Please follow this link in general:

Other than that:

  1. in “paths” section leave only “local” and “dist”
  2. If you want to use hdfs for results, etc. you need to change “dist” to something like “hdfs://namenode:8020/pdfs”
  3. “services” section - please follow a link above to set it up for master, executor, etc.
  4. uncomment: services.coordinator.master.embedded-zookeeper.enabled: false (if you plan to follow advice on #3), as your “master” section is outdated

Thanks @yufeldman. Now, will start the server and see as below to see any errors/success.

  1. Will do
  2. Sure, but for now, I’ll use local path
  3. seems I’ve all the props set as required (i’ve extra for web and client-endpoint) … ? perhaps, will remove
  4. this is how, I have master tag under coordinator… (and commented section also seems to be the same)?
master: {
  enabled: true,
  name: 10.224.21.139,
  port: 45678,

  embedded-zookeeper: {
    enabled: false
  }
},

Thanks
@HLNA

This is the default (starting from 1.4). No “name” as you can see.

master: {
enabled: true,
# configure an embedded ZooKeeper server on the same node as master
embedded-zookeeper: {
enabled: true,
port: 2181,
path: ${paths.local}/zk
}
},

@yufeldman Sorry confused on embedded-zookeeper.enabled=true … I am using an external zookeeper (though it’s on the same cluster) … and as per the documentation, embedded-zookeeper.enabled should be set to “false” for non-embedded zookeeper right?
Or, are you saying that from 1.4, we are suppose to use only embedded-zookeeper?
W.r.t the name attribute, sure, I will remove it.

Will update you. Thanks for helping out.

yes - it is correct embedded zookeeper should be set to false in your case

Thanks a lot… I’'ve just started the servers on all 3 testing. Will update. Thanks for your help.

Hi @yufeldman with various modifications and trimming of the config, I still face the error

So, the dremio service is startable (and status shows as running)… zookeeper (2181) is accessible from master/slaves… but, other ports (9047, 31010, 45678) are not showing up in netstat . As a test, I checked if these ports are accesible using “nc -l” and accessing from other servers using telnet… and it worked fine.

Should I have to edit the /etc/hosts ? as I am using IP address in dremio.conf for zookeeper attribute.

the current error

2018-02-26 07:24:31,328 [main] INFO c.d.s.fabric.FabricServiceImpl - fabric service has 104857600 bytes reserved
2018-02-26 07:24:31,409 [main] INFO c.dremio.dac.daemon.DACDaemonModule - Internal user/group service is configured.
2018-02-26 07:24:32,149 [main] ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.net.UnknownHostException: : Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_161]
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_161]
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_161]
at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[na:1.8.0_161]
at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[na:1.8.0_161]
at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_161]
at org.apache.zookeeper.client.StaticHostProvider.(St

you probably better off using hostnames and most likely internal hostnames

Sure, @yufeldman… will try replacing internal hostnames (aws ec2 ) for zookeeper attribute and will try.

After replacing Zookeeper attribute across all dremio.conf file, still facing the error but for the internal hostname.

2018-02-26 08:42:58,339 [main] ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.net.UnknownHostException: ip-<ip_of_master>.us-west-1.compute.internal: Name or service not known

earlier, I even tried adding /etc/hosts as it did not work out… removed the entries and changed the dremio.conf across all servers.

More Info:- The EMR cluster has only private DNS name and Private IP address . The security group has been added to all the nodes.