Dremio Yarn Kerberos Errors

Hello,

I am setting up Dremio community edition (2.0.1rc) to connect to a Hadoop environment with Kerberos. My keytab and principal all workout but when attempting to setup the connection in Yarn as HortonWorks with secured cluster I receive the following error:

java.io.IOException: Failed on local exception: java.io.IOException: Couldn’t set up IO streams: java.lang.IllegalArgumentException: Failed to specify server’s Kerberos principal name

My dremio.conf is setup like…

paths: {

the local path for dremio to store data.

local: ${DREMIO_HOME}"/data/metadata"

the distributed path Dremio data including job results, downloads, uploads, etc

#dist: “pdfs://”${paths.local}"/pdfs"
}

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false

}

services.kerberos: {
principal: “dremio@MYREALM_EXAMPLE.com”,
keytab.file.path: “/dremio/dremio/conf/key_dremio”
}

Any help is appreciated.

Thanks

try to link your core-site.xml (and yarn-site.xml) to the directory you have dremio.conf in - essentially having them on Dremio classpath

1 Like

Thank you for the response. I have linked core-site.xml, yarn-site.xml, & hdfs-site.xml. This has given me a new error:

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found

I found an additional post regarding copy the JAR from (/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-common.jar) to jars/3rdparty. Then I removed the other version of this jar (Currently 2.8 as of this post) from 3rdparty directory and Dremio picked up the jar after restart.

I have one last Kerberos error but I believe that is a permissions error I need to workout within my environment.

Thank you for your help you put me on the right path.

1 Like

I moved beyond my Kerberos error (I needed to add dremio to HDFS /user). I can create a Yarn request for virtual workers, however the workers are stuck in pending and Yarn shows the following errors in the log.

14:12:05.270 [main-SendThread(localhost:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_141]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[na:1.8.0_141]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f]
14:12:05.484 [CompositeService STARTING-SendThread(localhost:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_141]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[na:1.8.0_141]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f]

Any insight is appreciated.

Seems like you are running into a ZK connectivity issue. May I recommend trying to use your HDP’s ZK instead of our internal one? You will need to add the below 2 lines to your conf and restart…

zookeeper: “<ZOOKEEPER_HOST_1>:2181,<ZOOKEEPER_HOST_2>:2181”
services.coordinator.master.embedded-zookeeper.enabled: false

1 Like

You the man Anthony.

My worker is active but not available under the ‘Node Activity’ screen. I see an error on the Node Activity screen stating the following:

“Failed to create directory for spilling. Please check that the spill location is accessible and confirm read, write & execute permissions.”

I am some what confused regarding the location of this spill directory. When using Yarn does spill exist on Yarn server, HDFS volume, or Dremio coordinator? If it exists on HDFS volume can anyone give me an example of syntax?

Thanks

Spilling would be using local disk on Yarn servers, not HDFS, basically for performance reason (also, spilling doesn’t require strong durability guarantee).

The default spilling directory is the one configured in dremio.conf (on the coordinator node), under paths.spilling, but when configuration your Yarn cluster, you can specify another one. The documentation at https://docs.dremio.com/deployment/yarn-hadoop.html contains details about spilling, more specifically which path to choose, so hopefully you should be able to get it working.

1 Like

Awesome Laurent. That fixed that issue.

Now…I appear to be getting an ‘Unexpected error ocurred’ on the Node activity screen. When going through the server.log file on Master I see the following:

startTime: 1524594931386
provision_id: “container_1524065750213_0014_01_000002”
max_direct_memory: 12884901888
available_cores: 4
.
2018-04-24 18:38:44,309 [FABRIC-rpc-event-queue] INFO c.d.e.w.protector.ForemenWorkManager - A fragment status message arrived post query termination, dropping. Fragment [1:0] reported a state of CANCELLED.
2018-04-24 18:39:45,233 [2520860d-bd8e-9320-124f-fc3f2a3f3200:foreman] WARN c.d.exec.planner.logical.RexToExpr - Converting exact decimal into approximate decimal. Should be fixed once decimal is implemented.
2018-04-24 18:39:45,305 [FABRIC-rpc-event-queue] INFO c.d.exec.work.foreman.QueryManager - Fragment 2520860d-bd8e-9320-124f-fc3f2a3f3200:0:0 failed, cancelling remaining fragments.
2018-04-24 18:39:45,307 [FABRIC-rpc-event-queue] INFO query.logger - {“queryId”:“2520860d-bd8e-9320-124f-fc3f2a3f3200”,“schema”:“sys”,“queryText”:“select\n ‘green’ as status,\n nodes.hostname name,\n nodes.ip_address ip,\n nodes.fabric_port port,\n cpu cpu,\n memory memory \nfrom\n sys.nodes,\n (select\n hostname,\n fabric_port,\n sum(cast(cpuTime as float) / cores) cpu \n from\n sys.threads \n group by\n hostname,\n fabric_port) cpu,\n (select\n hostname,\n fabric_port,\n direct_current * 100.0 / direct_max as memory \n from\n sys.memory) memory \nwhere\n nodes.hostname = cpu.hostname \n and nodes.fabric_port = cpu.fabric_port \n and nodes.hostname = memory.hostname \n and nodes.fabric_port = memory.fabric_port \norder by\n name,\n port”,“start”:1524595185125,“finish”:1524595185306,“outcome”:“FAILED”,“username”:“admin”,“commandDescription”:“execute; query”}
2018-04-24 18:39:45,315 [FABRIC-rpc-event-queue] INFO c.d.e.w.protector.ForemenWorkManager - A node query status message arrived post query termination, dropping. Query [2520860d-bd8e-9320-124f-fc3f2a3f3200] from node address: “<DATA_NODE_HOST_WAS_HERE>”
user_port: -1
fabric_port: 46011
roles {
sql_query: false
java_executor: true
master: false
}

This error means nothing to me so any thoughts including other places to look for errors are highly appreciated.

You might get more information from the query profile (you should see the corresponding query in the Jobs page, if you choose to display Internal queries too).

Thanks for the reply laurent. Internal queries did give me more information. The following error was highlighted but the id running the application owns all directories on this drive. Any ideas?

Error:
IOException: Mkdirs failed to create /dremio/dremio/data/pdfs/results/.25205143-9c9c-305d-3c1e-1833fbdbf800-1524608699718

Regarding path:
1st dremio = mount
2nd dremio = symbolic link to dremio2.0.1

The only folder in /data is the ‘db’ folder.

Thanks

I didn’t realize when you posted your configuration, but the default distributed filesystem used internally by Dremio is named PDFS, which requires all nodes to be able to write on the same local path. Since you are using Yarn, and you also have a HDFS cluster, I would recommend switching to HDFS by changing paths.dist in dremio.conf to a directory all Dremio nodes would be able to write into.

Summary

  • I overlooked adding classpath to dremio-env. I did a fresh install, tested it came up with my dremio.conf setup and then I altered dremio-env and received the below error.
  • Additionally, I am not sure how I can turn impersonation on for dremio in HDP. I thought using Kerberos turned off that feature.
  • core-site.xml copied to /conf directory

Software Stack

  • Dremio - v2.0.1 Community - Yarn install with Kerberos authentication to Hadoop
  • HortonWorks - HDP-2.6.2.0-205 - Kerberos implemented

Error after adding hadoop class path to dremio-env:

Dremio Daemon Started as master
Wed Apr 25 16:46:15 UTC 2018 Terminating dremio pid 3177982
Wed Apr 25 16:46:32 UTC 2018 Starting dremio on <MY_HOSTNAME>
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257602
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
18/04/25 16:46:33 WARN util.GuavaPatcher: Unable to patch Guava classes.
javassist.CannotCompileException: [source error] elapsed(java.util.concurrent.TimeUnit) not found in com.google.common.base.Stopwatch
at javassist.CtNewMethod.make(CtNewMethod.java:79)
at javassist.CtNewMethod.make(CtNewMethod.java:45)
at com.dremio.exec.util.GuavaPatcher.patchStopwatch(GuavaPatcher.java:65)
at com.dremio.exec.util.GuavaPatcher.patch(GuavaPatcher.java:35)
at com.dremio.dac.daemon.DremioDaemon.(DremioDaemon.java:61)
Caused by: compile error: elapsed(java.util.concurrent.TimeUnit) not found in com.google.common.base.Stopwatch
at javassist.compiler.TypeChecker.atMethodCallCore(TypeChecker.java:723)
at javassist.compiler.TypeChecker.atCallExpr(TypeChecker.java:688)
at javassist.compiler.JvstTypeChecker.atCallExpr(JvstTypeChecker.java:157)
at javassist.compiler.ast.CallExpr.accept(CallExpr.java:46)
at javassist.compiler.CodeGen.doTypeCheck(CodeGen.java:242)
at javassist.compiler.CodeGen.compileExpr(CodeGen.java:229)
at javassist.compiler.CodeGen.atReturnStmnt2(CodeGen.java:598)
at javassist.compiler.JvstCodeGen.atReturnStmnt(JvstCodeGen.java:425)
at javassist.compiler.CodeGen.atStmnt(CodeGen.java:363)
at javassist.compiler.ast.Stmnt.accept(Stmnt.java:50)
at javassist.compiler.CodeGen.atStmnt(CodeGen.java:351)
at javassist.compiler.ast.Stmnt.accept(Stmnt.java:50)
at javassist.compiler.CodeGen.atMethodBody(CodeGen.java:292)
at javassist.compiler.CodeGen.atMethodDecl(CodeGen.java:274)
at javassist.compiler.ast.MethodDecl.accept(MethodDecl.java:44)
at javassist.compiler.Javac.compileMethod(Javac.java:169)
at javassist.compiler.Javac.compile(Javac.java:95)
at javassist.CtNewMethod.make(CtNewMethod.java:74)
… 4 more
Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at com.dremio.common.config.SabotConfig.create(SabotConfig.java:227)
at com.dremio.common.config.SabotConfig.create(SabotConfig.java:210)
at com.dremio.common.config.SabotConfig.create(SabotConfig.java:151)
at com.dremio.config.DremioConfig.create(DremioConfig.java:221)
at com.dremio.config.DremioConfig.create(DremioConfig.java:216)
at com.dremio.dac.server.DACConfig.newConfig(DACConfig.java:195)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:121)

dremio.conf

paths: {
local: ${DREMIO_HOME}"/data"
dist: “hdfs:///user/dremio/cache”
}

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false

}

services.kerberos: {
principal: “dremio@MY_REALM”,
keytab.file.path: “/dremio/dremio/conf/key_dremio”
}

zookeeper: “MY_ZOOKEEPER_INSTANCES:2181”,
services.coordinator.master.embedded-zookeeper.enabled: false

dremio-env

DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

DREMIO_CLASSPATH_USER_FIRST=/usr/hdp/2.6.2.0-205/hadoop/conf:/usr/hdp/2.6.2.0-205/hadoop/lib/:/usr/hdp/2.6.2.0-205/hadoop/.//:/usr/hdp/2.6.2.0-205/hadoop-hdfs/./:/usr/hdp/2.6.2.0-205/hadoop-hdfs/lib/:/usr/hdp/2.6.2.0-205/hadoop-hdfs/.//:/usr/hdp/2.6.2.0-205/hadoop-yarn/lib/:/usr/hdp/2.6.2.0-205/hadoop-yarn/.//:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/lib/:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/.//::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java-5.1.40-bin.jar:/usr/share/java/mysql-connector-java.jar:/usr/hdp/current/hadoop-mapreduce-client/:/usr/hdp/2.6.2.0-205/tez/:/usr/hdp/2.6.2.0-205/tez/lib/*:/usr/hdp/2.6.2.0-205/tez/conf

Again, any insight is appreciated. If this can’t be figured out I’ll go back to a non-yarn deployment of Dremio.

Thanks

The documentation states to add the Hadoop config to the classpath but not the whole hadoop classpath. Dremio comes with its own Hadoop version, and adding your cluster version might create classpath issues (like the one you experienced).

Laurent,

Thank you for your feedback. I am trying a few different classpaths in an attempt to get it working. Perhaps I am missing something obvious. Do you have a classpath you would recommend based off the the entire hadoop classpath I added in my previous reply? I am just iterating through them and receiving different errors at different places each time.

Thanks

Please try to add only hadoop config directory to your classpath and only if you did not link your *-site.xml files to dremio conf directory

Okay, two different implementations two different errors.

FIRST SETUP

  • No core-site.xml in conf

dremio.conf

No changes from previous post.

dremio-env

DREMIO_CLASSPATH_USER_FIRST=/usr/hdp/2.6.2.0-205/hadoop/:/usr/hdp/2.6.2.0-205/hadoop-hdfs/:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/:/usr/hdp/2.6.2.0-205/hadoop-yarn/
DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

ERROR

Wed Apr 25 20:25:26 UTC 2018 Starting dremio on MY_HOST
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257602
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at com.dremio.common.config.SabotConfig.create(SabotConfig.java:227)
at com.dremio.common.config.SabotConfig.create(SabotConfig.java:210)
at com.dremio.common.config.SabotConfig.create(SabotConfig.java:151)
at com.dremio.config.DremioConfig.create(DremioConfig.java:221)
at com.dremio.config.DremioConfig.create(DremioConfig.java:216)
at com.dremio.dac.server.DACConfig.newConfig(DACConfig.java:195)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:121)

SECOND SETUP

  • core-site.xml in conf

dremio.conf

No changes from previous post.

dremio-env

DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

ERROR

Wed Apr 25 20:27:43 UTC 2018 Starting dremio on MY_HOST
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257602
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Exception in thread “main” java.lang.NullPointerException: No Cluster Identity found
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:226)
at com.dremio.dac.daemon.DremioDaemon.checkVersion(DremioDaemon.java:89)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:125)

Second scenario sounds better.

Could you try to clean up content of:

and restart coordinator

Thanks for the quick turn around.

Commands

rm -rf data && mkdir data
bin/dremio start

ERROR

Wed Apr 25 20:43:10 UTC 2018 Starting dremio on MY_HOST
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257602
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
com.dremio.common.exceptions.UserException: Tried to access non-existent source [__jobResultsStore].
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:746)
at com.dremio.exec.catalog.CatalogServiceImpl.synchronize(CatalogServiceImpl.java:407)
at com.dremio.exec.catalog.CatalogServiceImpl.getPlugin(CatalogServiceImpl.java:813)
at com.dremio.exec.catalog.CatalogServiceImpl.getSource(CatalogServiceImpl.java:841)
at com.dremio.dac.daemon.DACDaemonModule$4.get(DACDaemonModule.java:381)
at com.dremio.dac.daemon.DACDaemonModule$4.get(DACDaemonModule.java:376)
at com.dremio.service.jobs.LocalJobsService.start(LocalJobsService.java:258)
at com.dremio.service.SingletonRegistry$AbstractServiceReference.start(SingletonRegistry.java:137)
at com.dremio.service.ServiceRegistry.start(ServiceRegistry.java:74)
at com.dremio.service.SingletonRegistry.start(SingletonRegistry.java:33)
at com.dremio.dac.daemon.DACDaemon.startServices(DACDaemon.java:171)
at com.dremio.dac.daemon.DACDaemon.init(DACDaemon.java:177)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:131)

2nd & 3rd Run - no changes to configurations
ERROR

Wed Apr 25 20:45:01 UTC 2018 Starting dremio on MY_HOST
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257602
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Exception in thread “main” java.lang.NullPointerException: No Cluster Identity found
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:226)
at com.dremio.dac.daemon.DremioDaemon.checkVersion(DremioDaemon.java:89)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:125)

Could you try to delete

and restart again - but this time take a look at not just server.out but also into server.log
It feels like some directories on hdfs failed to be created and it causes those failures.

Which actually brings to the point:

hdfs:///user/dremio/cache

Is not exactly valid NameNode URI
Should it be something like:
hdfs://namenode_hostname:port/user/dremio/cache

yufeldman,

This worked. Dremio starts/stops without problems. I see the files made in the HDFS location(I cleared them before starting Dremio) and /data only has the db folder which seems correct.

Provisioning Yarn seems to hang and not move forward. The following error is given.

ERROR

java.io.IOException: Failed on local exception: java.io.IOException: Couldn’t set up IO streams: java.lang.IllegalArgumentException: Failed to specify server’s Kerberos principal name; Host Details : local host is: “DREMIO-COORD-HOST/HOST-IP”; destination host is: “ACTIVE_YARN_NODE”:8032;

How can I specify my Kerberos information to Yarn? I have already listed it in my dremio.conf file.

dremio.conf

paths: {
local: ${DREMIO_HOME}"/data"
dist: “hdfs://HDFS_NAME_NODE:8020/user/dremio/cache”
}

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false

}

services.kerberos: {
principal: “dremio@MY_REALM”,
keytab.file.path: “/dremio/dremio/conf/key_dremio”
}

zookeeper: “ZOOKEEPER_INSTANCE_1:2181,ZOOKEEPER_INSTANCE_2:2181, ect:2181”,
services.coordinator.master.embedded-zookeeper.enabled: false

Yarn Provision via GUI

Resource Manager: Hostname
Namenode: hdfs://hostname:8020
Spill: file:///tmp/dremio
queue: dremio