I followed the steps mentioned here (https://docs.dremio.com/deployment/yarn-hadoop.html) to run dremio with yarn on AWS EMR.
While attempting to view data using hive source I get the following exception in /opt/dremio/log/server.log.
Caused by: java.lang.ClassNotFoundException: com.uber.hoodie.hadoop.HoodieInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_162]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_162]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) ~[na:1.8.0_162]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_162]
at java.lang.Class.forName0(Native Method) ~[na:1.8.0_162]
at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_162]
at com.dremio.exec.store.hive.DatasetBuilder.getInputFormatClass(DatasetBuilder.java:773) ~[dremio-hive-plugin-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
HoodieInputFormat is in our custom jar, which is also included in hive aux jars.
I can query the data using hive directly, but not through dremio.
I am not sure where this jar should be copied so dremio can find it. Any can help me with this?
You can copy the custom jar into <DREMIO_INSTALLTION_DIRECTORY>/jars/3rdparty/
One thing to note is: currently Dremio uses Hive client jars based on Hive 1.2.1. If the custom jar is depended on higher or lower versions of Hive version 1.2.1 there may be class not found, no such method etc. errors.
Thanks, that worked, I had to change my hive version too for it to work.
Now since the hive data is in s3 I am now getting the following error.
I have included needed s3 properties in core-site.xml and also while adding source to dremio. Where else should this be set?
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74) ~[hadoop-aws-2.8.0.jar:na]
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:94) ~[hadoop-aws-2.8.0.jar:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_162]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_162]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_162]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_162]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) ~[hadoop-common-2.8.0.jar:na]
at com.sun.proxy.$Proxy126.initialize(Unknown Source) ~[na:na]
at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:111) ~[hadoop-aws-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) ~[hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) ~[hadoop-common-2.8.0.jar:na]
at com.dremio.exec.store.dfs.FileSystemWrapper.get(FileSystemWrapper.java:129) ~[dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.store.hive.DatasetBuilder.addInputPath(DatasetBuilder.java:626) ~[dremio-hive-plugin-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
Assuming you are trying to access S3 backed Hive Tables on Amazon EMR, did you add the extra setting per our docs here? If yes and it still doesn’t work, you can also try to add fs.s3.awsAccessKeyId & fs.s3.awsSecretAccessKey in the connection str opts as well.