Dremio Connection With Remote Hive

Hi All,

Am new to dremio community. i have to connect my lappy dremio to my hive server. but its giving error. where as if dremio and hive both on the same server. than its working fine.

Hive Server IP : 192.168.0.72
Dremio installed on : 192.168.0.192

We definitely allow Hive to be on a different server. In fact, most (if not all) of our deployments working with Hive has it on another box. What exactly does your error say? I can’t see the full error from the screenshot (need to scroll more to the right). I am guessing you may be running into some networking connection issue where you may have to open/allow ports.

Hi Anthony

Tried everything … please refer the below error

PLAN ERROR: Failure while retrieving metadata for table hive.dremio.sample.

Sql Query SELECT *
FROM hive.dremio.sample

(java.net.ConnectException) Call From vikas/192.168.0.192 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():62
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():423
org.apache.hadoop.net.NetUtils.wrapWithMessage():801
org.apache.hadoop.net.NetUtils.wrapException():732
org.apache.hadoop.ipc.Client.getRpcResponse():1485
org.apache.hadoop.ipc.Client.call():1427
org.apache.hadoop.ipc.Client.call():1337
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():227
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():116
com.sun.proxy.$Proxy116.getFileInfo():-1
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo():787
sun.reflect.NativeMethodAccessorImpl.invoke0():-2
sun.reflect.NativeMethodAccessorImpl.invoke():62
sun.reflect.DelegatingMethodAccessorImpl.invoke():43
java.lang.reflect.Method.invoke():498
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod():398
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod():163
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke():155
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce():95
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke():335
com.sun.proxy.$Proxy117.getFileInfo():-1
org.apache.hadoop.hdfs.DFSClient.getFileInfo():1700
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall():1436
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall():1433
org.apache.hadoop.fs.FileSystemLinkResolver.resolve():81
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus():1433
org.apache.hadoop.fs.FileSystem.exists():1436
com.dremio.exec.store.dfs.FileSystemWrapper.exists():942
com.dremio.exec.store.hive.DatasetBuilder.addInputPath():626
com.dremio.exec.store.hive.DatasetBuilder.buildSplits():442
com.dremio.exec.store.hive.DatasetBuilder.buildIfNecessary():285
com.dremio.exec.store.hive.DatasetBuilder.getDataset():204
com.dremio.exec.store.SimpleSchema.getTableFromSource():366
com.dremio.exec.store.SimpleSchema.getTableWithRegistry():284
com.dremio.exec.store.SimpleSchema.getTable():406
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():67
org.apache.calcite.jdbc.CalciteSchema.getTable():219
org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom():117
org.apache.calcite.prepare.CalciteCatalogReader.getTable():106
org.apache.calcite.prepare.CalciteCatalogReader.getTable():73
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.SqlSelect.validate():208
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
com.dremio.exec.planner.sql.SqlConverter.validate():188
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():165
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():153
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
com.dremio.exec.work.foreman.AttemptManager.run():293
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
Caused By (java.net.ConnectException) Connection refused
sun.nio.ch.SocketChannelImpl.checkConnect():-2
sun.nio.ch.SocketChannelImpl.finishConnect():717
org.apache.hadoop.net.SocketIOWithTimeout.connect():206
org.apache.hadoop.net.NetUtils.connect():531
org.apache.hadoop.net.NetUtils.connect():495
org.apache.hadoop.ipc.Client$Connection.setupConnection():681
org.apache.hadoop.ipc.Client$Connection.setupIOstreams():777
org.apache.hadoop.ipc.Client$Connection.access$3500():409
org.apache.hadoop.ipc.Client.getConnection():1542
org.apache.hadoop.ipc.Client.call():1373
org.apache.hadoop.ipc.Client.call():1337
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():227
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():116
com.sun.proxy.$Proxy116.getFileInfo():-1
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo():787
sun.reflect.NativeMethodAccessorImpl.invoke0():-2
sun.reflect.NativeMethodAccessorImpl.invoke():62
sun.reflect.DelegatingMethodAccessorImpl.invoke():43
java.lang.reflect.Method.invoke():498
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod():398
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod():163
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke():155
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce():95
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke():335
com.sun.proxy.$Proxy117.getFileInfo():-1
org.apache.hadoop.hdfs.DFSClient.getFileInfo():1700
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall():1436
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall():1433
org.apache.hadoop.fs.FileSystemLinkResolver.resolve():81
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus():1433
org.apache.hadoop.fs.FileSystem.exists():1436
com.dremio.exec.store.dfs.FileSystemWrapper.exists():942
com.dremio.exec.store.hive.DatasetBuilder.addInputPath():626
com.dremio.exec.store.hive.DatasetBuilder.buildSplits():442
com.dremio.exec.store.hive.DatasetBuilder.buildIfNecessary():285
com.dremio.exec.store.hive.DatasetBuilder.getDataset():204
com.dremio.exec.store.SimpleSchema.getTableFromSource():366
com.dremio.exec.store.SimpleSchema.getTableWithRegistry():284
com.dremio.exec.store.SimpleSchema.getTable():406
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():67
org.apache.calcite.jdbc.CalciteSchema.getTable():219
org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom():117
org.apache.calcite.prepare.CalciteCatalogReader.getTable():106
org.apache.calcite.prepare.CalciteCatalogReader.getTable():73
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.SqlSelect.validate():208
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
com.dremio.exec.planner.sql.SqlConverter.validate():188
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():165
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():153
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
com.dremio.exec.work.foreman.AttemptManager.run():293
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

could you try to include core-site.xml on dremio classpath - linking to dremio conf directory?
Or may be try to include namenode as an advanced property while setting hive source.
It feels like Dremio can’t connect to HDFS as it does not know namenode host/port.

Looks like there’s a reference to localhost somewhere in hive or hdfs that gets passed to dremio and dremio tries to use that hostname to connect to the namenode.

Could you post a screenshot of the source configuration within dremio and a copy of core-site and hdfs-site?

Here is screenshot while setting up Hive Source:

By clicking on “Add Property” you should be able to add property defining NameNode location.

As far as having core-site.xml on Dremio classpath - it depends on your installation - you either copy core-site.xml into “conf” directory of your Dremio installation or link core-site.xml to that location.
Not sure if screenshot is going to help here (it is on command line)

i have copy and paste the core-sites.xml the file in dremio conf folder. but its still not working

Hi Vikas,

Could you upload a screenshot of the source configuration within dremio and copies of core-site and hdfs-site?

Hi jason
please check attached files

files.zip (3.1 KB)

it looks looks like the issue is that fs.defaultFS is set to hdfs://localhost:9000 so localhost is the host that Dremio gets from the configuration to access HDFS. Change this property in Ambari/Cloudera Manager to be hdfs://192.168.0.72:9000, restart Hadoop or the necessary services then remove the source from dremio and re-add it

Hi Jason,

I tried all the steps, and removed all hardcodings to localhost. I have configured all as per the above in core-site.xml .

Now I get a different kind of error :
DATA_READ ERROR: Failed to initialize Hive record reader Dataset split key hdfs://sandbox.kylo.io/user/hive/warehouse/test.db/t_customer__0
HIVE_SUB_SCAN Location 0:0:7 SqlOperatorImpl HIVE_SUB_SCAN Location 0:0:7 Fragment 0:0 [Error Id: 2df0ce62-cba8-4339-ae9f-d43da058ccfc on localhost:31010] (org.apache.hadoop.hdfs.BlockMissingException) Could not obtain block: BP-82876869-127.0.0.1-1527287107078:blk_1073742969_2157 file=/user/hive/warehouse/test.db/t_customer/20180925_162010_00014_utnbv_090cff90-1bf9-47df-bad7-65a2c9c52a1c org.apache.hadoop.hdfs.DFSInputStream.refetchLocations():1052 org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode():1036 org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode():1015 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo():647 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy():926 org.apache.hadoop.hdfs.DFSInputStream.read():982 java.io.DataInputStream.readFully():195 java.io.DataInputStream.readFully():169 org.apache.hadoop.hive.ql.io.RCFile$Reader.init():1462 org.apache.hadoop.hive.ql.io.RCFile$Reader.():1363 org.apache.hadoop.hive.ql.io.RCFile$Reader.():1343 org.apache.hadoop.hive.ql.io.RCFileRecordReader.():100 org.apache.hadoop.hive.ql.io.RCFileInputFormat.getRecordReader():58 com.dremio.exec.store.hive.exec.HiveRCFileReader.internalInit():52 com.dremio.exec.store.hive.exec.HiveAbstractReader.setup():198 com.dremio.sabot.op.scan.ScanOperator$1.run():189 com.dremio.sabot.op.scan.ScanOperator$1.run():185 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1836 com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():185 com.dremio.sabot.op.scan.ScanOperator.setupReader():177 com.dremio.sabot.op.scan.ScanOperator.setup():163 com.dremio.sabot.driver.SmartOp$SmartProducer.setup():560 com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():79 com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():63 com.dremio.sabot.driver.SmartOp$SmartProducer.accept():530 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.StraightPipe.setup():102 com.dremio.sabot.driver.Pipeline.setup():58 com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():344 com.dremio.sabot.exec.fragment.FragmentExecutor.run():234 com.dremio.sabot.exec.fragment.FragmentExecutor.access$800():86 com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():591 com.dremio.sabot.task.AsyncTaskWrapper.run():107 com.dremio.sabot.task.slicing.SlicingThread.run():102

Note : the above works if I have the worker disabled and master executing. The master is running on the VM with hadoop, and worker is on another VM with just vanilla linux.

hi @cbhatnagar7101

could you provide core-site.xml, hdfs-site.xml, and server.log from the executor that is on the other VM (not the hadoop VM)? and also the profile from the failed query.

Thanks,
Jason

Failed to initialize Hive record reader Dataset split key can be caused by security issues. Can you talk a little bit about your cluster’s security setup? Also, hive-site.xml would help too

hi Jason and Anthony,

my setup is simple. It is a single node VM from Kylo. I have disabled the firewalls to ensure no security issues create this. I have attached :

  • Core-Site.xml and Hive-Site.xml
  • Query Profiles
  • Server Log from executer

Thanks for the help.server.zip (53.5 KB)

Thanks guys. I figured it out. It was the way mysql was behaving when it was being targeted to log in with hive. I used a new hosts entry to tackle this, and bring hive services up with another one. It worked.