Exception while accessing Paraquet format tables from Hive

While trying to connect to the hive where tables are stored in Parquet format into HDFS, I am getting exception where the app is trying to access the datanode(i guess) with a port of 8020 whereas I don’t have configured a different port itself for the whole cluster of Hive, Hadoop and Spark. I have added the required, hive-site, core-site, yarn-site, mapred-site .xml files in the path /opt/dremio/plugins/connectors/hive2.d/conf as well.

I think it is taking the default port. I tried to the value with the port and tried removing the port from the core-site for the property fs.defaultFS, still it is taking the default 8020 port.

Exception trail:

2022-06-03 12:33:03,985 [out-of-band-observer] INFO query.logger - {“queryId”:“1d660081-7978-40a4-03d4-cfc427f70200”,“queryText”:“SELECT * FROM “XXXXXXXXXX DATALAKE POC.datalake_parquet_stg”.supplier_p”,“start”:1654259582786,“finish”:1654259583632,“outcome”:“FAILED”,“outcomeReason”:“RESOURCE ERROR: Call From HOSTNAME/192.168.15.121 to HOSTNAME_1.XXXXXXXXXX.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused\n\nSqlOperatorImpl HIVE_SUB_SCAN\nLocation 0:0:3\nSqlOperatorImpl HIVE_SUB_SCAN\nLocation 0:0:3\nFragment 0:0\n\n[Error Id: baf753d0-c760-459a-8389-96697ceb1a38 on HOSTNAME.XXXXXXXXXX.net:0]\n\n (java.net.ConnectException) Call From HOSTNAME/192.168.15.121 to HOSTNAME_1.XXXXXXXXXX.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused\n sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2\n sun.reflect.NativeConstructorAccessorImpl.newInstance():62\n sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45\n java.lang.reflect.Constructor.newInstance():423\n org.apache.hadoop.net.NetUtils.wrapWithMessage():833\n org.apache.hadoop.net.NetUtils.wrapException():757\n org.apache.hadoop.ipc.Client.getRpcResponse():1549\n org.apache.hadoop.ipc.Client.call():1491\n org.apache.hadoop.ipc.Client.call():1388\n org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():233\n org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():118\n com.sun.proxy.$Proxy162.getFileInfo():-1\n org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo():907\n sun.reflect.NativeMethodAccessorImpl.invoke0():-2\n sun.reflect.NativeMethodAccessorImpl.invoke():62\n sun.reflect.DelegatingMethodAccessorImpl.invoke():43\n java.lang.reflect.Method.invoke():498\n org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod():422\n org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod():165\n org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke():157\n org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce():95\n org.apache.hadoop.io.retry.RetryInvocationHandler.invoke():359\n com.sun.proxy.$Proxy163.getFileInfo():-1\n org.apache.hadoop.hdfs.DFSClient.getFileInfo():1666\n org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall():1576\n org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall():1573\n org.apache.hadoop.fs.FileSystemLinkResolver.resolve():81\n org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus():1588\n com.dremio.exec.hadoop.HadoopFileSystem.getFileAttributes():229\n com.dremio.exec.store.hive.exec.DremioFileSystem.getFileStatus():347\n com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():232\n com.dremio.exec.store.hive.exec.FileSplitParquetRecordReader.createInputStreamProvider():232\n com.dremio.exec.store.hive.exec.FileSplitParquetRecordReader.setup():293\n com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():85\n com.dremio.exec.store.HiveParquetCoercionReader.setup():128\n com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():281\n com.dremio.sabot.op.scan.ScanOperator.setupReader():251\n com.dremio.sabot.op.scan.ScanOperator.setup():237\n com.dremio.sabot.driver.SmartOp$SmartProducer.setup():563\n com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():79\n com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():63\n com.dremio.sabot.driver.SmartOp$SmartProducer.accept():533\n com.dremio.sabot.driver.StraightPipe.setup():102\n com.dremio.sabot.driver.StraightPipe.setup():102\n com.dremio.sabot.driver.StraightPipe.setup():102\n com.dremio.sabot.driver.Pipeline.setup():68\n com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():391\n com.dremio.sabot.exec.fragment.FragmentExecutor.run():273\n com.dremio.sabot.exec.fragment.FragmentExecutor.access$1400():94\n com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():709\n com.dremio.sabot.task.AsyncTaskWrapper.run():112\n com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():225\n com.dremio.sabot.task.slicing.SlicingThread.run():156\n Caused By (java.net.ConnectException) Connection refused\n sun.nio.ch.SocketChannelImpl.checkConnect():-2\n sun.nio.ch.SocketChannelImpl.finishConnect():717\n org.apache.hadoop.net.SocketIOWithTimeout.connect():206\n org.apache.hadoop.net.NetUtils.connect():533\n org.apache.hadoop.ipc.Client$Connection.setupConnection():700\n org.apache.hadoop.ipc.Client$Connection.setupIOstreams():804\n org.apache.hadoop.ipc.Client$Connection.access$3800():421\n org.apache.hadoop.ipc.Client.getConnection():1606\n org.apache.hadoop.ipc.Client.call():1435\n org.apache.hadoop.ipc.Client.call():1388\n org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():233\n org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke():118\n com.sun.proxy.$Proxy162.getFileInfo():-1\n org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo():907\n sun.reflect.NativeMethodAccessorImpl.invoke0():-2\n sun.reflect.NativeMethodAccessorImpl.invoke():62\n sun.reflect.DelegatingMethodAccessorImpl.invoke():43\n java.lang.reflect.Method.invoke():498\n org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod():422\n org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod():165\n org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke():157\n org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce():95\n org.apache.hadoop.io.retry.RetryInvocationHandler.invoke():359\n com.sun.proxy.$Proxy163.getFileInfo():-1\n org.apache.hadoop.hdfs.DFSClient.getFileInfo():1666\n org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall():1576\n org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall():1573\n org.apache.hadoop.fs.FileSystemLinkResolver.resolve():81\n org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus():1588\n com.dremio.exec.hadoop.HadoopFileSystem.getFileAttributes():229\n com.dremio.exec.store.hive.exec.DremioFileSystem.getFileStatus():347\n com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():232\n com.dremio.exec.store.hive.exec.FileSplitParquetRecordReader.createInputStreamProvider():232\n com.dremio.exec.store.hive.exec.FileSplitParquetRecordReader.setup():293\n com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():85\n com.dremio.exec.store.HiveParquetCoercionReader.setup():128\n com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():281\n com.dremio.sabot.op.scan.ScanOperator.setupReader():251\n com.dremio.sabot.op.scan.ScanOperator.setup():237\n com.dremio.sabot.driver.SmartOp$SmartProducer.setup():563\n com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():79\n com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():63\n com.dremio.sabot.driver.SmartOp$SmartProducer.accept():533\n com.dremio.sabot.driver.StraightPipe.setup():102\n com.dremio.sabot.driver.StraightPipe.setup():102\n com.dremio.sabot.driver.StraightPipe.setup():102\n com.dremio.sabot.driver.Pipeline.setup():68\n com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():391\n com.dremio.sabot.exec.fragment.FragmentExecutor.run():273\n com.dremio.sabot.exec.fragment.FragmentExecutor.access$1400():94\n com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():709\n com.dremio.sabot.task.AsyncTaskWrapper.run():112\n com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():225\n com.dremio.sabot.task.slicing.SlicingThread.run():156\n”,“username”:“abc”}

@lyashninan First the HDFS client will talk to the Namenode (not datanode) on port configured (usually 8020), Can you check with the rule is open to the NN on port 8020?

@balaji.ramaswamy - The dremio conf path(/opt/dremio/plugins/connectors/hive2.d/conf) is having the copy of the same hadoop and hive configs. Is there any other rule which needs to be provided?

hadoop.proxyuser.dremio.groups, hadoop.proxyuser.dremio.users with * value is also present in the core-site.xml

@lyashninan Do you have anything in the Hive source advanced properties? Does that match with the one in core-site.xml? Also do you have any xml files copied under the Dremio conf folder?

@balaji.ramaswamy - No, I have not added up anything in the advanced option from UI. Yes. I have coppied the hive-site, core-site, yarn-site, mapred-site, xml files in the /opt/dremio/plugins/connectors/hive2.d/conf path.

@lyashninan Can you please compare with your server side core-site and see if the Namenode values are the same?