Unable to retrieve metadata info from HDFS using Dremio

Hi,

I have connected to HDFS connection and try to use the csv data from particular location.I a getting below error.

Sorry to hear you have issues.
Could you please look at the profile under “Jobs” Menu for this particular job? And look under “Error”/“Verbose Error” within profile page?

I came across the same issue where I can browse to the file in Dremio however I cannot open it / create it as a source. The file is there in HDFS and is not corrupted, I am able to open it form the command line.
Below are the screenshots for the HDFS connection and the full exception.

Does anyone know how to solve it?

     PLAN ERROR: Failure while retrieving metadata for table HDFSLocal.dremiotest."sf-salaries-2014.csv".

Sql Query SELECT *
FROM HDFSLocal.dremiotest.“sf-salaries-2014.csv”

(java.lang.RuntimeException) java.lang.RuntimeException: com.dremio.common.exceptions.ExecutionSetupException: Failure while setting up text reader for file hdfs://localhost:8020/dremiotest/sf-salaries-2014.csv
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDatasetInternal():177
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.buildDataset():108
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDataset():119
com.dremio.exec.store.SimpleSchema.getTableFromSource():366
com.dremio.exec.store.SimpleSchema.getTableWithRegistry():284
com.dremio.exec.store.SimpleSchema.getTable():406
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():67
org.apache.calcite.jdbc.CalciteSchema.getTable():219
org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom():117
org.apache.calcite.prepare.CalciteCatalogReader.getTable():106
org.apache.calcite.prepare.CalciteCatalogReader.getTable():73
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.SqlSelect.validate():208
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
com.dremio.exec.planner.sql.SqlConverter.validate():188
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():165
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():153
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
com.dremio.exec.work.foreman.AttemptManager.run():293
java.util.concurrent.ThreadPoolExecutor.runWorker():-1
java.util.concurrent.ThreadPoolExecutor$Worker.run():-1
java.lang.Thread.run():-1
Caused By (java.lang.RuntimeException) com.dremio.common.exceptions.ExecutionSetupException: Failure while setting up text reader for file hdfs://localhost:8020/dremiotest/sf-salaries-2014.csv
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.getBatchSchema():148
com.dremio.exec.store.dfs.FileSystemDatasetAccessor$1.run():151
com.dremio.exec.store.dfs.FileSystemDatasetAccessor$1.run():146
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():-1
org.apache.hadoop.security.UserGroupInformation.doAs():1807
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDatasetInternal():145
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.buildDataset():108
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDataset():119
com.dremio.exec.store.SimpleSchema.getTableFromSource():366
com.dremio.exec.store.SimpleSchema.getTableWithRegistry():284
com.dremio.exec.store.SimpleSchema.getTable():406
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():67
org.apache.calcite.jdbc.CalciteSchema.getTable():219
org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom():117
org.apache.calcite.prepare.CalciteCatalogReader.getTable():106
org.apache.calcite.prepare.CalciteCatalogReader.getTable():73
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.SqlSelect.validate():208
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
com.dremio.exec.planner.sql.SqlConverter.validate():188
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():165
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():153
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
com.dremio.exec.work.foreman.AttemptManager.run():293
java.util.concurrent.ThreadPoolExecutor.runWorker():-1
java.util.concurrent.ThreadPoolExecutor$Worker.run():-1
java.lang.Thread.run():-1
Caused By (com.dremio.common.exceptions.ExecutionSetupException) Failure while setting up text reader for file hdfs://localhost:8020/dremiotest/sf-salaries-2014.csv
com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():142
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():62
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.getBatchSchema():137
com.dremio.exec.store.dfs.FileSystemDatasetAccessor$1.run():151
com.dremio.exec.store.dfs.FileSystemDatasetAccessor$1.run():146
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():-1
org.apache.hadoop.security.UserGroupInformation.doAs():1807
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDatasetInternal():145
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.buildDataset():108
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDataset():119
com.dremio.exec.store.SimpleSchema.getTableFromSource():366
com.dremio.exec.store.SimpleSchema.getTableWithRegistry():284
com.dremio.exec.store.SimpleSchema.getTable():406
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():67
org.apache.calcite.jdbc.CalciteSchema.getTable():219
org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom():117
org.apache.calcite.prepare.CalciteCatalogReader.getTable():106
org.apache.calcite.prepare.CalciteCatalogReader.getTable():73
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.SqlSelect.validate():208
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
com.dremio.exec.planner.sql.SqlConverter.validate():188
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():165
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():153
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
com.dremio.exec.work.foreman.AttemptManager.run():293
java.util.concurrent.ThreadPoolExecutor.runWorker():-1
java.util.concurrent.ThreadPoolExecutor$Worker.run():-1
java.lang.Thread.run():-1
Caused By (org.apache.hadoop.hdfs.BlockMissingException) Could not obtain block: BP-32082187-172.17.0.2-1517480669419:blk_1073742662_1838 file=/dremiotest/sf-salaries-2014.csv
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode():1019
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo():641
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy():918
org.apache.hadoop.hdfs.DFSInputStream.read():983
org.apache.hadoop.fs.FSDataInputStream.read():147
com.dremio.exec.store.dfs.FSDataInputStreamWrapper.read():112
com.dremio.exec.store.easy.text.compliant.TextInput.read():211
com.dremio.exec.store.easy.text.compliant.TextInput.updateBuffer():242
com.dremio.exec.store.easy.text.compliant.TextInput.start():160
com.dremio.exec.store.easy.text.compliant.TextReader.start():346
com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():137
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():62
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.getBatchSchema():137
com.dremio.exec.store.dfs.FileSystemDatasetAccessor$1.run():151
com.dremio.exec.store.dfs.FileSystemDatasetAccessor$1.run():146
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():-1
org.apache.hadoop.security.UserGroupInformation.doAs():1807
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDatasetInternal():145
com.dremio.exec.store.easy.EasyFormatDatasetAccessor.buildDataset():108
com.dremio.exec.store.dfs.FileSystemDatasetAccessor.getDataset():119
com.dremio.exec.store.SimpleSchema.getTableFromSource():366
com.dremio.exec.store.SimpleSchema.getTableWithRegistry():284
com.dremio.exec.store.SimpleSchema.getTable():406
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():67
org.apache.calcite.jdbc.CalciteSchema.getTable():219
org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom():117
org.apache.calcite.prepare.CalciteCatalogReader.getTable():106
org.apache.calcite.prepare.CalciteCatalogReader.getTable():73
org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
org.apache.calcite.sql.SqlSelect.validate():208
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
com.dremio.exec.planner.sql.SqlConverter.validate():188
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():165
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():153
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
com.dremio.exec.work.foreman.AttemptManager.run():293
java.util.concurrent.ThreadPoolExecutor.runWorker():-1
java.util.concurrent.ThreadPoolExecutor$Worker.run():-1
java.lang.Thread.run():-1

Hi @igreg

Can you check that the user configured to run Dremio also has access to file (can read its content)?

Did you try to define the formatting for the file? For sources that are not self-describing (ie, CSV), you define the formatting of the file (eg, delimiter, column names, line delimiter).

An error comes up when configuring the formatting for the file saying “Failure while attempting to retrieve metadata information for table HDFSLocal2.dremiotest.“sf-salaries-2014.csv”.”

I’m logged in as “admin” user in dremio and the file is owned by “admin” user in HDFS with full permissions to everyone:

[root@sandbox-hdp ~]# hdfs dfs -ls hdfs://172.17.0.2/dremiotest/sf-salaries-2014.csv

-rwxrwxrwx 1 admin hdfs 4267623 2018-04-23 14:10 hdfs://172.17.0.2/dremiotest/sf-salaries-2014.csv

Could you try to use “172.17.0.2” instead of “localhost” in HDFS source definition

1 Like

If permissions are correct, I then suspect some other issue too. Can you also share which version of HDP Sandbox you’re using?

I’ve also tried with 172.17.0.2 but no luck.

I ran fsck on the HDFS file system to check and it says the block does not exists. However if I check the block without the “Generation Stamp” (i.e 1838) the block exists. Below are the commands:

[root@sandbox-hdp ~]# hdfs fsck -blockId blk_1073742662_1838
Connecting to namenode via http://sandbox-hdp.hortonworks.com:50070/fsck?ugi=root&blockId=blk_1073742662_1838+&path=%2F
FSCK started by root (auth:SIMPLE) from /172.17.0.2 at Mon Apr 23 14:41:53 UTC 2018

Block blk_1073742662_1838 does not exist

[root@sandbox-hdp ~]# hdfs fsck -blockId blk_1073742662
Connecting to namenode via http://sandbox-hdp.hortonworks.com:50070/fsck?ugi=root&blockId=blk_1073742662+&path=%2F
FSCK started by root (auth:SIMPLE) from /172.17.0.2 at Mon Apr 23 15:12:07 UTC 2018

Block Id: blk_1073742662
Block belongs to: /dremiotest/sf-salaries-2014.csv
No. of Expected Replica: 1
No. of live Replica: 1
No. of excess Replica: 0
No. of stale Replica: 0
No. of decommissioned Replica: 0
No. of decommissioning Replica: 0
No. of corrupted Replica: 0

Here is the sandbox version:

[root@sandbox-hdp ~]# sandbox-version
Sandbox information:
Created on: 01_02_2018_10_47_41
Hadoop stack version: Hadoop 2.7.3.2.6.4.0-91
Ambari Version: 2.6.1.0-143
Ambari Hash: 2989989d67edacff7e9db702b4cf0c080556dddc
Ambari build: Release : 143
Java version: 1.8.0_161
OS Version: CentOS release 6.9 (Final)

Is there a go to overcome this issue?
It’s hard to convince someone to use if feature is unstable

@Hai_Pham are you using the latest HDP Sandbox as well?

Sorry for the delay folks, it just take sometimes a bit of time to do some testing.

My guess is that Dremio was installed outside of the sandbox, not inside. As far as I can tell, this won’t work out of the box for 2 reasons:

  • HDP 2.6 doesn’t create a port mapping for HDFS datanode
  • The IP used inside the sandbox (172.17.0.2) is not directly accessible from outside the sandbox.

As Dremio needs to connect directly to the datanode to read data, and since the IP address of the datanode returned by the namenode is incorrect (the block id is fine however), Dremio would try to connect and timeout.

The only workaround I can advise is to install Dremio inside the sandbox so networking is not an issue anymore. You might want then to update port forwarding so that you can access Dremio UI from outside the VM (by changing the HDP sandbox script or using a command like ssh -L9047:localhost:9047 root@sandbox.hortonworks.com -p2222)

You might also want to reach out to HortonWorks to see if the sandbox can be configured to enable full HDFS access from outside the sandbox.

1 Like

Old thread, but posting for posterity. I hit the same issue (Dremio logs show BlockMissingException) and I was able to fix my issue by forcing the HDFS client (Dremio in this instance) to connect to the data nodes by hostname rather than the internal Docker IPs. To do so, make sure to configure your /etc/hosts file to map the hadoop docker container’s hostname to the loopback address, expose from the docker container port 9866 (the HDFS datanode server port for data transfer) to the host machine, and when creating the source in Dremio add connection property dfs.client.use.datanode.hostname: true in advanced settings.