Not able to connect HDFS to Dremio!

Hello All, Any kind of help with the following problem would be appreciated.

I have deployed Dremio through docker. I have installed Hadoop 3.2.1 manually on my system. I have put data into Hadoop through the command line. When I try to load the HDFS into Dremio via the Namenode host and port it is not working. It always shows that No source available.

Thanks in Advance for helping out!!!

Regards,
Pranav Kotak

1 Like

@pranavkotak

The server.log on the master coordinator should give the more information on why Dremio was unable to add the source. Can you please check server.log? If unable to identify issue, please send us the server.log

Thanks
Bali

LOG :

2020-12-13 09:31:16,891 [start-k] WARN c.d.e.catalog.ManagedStoragePlugin - Error starting new source: k

java.lang.Exception: Unavailable: Call From 466b7d8406f5/["IP add"] to localhost:9820 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$newStartSupplier$1(ManagedStoragePlugin.java:529)

at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$nameSupplier$3(ManagedStoragePlugin.java:591)

at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

2020-12-13 09:31:16,892 [start-k] INFO c.d.s.s.LocalSchedulerService - Cancelling task metadata-refresh-wakeup-k

The above is the main error. Connection Refused I have already checked everything. The port is active. Hadoop is working fine. But dremio is not able to connect. Also, dremio has been deployed through docker.

@pranavkotak

From your docker container are you able to do, also make sure you have the right port, will be in hdfs-site.xm

telnet port

Have you copied the core-site.xml and hdfs-site.xml to the/conf?

Thanks
Bali

Hello @balaji.ramaswamy,

Here is the content of my hdfs-site.xml file,

dfs.replication 1 dfs.namenode.name.dir C:\Users\prana\Downloads\hadoop-3.2.1\data\dfs\namenode dfs.datanode.data.dir C:\Users\prana\Downloads\hadoop-3.2.1\data\dfs\datanode

And also how to access the docker files? Do I need to add port configuration in the above code? If yes what do I have to add?

Also, where do I find the conf/ file?? I cannot trace in the hadoop folder.

Thank You!!

Hi @pranavkotak

Your hdfs-site.xml seems to lack a lot of information and we also need core-site.xml, are you able to go through the docs below and set them up one by one? You can skip the sections for Yarn


Once you complete the setup and have issues then reach out to us

Also to add HDFS as a source, you would need to find the port it is listening on and make sure Dremio coordinator and executors are able to talk to the Name node host on that port, default is usually 8020 or 9000

Hello @balaji.ramaswamy,

I have added the required code in the core-site.xml. Also, I have checked the port is listening(8020). The port configured for Hadoop is 8020. Yet still, I get the original error of Connection Refused.

I am not understanding what is the issue. From the Hadoop part, everything is working fine. I don’t know why it is unable to get the data from HDFS.

Thank You!!!

@pranavkotak

It looks like the docker container is unable to reach the Hadoop Namenode. Are you able to atleast ping the Name node from the docker?

Can you try something slightly different? Can you setup Hadoop on docker and see if you are able to connect ?