Dremio is unable to access hadoop 3.1.0?

I just tried to install from scratch a hadoop cluster 3.1.0
There are some ports changed in this release so I’m not sure why I cannot connect Dremio to this new hadoop cluster
Look into the Dremio dependencies, it seems that Dremio right now is using hadoop 2.7.1, so they couldn’t work together
Can you please confirm
Thank you so much

Could you share any server log with us?

You’re right that some ports changes with Hadoop 3.0, so make sure you use the correct port for connecting to HDFS. Previous default port was 8020, but new port would be 9820.
You might also be using the MapR distribution of Dremio, instead of the generic one (which would be based on ASF Hadoop 2.8.0). Can you please confirm it too?

I just tried to add hdfs data source in dremio, not mapr or something like that
I configured to use port 9820 but it didn’t work
Let me double check tomorrow

To clarify, there are 2 flavors of Dremio: the regular one (based on Apache Hadoop 2.8.0), and a MapR based (based on the MapR version of Hadoop 2.7.1)

To add to @laurent said. Looks like you are trying to use Apache Hadoop 3.1.0 cluster with some version of Dremio community edition. Not sure where Hadoop version 2.7.1 came from as Dremio uses 2.8.0 for non-MapR distros.
In any case if there are some incompatibilities between hdfs 2.x. client and hdfs 3.x server there would be issues.
Please let us know what errors you notice in Dremio logs.

In the web GUI, it shows message “Failed to fetch”, but nothing is written to server.log
As you can see below, the server.log is updated yesterday (2 May)
Can you please advise how to double check what the root cause is

-rw-rw-r-- 1 ec2-user ec2-user 4743 May 3 01:45 access.log
drwxrwxr-x 2 ec2-user ec2-user 4096 May 3 01:42 archive
drwxrwxr-x 3 ec2-user ec2-user 4096 May 2 08:37 json
-rw-rw-r-- 1 ec2-user ec2-user 1488 Apr 26 09:13 queries.json
-rw-rw-r-- 1 ec2-user ec2-user 9971 May 3 00:10 server.gc
-rw-rw-r-- 1 ec2-user ec2-user 6172 Apr 25 21:07 server.gc.1
-rw-rw-r-- 1 ec2-user ec2-user 5561 Apr 23 07:59 server.gc.2
-rw-rw-r-- 1 ec2-user ec2-user 5093 Apr 23 07:01 server.gc.3
-rw-rw-r-- 1 ec2-user ec2-user 4905 Apr 23 06:46 server.gc.4
-rw-rw-r-- 1 ec2-user ec2-user 4905 Apr 23 06:26 server.gc.5
-rw-rw-r-- 1 ec2-user ec2-user 109036 May 2 10:10 server.log
-rw-rw-r-- 1 ec2-user ec2-user 2290736 Apr 26 08:26 server.out

@yufeldman: yes, you are right. I use Dremio community edition with Apache Hadoop 3.1.0 cluster
If I use Apache Hadoop from Hortonworks (I guess 2.8 because my teammate set it up), it functions as I expect.

I tried to install hadoop 2.8.3, it functions well with dremio
Maybe something changes in hadoop 3.1.0 that I don’t know
Use the same steps to configure hadoop for both version. In the name node web GUI, I see nodes in hadoop 2.8.3 are running, but no nodes in hadoop 3.1.0 running
Hence, I would move forward with hadoop 2.8.3 then comeback to check with hadoop 3.1.0 when I have time
Thank you so much

I want to make sure to clarify things here.

  • Dremio is currently validated with most Hadoop 2.x based versions of Hadoop distributions from Apache, Cloudera, Hortonworks and MapR. We’ll post a certification matrix shortly to make this clearer.
  • We’ve not yet certified Dremio with the Hadoop 3.x series of releases. This should be announced in the coming months.
  • Dremio’s main community distribution ships with a Hadoop 2.8.x-based client. This client is compatible with all the versions of distributions mentioned above.
  • Dremio’s MapR edition ships with MapR’s latest client which is based on Hadoop 2.7.x. This client will likely work with several editions of the other Hadoop distributions as well but should generally be used only when working with MapR distributions since the Hadoop base is older than what the standard community edition ships with.

The reason that the last point is important is that you mentioned you are running a version of Dremio community edition that has a Hadoop 2.7.x jar. This suggest that you are using the MapR edition against Hortonworks instead of the standard edition. While this will likely work, we strongly recommend using the standard edition (and not the MapR one).



Good news ! But i didn’t find any announce/docs about Hadoop 3 compatibility. Is it still work in progress ?

Thank you !

Hadoop 3.x is not yet certified with Dremio, but we do plan to address this in an upcoming release.

And how about today? Still no HDP 3.x support?

Any Updates to this thread? HDFS 3 is supported or not? Apache drill has supported this filesystem in version 1.17?


Dremio should be able to add HDFS as a source, below is the doc