I setup a Hive Source but believe I need to add Connection String Options in order to make it functional. In the source definition window there is a note:
These options will be added to your Hive connection string. Please see the Dremio documentation for a list of commonly used connection string options
I have searched in vein for this list; no where to be found.
The reason I believe I need to set some of these options is “No Data” is displayed in the main window once the source is defined. I’ve tried running queries against the data source and they do not error out, but instead just appear to be taking up cycles as I get a message with the timer that the query is running.
New to Dremio and trying to build some knowledge step by step. Any help here would be most appreciated.
But it really depends what additional properties you set/use to connect to HiveMetastore (either through hive-site.xml or on command line) for your other applications
For instance when I connect to Hive through DBViz I specify just the Host, Port, and my credentials. I am guessing there are Connection string options that would allow me to include my Userid and PW. Just find it odd I’m not getting an authentication error when trying to connect.
Dremio is a service, so it is not going to connect to HiveMetastore as an end user logged into dremio, but rather as a user that is running dremio and it will be using impersonation.
So as a user that logged in to dremio you may not have permissions to view certain (or none) data.
One another thing to point out is Dremio connects to Hive Metastore service and not the HiveServer2 service which can accepts username/password as credentials. Hive Metastore can accept only kerberos credentials or no credentials. username/password is not option when connecting to Hive metastore. You can check out your hive-site.xml for the hostname of Hive metastore in property hive.metastore.uris.
yufeldman and vvv, greatly appreciate your replies as you were right on target. I changed my connection string to use the HiveMetastore Port (9083); once I did this I was able to see the directory tree in Hadoop and navigate to the folder that contained the Hive Tables I was interested in. Now I just need to figure out why Dremio can’t read these tables; yufeldman you alluded to this possibility of not having permissions to view certain data going this route. What confuses me is that since one does not connect with their own credentials to the HiveMetastore how would access to tables be enabled?
Could you look under “Jobs” (in example image it is in the top menu) and you will probably see failed job marked with red hexagon and next to it more detailed page. If you could click on “Profile” (highlighted in red)
Doesn’t the fact I’m seeing the HIVE Folders in Dremio an indication that the Host is resolvable? I am only running into this issue trying to access the HIVE Tables.
Showing Hive folders and tables (from HiveMetadata) is not the same as fetching the real data from NameNode/DataNodes. And looks like exception is related to NameNode host resolution.
Just a an idea:
Could you include following as an additional property while configuring HiveSource:
name: fs.defaultFS
value: hdfs://<namenode_host>:8020
@maverick if your HDFS cluster uses HA, you would have to provide the fully qualified hostname for the master namenode.
Alternatively It is probably possible to use the cluster name, but you would need to add your hdfs configuration (hdfs-site.xml or similar) under <DREMIO_HOME>/conf in order for Dremio to get the HA configuration for the HDFS cluster.