HIVE connection error

dataedgehungary · January 24, 2019, 12:38pm

Hi!

I just installed Dremio on my Hortonworks cluster. Everything seems to be working up till the point of table previews. YARN containers running, HIVE store attached (metastore on default port: 9083) I can see my hive db, but when I select a table, I get the following for preview:

     PARSE ERROR: Failure parsing the query.

(org.apache.calcite.sql.parser.SqlParseException) Encountered “<EOF>” at line 0, column 0. Was expecting one of:…

fd4f0aa8-7848-42e4-897b-666bc175b9d3.zip (4.4 KB)

Also, if I try to run a select * query on one of the tables:

(java.lang.RuntimeException) MetaException(message:Your client does not appear to support insert-only tables. To skip capability checks, please set metastore.client.capability.check to false. This setting can be set globally, or on the client for the current metastore session. Note that this may lead to incorrect results, data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client capabilities via get_table_req API.)

d6208915-601d-458f-b2b5-fc046cb250a4.zip (4.3 KB)

I do not know where it goes wrong. Fresh install of Hadoop 3.1, with all updates HIVE etc…

Dremio running as ‘root’ user, RANGER permissions all set.

I am able to connect to HDFS datasource and see the files and everything there.

HIVE server is running on port: 10000, Thrift on: 10001… Am i not supposed to connect to those with Dremio?

Let me know, thanks

balaji.ramaswamy · January 24, 2019, 2:17pm

Hi @dataedgehungary

The first error is a parsing error. Can you please check the syntax of the query?

The second error seems to come from the fact that you are using a transactional table. Can you please send me the output of “describe formatted PONTMASTERDB.BOLT;”

Thanks
@balaji.ramaswamy

dataedgehungary · January 24, 2019, 3:02pm

Hi,

The first error is just literally clicking on the table. It tries to load the preview.

dataedgehungary · January 24, 2019, 3:06pm

Yes, I am using a transactional table ACID

| Detailed Table Information | Table(tableName:vasarlas_2011068, dbName:pontmasterdb, owner:root, createTime:1548252921, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:bigint, comment:null), FieldSchema(name:kreditkartyaid, type:bigint, comment:null), FieldSchema(name:vevoid, type:bigint, comment:null), FieldSchema(name:partnerid, type:bigint, comment:null), FieldSchema(name:boltid, type:bigint, comment:null), FieldSchema(name:kartyaszam, type:string, comment:null), FieldSchema(name:esemenydatumido, type:string, comment:null), FieldSchema(name:azonosito, type:string, comment:null), FieldSchema(name:vasarlasosszeg, type:int, comment:null), FieldSchema(name:forrastipus, type:int, comment:null), FieldSchema(name:forrasid, type:bigint, comment:null), FieldSchema(name:penztargepazonosito, type:string, comment:null), FieldSchema(name:datumido, type:string, comment:null), FieldSchema(name:verzio, type:bigint, comment:null), FieldSchema(name:eszkozid, type:bigint, comment:null), FieldSchema(name:penztarosid, type:bigint, comment:null), FieldSchema(name:oper, type:string, comment:null)], location:hdfs://master1.hadoop:8020/data/warehouse/tablespace/managed/hive/pontmasterdb.db/vasarlas_2011068, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=, line.delim=\n, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=575641648, numRows=0, rawDataSize=0, transactional_properties=insert_only, numFiles=4, transient_lastDdlTime=1548252930, bucketing_version=2, comment=Imported by sqoop on 2019/01/23 09:15:16, transactional=true}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER, writeId:1) |

I managed to get on further with the “hive.metastore.capability.check false”

But now i have other errors, mainly due to the subdirectory trees…

[Error Id: e2495f6b-13dd-4696-8b1e-c6f7fb807cfe on slave21.hadoop:-1]

(java.lang.RuntimeException) java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSplit

My workflow is built upon incremental update from my RDBMS with SQOOP to Hive.

balaji.ramaswamy · January 24, 2019, 4:24pm

Hi @dataedgehungary

Couple of questions,

What version of Hive is this?
Your table properties says input format is “text”

inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,

But the error message says “ORC”

java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSplit

Does this mean your files are ORC and the table definition is ORC?

Thanks
@balaji.ramaswamy

dataedgehungary · January 24, 2019, 4:42pm

Hive 3.1.0.3.1.0.0-78

It is stored as text… I have no idea where it gets the ORC.
After the capability set false, I get the following error:

(java.io.IOException) Not a file: hdfs:/

Then I set hive.mapred.supports.subdirectories true

Then I get the ORC error…

dataedgehungary · January 25, 2019, 11:57am

I decided to rewrite my Sqoop logic. If I create the Hive tables beforehand as EXTERNAL Parquet, then append to it with sqoop import, then Dremio works properly.

Does it mean we can’t use ACID Transaction tables currently at all?

ssky · August 6, 2020, 7:47am

Hi dataedgehungary,

Sqoop can’t write in EXTERNAL TABLE neither an ACID table at least until sqoop version 1.4.4 and hive 2.1, it do write in a hive internal managed table (no ACID)

You can do an incremental load using sqoop in hive managed table and then update your ACID hive table using insert update or merge statements

Dremio can read from a ACID table using hive metastore connection (HDFS connection is not supported for ACID tables),
the thing is that’s slower than reading from parquet table in dremio

ssky · August 6, 2020, 7:51am

one more thing

hive ACID table ==> ORC files

Dremio is built in parquet file and it’s doesn’t support ORC files yet !

So to read acid hive tables in dremio, you have to use hive metadata connection

balaji.ramaswamy · August 8, 2020, 9:04am

@dataedgehungary

To add to @ssky’s explanation, even though Dremio needs Hive to read any ORC file, it is only to get the location of the file on HDFS. Once we get that we directly access the file using our vectorized ORC reader (do not use HS2)

Topic		Replies	Views
Hive "Insert-Only" Support	14	5569	October 18, 2018
Could not access tables inside hive source	19	1940	January 29, 2021
Dremio OS hive metadata error (Windows)	5	1391	November 9, 2018
Dremio Connection With Remote Hive	14	3002	October 19, 2018
Unable to add Hive to DS	4	1252	November 13, 2018

HIVE connection error

Related topics