HIVE connection error

Hi!

I just installed Dremio on my Hortonworks cluster. Everything seems to be working up till the point of table previews. YARN containers running, HIVE store attached (metastore on default port: 9083) I can see my hive db, but when I select a table, I get the following for preview:

     PARSE ERROR: Failure parsing the query. 

(org.apache.calcite.sql.parser.SqlParseException) Encountered “<EOF>” at line 0, column 0. Was expecting one of:…

fd4f0aa8-7848-42e4-897b-666bc175b9d3.zip (4.4 KB)

Also, if I try to run a select * query on one of the tables:

(java.lang.RuntimeException) MetaException(message:Your client does not appear to support insert-only tables. To skip capability checks, please set metastore.client.capability.check to false. This setting can be set globally, or on the client for the current metastore session. Note that this may lead to incorrect results, data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client capabilities via get_table_req API.)

d6208915-601d-458f-b2b5-fc046cb250a4.zip (4.3 KB)

I do not know where it goes wrong. Fresh install of Hadoop 3.1, with all updates HIVE etc…

Dremio running as ‘root’ user, RANGER permissions all set.

I am able to connect to HDFS datasource and see the files and everything there.

HIVE server is running on port: 10000, Thrift on: 10001… Am i not supposed to connect to those with Dremio?

Let me know, thanks

Hi @dataedgehungary

The first error is a parsing error. Can you please check the syntax of the query?

The second error seems to come from the fact that you are using a transactional table. Can you please send me the output of “describe formatted PONTMASTERDB.BOLT;”

Thanks
@balaji.ramaswamy

Hi,

The first error is just literally clicking on the table. It tries to load the preview.

Yes, I am using a transactional table ACID

| Detailed Table Information | Table(tableName:vasarlas_2011068, dbName:pontmasterdb, owner:root, createTime:1548252921, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:bigint, comment:null), FieldSchema(name:kreditkartyaid, type:bigint, comment:null), FieldSchema(name:vevoid, type:bigint, comment:null), FieldSchema(name:partnerid, type:bigint, comment:null), FieldSchema(name:boltid, type:bigint, comment:null), FieldSchema(name:kartyaszam, type:string, comment:null), FieldSchema(name:esemenydatumido, type:string, comment:null), FieldSchema(name:azonosito, type:string, comment:null), FieldSchema(name:vasarlasosszeg, type:int, comment:null), FieldSchema(name:forrastipus, type:int, comment:null), FieldSchema(name:forrasid, type:bigint, comment:null), FieldSchema(name:penztargepazonosito, type:string, comment:null), FieldSchema(name:datumido, type:string, comment:null), FieldSchema(name:verzio, type:bigint, comment:null), FieldSchema(name:eszkozid, type:bigint, comment:null), FieldSchema(name:penztarosid, type:bigint, comment:null), FieldSchema(name:oper, type:string, comment:null)], location:hdfs://master1.hadoop:8020/data/warehouse/tablespace/managed/hive/pontmasterdb.db/vasarlas_2011068, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=, line.delim=\n, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=575641648, numRows=0, rawDataSize=0, transactional_properties=insert_only, numFiles=4, transient_lastDdlTime=1548252930, bucketing_version=2, comment=Imported by sqoop on 2019/01/23 09:15:16, transactional=true}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER, writeId:1) |

I managed to get on further with the “hive.metastore.capability.check false”

But now i have other errors, mainly due to the subdirectory trees…

[Error Id: e2495f6b-13dd-4696-8b1e-c6f7fb807cfe on slave21.hadoop:-1]

(java.lang.RuntimeException) java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSplit

My workflow is built upon incremental update from my RDBMS with SQOOP to Hive.

Hi @dataedgehungary

Couple of questions,

  1. What version of Hive is this?
  2. Your table properties says input format is “text”

inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,

But the error message says “ORC”

java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSplit

Does this mean your files are ORC and the table definition is ORC?

Thanks
@balaji.ramaswamy

Hive 3.1.0.3.1.0.0-78

It is stored as text… I have no idea where it gets the ORC.
After the capability set false, I get the following error:

(java.io.IOException) Not a file: hdfs:/

Then I set hive.mapred.supports.subdirectories true

Then I get the ORC error…

I decided to rewrite my Sqoop logic. If I create the Hive tables beforehand as EXTERNAL Parquet, then append to it with sqoop import, then Dremio works properly.

Does it mean we can’t use ACID Transaction tables currently at all?

Hi dataedgehungary,

Sqoop can’t write in EXTERNAL TABLE neither an ACID table at least until sqoop version 1.4.4 and hive 2.1, it do write in a hive internal managed table (no ACID)

You can do an incremental load using sqoop in hive managed table and then update your ACID hive table using insert update or merge statements

Dremio can read from a ACID table using hive metastore connection (HDFS connection is not supported for ACID tables),
the thing is that’s slower than reading from parquet table in dremio

one more thing

hive ACID table ==> ORC files

Dremio is built in parquet file and it’s doesn’t support ORC files yet !

So to read acid hive tables in dremio, you have to use hive metadata connection

@dataedgehungary

To add to @ssky’s explanation, even though Dremio needs Hive to read any ORC file, it is only to get the location of the file on HDFS. Once we get that we directly access the file using our vectorized ORC reader (do not use HS2)

1 Like