First we had an error where the executors couldn’t find the serde class, and we added the jar’s in dremio.conf in both “provisioning.yarn.classpath” and “provisioning.yarn.app.classpath” and we got past this.
Google’d a lot for it, checked the Dremio source starting with ScanWithHiveReader but I cannot figure it out what’s the issue - it would suggest the class is not the right type (must implement / extend something). And I’m not sure why.
Is Dremio expected to work with the JSON SerDe on Hive external tables or just external tables in general? Did we do something very wrong?
There’s another tiny little question: is it possible to limit the data pulled by the Preview functionality (like it would include some “TOP/LIMIT x” in the JDBC SQL queries?
It currently seems to pull the entire source table (we have some huge ones) and then does the limit after it gets all the data.
We checked all the settings / configs - didn’t find anything for this.
We are using Dremio 4.1.8.
I did what you suggested - and it’s good to know - this way we could remove the settings “provisioning.yarn.classpath” from dremio.conf and we don’t get the “class not found” exception too.
But with this change, we still get the same “(java.lang.RuntimeException) java.lang.ClassCastException: class org.apache.hive.hcatalog.data.JsonSerDe” error.
It looks like Dremio cannot use the Hive JSON SerDe’s for external tables - I don’t know where this error comes from, but it looks like it just doesn’t like the SerDe class, and we tried 2 different libs.
Does the Hive SerDe class need to be somehow specially crafted for Dremio? Does anyone know if there’s a Hive JSON SerDe compatible with Dremio?
Is anyone else using Hive external tables on JSON structured data in Dremio?
Note: we have some tables using “org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe” and they work fine. So how is this SerDe more different than the JSON ones?
If I move the jar to hive3.d, I get the "ClassNotFoundException: org.apache.hive.hcatalog.data.JsonSerDe " again. So “hive3-ee.d” looks to be the dir - but for some reason Dremio doesn’t like that class
The error is pretty self-explanatory “java.lang.ClassCastException: class” - so I was just wondering if these JSON SerDe’s are not supported for Hive 3 .
As a workaround we will probably migrate some processes to parquet files.
Regarding the second question, the Preview of a dataset, is it possible to not pull the entire table from the external source when preview-ing, and apply the limit after that? Just pull the data with a limit - ie. if it’s a JDBC source, just do a TOP / LIMIT X query?
Is there any support for this, some configuration that we missed? We have dome SQL Server tables with 500M rows, and the preview attempts to bring everything into Dremio just for previewing…
Hi @balaji.ramaswamy,
We are using Dremio 4.9.1 now,
May I know does Dremio support HIVE tables in JSON format?
or do we need to import the below external .jar package to save the table in TEXTFILE format by below sql?
ROW FORMAT SERDE ‘org.apache.hive.hcatalog.data.JsonSerDe’
STORED AS TEXTFILE
Hi @jduong,
We are using Dremio 4.9.1 now,
May I know does Dremio support HIVE tables in JSON format?
or do we need to import the below external .jar package (hive-hcatalog-core-0.13.1-cdh5.3.6.jar)
to save the table in TEXTFILE format by below sql:
ROW FORMAT SERDE ‘org.apache.hive.hcatalog.data.JsonSerDe’
STORED AS TEXTFILE