Write in Dremio(Nessie catalog) through Spark

gakshat1107 · August 8, 2024, 7:05am

Hi,
I created a dremio jdbc connection in spark-scala and i’m able to read the tables from NessieCatalog through it.

When I’m trying to write the datafame into the nessiecatalog then it’s
df_hive_rename1.write.mode(“append”).jdbc(jdbcUrl, “NessieCatalog.silver.hivecreate”, connectionProperties)
“Illegal use of dynamic parameter”
“SQL Query INSERT INTO "NessieCatalog"."Silver"."dremsilvertable" ("department_id","department_name","location") VALUES (?,?,?)”
I tried connecting to Dremio (nessie catalog) to access through spark.sql but can’t find any resource for it. Is it possible to connect?
Is it possible to write the dataframe like in point no.2 ?

balaji.ramaswamy · August 12, 2024, 5:32am

@gakshat1107 Let’s narrow down the issue to Nessie or the actual SQL, are you able to run the same insert via Dremio UI? I see you are trying to use dynamic parameter which is not supported

gakshat1107 · August 12, 2024, 6:28am

@balaji.ramaswamy
Yes, I’m able to run the same SQL through Dremio UI.
For dynamic parameters - didn’t know that we can’t use the dynamic parameters. I found an discussion yesterday only where you told that dynamic parameter can’t be used. - Thanks for the explanation
I tried a different method using spark.sql and now able to insert data through Spark too.

But there’s some other issue I’m facing now:

I created a table through spark.sql in spark scala and insert data in it and tried to read it through the same method and also from Dremio UI - successfull
Then I inserted some data through Dremio UI into the same table and now trying to read it through dremio - successfull
But when I’m trying spark.sql, it’s throwing the below error, is it because Dremio changes any metadata?:
ERROR BaseReader: Error reading file(s): s3://etltest/Silver/region4_adfad9d4-bcec-4c49-a104-20b6940009f1/19465862-bf0b-ac9c-4ce5-2e28bd08ce00/0_0_0.parquet
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:275)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:553)
at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.SnappyDecompressor.uncompress(SnappyDecompressor.java:30)
at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.NonBlockedDecompressor.decompress(NonBlockedDecompressor.java:73)
at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.iceberg.shaded.org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:286)

balaji.ramaswamy · August 12, 2024, 2:45pm

@gakshat1107 In each of these scenarios, what was the Iceberg catalog?

gakshat1107 · August 14, 2024, 10:56am

@balaji.ramaswamy i’m using NessieCatalog for the above scenarios and minio is being used as the storage

balaji.ramaswamy · August 15, 2024, 5:43am

@gakshat1107 I was able to insert rows via Dremio and read from Spark but used a Hive catalog. Can you please try the below

Create a Iceberg table using Hive catalog
Insert som rows via Spark
Query table via Spark
Query table via Dremio
Insert additional rows into the table via Dremio
Query the table via Dremio
Query the table via Spark

gakshat1107 · August 15, 2024, 7:10am

Sure @balaji.ramaswamy, I’ll try that and get back to you

gakshat1107 · August 20, 2024, 1:22pm

Hi @balaji.ramaswamy, I setup hive and tried the following:

Created table directly in hive:
a. able to read in spark
b. tried to insert data through dremio in the table (getting “Table [HiveCatalog.iceee1.testing] is not configured to support DML operations”
Created HiveCatalog table through Dremio:
a. Inserted the data into that table
b. tried to read the data through hive, getting the error - FAILED: SemanticException [Error 10055]: Line 1:12 Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat ‘dremtest’: The class is class org.apache.hadoop.mapred.FileOutputFormat (need to look into it)
c. tried to read the data through spark-scala, getting the same error as I was getting for nessie java.lang.IllegalArgumentException

balaji.ramaswamy · August 21, 2024, 5:44am

@gakshat1107 Can you please send the create table statements done on Hive and Dremio?

gakshat1107 · August 21, 2024, 7:17am

Hi @balaji.ramaswamy,
Please find the statements below:

Table created through hive:
CREATE TABLE iceee1.testing(
id bigint,
data string)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘s3a://iceee/iceee1/testing’
TBLPROPERTIES (
‘spark.sql.create.version’=‘3.4.3’,
‘spark.sql.sources.schema’=‘{“type”:“struct”,“fields”:[{“name”:“id”,“type”:“long”,“nullable”:true,“metadata”:{}},{“name”:“data”,“type”:“string”,“nullable”:true,“metadata”:{}}]}’,
‘transient_lastDdlTime’=‘1722922828’)
Table created through Dremio:
CREATE EXTERNAL TABLE iceee1.dremtest(
id int,
data string)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.FileInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.mapred.FileOutputFormat’
LOCATION
‘s3a://iceee/iceee1/dremtest’
TBLPROPERTIES (
‘commit.manifest.target-size-bytes’=‘153600’,
‘compatibility.snapshot-id-inheritance.enabled’=‘true’,
‘metadata_location’=‘s3a://iceee/iceee1/dremtest/metadata/00001-a3c32621-004c-4db8-8fc9-dae18fda7f7a.metadata.json’,
‘previous_metadata_location’=‘s3a://iceee/iceee1/dremtest/metadata/00000-697ea1a5-b7bd-4dab-9430-60b731ea1ed7.metadata.json’,
‘table_type’=‘ICEBERG’,
‘transient_lastDdlTime’=‘1724159515’,
‘write.parquet.compression-codec’=‘zstd’)

gakshat1107 · August 21, 2024, 7:49am

Hi @balaji.ramaswamy , please note i didn’t gave any properties while creating the tables, i only gave the columns

balaji.ramaswamy · August 26, 2024, 5:28am

@gakshat1107 The table you have created through Hive is text format

STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

gakshat1107 · August 26, 2024, 8:04am

@balaji.ramaswamy ,

I’ll change that property and will try to insert through Dremio.
The issue is i’m not able to read the table that’s created using Dremio, or if i’ve inserted any data using Dremio UI into the hive table.

Refer to below:
Created HiveCatalog table through Dremio:
a. Inserted the data into that table
b. tried to read the data through hive, getting the error - FAILED: SemanticException [Error 10055]: Line 1:12 Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat ‘dremtest’: The class is class org.apache.hadoop.mapred.FileOutputFormat (need to look into it)
c. tried to read the data through spark-scala, getting the same error as I was getting for nessie java.lang.IllegalArgumentException

balaji.ramaswamy · August 28, 2024, 5:40am

@gakshat1107 Both your create tables are not iceberg format, see SSL above, that is the reason you cannot insert into this table

gakshat1107 · August 28, 2024, 12:06pm

Hi @balaji.ramaswamy,

Yes, Hive table is not of iceberg format, i’ll correct it and then will try.
But Hive table created through Dremio is in iceberg format ‘table_type’=‘ICEBERG’,

balaji.ramaswamy · August 29, 2024, 6:08am

@gakshat1107 Although you mentioned table_type=ICEBERG the serde seems to be text. Can you run a select * from iceee.iceee1.dremtest and send me the profile?

Instead of above command, can you try the below via Dremio, then insert rows and then try to read via Dremio followed by spark or hive?

CREATE TABLE iceee.iceee1.dremtest(
id int,
data string)

gakshat1107 · September 9, 2024, 6:02am

Hi @balaji.ramaswamy, sorry for the late reply

I created the table as suggested by you
CREATE TABLE HiveCatalog.iceee1.dremtest1(
id int,
data varchar(200));
I ran the show create table command and got the below output:
CREATE EXTERNAL TABLE iceee1.dremtest1(
id int,
data string)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.FileInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.mapred.FileOutputFormat’
LOCATION
‘s3a://iceee/iceee1/dremtest1’
TBLPROPERTIES (
‘commit.manifest.target-size-bytes’=‘153600’,
‘compatibility.snapshot-id-inheritance.enabled’=‘true’,
‘metadata_location’=‘s3a://iceee/iceee1/dremtest1/metadata/00001-c0150820-eb52-4f26-b584-cd0560eff1d7.metadata.json’,
‘previous_metadata_location’=‘s3a://iceee/iceee1/dremtest1/metadata/00000-283ce5d6-487e-4cb3-a913-840e8c105bf7.metadata.json’,
‘table_type’=‘ICEBERG’,
‘transient_lastDdlTime’=‘1725860751’,
‘write.parquet.compression-codec’=‘zstd’)
I queried the table through spark and got the error:
spark.sql(“select * from iceee1.dremtest1”).show(5000,5000)
java.lang.RuntimeException: java.lang.InstantiationException
I queries the table through hive and it gave me no output

balaji.ramaswamy · September 14, 2024, 1:15am

@gakshat1107 Is HiveCatalog in your step #1 a Dremio source?

gakshat1107 · September 16, 2024, 6:10am

Hi @balaji.ramaswamy , yes I created that table by connecting Hive in Dremio

gakshat1107 · September 16, 2024, 6:42am

Hi @balaji.ramaswamy , can you please try with Nessie once too?
Cause we’ve to use Nessie not Hive

Topic		Replies	Views
Dremio Iceberg JDBC catalog	9	2054	May 24, 2023
Connect to Nessie catalog Dremio University	5	1417	December 12, 2022
Postgres Schema for Iceberg Nessie Apache Iceberg	4	349	October 17, 2024
Best Catalog for Apache Iceberg w/ Dremio? Apache Iceberg	2	66	September 3, 2025
Spark jdbc insert	6	701	December 27, 2023

Write in Dremio(Nessie catalog) through Spark

Related topics