Write in Dremio(Nessie catalog) through Spark

Hi,
I created a dremio jdbc connection in spark-scala and i’m able to read the tables from NessieCatalog through it.

  1. When I’m trying to write the datafame into the nessiecatalog then it’s
    df_hive_rename1.write.mode(“append”).jdbc(jdbcUrl, “NessieCatalog.silver.hivecreate”, connectionProperties)
    “Illegal use of dynamic parameter”
    “SQL Query INSERT INTO "NessieCatalog"."Silver"."dremsilvertable" ("department_id","department_name","location") VALUES (?,?,?)”

  2. I tried connecting to Dremio (nessie catalog) to access through spark.sql but can’t find any resource for it. Is it possible to connect?

  3. Is it possible to write the dataframe like in point no.2 ?

@gakshat1107 Let’s narrow down the issue to Nessie or the actual SQL, are you able to run the same insert via Dremio UI? I see you are trying to use dynamic parameter which is not supported

@balaji.ramaswamy
Yes, I’m able to run the same SQL through Dremio UI.
For dynamic parameters - didn’t know that we can’t use the dynamic parameters. I found an discussion yesterday only where you told that dynamic parameter can’t be used. - Thanks for the explanation
I tried a different method using spark.sql and now able to insert data through Spark too.

But there’s some other issue I’m facing now:

  1. I created a table through spark.sql in spark scala and insert data in it and tried to read it through the same method and also from Dremio UI - successfull
  2. Then I inserted some data through Dremio UI into the same table and now trying to read it through dremio - successfull
  3. But when I’m trying spark.sql, it’s throwing the below error, is it because Dremio changes any metadata?:
    ERROR BaseReader: Error reading file(s): s3://etltest/Silver/region4_adfad9d4-bcec-4c49-a104-20b6940009f1/19465862-bf0b-ac9c-4ce5-2e28bd08ce00/0_0_0.parquet
    java.lang.IllegalArgumentException
    at java.nio.Buffer.limit(Buffer.java:275)
    at org.xerial.snappy.Snappy.uncompress(Snappy.java:553)
    at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.SnappyDecompressor.uncompress(SnappyDecompressor.java:30)
    at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.NonBlockedDecompressor.decompress(NonBlockedDecompressor.java:73)
    at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
    at java.io.DataInputStream.readFully(DataInputStream.java:195)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at org.apache.iceberg.shaded.org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:286)

@gakshat1107 In each of these scenarios, what was the Iceberg catalog?

@balaji.ramaswamy i’m using NessieCatalog for the above scenarios and minio is being used as the storage

@gakshat1107 I was able to insert rows via Dremio and read from Spark but used a Hive catalog. Can you please try the below

  • Create a Iceberg table using Hive catalog
  • Insert som rows via Spark
  • Query table via Spark
  • Query table via Dremio
  • Insert additional rows into the table via Dremio
  • Query the table via Dremio
  • Query the table via Spark

Sure @balaji.ramaswamy, I’ll try that and get back to you

Hi @balaji.ramaswamy, I setup hive and tried the following:

  1. Created table directly in hive:
    a. able to read in spark
    b. tried to insert data through dremio in the table (getting “Table [HiveCatalog.iceee1.testing] is not configured to support DML operations”

  2. Created HiveCatalog table through Dremio:
    a. Inserted the data into that table
    b. tried to read the data through hive, getting the error - FAILED: SemanticException [Error 10055]: Line 1:12 Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat ‘dremtest’: The class is class org.apache.hadoop.mapred.FileOutputFormat (need to look into it)
    c. tried to read the data through spark-scala, getting the same error as I was getting for nessie java.lang.IllegalArgumentException

@gakshat1107 Can you please send the create table statements done on Hive and Dremio?

Hi @balaji.ramaswamy,
Please find the statements below:

  1. Table created through hive:
    CREATE TABLE iceee1.testing(
    id bigint,
    data string)
    ROW FORMAT SERDE
    ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
    STORED AS INPUTFORMAT
    ‘org.apache.hadoop.mapred.TextInputFormat’
    OUTPUTFORMAT
    ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
    LOCATION
    ‘s3a://iceee/iceee1/testing’
    TBLPROPERTIES (
    ‘spark.sql.create.version’=‘3.4.3’,
    ‘spark.sql.sources.schema’=‘{“type”:“struct”,“fields”:[{“name”:“id”,“type”:“long”,“nullable”:true,“metadata”:{}},{“name”:“data”,“type”:“string”,“nullable”:true,“metadata”:{}}]}’,
    ‘transient_lastDdlTime’=‘1722922828’)

  2. Table created through Dremio:
    CREATE EXTERNAL TABLE iceee1.dremtest(
    id int,
    data string)
    ROW FORMAT SERDE
    ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
    STORED AS INPUTFORMAT
    ‘org.apache.hadoop.mapred.FileInputFormat’
    OUTPUTFORMAT
    ‘org.apache.hadoop.mapred.FileOutputFormat’
    LOCATION
    ‘s3a://iceee/iceee1/dremtest’
    TBLPROPERTIES (
    ‘commit.manifest.target-size-bytes’=‘153600’,
    ‘compatibility.snapshot-id-inheritance.enabled’=‘true’,
    ‘metadata_location’=‘s3a://iceee/iceee1/dremtest/metadata/00001-a3c32621-004c-4db8-8fc9-dae18fda7f7a.metadata.json’,
    ‘previous_metadata_location’=‘s3a://iceee/iceee1/dremtest/metadata/00000-697ea1a5-b7bd-4dab-9430-60b731ea1ed7.metadata.json’,
    ‘table_type’=‘ICEBERG’,
    ‘transient_lastDdlTime’=‘1724159515’,
    ‘write.parquet.compression-codec’=‘zstd’)

Hi @balaji.ramaswamy , please note i didn’t gave any properties while creating the tables, i only gave the columns

@gakshat1107 The table you have created through Hive is text format

STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

@balaji.ramaswamy ,

  1. I’ll change that property and will try to insert through Dremio.
  2. The issue is i’m not able to read the table that’s created using Dremio, or if i’ve inserted any data using Dremio UI into the hive table.

Refer to below:
Created HiveCatalog table through Dremio:
a. Inserted the data into that table
b. tried to read the data through hive, getting the error - FAILED: SemanticException [Error 10055]: Line 1:12 Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat ‘dremtest’: The class is class org.apache.hadoop.mapred.FileOutputFormat (need to look into it)
c. tried to read the data through spark-scala, getting the same error as I was getting for nessie java.lang.IllegalArgumentException

@gakshat1107 Both your create tables are not iceberg format, see SSL above, that is the reason you cannot insert into this table

Hi @balaji.ramaswamy,

  1. Yes, Hive table is not of iceberg format, i’ll correct it and then will try.
  2. But Hive table created through Dremio is in iceberg format ‘table_type’=‘ICEBERG’,

@gakshat1107 Although you mentioned table_type=ICEBERG the serde seems to be text. Can you run a select * from iceee.iceee1.dremtest and send me the profile?

Instead of above command, can you try the below via Dremio, then insert rows and then try to read via Dremio followed by spark or hive?

CREATE TABLE iceee.iceee1.dremtest(
id int,
data string)

Hi @balaji.ramaswamy, sorry for the late reply

  1. I created the table as suggested by you
    CREATE TABLE HiveCatalog.iceee1.dremtest1(
    id int,
    data varchar(200));

  2. I ran the show create table command and got the below output:
    CREATE EXTERNAL TABLE iceee1.dremtest1(
    id int,
    data string)
    ROW FORMAT SERDE
    ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
    STORED AS INPUTFORMAT
    ‘org.apache.hadoop.mapred.FileInputFormat’
    OUTPUTFORMAT
    ‘org.apache.hadoop.mapred.FileOutputFormat’
    LOCATION
    ‘s3a://iceee/iceee1/dremtest1’
    TBLPROPERTIES (
    ‘commit.manifest.target-size-bytes’=‘153600’,
    ‘compatibility.snapshot-id-inheritance.enabled’=‘true’,
    ‘metadata_location’=‘s3a://iceee/iceee1/dremtest1/metadata/00001-c0150820-eb52-4f26-b584-cd0560eff1d7.metadata.json’,
    ‘previous_metadata_location’=‘s3a://iceee/iceee1/dremtest1/metadata/00000-283ce5d6-487e-4cb3-a913-840e8c105bf7.metadata.json’,
    ‘table_type’=‘ICEBERG’,
    ‘transient_lastDdlTime’=‘1725860751’,
    ‘write.parquet.compression-codec’=‘zstd’)

  3. I queried the table through spark and got the error:
    spark.sql(“select * from iceee1.dremtest1”).show(5000,5000)
    java.lang.RuntimeException: java.lang.InstantiationException

  4. I queries the table through hive and it gave me no output

@gakshat1107 Is HiveCatalog in your step #1 a Dremio source?

Hi @balaji.ramaswamy , yes I created that table by connecting Hive in Dremio

Hi @balaji.ramaswamy , can you please try with Nessie once too?
Cause we’ve to use Nessie not Hive