Hi @balaji.ramaswamy
I’ve followed all the above steps but still facing the issue with Nessie Catalog
I created a table through spark.sql in spark scala and insert data in it and tried to read it through the same method and also from Dremio UI - successfull
Then I inserted some data through Dremio UI into the same table and now trying to read it through dremio - successfull
But when I’m trying to read through spark.sql, it’s throwing the below error, is it because Dremio changes any metadata?
ERROR BaseReader: Error reading file(s): s3://etltest/Silver/region4_adfad9d4-bcec-4c49-a104-20b6940009f1/19465862-bf0b-ac9c-4ce5-2e28bd08ce00/0_0_0.parquet
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:275)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:553)
at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.SnappyDecompressor.uncompress(SnappyDecompressor.java:30)
at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.NonBlockedDecompressor.decompress(NonBlockedDecompressor.java:73)
at org.apache.iceberg.shaded.org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.iceberg.shaded.org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:286)
Connected to spark shell using spark-shell --jars /usr/local/lib/zstd-jni-1.5.2-5.jar,/data/spark/jars/dremio-jdbc-driver-11.0.0-202011171636110752-16ab953d.jar,/data/spark/jars/iceberg-spark-runtime-3.4_2.12-1.5.2.jar,/data/spark/jars/aws-java-sdk-1.11.901.jar,/data/spark/jars/aws-java-sdk-bundle-1.11.901.jar,/data/spark/jars/aws-java-sdk-dynamodb-1.11.901.jar,/data/spark/jars/aws-java-sdk-kms-1.11.901.jar,/data/spark/jars/aws-java-sdk-core-1.11.901.jar,/data/spark/jars/aws-java-sdk-s3-1.11.901.jar,/data/spark/jars/hadoop-aws-3.2.4.jar --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --conf “spark.executor.extraJavaOptions=-Djava.library.path=/usr/local/lib” --conf “spark.driver.extraJavaOptions=-Djava.library.path=/usr/local/lib” --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
Created a session to connect to nessie val sparknessie = SparkSession.builder().appName(“IcebergNessieExample”).config(“spark.jars.packages”,“org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.5.2,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.94.4”).config(“spark.sql.extensions”,“org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions”).config(“spark.sql.catalog.nessie.uri”, “http://10.**.196.***:19120/api/v1”).config(“spark.sql.catalog.nessie.ref”, “main”).config(“spark.sql.catalog.nessie.authentication.type”, “NONE”).config(“spark.sql.catalog.nessie.catalog-impl”, “org.apache.iceberg.nessie.NessieCatalog”).config(“spark.sql.catalog.nessie.warehouse”, “s3a://etltest”).config(“spark.sql.catalog.nessie”, “org.apache.iceberg.spark.SparkCatalog”).getOrCreate()
spark-sql> select * from test_spark_dremio_iceberg;
4 Pat Cash
3 Ivan Lendl
2 Boris Becker
1 Stefan Edberg
Time taken: 0.13 seconds, Fetched 4 row(s)
spark-sql>
Hi @balaji.ramaswamy ,
Hive was also having the same issue.
Can you please share the configuration and jars that you are using to connect spark to hive?
I’ll try the similar configurations for nessie too