Issues with equality deletes in 25.0.0 - and 25.1.0 (hadoop catalog)

dotjdk · August 14, 2024, 10:12am

We have an Iceberg table created with the Iceberg Sink Connector. I am unable to read it from Dremio 25.0.0, getting the following error:

IllegalArgumentException: equalityFields and equalityVectors list sizes do not match

We are using the hadoop catalog, but I am wondering if it is related to this statement in the release notes (even though it says hive and glue only), and we have to wait for a new oss release to be able to access the table?

Issues may occur when reading Apache Iceberg tables with equality deletes from Hive or Glue sources. To resolve this issue, upgrade to version 25.0.4.

dotjdk · August 14, 2024, 10:15am

Side note: I have already tested the table access from spark and it works perfectly fine

balaji.ramaswamy · August 15, 2024, 5:36am

@dotjdk

Checking on this internally and will get back to you

Thanks,
Bali

dotjdk · August 21, 2024, 7:12am

Any update on this? It is a huge blocker for us.

We migrated to the S3 Iceberg Sink Connector from a Mongo Collection mainly because we can’t do incremental refresh based on modifiedTS on the mongo collection, since Dremio generates duplicates when the collection is not append-only. And continuously doing a full refresh is not an option on a collection with 90 million rows because it takes way too long, and Dremio gets stuck loading at 60 something million rows (same exact count every time).

Now we are facing this issue with Dremio not being able to read the Iceberg Table with delete files. And unfortunately the connector doesn’t support copy-on-write.

balaji.ramaswamy · August 21, 2024, 8:19pm

@dotjdk We have a similar issue resolved in 25.0.4, did you say you have already tested on 25.0.4?

dotjdk · August 22, 2024, 3:19am

No, we are on oss, so we don’t have access to 25.0.4

balaji.ramaswamy · August 26, 2024, 5:39am

@dotjdk You can try it on 25.1, which will be releasing soon

dotjdk · August 26, 2024, 6:32am

Thanks @balaji.ramaswamy … I will look forward to that and report back when I have tried it.

dotjdk · September 10, 2024, 8:49am

Same issue in 25.1.0

balaji.ramaswamy · September 14, 2024, 1:01am

@dotjdk Let me check and get back to you

balaji.ramaswamy · September 16, 2024, 4:37pm

@dotjdk Is this reproducible? If yes, what are the steps to reproduce this issue?

dotjdk · September 17, 2024, 3:55am

Yes. We are using the Tabular Iceberg Sink Connector to store a Kafka Changelog topic in Iceberg with the following connector config:

{
  "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
  "topics": "redacted-changelog",
  "name": "redacted-iceberg-sink",
  "tasks.max": "1",
  "iceberg.control.commit.interval-ms": "30000",
  "iceberg.tables": "redacted_ns.redacted_tablename",
  "iceberg.tables.default-id-columns": "redacted, redacted, redacted",
  "iceberg.tables.default-partition-by": "bucket(redacted, 5)",
  "iceberg.tables.upsert-mode-enabled": "True",
  "iceberg.tables.schema-case-insensitive": "True",
  "iceberg.tables.auto-create-enabled": "True",
  "iceberg.tables.auto-create-props.write.metadata.compression-codec": "gzip",
  "iceberg.tables.auto-create-props.write.distribution-mode": "range",
  "iceberg.tables.auto-create-props.write.metadata.previous-versions-max": "500",
  "iceberg.tables.auto-create-props.write.metadata.delete-after-commit.enabled": "true",
  "iceberg.tables.auto-create-props.write.target-file-size-bytes": "268435456",
  "iceberg.catalog.default-namespace": "redacted_ns",
  "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
  "iceberg.catalog.s3.path-style-access": "true",
  "iceberg.catalog.s3.delete-enabled": "false",
  "iceberg.catalog.type": "hadoop",
  "iceberg.catalog": "iceberg",
  "iceberg.catalog.s3.endpoint": "http://redacted",
  "iceberg.catalog.zookeeper.connectionString": "redacted:2181",
  "iceberg.catalog.lock-impl": "our.custom.ZookeeperLocker",
  "iceberg.catalog.warehouse": "s3a://redacted/",
  "iceberg.catalog.s3.access-key-id": "redacted",
  "iceberg.catalog.s3.secret-access-key": "redacted",
  "iceberg.hadoop.fs.s3a.secret.key": "redacted",
  "iceberg.hadoop.fs.s3a.access.key": "redacted",
  "iceberg.hadoop.fs.s3a.connection.ssl.enabled": "false",
  "iceberg.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
  "iceberg.hadoop.fs.s3a.path.style.access": "true",
  "iceberg.hadoop.fs.s3a.endpoint": "http://redacted",
  "iceberg.hadoop.fs.s3a.multipart.size": "100M",
  "iceberg.hadoop.fs.s3a.multipart.threshold": "2G",
  "key.converter": "io.confluent.connect.avro.AvroConverter",
  "key.converter.schema.registry.url": "http://redacted:8081",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://redacted:8081"
}

The table is readable in Spark, but causes the reported error when trying to access it from Dremio

ben · January 10, 2025, 4:05am

@dotjdk is this still an issue? If so, please provide a query profile that shows the read error.

dotjdk · January 10, 2025, 5:53am

Yes, it is still an issue in Dremio 25.2.0. I don’t have any data on kafka for which I can share a profile without sensitive data. The best I can do is provide the stacktrace from the profile.

The issue is easily recreateable using the connector config I provided.

Initially, I didn’t bucket the data, but ran into the issue of Dremio not supporting global equality deletes on unpartitioned tables.

         SYSTEM ERROR: IllegalArgumentException: equalityFields and equalityVectors list sizes do not match

SqlOperatorImpl TABLE_FUNCTION
Location 1:7:4
ErrorOrigin: EXECUTOR
[Error Id: 6ce2248d-cd63-49cb-9245-d7f4d99a7158 on dremio-executor-0.dremio-cluster-pod.mass-dev-dremio-analytics.svc.cluster.local:0]

  (java.lang.RuntimeException) java.lang.IllegalArgumentException: equalityFields and equalityVectors list sizes do not match
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFileReader.buildHashTable():101
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.lambda$get$1():58
    java.util.stream.ReferencePipeline$3$1.accept():197
    java.util.ArrayList$ArrayListSpliterator.forEachRemaining():1625
    java.util.stream.AbstractPipeline.copyInto():509
    java.util.stream.AbstractPipeline.wrapAndCopyInto():499
    java.util.stream.ReduceOps$ReduceOp.evaluateSequential():921
    java.util.stream.AbstractPipeline.evaluate():234
    java.util.stream.ReferencePipeline.collect():682
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.get():63
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFilter.setup():110
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():282
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():180
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():151
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():87
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():84
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():192
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():180
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():128
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():257
    com.dremio.sabot.driver.StraightPipe.pump():55
    com.dremio.sabot.driver.Pipeline.doPump():134
    com.dremio.sabot.driver.Pipeline.pumpOnce():124
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():690
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():595
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1274
    com.dremio.sabot.task.AsyncTaskWrapper.run():130
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():281
    com.dremio.sabot.task.slicing.SlicingThread.run():186
  Caused By (java.lang.IllegalArgumentException) equalityFields and equalityVectors list sizes do not match
    com.google.common.base.Preconditions.checkArgument():143
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteHashTable$Builder.<init>():98
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFileReader.buildHashTable():88
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.lambda$get$1():58
    java.util.stream.ReferencePipeline$3$1.accept():197
    java.util.ArrayList$ArrayListSpliterator.forEachRemaining():1625
    java.util.stream.AbstractPipeline.copyInto():509
    java.util.stream.AbstractPipeline.wrapAndCopyInto():499
    java.util.stream.ReduceOps$ReduceOp.evaluateSequential():921
    java.util.stream.AbstractPipeline.evaluate():234
    java.util.stream.ReferencePipeline.collect():682
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.get():63
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFilter.setup():110
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():282
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():180
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():151
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():87
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():84
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():192
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():180
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():128
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():257
    com.dremio.sabot.driver.StraightPipe.pump():55
    com.dremio.sabot.driver.Pipeline.doPump():134
    com.dremio.sabot.driver.Pipeline.pumpOnce():124
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():690
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():595
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1274
    com.dremio.sabot.task.AsyncTaskWrapper.run():130
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():281
    com.dremio.sabot.task.slicing.SlicingThread.run():186


SqlOperatorImpl TABLE_FUNCTION
Location 1:7:4

ErrorOrigin: EXECUTOR

com.dremio.exec.store.iceberg.deletes.EqualityDeleteFileReader(EqualityDeleteFileReader.java:101)
com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier(LazyEqualityDeleteTableSupplier.java:58)
...(:0)
com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier(LazyEqualityDeleteTableSupplier.java:63)
com.dremio.exec.store.iceberg.deletes.EqualityDeleteFilter(EqualityDeleteFilter.java:110)
com.dremio.exec.store.parquet.UnifiedParquetReader(UnifiedParquetReader.java:282)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:180)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:151)
com.dremio.exec.store.parquet.ParquetCoercionReader(ParquetCoercionReader.java:87)
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader(AdditionalColumnsRecordReader.java:84)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:192)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:180)
com.dremio.sabot.op.tablefunction.TableFunctionOperator(TableFunctionOperator.java:128)
com.dremio.sabot.driver.SmartOp$SmartSingleInput(SmartOp.java:257)
com.dremio.sabot.driver.StraightPipe(StraightPipe.java:55)
com.dremio.sabot.driver.Pipeline(Pipeline.java:134)
com.dremio.sabot.driver.Pipeline(Pipeline.java:124)
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper(FragmentExecutor.java:690)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:595)
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl(FragmentExecutor.java:1274)
com.dremio.sabot.task.AsyncTaskWrapper(AsyncTaskWrapper.java:130)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:281)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:186)

Topic		Replies	Views
Can't read Iceberg V2 tables with equality deletes on Dremio Software v25.0 Dremio University	3	332	May 1, 2024
IllegalArgumentException: equalityFields and equalityVectors list sizes do not match - Error reading data from iceberg tables on Dremio Apache Iceberg	1	61	January 5, 2025
When can it be supported iceberg v2 tables with equality delete file Dremio University	3	834	March 31, 2023
Dremio not supports reading Apache Iceberg v2 tables with positional deletes	9	846	December 7, 2023
Unable to Query Iceberg Data in Dremio - Data Written by Kafka Connector Apache Iceberg	11	438	January 10, 2025

Issues with equality deletes in 25.0.0 - and 25.1.0 (hadoop catalog)

Related topics