Issues with equality deletes in 25.0.0 - and 25.1.0 (hadoop catalog)

We have an Iceberg table created with the Iceberg Sink Connector. I am unable to read it from Dremio 25.0.0, getting the following error:

IllegalArgumentException: equalityFields and equalityVectors list sizes do not match

We are using the hadoop catalog, but I am wondering if it is related to this statement in the release notes (even though it says hive and glue only), and we have to wait for a new oss release to be able to access the table?

  • Issues may occur when reading Apache Iceberg tables with equality deletes from Hive or Glue sources. To resolve this issue, upgrade to version 25.0.4.

Side note: I have already tested the table access from spark and it works perfectly fine

@dotjdk

Checking on this internally and will get back to you

Thanks,
Bali

1 Like

Any update on this? It is a huge blocker for us.

We migrated to the S3 Iceberg Sink Connector from a Mongo Collection mainly because we can’t do incremental refresh based on modifiedTS on the mongo collection, since Dremio generates duplicates when the collection is not append-only. And continuously doing a full refresh is not an option on a collection with 90 million rows because it takes way too long, and Dremio gets stuck loading at 60 something million rows (same exact count every time).

Now we are facing this issue with Dremio not being able to read the Iceberg Table with delete files. And unfortunately the connector doesn’t support copy-on-write.

@dotjdk We have a similar issue resolved in 25.0.4, did you say you have already tested on 25.0.4?

No, we are on oss, so we don’t have access to 25.0.4 :frowning:

@dotjdk You can try it on 25.1, which will be releasing soon

Thanks @balaji.ramaswamy … I will look forward to that and report back when I have tried it.

Same issue in 25.1.0

@dotjdk Let me check and get back to you

@dotjdk Is this reproducible? If yes, what are the steps to reproduce this issue?

Yes. We are using the Tabular Iceberg Sink Connector to store a Kafka Changelog topic in Iceberg with the following connector config:

{
  "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
  "topics": "redacted-changelog",
  "name": "redacted-iceberg-sink",
  "tasks.max": "1",
  "iceberg.control.commit.interval-ms": "30000",
  "iceberg.tables": "redacted_ns.redacted_tablename",
  "iceberg.tables.default-id-columns": "redacted, redacted, redacted",
  "iceberg.tables.default-partition-by": "bucket(redacted, 5)",
  "iceberg.tables.upsert-mode-enabled": "True",
  "iceberg.tables.schema-case-insensitive": "True",
  "iceberg.tables.auto-create-enabled": "True",
  "iceberg.tables.auto-create-props.write.metadata.compression-codec": "gzip",
  "iceberg.tables.auto-create-props.write.distribution-mode": "range",
  "iceberg.tables.auto-create-props.write.metadata.previous-versions-max": "500",
  "iceberg.tables.auto-create-props.write.metadata.delete-after-commit.enabled": "true",
  "iceberg.tables.auto-create-props.write.target-file-size-bytes": "268435456",
  "iceberg.catalog.default-namespace": "redacted_ns",
  "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
  "iceberg.catalog.s3.path-style-access": "true",
  "iceberg.catalog.s3.delete-enabled": "false",
  "iceberg.catalog.type": "hadoop",
  "iceberg.catalog": "iceberg",
  "iceberg.catalog.s3.endpoint": "http://redacted",
  "iceberg.catalog.zookeeper.connectionString": "redacted:2181",
  "iceberg.catalog.lock-impl": "our.custom.ZookeeperLocker",
  "iceberg.catalog.warehouse": "s3a://redacted/",
  "iceberg.catalog.s3.access-key-id": "redacted",
  "iceberg.catalog.s3.secret-access-key": "redacted",
  "iceberg.hadoop.fs.s3a.secret.key": "redacted",
  "iceberg.hadoop.fs.s3a.access.key": "redacted",
  "iceberg.hadoop.fs.s3a.connection.ssl.enabled": "false",
  "iceberg.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
  "iceberg.hadoop.fs.s3a.path.style.access": "true",
  "iceberg.hadoop.fs.s3a.endpoint": "http://redacted",
  "iceberg.hadoop.fs.s3a.multipart.size": "100M",
  "iceberg.hadoop.fs.s3a.multipart.threshold": "2G",
  "key.converter": "io.confluent.connect.avro.AvroConverter",
  "key.converter.schema.registry.url": "http://redacted:8081",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://redacted:8081"
}

The table is readable in Spark, but causes the reported error when trying to access it from Dremio

@dotjdk is this still an issue? If so, please provide a query profile that shows the read error.

Yes, it is still an issue in Dremio 25.2.0. I don’t have any data on kafka for which I can share a profile without sensitive data. The best I can do is provide the stacktrace from the profile.

The issue is easily recreateable using the connector config I provided.

Initially, I didn’t bucket the data, but ran into the issue of Dremio not supporting global equality deletes on unpartitioned tables.

         SYSTEM ERROR: IllegalArgumentException: equalityFields and equalityVectors list sizes do not match

SqlOperatorImpl TABLE_FUNCTION
Location 1:7:4
ErrorOrigin: EXECUTOR
[Error Id: 6ce2248d-cd63-49cb-9245-d7f4d99a7158 on dremio-executor-0.dremio-cluster-pod.mass-dev-dremio-analytics.svc.cluster.local:0]

  (java.lang.RuntimeException) java.lang.IllegalArgumentException: equalityFields and equalityVectors list sizes do not match
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFileReader.buildHashTable():101
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.lambda$get$1():58
    java.util.stream.ReferencePipeline$3$1.accept():197
    java.util.ArrayList$ArrayListSpliterator.forEachRemaining():1625
    java.util.stream.AbstractPipeline.copyInto():509
    java.util.stream.AbstractPipeline.wrapAndCopyInto():499
    java.util.stream.ReduceOps$ReduceOp.evaluateSequential():921
    java.util.stream.AbstractPipeline.evaluate():234
    java.util.stream.ReferencePipeline.collect():682
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.get():63
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFilter.setup():110
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():282
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():180
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():151
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():87
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():84
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():192
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():180
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():128
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():257
    com.dremio.sabot.driver.StraightPipe.pump():55
    com.dremio.sabot.driver.Pipeline.doPump():134
    com.dremio.sabot.driver.Pipeline.pumpOnce():124
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():690
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():595
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1274
    com.dremio.sabot.task.AsyncTaskWrapper.run():130
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():281
    com.dremio.sabot.task.slicing.SlicingThread.run():186
  Caused By (java.lang.IllegalArgumentException) equalityFields and equalityVectors list sizes do not match
    com.google.common.base.Preconditions.checkArgument():143
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteHashTable$Builder.<init>():98
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFileReader.buildHashTable():88
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.lambda$get$1():58
    java.util.stream.ReferencePipeline$3$1.accept():197
    java.util.ArrayList$ArrayListSpliterator.forEachRemaining():1625
    java.util.stream.AbstractPipeline.copyInto():509
    java.util.stream.AbstractPipeline.wrapAndCopyInto():499
    java.util.stream.ReduceOps$ReduceOp.evaluateSequential():921
    java.util.stream.AbstractPipeline.evaluate():234
    java.util.stream.ReferencePipeline.collect():682
    com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier.get():63
    com.dremio.exec.store.iceberg.deletes.EqualityDeleteFilter.setup():110
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():282
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():180
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():151
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():87
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():84
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():192
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():180
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():128
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():257
    com.dremio.sabot.driver.StraightPipe.pump():55
    com.dremio.sabot.driver.Pipeline.doPump():134
    com.dremio.sabot.driver.Pipeline.pumpOnce():124
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():690
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():595
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1274
    com.dremio.sabot.task.AsyncTaskWrapper.run():130
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():281
    com.dremio.sabot.task.slicing.SlicingThread.run():186


SqlOperatorImpl TABLE_FUNCTION
Location 1:7:4

ErrorOrigin: EXECUTOR

com.dremio.exec.store.iceberg.deletes.EqualityDeleteFileReader(EqualityDeleteFileReader.java:101)
com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier(LazyEqualityDeleteTableSupplier.java:58)
...(:0)
com.dremio.exec.store.iceberg.deletes.LazyEqualityDeleteTableSupplier(LazyEqualityDeleteTableSupplier.java:63)
com.dremio.exec.store.iceberg.deletes.EqualityDeleteFilter(EqualityDeleteFilter.java:110)
com.dremio.exec.store.parquet.UnifiedParquetReader(UnifiedParquetReader.java:282)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:180)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:151)
com.dremio.exec.store.parquet.ParquetCoercionReader(ParquetCoercionReader.java:87)
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader(AdditionalColumnsRecordReader.java:84)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:192)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:180)
com.dremio.sabot.op.tablefunction.TableFunctionOperator(TableFunctionOperator.java:128)
com.dremio.sabot.driver.SmartOp$SmartSingleInput(SmartOp.java:257)
com.dremio.sabot.driver.StraightPipe(StraightPipe.java:55)
com.dremio.sabot.driver.Pipeline(Pipeline.java:134)
com.dremio.sabot.driver.Pipeline(Pipeline.java:124)
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper(FragmentExecutor.java:690)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:595)
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl(FragmentExecutor.java:1274)
com.dremio.sabot.task.AsyncTaskWrapper(AsyncTaskWrapper.java:130)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:281)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:186)