Issues with equality deletes in 25.0.0 - and 25.1.0 (hadoop catalog)

We have an Iceberg table created with the Iceberg Sink Connector. I am unable to read it from Dremio 25.0.0, getting the following error:

IllegalArgumentException: equalityFields and equalityVectors list sizes do not match

We are using the hadoop catalog, but I am wondering if it is related to this statement in the release notes (even though it says hive and glue only), and we have to wait for a new oss release to be able to access the table?

  • Issues may occur when reading Apache Iceberg tables with equality deletes from Hive or Glue sources. To resolve this issue, upgrade to version 25.0.4.

Side note: I have already tested the table access from spark and it works perfectly fine

@dotjdk

Checking on this internally and will get back to you

Thanks,
Bali

1 Like

Any update on this? It is a huge blocker for us.

We migrated to the S3 Iceberg Sink Connector from a Mongo Collection mainly because we can’t do incremental refresh based on modifiedTS on the mongo collection, since Dremio generates duplicates when the collection is not append-only. And continuously doing a full refresh is not an option on a collection with 90 million rows because it takes way too long, and Dremio gets stuck loading at 60 something million rows (same exact count every time).

Now we are facing this issue with Dremio not being able to read the Iceberg Table with delete files. And unfortunately the connector doesn’t support copy-on-write.

@dotjdk We have a similar issue resolved in 25.0.4, did you say you have already tested on 25.0.4?

No, we are on oss, so we don’t have access to 25.0.4 :frowning:

@dotjdk You can try it on 25.1, which will be releasing soon

Thanks @balaji.ramaswamy … I will look forward to that and report back when I have tried it.

Same issue in 25.1.0

@dotjdk Let me check and get back to you

@dotjdk Is this reproducible? If yes, what are the steps to reproduce this issue?

Yes. We are using the Tabular Iceberg Sink Connector to store a Kafka Changelog topic in Iceberg with the following connector config:

{
  "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
  "topics": "redacted-changelog",
  "name": "redacted-iceberg-sink",
  "tasks.max": "1",
  "iceberg.control.commit.interval-ms": "30000",
  "iceberg.tables": "redacted_ns.redacted_tablename",
  "iceberg.tables.default-id-columns": "redacted, redacted, redacted",
  "iceberg.tables.default-partition-by": "bucket(redacted, 5)",
  "iceberg.tables.upsert-mode-enabled": "True",
  "iceberg.tables.schema-case-insensitive": "True",
  "iceberg.tables.auto-create-enabled": "True",
  "iceberg.tables.auto-create-props.write.metadata.compression-codec": "gzip",
  "iceberg.tables.auto-create-props.write.distribution-mode": "range",
  "iceberg.tables.auto-create-props.write.metadata.previous-versions-max": "500",
  "iceberg.tables.auto-create-props.write.metadata.delete-after-commit.enabled": "true",
  "iceberg.tables.auto-create-props.write.target-file-size-bytes": "268435456",
  "iceberg.catalog.default-namespace": "redacted_ns",
  "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
  "iceberg.catalog.s3.path-style-access": "true",
  "iceberg.catalog.s3.delete-enabled": "false",
  "iceberg.catalog.type": "hadoop",
  "iceberg.catalog": "iceberg",
  "iceberg.catalog.s3.endpoint": "http://redacted",
  "iceberg.catalog.zookeeper.connectionString": "redacted:2181",
  "iceberg.catalog.lock-impl": "our.custom.ZookeeperLocker",
  "iceberg.catalog.warehouse": "s3a://redacted/",
  "iceberg.catalog.s3.access-key-id": "redacted",
  "iceberg.catalog.s3.secret-access-key": "redacted",
  "iceberg.hadoop.fs.s3a.secret.key": "redacted",
  "iceberg.hadoop.fs.s3a.access.key": "redacted",
  "iceberg.hadoop.fs.s3a.connection.ssl.enabled": "false",
  "iceberg.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
  "iceberg.hadoop.fs.s3a.path.style.access": "true",
  "iceberg.hadoop.fs.s3a.endpoint": "http://redacted",
  "iceberg.hadoop.fs.s3a.multipart.size": "100M",
  "iceberg.hadoop.fs.s3a.multipart.threshold": "2G",
  "key.converter": "io.confluent.connect.avro.AvroConverter",
  "key.converter.schema.registry.url": "http://redacted:8081",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://redacted:8081"
}

The table is readable in Spark, but causes the reported error when trying to access it from Dremio