Unable to Query Iceberg Data in Dremio - Data Written by Kafka Connector

Issue Description
I am using a Kafka Connect Sink Connector to write data to Iceberg tables with the following configuration:

connector.class=io.tabular.iceberg.connect.IcebergSinkConnector
tasks.max=2
topics=transactions
iceberg.tables.dynamic-enabled=true
iceberg.tables.route-field=table
iceberg.tables.auto-create-enabled=true
iceberg.tables.default-id-columns=id
iceberg.tables.upsert-mode-enabled=true
iceberg.tables.default-partition-by=date

What I’ve Tried

• I verified that the data files exist in the S3 bucket. Path: warehouse/bill_cbfaff37-06d3-4437-a4cd-67e96e3ba498/data/date=2024-03-04

• I ran:

The data is written successfully, and I can see the files in the S3 data directory. However, when I try to query these tables in Dremio, no data is returned.

ALTER TABLE nessie.juantu_aaa REFRESH METADATA;

Suspected Issue

It seems Dremio is unable to read the Iceberg metadata or data files generated by the Kafka Connector. This might be due to a difference in how metadata or partitions are handled between the two systems.

Questions

  1. Does Dremio support querying Iceberg tables written by external tools like Kafka Connect, especially when using upsert mode?

  2. Are there specific configurations needed in Dremio or the connector to ensure compatibility?

  3. Is there a way to force Dremio to recognize these data files?

Any insights or suggestions would be greatly appreciated!

Sounds like you’re hitting the same issues as we did. See Issues with equality deletes in 25.0.0 - and 25.1.0 (hadoop catalog)

Dremio doesn’t support reading table with equality deletes. It is still an issue in 25.2.0. If you do a full rewrite on the table after each update it works, but that’s not really a viable option.

Append only tables written by the connector works though.