Issue Description
I am using a Kafka Connect Sink Connector to write data to Iceberg tables with the following configuration:
connector.class=io.tabular.iceberg.connect.IcebergSinkConnector
tasks.max=2
topics=transactions
iceberg.tables.dynamic-enabled=true
iceberg.tables.route-field=table
iceberg.tables.auto-create-enabled=true
iceberg.tables.default-id-columns=id
iceberg.tables.upsert-mode-enabled=true
iceberg.tables.default-partition-by=date
What I’ve Tried
• I verified that the data files exist in the S3 bucket. Path: warehouse/bill_cbfaff37-06d3-4437-a4cd-67e96e3ba498/data/date=2024-03-04
• I ran:
The data is written successfully, and I can see the files in the S3 data directory. However, when I try to query these tables in Dremio, no data is returned.
ALTER TABLE nessie.juantu_aaa REFRESH METADATA;
Suspected Issue
It seems Dremio is unable to read the Iceberg metadata or data files generated by the Kafka Connector. This might be due to a difference in how metadata or partitions are handled between the two systems.
Questions
-
Does Dremio support querying Iceberg tables written by external tools like Kafka Connect, especially when using upsert mode?
-
Are there specific configurations needed in Dremio or the connector to ensure compatibility?
-
Is there a way to force Dremio to recognize these data files?
Any insights or suggestions would be greatly appreciated!