Dremio not supports reading Apache Iceberg v2 tables with positional deletes

I’m using latest dremio community to test query Iceberg table from spark job.I found the feature mentioned in doc not work:

1.My Iceberg format version is 2
image
2.I use merge-on-read mode for delete
image
3.I can see some positional delete related file in the table info-> ‘total-delete-files’: ‘1’, ‘total-position-deletes’: ‘1’, ‘total-equality-deletes’: ‘0’
4.Spark can query out with correct result.
5.Dremio query out all data without merge the delete file

@fiona Which source is this?

What do you mean about source? My data is on S3.

The issue could be that merge-on-read is not supported.

@fiona

If it’s filesystem source (which is true in your case as S3) changes won’t be visible until either scheduled or manual refresh.

Have you tried to do an “ALTER PDS REFRESH METADATA” and then see if it works?

Hi Balaji, thanks for your time. I reformat the table which should have the same effect like refresh metadata, right? But still not work.

Hi @fiona Are you getting the same error again?

Hi Balaji, no system error, just not show correct data like what I describe above.

@fiona

Have checked with our Iceberg team and will get back on the right answer

Thanks
Bali

Hi @fiona , I’m working with @balaji.ramaswamy on troubleshooting this.

Can you provide the following information:

  • Dremio version. Support for reading tables with positional deletes was added in version 23. This includes v2 tables created and modified in Spark with merge-on-read enabled.
  • Spark version.
  • Your Spark configuration for the Iceberg catalog, excluding any secrets. These would be all the spark.sql.catalog.* properties you have configured for your S3 Iceberg catalog, excluding access key/secret key/etc.
  • If you can reproduce the issue on a small Iceberg table that you can share, if you could zip up the entire table - data + metadata and share with us that would help as well.

I suspect the problem is related to path/URI normalization differences between Dremio and Spark. Being able to look at the table metatadata and delete files should help confirm if this is the case.

Thanks
Scott