Hello. I ve encoutered strange(at least for me) behavior when deleting records from Iceberg table out of the box. Aws Glue Datasource.
I ve created unpartitioned Test table using CTA with 20k rows (>170 columns if it matters) and 5mb of total size:
CREATE TABLE Table
LOCALSORT BY (SortColumn)
AS
SELECT ..170columns.. FROM ...
After that checking it’s files: SELECT * FROM TABLE( table_files( "Table"'))
Single <5mb file is created.
Than i ran query to delete 200 rows DELETE FROM "Table" where SortColumn=10;
Rows Deleted 200
Again ran SELECT * FROM TABLE( table_files( "Table"'))
Now 6 files were produced, each just 1mb of size(total of >6mb vs <5mb before)
Running OPTIMIZE TABLE "Table"
does what is expected: 1 file yielded again.
Doing similar DELETE from Athena produces such setup:
Is this by design? Splitting such a small file into 6 doesn’t sound bery optimal for me. How can i avoid it.
Dremio supports both COW and MOR as described here: Lakehouse Strategies: Copy-on-Write vs. Merge-on-Read
Your example above has COW with Dremio and MOR with Athena which is why POSITION_DELETE files are being generated.
As for why COW is producing more files, it has to do with parallelism of the DELETE query to re-write the original data file with the deleted rows excluded. It’s likely more optimal to parallelize the write when the data size is much larger.
I tried using merge-on-read.
CREATE TABLE TableMOR
LOCALSORT BY (SortColumn)
TBLPROPERTIES ('format-version' = '2', 'write.delete.mode' = 'merge-on-read', 'write.merge.mode' = 'merge-on-read', 'write.update.mode' = 'merge-on-read')
AS
SELECT * FROM ...
BUt got error: The target iceberg table's write.delete.mode table-property is set to 'merge-on-read', but dremio does not support this write property at this time. Please alter your write.delete.mode table property to 'copy-on-write' to proceed.
I am on AWS 25.0.7 build.
But anyway my original question is the same - is this normal that delete creates 6 files in COW? I i will issue another delete will it create +6 files?)
I mean when i use COW with INSERT/MERGE/UPDATE it keeps 1 or 2(if new records inserted) files. Than why parallelism was not applied in such case?)
I can not test COW Delete in Athena to compare cos it uses Trino that has no COW mode.
FYI: We are doing DELETE cos MERGE doesnt fit our need, cos we need INSERT OVERWRITE logic.
So which write modes does dremio support at the moment?
Another weird behavior…
When i run OPTIMIZE against single partition(truncate function is used to partition by col using
OPTIMIZE TABLE Table REWRITE DATA USING BIN_PACK
-- single partition
FOR PARTITIONS col IN (173500)
(MIN_INPUT_FILES = 1)
It produces 5 files…
When i run this command without Partitions clause nothing is compacted or 1 file is produced for the same modified partition as expected.
If change
FOR PARTITIONS col IN (173500, ...)
to include multiply partitions again it works fine(either nothing happens or single file per partition is produced).
partition total file size is small. Here is how it splits file into 5 for above example for no reason…
so each file is way below default Dremio target file size and even if target size would be below it wont explain why calling OPTIMIZE over all table or multiply partitions do not cause file split in this case.