Create Table NOT as iceberg

@balaji.ramaswamy

Using two iceberg tables as an illustration.

THe one in l0 is constantly getting new data via MERGE.

I perform vacuum (expire snapshot retain 1) and optimise after every MERGE.

However, it’s data size grows to 31GiB.

I ran this SQL to determine the file_size_in_bytes



select sum("record_count"), sum("file_size_in_bytes") from TABLE( table_files( 'Minio.finance.l0.XXXX' ) );

It says it is around 3.5GB (which is VERY DIFFERENT from the actual size of 31GiB).

========

However, after I recreate the table (and put it in the backup_2025_03_26)

create table Minio.finance.backup_2025_03_26."XXXXXX"

as (SELECT * FROM Minio.finance.l0."XXXXXX")

The file sized shrank to 3.3GiB. This is more inline with what the original statistics is saying.

Note that I have always expired all snapshots and it should not bloat to 31GiB.

I feel that I hit something similar to what is reported here (Iceberg file size on dremio - #13 by dacopan)

=======

In the original iceberg table (with regular MERGE of new data)

—> The data files folder are multiple. Those XLDIR directories were created after vacuum operation, where snapshots got deleted.

However, I think there are still dangling snapshots in the data files.

Meanwhile, in the recreated table (via CTAS), there’s just ONE data file folder

So I suppose that many data files folder in the original iceberg tables are actually useless (or else they would have been copied to the new iceberg table via CTAS)