I have a NAS source called lake
, and create tables using CREATE TABLE lake.xxxx
sentence
It got solved with setting compression to gzip instead of snappy while writing.
Thank you so much for your answer.
I had a similar issue with VACUUM as it did not reduce the iceberg file size at all.
I changed the compression to gzip and voila, it is working.
ALTER TABLE Minio.finance.l0.stockanalysis_balance_sheet_history SET TBLPROPERTIES (
'write.parquet.compression-codec' = 'gzip'
);
Then just do the normal VACUUM and it is WORKING!!!
My Thread: OPTIMIZE and VACUUM table cannot reduce iceberg metadata file size - #8 by Ken
@Ken What was your original compression?
My original compression was the default, which is zstd
.
@Ken We are checking on this and will revert back
Hi @dacropan,
I have the same observation that the table files tracked by Dremio (via the select * from table(table_files('XXX'))
sql where XXX is the fully qualified path of the table) is far fewer than the parquet data files under the iceberg folder on Disk.
My setup is Dremio with Minio as the object storage.
My thinking is that these are orphan data files which Dremio fails to delete.
My solution is to run a list object function in Minio and delete those which are not tracked by Dremio. Below is my solution for your reference.