Iceberg file size on dremio

Hi @balaji.ramaswamy we make some new test reproducing same behavior
I attach more information that can help you we run (before and after vacuum) also attached vacuum query profile

select * from TABLE( table_history( 'lake.prod.posts' ) );

select * from TABLE( table_snapshot( 'lake.prod.posts' ) );

select * from TABLE( table_files( 'lake.prod.posts' ) );

before vacuum 233GB on disk , table files > 4030

after vacuum 209G on disk

select SUM(file_size_in_bytes) from TABLE( table_files( ‘lake.prod.posts’ ) );

– return 2151709359

as you can see after vacuum we have 209G on disk that not correspond of real data, even not correspond with sum of file_size_in_bytes

also attach the folder hierachy before (test3.txt) and after vacuum (test3_after.txt)

vacuum3.zip (3,8 MB)

thank by your help, I think this is a good case to optimize dremio.

I’m on the lookout for anything you need, including access to our environment and data or a remote session if is necesary