Understanding dremio partition folder names

Hello friends, I want understand how dremio organizate folders when make partitions in a Iceberg Table. for example we have the follow table and partition definition (NAS source):

CREATE TABLE lake2.prod.posts
(
    ID                        varchar,
    ID_POST                   varchar,
    ID_PAGE                   varchar,
    ENGAGEMENT                double,
    IS_ROOT                   integer,
    ...
    TYPE                      varchar,
    IMPRESSIONS_PAID          integer,
    IMPRESSIONS_ORGANIC       integer,
    IMPRESSION_TOTAL          integer,
    REACH_TOTAL               integer,
    CLICKS_TOTAL              integer,
    CREATED_DATE              timestamp,
    
)

    partition by (bucket( 400, ID_PAGE ), TYPE, month(CREATED_DATE));

after ingest some 185M rows in the server 57865 folder inside each folder we have 2 folder corresponding to “type” partition but why 57865 folder? if only make 400 buckets?

The folder structure doesn’t need to mimic the partition spec. That’s how Hive partitioning works but this is Iceberg partitioning. Also, the folder structure is query engine specific since Iceberg spec doesn’t stipulate how the folder structure should be laid out. A partition value is a distinct tuple constructed from each partition column. What really matters is that the manifest list and manifest files map the partition values to their corresponding data files.

thank you @Benny_Chow , so If I want read/write this iceberg table I can do only with dremio? or could be using other engine/library because all are reference in manifest?

Of course the latter! That’s the value of Iceberg’s open table format.

1 Like