Partition filter not used when in clause

We have a table “posts” partitioned:

    partition by (bucket( 300, ID_PAGE ), month(CREATED_DATE))

when run this query: Dremio use partition filter on ID_PAGE

SELECT * FROM lake.prod.posts WHERE ID_PAGE ='7070561331787219974'

but when run this query: Dremio NO USE partition filter on ID_PAGE

SELECT * FROM lake.prod.posts WHERE ID_PAGE IN (SELECT ID_PAGE FROM lake.prod.spaces WHERE ID_SPACE = '6654026945b0245faa756c7a')

Query profiles:
71f242df-6a47-4d3a-ac31-7b88ace22efa.zip (20,3 KB)

4ad63fd5-58c7-475d-8a93-e524f27f18c6.zip (25,8 KB)

Please @Benny_Chow oe @balaji.ramaswamy can you make me a help Please?

@dacopan Thanks for uploading the profiles, let me review and get back to you

1 Like

@dacopan When using a subquery, direct pruning does not happen instead partitions are pruned via runtimefiltering. I do see on table_functiion 01-xx-06 runtime filter is getting applied but is not effective as zero partitions are getting pruned, see operator metrics under the table function and see column num_partitions_pruned. Expand operator details and you can see the filter arrived even before the probe side started scanning. Do you have the executor log file when this query was run?

yes, give a minutes to upload log

5655036a-3c55-4ceb-bd0d-771adba1488b.zip (41,8 KB)

log4.zip (14,7 KB)

@dacopan Are you able to try and run this on 2 executors? I am still investigating

thank you by your help, let me prepare infraestructure to run on 2 executors

Hello, Maybe you could validate something additional that would help me improve this? On my side, I have not been able to obtain the necessary infrastructure to port as you suggested.

This may be related to a known bug (DX-86309/DX-86575), I suggest you retry in the upcoming 25 release, as the fix should be included in it.

Thanks, Bogdan

1 Like

Hey, @dacopan! trying to repro this case, and already have query with a similar plan. But case with NUM_PARTITIONS_PRUNED 0 and NUM_RUNTIME_FILTERS 1 I can get only in case when SELECT from IN returns to many records, so partitions weren’t pruned. So I need more details:

  • are sure that some partitions should be pruned with this condition ( filter “not applied” correctly or it just “not effective”)?
  • if you could run this query with debug logs and share it - it would be very helpful!
  • did you try it with another condition in IN query?

Hello @Ivan_Chesnov please give a day to make a new test and sent all you are requesting me.

sorry by delay, @Ivan_Chesnov here you have all test

Test_partitions.zip (269,6 KB)

because of

    partition by (bucket( 300, ID_PAGE ), month(CREATED_DATE))

dremio will not prune non-identity partitions - it’s by design.

Hello @Ivan_Chesnov please can you explain more what meaning that why non-identity partitions

Also why partition pruning works when put direct values in where clause but not works when use subselect

@dacopan Partition pruning happens when given direct value, when a sub query is used, partitions will get pruned via runtime filtering

Hi @balaji.ramaswamy yes this behavior already you explained me, but @Ivan_Chesnov said that dremio not pruning with non-identity partitions so now I’m confused how partition pruning works when direct value vs subselect because if not pruning works with subselect, have partition is not usefull

@dacopan I took a closer look, the filter condition although a partition column, the plan on the query with the direct filter, the filter is applied as part of the DATA scan TABLE_FUNCTION 01-04. If it is a partition column it would be applied as part of the manifestContent=[DATA]) IcebergManifestList scan 02-03

Let me see why that part is mixed up?

Do you have the create table statement you used for creating this table?

Hello @balaji.ramaswamy sure this is the DDL


CREATE TABLE lake.prod.posts
(
    ID                        varchar,
    ID_POST                   varchar,
    ID_PAGE                   varchar,
    ENGAGEMENT                double,
    CREATED_DATE_STR          varchar,
    MODIFIED_DATE_STR         varchar,
    TIMESTAMP_DATE            timestamp,
    INTERACTIONS              integer,
    SHARES                    integer,
    SENTIMENT                 varchar,
    GENDER                    varchar,
    IS_ROOT                   integer,
    USERNAME                  varchar,
    NAME                      varchar,
    REGION                    varchar,
    PROVINCE                  varchar,
    CITY                      varchar,
    CONTEXT                   varchar,
    USERNAME_COMMENTS         varchar,
    COMMENTS                  integer,
    REACTIONS_POST            integer,
    SORRY_COMPARATIVE         integer,
    ANGER_COMPARATIVE         integer,
    HAHA_COMPARATIVE          integer,
    LOVE_COMPARATIVE          integer,
    WOW_COMPARATIVE           integer,
    LIKE_COMPARATIVE          integer,
    TEXT                      varchar,
    PUBLICATION_TYPE          varchar,
    PUBLICATION_TYPE_COAUTHOR integer,
    VIDEO_VIEWS               integer,
    CONCEPTS                  varchar ARRAY,
    EMOJIS                    varchar ARRAY,
    VERBS                     varchar ARRAY,
    HASHTAGS                  varchar ARRAY,
    RED                       integer,
    RED_NAME                  varchar,
    TYPE                      varchar,
    IS_RT                     integer,
    MEDIA                     varchar,
    AGE                       varchar,
    EMOTIONS                  integer,
    PERSONALITY               integer,
    TALK_CATEGORY             integer,
    IMPRESSIONS_PAID          integer,
    IMPRESSIONS_ORGANIC       integer,
    IMPRESSION_TOTAL          integer,
    REACH_TOTAL               integer,
    CLICKS_TOTAL              integer,
    CREATED_DATE              timestamp,
    MODIFIED_DATE             timestamp
)
    --@formatter:off
    partition by (bucket( 300, ID_PAGE ), TYPE, month(CREATED_DATE));