Cannot use partitions filters using VDS

Hello,
We are evaluating Dremio Enterprise (Using it via AWS Marketplace).
And we are facing some strange behavior.

We have a PDS that have a “change log of events” like this:

ID, STATUS, DATE, dir0, dir1, dir2
1, status1, 2021-09-15, year=2021, month=09, day=15
1, status2, 2021-09-16, year=2021, month=09, day=16
1, status3, 2021-09-20, year=2021, month=09, day=20
2, status1, 2021-09-13, year=2021, month=09, day=13
2, status2, 2021-09-14, year=2021, month=09, day=14

This PDS is partitioned using year, month and day.

But we only want the latest row for each ID based on the date column, then we created an VDS using this query:

SELECT ID, STATUS, DATE, dir0, dir1, dir2 FROM
(select *,
            row_number() over (partition by ID order by DATE desc) as line_num
    from RAW_CHANGELOG_TABLE) nested_0 where line_num = 1

The problem is, when we try to query this VDS using some partition filters, it seems that those filter are not used by the “PDS” from the source.
We always get memory exceeded even if we query something small like:

SELECT ID,STATUS,DATE FROM VDS_DATASET WHERE dir0='year=2021' and dir1='month=09' limit 10;

It seems that every time the query is doing a full scan at the PDS level.

There is some way to solve this problem?
Thanks!

1 Like

Hi @dfdf

Are you able to provide the qurty profile?

Yes sure, here it is.

e511d02d-75f8-402f-a95b-39efc53d507a.zip (39,8,KB)

@dfdf I see 27 row groups read per thread and total of 24 threads, does that look not pruned? I see this query completed successfully

How many row groups does this data set have in total?

Sorry @dfdf This query was accelerated, are you able to turn on planner.verbose_profile, run the query again and send us the profile?