Does Dremio support dynamic partition pruning on reflections?
I have a virtual dataset called “my_detail_view” which has a raw reflection on it that is partitioned by effective_date.
In the below query the values for effective_date are supplied explicitly and judging by the Input Record counts on the Job overview, dremio loads only the data for the partitions with the effective_date ‘20190118’, ‘20190117’ which is an expected number of records.
select distinct r.effective_date, r.source
from "NAS_POC".my_detail_view r
where r.effective_date in ('20190118', '20190117')
Input Bytes: 573.60 MB
Input Records: 25,934,501
However, in the the below query, the value of the effective_date is retrieved dynamically from a subquery (which returns values ‘20190118’, ‘20190117’). In this case however, dremio loads all the data into memory rather than just for the partitions which utilizes significantly more memory and takes longer to run.
select distinct r.effective_date, r.source
from "NAS_POC".my_detail_view r
where r.effective_date in (select effective_date from NAS_POC."dates")
Input Bytes: | 4.78 GB |
---|---|
Input Records: | 221,172,150 |