Dynamic partition pruning


#1

Does Dremio support dynamic partition pruning on reflections?

I have a virtual dataset called “my_detail_view” which has a raw reflection on it that is partitioned by effective_date.

In the below query the values for effective_date are supplied explicitly and judging by the Input Record counts on the Job overview, dremio loads only the data for the partitions with the effective_date ‘20190118’, ‘20190117’ which is an expected number of records.

	select distinct r.effective_date, r.source
	from "NAS_POC".my_detail_view r
	where r.effective_date in ('20190118', '20190117')

Input Bytes: 573.60 MB
Input Records: 25,934,501

However, in the the below query, the value of the effective_date is retrieved dynamically from a subquery (which returns values ‘20190118’, ‘20190117’). In this case however, dremio loads all the data into memory rather than just for the partitions which utilizes significantly more memory and takes longer to run.

	select distinct r.effective_date, r.source
	from "NAS_POC".my_detail_view r
	where r.effective_date in (select effective_date from NAS_POC."dates") 
Input Bytes: 4.78 GB
Input Records: 221,172,150

#2

I’m afraid not at the moment. As partition pruning is done during planning, partition values have to be known before execution.


#3

hi @laurent

Is this a feature that is planned to be implemented in dremio at some point? Is there an alternative way you would suggest?

Our use case is running a report from tableau which needs data from a dataset for a dynamic set of dates based on another query. Even though we created a reflection with a partition on the effective_date, dremio will not take advantage of it in this case and loads all records into memory. The effect is that more memory is needed for the query to run and sometimes hits memory limits we have.


#4

It might be implemented at some point, although it’s not on our immediate roadmap at the moment. Unfortunately I don’t have an alternative to offer you.