I have a question about whether Dremio supports dynamic partitioning in queries.
Will Dremio add partitions to the query for optimization and enhanced performance, based on a client’s request?
For example, let’s say I have a table called “Employee” with a date column s_date and we maintain an indexed column for s_date in an integer format (YYYYMMDD). If a client submits a query with s_date in the WHERE condition, is there any feature in Dremio that can automatically identify the corresponding partition index and rewrite the query by adding the partition column for improved performance?
Additionally, I would like to know what is parameterized queries in Dremio and how they function in dremio ?
@JoiceJacob Depends on what source, if it is a RDBMS then the query is pushed down to the database example Oracle and use indexes/partition indexes directly on Oracle Server
While if this is Parquet or Iceberg then if the filter has a partition column then it would be a partition prune vs if a non-partition column then would be a filter push down. Happy to go over internals of the differences between the 2 if interested
Support for parameterized queries is in the roadmap but currently not there. Let us start with a profile of the job you are trying to analyze. Kindly attach the profile and we can go from there
I have a table called cust_rec_data that contains columns like customer_number, event_date, recharge_amount, recharge_channel, and part_idx (which represents the month extracted from event_date in MM format). The event_date column is indexed. Since the table holds a large amount of data for the last 12 months, we use part_idx for partitioning. When integrating Dremio with BI tools like Superset, the user wants to view the sum of recharge amounts by recharge_channel for a specific range between different event_date values. They don’t expect to manually select part_idx because they’ve already selected the date range based on event_date. Can we manipulate the query in Dremio to automatically add the relevant part_idx values based on the selected date range, without user input, so that the query runs much faster?