Query Pushdown Issue with Dynamic Date Condition in Dremio

Hi team,
I am encountering an issue with query pushdown when using dynamic date conditions. My use case involves executing queries on a dynamic date condition, and I’ve observed that the query pushdown is not happening as expected. This is leading to performance issues when working with big data sets.
When I use a hardcoded date condition, the date filter is applied within the JDBC step, resulting in efficient query execution. However, when I use a dynamic date condition, a separate filter step is executed in Dremio, which affects performance. The query plan shows that the initial query is executed without any date filter, and the filter step is applied afterward within Dremio.
Below are the queries I am working with:
Query with hardcoded date condition:

select Cust_Sub_Id AS cust_sub_id,Msisdn as msisdn,Line_Type as line_type,OG_Bundle_Call_Revenue from “EDA Customer 360”.EDA.“Voice_Seg_Fct” where Event_Date between TO_DATE(‘2024-05-01’) and TO_DATE(‘2024-05-31’)

Query with dynamic date condition:

select Cust_Sub_Id AS cust_sub_id,Msisdn as msisdn,Line_Type as line_type,OG_Bundle_Call_Revenue from “EDA Customer 360”.EDA.“Voice_Seg_Fct” where Event_Date >= TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1) ,‘YYYY-MM-DD’,0) and Event_Date <= LAST_DAY(DATE_SUB(now(), CAST(1 AS INTERVAL MONTH)))

In the query plan for the dynamic date condition, the filter step is executed separately within Dremio, resulting in reduced performance.
Query Plan with hardcoded date condition:

Jdbc(sql=[SELECT Cust_Sub_Id AS cust_sub_id, Msisdn AS msisdn, Line_Type AS line_type, OG_Bundle_Call_Revenue
FROM (SELECT Voice_Seg_Fct.Msisdn, Voice_Seg_Fct.Cust_Sub_Id, Voice_Seg_Fct.Event_Date, Voice_Seg_Fct.OG_Bundle_Call_Revenue, Voice_Seg_Fct.Line_Type
FROM EDA.Voice_Seg_Fct) AS Voice_Seg_Fct
WHERE Event_Date >= CAST(‘2024-05-01’ AS DATE) AND Event_Date <= CAST(‘2024-05-31’ AS DATE)]) : rowType = RecordType(BIGINT cust_sub_id, VARCHAR(65536) msisdn, VARCHAR(65536) line_type, DECIMAL(16, 4) OG_Bundle_Call_Revenue): rowcount = 1.0, cumulative cost = {3.0 rows, 52.0 cpu, 50.0 io, 50.0 network, 0.0 memory}, id = 157631

Query Plan with dynamic date condition:

00-06 SelectionVectorRemover : rowType = RecordType(VARCHAR(65536) Msisdn, BIGINT Cust_Sub_Id, DATE Event_Date, DECIMAL(16, 4) OG_Bundle_Call_Revenue, VARCHAR(65536) Line_Type): rowcount = 1.0, cumulative cost = {4.0 rows, 63.00005 cpu, 50.0 io, 50.0 network, 0.0 memory}, id = 158099
00-07 Filter(condition=[AND(>=($2, TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(2024-06-07 06:49:15.001:TIMESTAMP(3)), 2:INTERVAL MONTH), 1), ‘YYYY-MM-DD’:VARCHAR(10), 0)), <=($2, 2024-05-31))]) : rowType = RecordType(VARCHAR(65536) Msisdn, BIGINT Cust_Sub_Id, DATE Event_Date, DECIMAL(16, 4) OG_Bundle_Call_Revenue, VARCHAR(65536) Line_Type): rowcount = 1.0, cumulative cost = {3.0 rows, 62.00005 cpu, 50.0 io, 50.0 network, 0.0 memory}, id = 158098
00-08 Project(Msisdn=[$0], Cust_Sub_Id=[$1], Event_Date=[$2], OG_Bundle_Call_Revenue=[$3], Line_Type=[$4]) : rowType = RecordType(VARCHAR(65536) Msisdn, BIGINT Cust_Sub_Id, DATE Event_Date, DECIMAL(16, 4) OG_Bundle_Call_Revenue, VARCHAR(65536) Line_Type): rowcount = 1.0, cumulative cost = {2.0 rows, 50.00005 cpu, 50.0 io, 50.0 network, 0.0 memory}, id = 158097
00-09 Jdbc(sql=[SELECT Voice_Seg_Fct.Msisdn, Voice_Seg_Fct.Cust_Sub_Id, Voice_Seg_Fct.Event_Date, Voice_Seg_Fct.OG_Bundle_Call_Revenue, Voice_Seg_Fct.Line_Type
FROM EDA.Voice_Seg_Fct]) : rowType = RecordType(VARCHAR(65536) Msisdn, BIGINT Cust_Sub_Id, DATE Event_Date, DECIMAL(16, 4) OG_Bundle_Call_Revenue, VARCHAR(65536) Line_Type): rowcount = 1.0, cumulative cost = {1.0 rows, 50.0 cpu, 50.0 io, 50.0 network, 0.0 memory}, id = 157959

Dremio version : 24
Note : I am executing the above queries in SingleStore and have connected to SingleStore using the MySQL connector.

Is there a way to ensure that the date filter is applied within the JDBC step for dynamic date conditions, similar to the hardcoded date condition? Any insights or recommendations to improve the query pushdown for dynamic conditions would be greatly appreciated.
Please find the profile query for the above queries attached below.
Query with Harded Date.zip (19.3 KB)
Query with Dynamic Date.zip (20.5 KB)

Thank you!

I think you can change this to:

TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1))

And it should work. The 3 arg version of TO_DATE needs to take a string as input.

FYI, you can enable verbose profiles and observe where the constant folding occurs. It’s in the Reduce Expressions phase of planning.

1 Like

Hi @Benny_Chow Thanks for the response.
If changing the date condition as suggested resulted in a syntax error. Please find the error attached below.

Hi @Benny_Chow
When I enabled verbose profiles and examined the constant folding in the Reduce Expressions phase of query planning, I noticed that the query for the first date of last month is not being replaced, whereas the query for the last date is being replaced. Can you help me with this issue?

LogicalFilter(condition=[AND(>=($4, TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(2024-06-10 07:37:02.315:TIMESTAMP(3)), 2:INTERVAL MONTH), 1), ‘YYYY-MM-DD’:VARCHAR(10), 0)), <=($4, 2024-05-31))]): rowcount = 1.0, cumulative cost = {inf}, id = 174964.

Please find the full Reduce Expressions Planning Attached Below
Reduce Expressions Planning.zip (1.6 KB)

With my first suggestion, was the constant folding happening? If so, then what SQL was pushed down into mysql? This should be available in the verbose profile.

Thanks for you response.
The query you suggested, TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)), 1)), results in a syntax error. I have attached a screenshot of the syntax error in the previous comment.

I checked the constant folding for the following query:

SELECT
Cust_Sub_Id AS cust_sub_id,
Msisdn AS msisdn,
Line_Type AS line_type,
OG_Bundle_Call_Revenue
FROM
“EDA Customer 360”.EDA.“Voice_Seg_Fct”
WHERE
Event_Date >= TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)), 1), ‘YYYY-MM-DD’, 0)
AND Event_Date <= LAST_DAY(DATE_SUB(now(), CAST(1 AS INTERVAL MONTH)));

The constant folding for TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)), 1), 'YYYY-MM-DD', 0) results in TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(2024-06-10 07:37:02.315:TIMESTAMP(3)), 2:INTERVAL MONTH), 1), 'YYYY-MM-DD':VARCHAR(10), 0). In contrast, LAST_DAY(DATE_SUB(now(), CAST(1 AS INTERVAL MONTH))) is correctly folded to 2024-05-31.Reduced Expression Planning is attached in the previous comment.

From my understanding, TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)), 1), 'YYYY-MM-DD', 0) is not being constant folded correctly, whereas LAST_DAY(DATE_SUB(now(), CAST(1 AS INTERVAL MONTH))) is. Please correct me if I am wrong.

Could you kindly provide an update on the previous scenario and suggest an alternative approach.

Also,Is it possible to create a view with dynamic parameters, such as date variables passed via REST API, in Dremio?

If you run:

SELECT TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1))

Does the constant folding occur?

Then, if you run:

SELECT TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1)), * from EDA .Voice_Seg_Fct

Does the constant folding occur?

If no, to either, please provide the verbose profiles. It’s a zip file you get from downloading it from the jobs page.

For your second question, you can create a parameterized view using a tabular UDF. Here’s an example: Tabular User-Defined Functions Unveiled | Dremio

Thanks for the reply @Benny_Chow
I would like to bring to your attention that while running SELECT TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1)) (query 1), constant folding is happening. However, when running SELECT TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1)), * FROM EDA.Voice_Seg_Fct (query 2) and SELECT Cust_Sub_Id AS cust_sub_id, Msisdn AS msisdn, Line_Type AS line_type, OG_Bundle_Call_Revenue FROM "EDA Customer 360".EDA."Voice_Seg_Fct" WHERE Event_Date >= TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)),1)) AND Event_Date <= LAST_DAY(DATE_SUB(now(), CAST(1 AS INTERVAL MONTH))) (query 3), I am encountering a syntax error. Attached below are the error screenshots, verbose profile files for the above queries, and additional context. Can you please assist in identifying and resolving the syntax issue?



query 1 - verbose profile.zip (13.9 KB)
query 2 - verbose profile.zip (85.4 KB)
query 3 - verbose profile.zip (26.2 KB)

Hi @Benny_Chow

I have created two functions to calculate the first and last days of the previous month:

CREATE FUNCTION first_day_last_month() RETURNS DATE
RETURN (
SELECT TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(NOW()), CAST(2 AS INTERVAL MONTH)), 1), ‘YYYY-MM-DD’, 0)
);

CREATE FUNCTION last_day_last_month() RETURNS DATE
RETURN (
SELECT LAST_DAY(DATE_SUB(NOW(), CAST(1 AS INTERVAL MONTH)))
);

When I try to execute the following query using these functions, query pushdown is still not happening:

SELECT
Cust_Sub_Id AS cust_sub_id,
Msisdn AS msisdn,
Line_Type AS line_type,
OG_Bundle_Call_Revenue
FROM
“EDA Customer 360”.EDA.“Voice_Seg_Fct”
WHERE
Event_Date >= first_day_last_month()
AND Event_Date <= last_day_last_month();

Additionally, I created a function to return the first day of the previous month with the query you suggested, but I am encountering a syntax error when executing it:

Query :

SELECT TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(NOW()), CAST(2 AS INTERVAL MONTH)), 1), ‘YYYY-MM-DD’, 0);

Can you please suggest a solution for this.

I tested the following query in MySQL and it worked without any syntax errors, and the query pushdown occurred:

SELECT * FROM “Customer 360 Mysql”.“DREMIO_TEST_DB_82”.“customer_360_dremio_test”
WHERE Event_Date >= TO_DATE(DATE_ADD(DATE_SUB(LAST_DAY(now()), CAST(2 AS INTERVAL MONTH)), 1))
AND Event_Date <= LAST_DAY(DATE_SUB(now(), CAST(1 AS INTERVAL MONTH)));

However, when running the same query with SingleStore DB, I encountered syntax issues. I connected the SingleStore DB data source using the MySQL source. Could this issue be related to SingleStore?

Well that explains it then. I tested the constant folding and pushdown using mysql and it worked fine too. So, at first, I thought maybe there was a configuration issue with your mysql server. Now that you say you are actually using singlestore db, then you should use the singlestore connector and not mysql connector. Can you try this out?

@Benny_Chow
Thanks for the response.
While attempting to connect to my SingleStore DB using the SingleStore Data source, I am encountering a “500 - Request failed” error. However, I am able to connect using the same credentials with MySQL as the Data Source.
I am using dremio version : 24.3.2