Good morning.
My organization is in the process of moving our reporting from our production database into Dremio.
We created a VDS using the following query:
SELECT
to_char( T.CREATION_DATETIME, 'YYYY-MM-DD' ) "Activity Date",
t.LOCATION_ID "Location",
COUNT( * ) "Moves"
FROM prod.MYSCHEMA.MY_TABLE T
WHERE T.SERVICE_CODE = 'INGAT' AND
T.COMPLETE_DATETIME IS NOT NULL AND
T.CANCELLATION_DATETIME IS NULL AND
T.CREATION_DATETIME >= TO_DATE ( '05-01-2021', 'MM-DD-YYYY') AND
T.CREATION_DATETIME < TO_DATE ( '06-01-2021', 'MM-DD-YYYY')
and t.LOCATION_ID in ('AB')
GROUP BY
t.LOCATION_ID,
to_char( T.CREATION_DATETIME, 'YYYY-MM-DD' )
order by
t.LOCATION_ID,
to_char( T.CREATION_DATETIME, 'YYYY-MM-DD' )
We can run this query directly against Oracle in less than a second, but when we try to pull it into dremio it takes up to 12 minutes. This is the query Oracle receives from dremio for the above:
SELECT *
FROM (SELECT "Activity Date", CAST('AB' AS VARCHAR(4000)) "Location", COUNT(*) "Moves"
FROM (SELECT CAST('AB' AS VARCHAR(4000)) "Location", TO_CHAR("CREATION_DATETIME", 'YYYY-MM-DD') "Activity Date"
FROM
(SELECT "MY_TABLE"."SERVICE_CODE", "MY_TABLE"."LOCATION_ID", "MY_TABLE"."CREATION_DATETIME", "MY_TABLE"."COMPLETE_DATETIME", "MY_TABLE"."CANCELLATION_DATETIME"
FROM "MYSCHEMA"."MY_TABLE") "MY_TABLE"
WHERE "SERVICE_CODE" = 'INGAT' AND "CANCELLATION_DATETIME" IS NULL AND
"CREATION_DATETIME" >= TIMESTAMP '2021-05-01 00:00:00.000' AND
"CREATION_DATETIME" < TIMESTAMP '2021-06-01 00:00:00.000' AND
"LOCATION_ID" = 'AB' AND "COMPLETE_DATETIME" IS NOT NULL) "MY_TABLE"
GROUP BY "Location", "Activity Date") "MY_TABLE"
ORDER BY "Location", "Activity Date"
“MY_TABLE” is a large table with millions of records.The unfiltered subquery against MY_TABLE is causing a full table scan and tanking performance.
What is the solution to this? I thought dremio was supposed to push filters down to the source.