VDS sending Full Table Scan to Oracle

hmarchman-jones · June 25, 2021, 2:18pm

Good morning.
My organization is in the process of moving our reporting from our production database into Dremio.

We created a VDS using the following query:

SELECT   
    to_char( T.CREATION_DATETIME, 'YYYY-MM-DD' ) "Activity Date",
    t.LOCATION_ID "Location",
          COUNT( * ) "Moves"
    FROM   prod.MYSCHEMA.MY_TABLE T                               
   WHERE   T.SERVICE_CODE = 'INGAT' AND
                   T.COMPLETE_DATETIME IS NOT NULL AND
                   T.CANCELLATION_DATETIME IS NULL AND
                   T.CREATION_DATETIME >=  TO_DATE ( '05-01-2021', 'MM-DD-YYYY') AND
                   T.CREATION_DATETIME < TO_DATE ( '06-01-2021', 'MM-DD-YYYY') 
                   and t.LOCATION_ID in ('AB')
GROUP BY   
          t.LOCATION_ID,
          to_char( T.CREATION_DATETIME, 'YYYY-MM-DD' )
order by 
    t.LOCATION_ID,
    to_char( T.CREATION_DATETIME, 'YYYY-MM-DD' )

We can run this query directly against Oracle in less than a second, but when we try to pull it into dremio it takes up to 12 minutes. This is the query Oracle receives from dremio for the above:

SELECT *
FROM (SELECT "Activity Date", CAST('AB' AS VARCHAR(4000)) "Location", COUNT(*) "Moves"
FROM (SELECT CAST('AB' AS VARCHAR(4000)) "Location", TO_CHAR("CREATION_DATETIME", 'YYYY-MM-DD') "Activity Date"
FROM 
(SELECT "MY_TABLE"."SERVICE_CODE", "MY_TABLE"."LOCATION_ID", "MY_TABLE"."CREATION_DATETIME", "MY_TABLE"."COMPLETE_DATETIME", "MY_TABLE"."CANCELLATION_DATETIME"
FROM "MYSCHEMA"."MY_TABLE") "MY_TABLE"
WHERE "SERVICE_CODE" = 'INGAT' AND "CANCELLATION_DATETIME" IS NULL AND
 "CREATION_DATETIME" >= TIMESTAMP '2021-05-01 00:00:00.000' AND
  "CREATION_DATETIME" < TIMESTAMP '2021-06-01 00:00:00.000' AND
   "LOCATION_ID" = 'AB' AND "COMPLETE_DATETIME" IS NOT NULL) "MY_TABLE"
GROUP BY "Location", "Activity Date") "MY_TABLE"
ORDER BY "Location", "Activity Date"

“MY_TABLE” is a large table with millions of records.The unfiltered subquery against MY_TABLE is causing a full table scan and tanking performance.

What is the solution to this? I thought dremio was supposed to push filters down to the source.

balaji.ramaswamy · June 29, 2021, 7:51am

@hmarchman-jones The reason this is happening is because Dremio maps Oracle date to a timestamp as Oracle date can also store time portion. This lead to the above problem so in our upcoming release we allow users to define the mapping, so in your case if you map it to a DATE it should use the index, even now it is pushing down but the index is not getting used resulting in a FTS

hmarchman-jones · July 1, 2021, 2:50pm

Thank you for the response.
Are you referring to the table index in Oracle, or indexes in Dremio?
Is there a way to work around this by casting the Activity Date field to another datatype?

balaji.ramaswamy · July 2, 2021, 8:12am

@hmarchman-jones Talking about the index on the date column inside Oracle, the upcoming version would allow you to map Oracle DATE to a Dremio DATE (instead of a TIMESTAMP) and hence pushdown into Oracle as a DATE and the index should be used

Topic		Replies	Views
VDS using native SQL	11	3031	November 27, 2018
Timetype in oracle	3	1198	August 14, 2018
Oracle Making Full table Scans in some cases	7	509	September 5, 2023
How to keep time when connecting to an Oracle column with a Date Type?	5	1092	July 7, 2020
Dremio adds 'TIMESTAMP' word when connects to Oracle DB	6	383	December 7, 2023

VDS sending Full Table Scan to Oracle

Related topics