Oracle Making Full table Scans in some cases

I am attempting to use a list of Indexed ID’s stored in Dremio as a way to quickly lookup entries in a table with 200+ million rows. IOn oracle, the ID’s are stored as the Data Type “NUMBER”

When I run a query like

select * from oracle.big_table where big_table_id = 123456789

the query runs and I get the results back in less than a second.

Even if I run something like

select * from oracle.big_table where big_table_id in (List of 100 ID’s)

I get good performance.

But if I have a table in Dremio with the same 100 ID’s (dremio.table_ids) and try to run

select * from oracle.big_table where big_table_id in (select ID from dremio.table_ids)

It takes forever to run. Is there any way to make this work?

@sgoldsmith Wondering if the push down to Oracle is causing a FTS, any chance you can send us the 3 job profiles?

e0f4d974-a14b-43f1-aebc-eeae94fe16a1.zip (12.3 KB)
1efd4d45-f535-489b-9ca9-b0c06b9b8b8d.zip (10.4 KB)
9579f45a-3d5c-4d95-afed-fb6ff7f7323a.zip (10.2 KB)

I recreated the scenario, here are the different profiles.

@sgoldsmith I see 3 profiles

  • Job ID# 1b1302e9-cc8c-32e5-6b73-31b0e6adca00 I see the filter gets pushed down and query completes in 0.24s

  • Job ID# 1b130214-c8d3-ae86-1ba2-db98095ed200 It is a IN FILTER rewritten as OR and filter does get push down and query completes in 0.26s

  • Job ID# 1b1301f3-8d23-d05d-f8c7-9d727c04b700 is a join between an Oracle Table and a PARQUET dataset, there is no push down possible in this and query completes in 1.3s

What is your specific concern?

Do you have the profile when it runs forever? Cancel after it runs for say 30m and send us the profile

Here is the profile of the long run.
e6eacacd-183b-4533-95be-4692229cd409.zip (18.8 KB)

@sgoldsmith Looks like there is no FILTER in the query and it is selecting all records from Oracle, since the query is cancelled, not sure how many records in the Oracle table. How many records are there in HOLDINGDBO.POSITION_DETAIL? Since the inner query is from PARQUET, the result of the inner query cannot be pushed down into Oracle as they are 2 different sources in Oracle

I cancelled the query after about an hour. The HOLDINGDBO.POSITION_DETAIL has around 4.3 billion records.

@sgoldsmith Currently pushing down results from another source’s results into a different source. Can we archive the table and then create a reflection on the live table?