I am attempting to use a list of Indexed ID’s stored in Dremio as a way to quickly lookup entries in a table with 200+ million rows. IOn oracle, the ID’s are stored as the Data Type “NUMBER”
When I run a query like
select * from oracle.big_table where big_table_id = 123456789
the query runs and I get the results back in less than a second.
Even if I run something like
select * from oracle.big_table where big_table_id in (List of 100 ID’s)
I get good performance.
But if I have a table in Dremio with the same 100 ID’s (dremio.table_ids) and try to run
select * from oracle.big_table where big_table_id in (select ID from dremio.table_ids)
It takes forever to run. Is there any way to make this work?
@sgoldsmith Wondering if the push down to Oracle is causing a FTS, any chance you can send us the 3 job profiles?
@sgoldsmith I see 3 profiles
1b1302e9-cc8c-32e5-6b73-31b0e6adca00 I see the filter gets pushed down and query completes in 0.24s
1b130214-c8d3-ae86-1ba2-db98095ed200 It is a IN FILTER rewritten as OR and filter does get push down and query completes in 0.26s
1b1301f3-8d23-d05d-f8c7-9d727c04b700 is a join between an Oracle Table and a PARQUET dataset, there is no push down possible in this and query completes in 1.3s
What is your specific concern?
Do you have the profile when it runs forever? Cancel after it runs for say 30m and send us the profile
Here is the profile of the long run.
e6eacacd-183b-4533-95be-4692229cd409.zip (18.8 KB)
@sgoldsmith Looks like there is no FILTER in the query and it is selecting all records from Oracle, since the query is cancelled, not sure how many records in the Oracle table. How many records are there in
HOLDINGDBO.POSITION_DETAIL? Since the inner query is from PARQUET, the result of the inner query cannot be pushed down into Oracle as they are 2 different sources in Oracle
I cancelled the query after about an hour. The HOLDINGDBO.POSITION_DETAIL has around 4.3 billion records.
@sgoldsmith Currently pushing down results from another source’s results into a different source. Can we archive the table and then create a reflection on the live table?