Hi All,
We have 2 tables, one table only has 6 rows, and the other one has million rows. We would like to join these 2 tables.
Here is our sql
SELECT * ORACLE_TABLE
INNER JOIN MSSQL_TABLE ON MSSQL_TABLE.car_model = ORACLE_TABLE.car_model
Note: MSSQL_TABLE( only has 6 rows), ORACLE_TABLE( has million rows)
We saw Dremio pulls the whole rows from those tables, and then do the join. The joined result is correct, but I just wonder if there is a way to hint Dremio to select the small table first(MSSQL_TABLE), and then push down the result to the larger table(ORACLE_TABLE) ? This can avoid pulling the whole table data to Dremio.
@dacopan
Thanks, but it can’t work. I saw Dremio still pull the whole large table. Below is the plan, I’m seeking if there is any way for Dremio to execute “ParquetScan 02-01” first and then push down the result to the “Jdbc 01-10” later.
It seems you are joining across two data sources - table a -> sql server and table b -> oracle … and you are expecting it to do a push-down?! It will never happen.
Extract the 6 rows table car_model distinct value, hardcode it in your oracle SQL using an in clause. Something like this:
SELECT * ORACLE_TABLE
ORACLE_TABLE.car_model in (‘lambo’,‘fusion’,‘model s’,‘mini foo’,‘lx 560’,‘doogi’,‘turbo’)