I tested a scenario with two tables (in RedShift & MySQL) joined on a key to return single row. Both of the following queries involved full-table scan. Are we supposed to make any settings?
Qry-01:
SELECT a.*
FROM ds_redshift a JOIN ds_mysql b ON a.id = b.id
WHERE a.id = 123 and b.id = 123
Qry-02:
SELECT a.*
FROM ds_redshift a JOIN ds_mysql b ON a.id = b.id
WHERE a.id = 123
This is a known issue that will be fixed in an upcoming release. In the mean time, would you be able to restructure your query so that the filters explicitly happen before the join using sub-queries?
Eg for query 1:
SELECT a.* from (SELECT * from ds_redshift where id = 123) a join (SELECT * FROM ds_mysql where ID = 123) b ON a.id = b.id
query 2:
SELECT a.* from (SELECT * FROM ds_redshift where id = 123) a join ds_mysql b on a.id = b.id
@jduong I don’t have it handy… but, the query I shared is exactly the one being executed. I confirmed on both DB Server consoles that full table scan is happening. While I will try to fetch query profile next time, I am not sure why full table scan is initiated for this query as it it not complex and the predicate is clearly on a single id column
The reason is that the join won’t get pushed down (since this is across two different sources), but the planner is not pushing the filter to each join operand (this is being fixed).
hi @jduong I tested with sub-queries on the same large tables, which returned “No Data”. I created two test tables (across two datasources) and they returned the result (ds_redshift+ ds_mysql worked fine, which is test table with 3 or 4 records)
I think the later query works, because the key in join operation happens to the first member of the select…this took us few iterations to figure out Not sure if this is getting addressed in the upcoming 2.1 release?