This is working:
select t1.* from “mysql_a”.“db1”.“tb1” t1, “mongo_a”.“db1”.“tb1” t2
where t1.field_a = t2.field_b While this is not:
select t1.* from “mysql_a”.“db1”.“tb1” t1, “mongo_a”.“db1”.“tb1” t2
where t1.field_a != t2.field_b.
Error from Dremio is :
This query cannot be planned possibly due to either a cartesian join or an inequality join
Both field_a and field_b are string. Please help how to write “not equal” ? Thank you
you can try left join :
select t1.*
from “mysql_a”.“db1”.“tb1” t1 left join “mongo_a”.“db1”.“tb1” t2
on t1.field_a = t2.field_b
where t2.field_b is null
As the error you pasted, dremio don’t support cartesian join (without on clause) or an inequality join (on t1.field_a != t2.field_b). Your second sql is a cartesian join, so error occured. My above reply is an optional way to get the result same as your second sql. So if field_a = “abc” while filed_b = “def”, then my sql above will generate result like:
t1.field_a t1.field_b t1.field_c
abc xxx xxx
and your second sql will have the same result too (disregard of same lines and null lines).
But if you want to know why dremio don’t support cartesian join or inequality join. I think it’s because hard to implement and not urgent needs. Hive does not support inequality join in it’s early version too. Not official answer
Dremio does allow != as a filter condition on any non-JOIN query with datatypes that support that comparison.
It also supports != as the condition on the key some JOINs.
Dremio’s query planner will try to formulate an execution plan that doesn’t involve a cartesian join or inequality join, because these are slow to compute and take up a lot space during execution. But sometimes there’s no way around this and it will report back to you with an error like the one you saw.
We’re always making improvements to the query planner, so this may change in future releases.
Can you share the query profile? @hanbingwinter was kind enough to share a query re-write, but there might be other ways if we saw the query text and planning.