Bug: Wrong plan for query is giving the wrong result

Bug: Dremio’s planning only considers one of the parent tables and somehow ignores part of the query with the other.

My query:

select
*
from haygot.haygot.crm_leadevent le
where
le.id not in (select event_id from haygot.haygot.crm_communication)

It has two parents

haygot.haygot.crm_leadevent and
haygot.haygot.crm_communication)

Query profile attached.
b2674da4-7ff2-4a25-b09d-c0e303444f00.zip (11.1 KB)

Here’s the query that worked.

select
count(1)
from haygot.haygot.crm_leadevent le
where
le.id not in (select event_id from haygot.haygot.crm_communication where event_id is not null)

Query profile for the workaround
a4846703-ffef-4b39-b3ad-2c7f495f4706.zip (11.8 KB)

But I need to know the root cause of the bug. We need to be able to trust the results we get are correct. Please help.

@joejk

For the query you are saying is wrong results, we are pushing down the below SQL. Can you please run the below SQL on Postgres and see if you get wrong results? Also can you check if under advanced options “Enable Legacy Dialect” is checked or unchecked?

SELECT “t”.“id”, “t”.“lead_id”, “t”.“lead_campaign_id”, “t”.“event_type” COLLATE “C”, “t”.“created_on”, “t”.“modified_on”, “t”.“scheduled_at”, “t”.“finished_at”, “t”.“rescheduled_to”, “t”.“status” COLLATE “C”, “t”.“defaulted_by” COLLATE “C”, “t”.“root_communication_id”, “t”.“otp” COLLATE “C”, “t”.“phone” COLLATE “C”, “t”.“parent_id”, “t”.“assigned_to_id”, “t”.“modified_at”, “t”.“sprint_id”, “t”.“is_demo_confirmed”, “t”.“event_confirmed_by_id”, “t”.“is_paid”, “t”.“event_started_at”
FROM (SELECT “crm_leadevent”.“id”, “crm_leadevent”.“lead_id”, “crm_leadevent”.“lead_campaign_id”, “crm_leadevent”.“event_type” COLLATE “C”, “crm_leadevent”.“created_on”, “crm_leadevent”.“modified_on”, “crm_leadevent”.“scheduled_at”, “crm_leadevent”.“finished_at”, “crm_leadevent”.“rescheduled_to”, “crm_leadevent”.“status” COLLATE “C”, “crm_leadevent”.“defaulted_by” COLLATE “C”, “crm_leadevent”.“root_communication_id”, “crm_leadevent”.“otp” COLLATE “C”, “crm_leadevent”.“phone” COLLATE “C”, “crm_leadevent”.“parent_id”, “crm_leadevent”.“assigned_to_id”, “crm_leadevent”.“modified_at”, “crm_leadevent”.“sprint_id”, “crm_leadevent”.“is_demo_confirmed”, “crm_leadevent”.“event_confirmed_by_id”, “crm_leadevent”.“is_paid”, “crm_leadevent”.“event_started_at”, “crm_communication”.“$f0”, “crm_communication”.“$f1”, “crm_leadevent”.“id” AS “id0”
FROM “haygot”.“crm_leadevent”
INNER JOIN (SELECT COUNT(*) AS “$f0”, COUNT(“event_id”) AS “$f1”
FROM “haygot”.“crm_communication”) AS “crm_communication” ON TRUE) AS “t”
LEFT JOIN (SELECT “event_id”, MIN(1) AS “$f1”
FROM “haygot”.“crm_communication”
GROUP BY “event_id”) AS “crm_communication0” ON “t”.“id0” = “crm_communication0”.“event_id”
WHERE “t”.“$f0” = 0 OR ((“crm_communication0”.“$f1” IS NULL AND “t”.“$f1” >= “t”.“$f0”) AND “t”.“id0” IS NOT NULL

Super thanks for the quick reply!
Enable legacy is unchecked.
Dremio couldn’t parse the query you gave. It’s showing an error for COLLATE.
Am I missing anything here?

@joejk,

Can you please run the SQL on Postgres?

There was a bracket to be closed at the end. Hopefully, I did it correctly.
This returned 0 rows. It should have returned > 0 rows.

SELECT “t”.“id”,
“t”.“lead_id”,
“t”.“lead_campaign_id”,
“t”.“event_type” COLLATE “C”,
“t”.“created_on”,
“t”.“modified_on”,
“t”.“scheduled_at”,
“t”.“finished_at”,
“t”.“rescheduled_to”,
“t”.“status” COLLATE “C”,
“t”.“defaulted_by” COLLATE “C”,
“t”.“root_communication_id”,
“t”.“otp” COLLATE “C”,
“t”.“phone” COLLATE “C”,
“t”.“parent_id”,
“t”.“assigned_to_id”,
“t”.“modified_at”,
“t”.“sprint_id”,
“t”.“is_demo_confirmed”,
“t”.“event_confirmed_by_id”,
“t”.“is_paid”,
“t”.“event_started_at”
FROM
(SELECT
“crm_leadevent”.“id”,
“crm_leadevent”.“lead_id”,
“crm_leadevent”.“lead_campaign_id”,
“crm_leadevent”.“event_type” COLLATE “C”,
“crm_leadevent”.“created_on”,
“crm_leadevent”.“modified_on”,
“crm_leadevent”.“scheduled_at”,
“crm_leadevent”.“finished_at”,
“crm_leadevent”.“rescheduled_to”,
“crm_leadevent”.“status” COLLATE “C”,
“crm_leadevent”.“defaulted_by” COLLATE “C”,
“crm_leadevent”.“root_communication_id”,
“crm_leadevent”.“otp” COLLATE “C”,
“crm_leadevent”.“phone” COLLATE “C”,
“crm_leadevent”.“parent_id”,
“crm_leadevent”.“assigned_to_id”,
“crm_leadevent”.“modified_at”,
“crm_leadevent”.“sprint_id”,
“crm_leadevent”.“is_demo_confirmed”,
“crm_leadevent”.“event_confirmed_by_id”,
“crm_leadevent”.“is_paid”,
“crm_leadevent”.“event_started_at”,
“crm_communication”.“$f0”,
“crm_communication”.“$f1”,
“crm_leadevent”.“id” AS “id0”
FROM “haygot”.“crm_leadevent”
INNER JOIN
(SELECT COUNT(*) AS “$f0”,
COUNT(“event_id”) AS “$f1”
FROM “haygot”.“crm_communication”
) AS “crm_communication” ON TRUE
) AS “t”
LEFT JOIN
(SELECT “event_id”,
MIN(1) AS “$f1”
FROM “haygot”.“crm_communication”
GROUP BY “event_id”
) AS “crm_communication0”
ON “t”.“id0” = “crm_communication0”.“event_id”
WHERE
“t”.“$f0” = 0
OR ((
“crm_communication0”.“$f1” IS NULL
AND “t”.“$f1” >= “t”.“$f0”
)
AND “t”.“id0” IS NOT NULL
)