Not using distinct on a self join causing loss of data

Table Employee has SnapshotDate, Employee_ID, Employee_Full_Name, Employee_Display_Name and is filtered to a list of departments via IN (,) and to a status of “Active”
Table Relation is a relationship table that has the relationship between EmployeeIDs. It contains SnapshotDate, Employee_ID1, Employee_ID2.

I have the 2 queries below, Query 2 is providing the correct row count. To me it seems like the optimizer is applying the filters to Employee incorrectly, like the filter in the self join (via relationship) is being applied to the base table. I have rewritten to use a CTE to generate the relationship with manager name and am getting the correct result, but curious on why this is happening. Help?

Query 1: ~100,000 rows, time 17s

SELECT
e.snapshot_date,
e.employee_id,
e.employee_full_name,
mgr.employee_display_name as ManagerName
FROM employee e

left join relation r
on e.snapshot_date = r.snapshot_date
and e.employee_id = r.employee_id1
and r.relation_type_id = ‘FullTimeEmployees’

left join employee mgr
on r.snapshot_date = mgr.snapshot_date
and r.employee_id2 = mgr.employee_id

Where e.Status_name = ‘Active’
And e.Department in (‘Accounting’,
‘HumanResources’,
‘Engineering’,
‘Retail’)

Query 2: ~1,000,000 rows, time 1m11s

SELECT DISTINCT
e.snapshot_date,
e.employee_id,
e.employee_full_name,
mgr.employee_display_name as ManagerName
FROM employee e

left join relation r
on e.snapshot_date = r.snapshot_date
and e.employee_id = r.employee_id1
and r.relation_type_id = ‘FullTimeEmployees’

left join employee mgr
on r.snapshot_date = mgr.snapshot_date
and r.employee_id2 = mgr.employee_id

Where e.Status_name = ‘Active’
And e.Department in (‘Accounting’,
‘HumanResources’,
‘Engineering’,
‘Retail’)

Hi Justin, which dremio version are you using ?

Sorry about that, it’s on prem
Build
15.8.0-202107142036100714-d600589a
Edition
Enterprise Edition

bumping for visibility

Is there any thing else I need to do to get help on this?

@JustinB

I would like to see the plan generated for both SQL. Can we please have the 2 profiles?