Table Employee has SnapshotDate, Employee_ID, Employee_Full_Name, Employee_Display_Name and is filtered to a list of departments via IN (,) and to a status of “Active”
Table Relation is a relationship table that has the relationship between EmployeeIDs. It contains SnapshotDate, Employee_ID1, Employee_ID2.
I have the 2 queries below, Query 2 is providing the correct row count. To me it seems like the optimizer is applying the filters to Employee incorrectly, like the filter in the self join (via relationship) is being applied to the base table. I have rewritten to use a CTE to generate the relationship with manager name and am getting the correct result, but curious on why this is happening. Help?
Query 1: ~100,000 rows, time 17s
SELECT
e.snapshot_date,
e.employee_id,
e.employee_full_name,
mgr.employee_display_name as ManagerName
FROM employee e
left join relation r
on e.snapshot_date = r.snapshot_date
and e.employee_id = r.employee_id1
and r.relation_type_id = ‘FullTimeEmployees’
left join employee mgr
on r.snapshot_date = mgr.snapshot_date
and r.employee_id2 = mgr.employee_id
Where e.Status_name = ‘Active’
And e.Department in (‘Accounting’,
‘HumanResources’,
‘Engineering’,
‘Retail’)
Query 2: ~1,000,000 rows, time 1m11s
SELECT DISTINCT
e.snapshot_date,
e.employee_id,
e.employee_full_name,
mgr.employee_display_name as ManagerName
FROM employee e
left join relation r
on e.snapshot_date = r.snapshot_date
and e.employee_id = r.employee_id1
and r.relation_type_id = ‘FullTimeEmployees’
left join employee mgr
on r.snapshot_date = mgr.snapshot_date
and r.employee_id2 = mgr.employee_id
Where e.Status_name = ‘Active’
And e.Department in (‘Accounting’,
‘HumanResources’,
‘Engineering’,
‘Retail’)