Speed difference between Window Functions and Self-Join

Hi all,

I have the below 2 queries. Both queries try to find the last row in a table.

  • Query 1: Get Minimum then Self-join
with min_shareholding_data as (
SELECT sehk_code, participant_id, min(as_of_date_tz08) as as_of_date_tz08 FROM "l1-hk-trading".fact."fct_hk_stock_shareholding_history"
where as_of_date_tz08 >= '2024-01-01'
group by 1,2
)

select orig.as_of_date_tz08, orig.sehk_code, orig.participant_id, orig.shareholding_amount from min_shareholding_data "min"
left outer join "l1-hk-trading".fact."fct_hk_stock_shareholding_history" orig
using (sehk_code, participant_id, as_of_date_tz08)

ecd49274-91ef-49b7-824b-ffb065aed4ed.zip (109.4 KB)

Accelerated by 2 Raw Reflections, it ran for 2 seconds.

  • Query 2: Use row_number() window function and get the earliest row

select as_of_date_tz08, sehk_code, participant_id, shareholding_amount from "l1-hk-trading".fact."fct_hk_stock_shareholding_history" 
where as_of_date_tz08 >= '2024-01-01'
qualify row_number() over(partition by sehk_code, participant_id order by as_of_date_tz08 desc) = 1

5b1a3d2a-f52a-4036-95b2-ba75c826453a.zip (32.5 KB)

========

My understanding is, both queries should have run theoretically at the same speed.

Also, what puzzles mee speed. why there’s a difference in the raw reflections used.

===

The two raw reflections are

@Ken The first query does not have a Window function. The second query uses Window operator and no join. The entire time is spent on the Window operator. The planner evaluated both reflections and used the first one. The scan time is what the reflection would have mattered and that is anyway fast. You can override the planner and use the below optimizer hints and see if the other reflection is faster. But the time is all spent on the WINDOW operator. If you create a VDS with the WINDOW operator and then create a raw reflection on that VDS then fire the query on the VDS with just the filter column, you would see better performance but the reflection build will take time of the current query with the WINDOW operator