Performance concern on OVER() function

jasoncai · July 6, 2020, 9:58am

Is it any performance concern to use the window function OVER() ?

Why I asked this is because we have a SQL query which uses the over() had experienced bad performance during period of time. But later then, the query is fast again. Maybe the slowness was also due to the overload of the system at that time (lots of processes were using dremio at that time)

So I would like to learn if there is any performance concern to use over() in dremio ? Our query is just simple use case like: this table is partitioned on column4
select * from (
select column1, row_number() over (partition by column2 order by column3) rowNum
from table
where column1 in ()
and column4 = ‘2020-06-01’
) t
and t.rowNum = 1

The question is:

will this kind of over() use case cause any performance issue?
will this kind of query which is using over() be much slower when the dremio cluster lack of resources? (I mean if the query is slow at some timing, is the over() the main and root cause?)
is there any suggestion/best practice to use the over() to gain the best performance?

As dremio is unlike RDBMS, please help advise on it. Thanks.

Rakesh_Malugu · July 6, 2020, 11:38am

Hello @jasoncai

Can you attach both the query profiles that ran slow and fast?

We need to figure it out what caused the slowness in the first run.

Thanks,
@Rakesh_Malugu

Topic		Replies	Views
Error when running window ROW_NUMBER function	18	7271	September 17, 2019
Join row_number over function in dremio getting Wrong Result	13	4666	September 5, 2019
Strange planner behavior (and performance consequences)	3	1392	September 3, 2019
Datetime not sorting using the SQL over partition	1	1145	August 2, 2021
Understanding raw profile	5	396	February 6, 2024

Performance concern on OVER() function

Related topics