Error when running window ROW_NUMBER function

Chris_du_Toit · October 11, 2017, 4:19am

Any advice on getting my window function query to work would be great. I have created a dataset based on some SQL Server data and running the following query gives an error of “Cannot convert RexNode to equivalent Dremio expression. RexNode Class: org.apache.calcite.rex.RexOver, RexNode Digest: DENSE_RANK() OVER (PARTITION BY $16 ORDER BY $4 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)”

select
Invoice_ID
,TaskID
,ROW_NUMBER() OVER(partition by TaskID order by Invoice_ID DESC) as unq
,Invoice_Status
,BillableRate
FROM “@admin”."apientry.trftb_StageWFMInvoiceEntries"
where TaskID IS NOT NULL

jduong · October 11, 2017, 4:30am

Hi @Chris_du_Toit,

There’s currently a problem running window functions and OVER clauses when connecting Dremio to relational databases.

We’re investigating this issue and will let you know when we have more information.

Chris_du_Toit · October 11, 2017, 4:36am

Thanks for the quick reply. How do I turn a query from relational to a query on Dremio? In other words, how do I move the data from the relational source to Dremio’s in-memory structures?

kelly · October 11, 2017, 10:19am

When you access data from any source, the data is read into Apache Arrow in-memory buffers automatically.

You can also create Data Reflections, which maintain physically optimized representations of the source data in Dremio’s Reflection Store. Dremio’s cost-based optimizer will then consider all reflections when generating a query plan, and will usually determine running the query on one or more reflections is more cost effective than pushing the query into the source.

You can read more about Data Reflections here: http://docs.dremio.com/acceleration/

Chris_du_Toit · October 16, 2017, 1:07am

Thanks, I am now learning the Dremio and I see a reflection based on a RDBMS source also doesn’t support the OVER clause.
(1) Is there any workaround I can currently bring in to have the OVER clause work or is this just a feature/fix I need to wait for?
(2) Also, what’s the best way to monitor progress on this becoming available?

kelly · October 16, 2017, 1:33am

Can you attach the query profile?

Also, are you still seeing this on 1.2.2?

Kelly

Chris_du_Toit · October 16, 2017, 2:29am

Hi Kelly,

Yes, I have upgraded to 1.2.2 from 1.2.1. Please find the query profile here d2d22ab4-195e-4bab-ae09-b66cf0801b08.zip (2.7 KB)

Thanks for the quick response to this. It is really appreciated and makes for a great user experience.

Chris

Chris_du_Toit · October 17, 2017, 2:57am

Hi Kelly,

Any idea what I might be doing wrong?

balaji.ramaswamy · October 17, 2017, 5:59pm

Hi @Chris_du_Toit,

Currently analyzing the profile you have uploaded. Will get back to you in a few …

Thanks,
@balaji.ramaswamy

balaji.ramaswamy · October 17, 2017, 11:03pm

Hi @Chris_du_Toit,

If you have a JSON or a CSV file that has these columns the query will work fine. We have a bug when we need to push this to a RDBMS (JDBC pushdown) . We are working on the bug and it should arrive in our next release.

Kindly let me know if you have any further questions

Thanks,
@balaji.ramaswamy

Chris_du_Toit · October 18, 2017, 7:57pm

Thanks, what is the best way to stay up to date with new releases?

kelly · October 19, 2017, 2:40pm

Hi Chris,

We announce new releases here. You can expect ~1/month for the next several months.

Kelly

Chris_du_Toit · December 4, 2017, 9:59pm

Hi Kelly,

I know it’s not in the Release Notes for v1.3, but I just wanted to let you know that the WINDOW function error is still popping up v1.3.

BTW - Love the row numbers in the query windows.

Thanks
Chris

can · January 25, 2018, 2:57am

@Chris_du_Toit following up on your comment on Possible View of Dremio Roadmap.

We’re currently targeting 1.6 (2 releases out) for fixing this issue – it turned out to be a more involved fix than initially estimated.

gpaparisteidis · September 4, 2019, 12:07pm

Hey, has this been fixed? Am seeing an issue on Rank() window function, where ordering is ignored, and would like to know if it is related with that.

ben · September 4, 2019, 4:03pm

@gpaparisteidis, it’s possible, as we’ve fixed some errors around Window functions.

What version of Dremio are you using? Can you share a query profile for the failed job?

gpaparisteidis · September 5, 2019, 8:08am

Hey @ben , thanks for your response.
This the query profile of the job. 0474078f-a8f6-4497-80d5-a6d2710adcf5.zip (9.2 KB)
As I said the problem lies with the ordering of the results, where ORDER BY DESC seems to be ignored.

I am using version 3.1.1

ben · September 5, 2019, 3:02pm

Hello @gpaparisteidis,

So the issue is not an error, but what looks like “wrong results”.

In your query Dremio actually pushes the query down into the underlying source, MS SQL Server:

SELECT
    [IssueType] COLLATE LATIN1_GENERAL_BIN2 AS [IssueType],
    [ActualImpact],
    RANK() OVER (PARTITION 
BY
    [IssueType] COLLATE LATIN1_GENERAL_BIN2  
ORDER BY
    CASE 
        WHEN [ActualImpact] IS NULL THEN 1 
        ELSE 0 
    END DESC,
    [ActualImpact]) AS [issue_rank] 
FROM
    [1-BaApi].[dbo].[Issues] 
WHERE
    [ActualImpact] IS NOT NULL]

But Dremio orders null values as highest while SQL Server orders them lowest so that’s why you see the CASE statement in the pushed down query and probably the unexpected ordering. Does that explain what you are seeing?

gpaparisteidis · September 17, 2019, 3:26pm

Hi @ben, thanks for your response and sorry for the delayed reply, I was on holidays.

While I understand how Dremio and SQL Server order NULL values differently, what I don’t get is how is this affecting the results in my case, since there is a WHERE clause that only returns NON-NULL values.

I addition to that, the query that you have posted above seems to be defining ordering between NULL and NON-NULL values, however what I am seeing is un-ordered (or to be more specific ordered in ASC) results between NON-NULL values.

SELECT IssueType, ActualImpact,

RANK() OVER (PARTITION BY IssueType ORDER BY ActualImpact ASC) issue_rank

FROM "1-BaApi"."1-BaApi".dbo.Issues
WHERE ActualImpact IS NOT NULL

SELECT IssueType, ActualImpact,

RANK() OVER (PARTITION BY IssueType ORDER BY ActualImpact DESC) issue_rank

FROM "1-BaApi"."1-BaApi".dbo.Issues
WHERE ActualImpact IS NOT NULL

These two queries produce the same results.

Appreciate your help.

Topic		Replies	Views
Join row_number over function in dremio getting Wrong Result	13	4586	September 5, 2019
Possible View of Dremio Roadmap	4	3393	July 3, 2019
Window Functions not working after patch	7	1480	July 15, 2020
Window Functions Order By Not working	0	1728	August 22, 2019
Performance concern on OVER() function	1	1110	July 6, 2020

Error when running window ROW_NUMBER function

Related topics