Error when running window ROW_NUMBER function

Any advice on getting my window function query to work would be great. I have created a dataset based on some SQL Server data and running the following query gives an error of “Cannot convert RexNode to equivalent Dremio expression. RexNode Class: org.apache.calcite.rex.RexOver, RexNode Digest: DENSE_RANK() OVER (PARTITION BY $16 ORDER BY $4 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)”

select
Invoice_ID
,TaskID
,ROW_NUMBER() OVER(partition by TaskID order by Invoice_ID DESC) as unq
,Invoice_Status
,BillableRate
FROM “@admin”."apientry.trftb_StageWFMInvoiceEntries"
where TaskID IS NOT NULL

Hi @Chris_du_Toit,

There’s currently a problem running window functions and OVER clauses when connecting Dremio to relational databases.

We’re investigating this issue and will let you know when we have more information.

Thanks for the quick reply. How do I turn a query from relational to a query on Dremio? In other words, how do I move the data from the relational source to Dremio’s in-memory structures?

When you access data from any source, the data is read into Apache Arrow in-memory buffers automatically.

You can also create Data Reflections, which maintain physically optimized representations of the source data in Dremio’s Reflection Store. Dremio’s cost-based optimizer will then consider all reflections when generating a query plan, and will usually determine running the query on one or more reflections is more cost effective than pushing the query into the source.

You can read more about Data Reflections here: http://docs.dremio.com/acceleration/

Thanks, I am now learning the Dremio and I see a reflection based on a RDBMS source also doesn’t support the OVER clause.
(1) Is there any workaround I can currently bring in to have the OVER clause work or is this just a feature/fix I need to wait for?
(2) Also, what’s the best way to monitor progress on this becoming available?

Can you attach the query profile?

Also, are you still seeing this on 1.2.2?

Kelly

Hi Kelly,

Yes, I have upgraded to 1.2.2 from 1.2.1. Please find the query profile here d2d22ab4-195e-4bab-ae09-b66cf0801b08.zip (2.7 KB)

Thanks for the quick response to this. It is really appreciated and makes for a great user experience.

Chris

Hi Kelly,

Any idea what I might be doing wrong?

Hi @Chris_du_Toit,

Currently analyzing the profile you have uploaded. Will get back to you in a few …

Thanks,
@balaji.ramaswamy

Hi @Chris_du_Toit,

If you have a JSON or a CSV file that has these columns the query will work fine. We have a bug when we need to push this to a RDBMS (JDBC pushdown) . We are working on the bug and it should arrive in our next release.

Kindly let me know if you have any further questions

Thanks,
@balaji.ramaswamy

Thanks, what is the best way to stay up to date with new releases?

Hi Chris,

We announce new releases here. You can expect ~1/month for the next several months.

Kelly

Hi Kelly,

I know it’s not in the Release Notes for v1.3, but I just wanted to let you know that the WINDOW function error is still popping up v1.3.

BTW - Love the row numbers in the query windows.

Thanks
Chris

@Chris_du_Toit following up on your comment on Possible View of Dremio Roadmap.

We’re currently targeting 1.6 (2 releases out) for fixing this issue – it turned out to be a more involved fix than initially estimated.

Hey, has this been fixed? Am seeing an issue on Rank() window function, where ordering is ignored, and would like to know if it is related with that.

@gpaparisteidis, it’s possible, as we’ve fixed some errors around Window functions.

What version of Dremio are you using? Can you share a query profile for the failed job?

Hey @ben , thanks for your response.
This the query profile of the job. 0474078f-a8f6-4497-80d5-a6d2710adcf5.zip (9.2 KB)
As I said the problem lies with the ordering of the results, where ORDER BY DESC seems to be ignored.

I am using version 3.1.1

Hello @gpaparisteidis,

So the issue is not an error, but what looks like “wrong results”.

In your query Dremio actually pushes the query down into the underlying source, MS SQL Server:

SELECT
    [IssueType] COLLATE LATIN1_GENERAL_BIN2 AS [IssueType],
    [ActualImpact],
    RANK() OVER (PARTITION 
BY
    [IssueType] COLLATE LATIN1_GENERAL_BIN2  
ORDER BY
    CASE 
        WHEN [ActualImpact] IS NULL THEN 1 
        ELSE 0 
    END DESC,
    [ActualImpact]) AS [issue_rank] 
FROM
    [1-BaApi].[dbo].[Issues] 
WHERE
    [ActualImpact] IS NOT NULL]

But Dremio orders null values as highest while SQL Server orders them lowest so that’s why you see the CASE statement in the pushed down query and probably the unexpected ordering. Does that explain what you are seeing?

Hi @ben, thanks for your response and sorry for the delayed reply, I was on holidays.

While I understand how Dremio and SQL Server order NULL values differently, what I don’t get is how is this affecting the results in my case, since there is a WHERE clause that only returns NON-NULL values.

I addition to that, the query that you have posted above seems to be defining ordering between NULL and NON-NULL values, however what I am seeing is un-ordered (or to be more specific ordered in ASC) results between NON-NULL values.

SELECT IssueType, ActualImpact,

RANK() OVER (PARTITION BY IssueType ORDER BY ActualImpact ASC) issue_rank

FROM "1-BaApi"."1-BaApi".dbo.Issues
WHERE ActualImpact IS NOT NULL
SELECT IssueType, ActualImpact,

RANK() OVER (PARTITION BY IssueType ORDER BY ActualImpact DESC) issue_rank

FROM "1-BaApi"."1-BaApi".dbo.Issues
WHERE ActualImpact IS NOT NULL

These two queries produce the same results.

Appreciate your help.