Any advice on getting my window function query to work would be great. I have created a dataset based on some SQL Server data and running the following query gives an error of “Cannot convert RexNode to equivalent Dremio expression. RexNode Class: org.apache.calcite.rex.RexOver, RexNode Digest: DENSE_RANK() OVER (PARTITION BY $16 ORDER BY $4 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)”
select
Invoice_ID
,TaskID
,ROW_NUMBER() OVER(partition by TaskID order by Invoice_ID DESC) as unq
,Invoice_Status
,BillableRate
FROM “@admin”."apientry.trftb_StageWFMInvoiceEntries"
where TaskID IS NOT NULL
Thanks for the quick reply. How do I turn a query from relational to a query on Dremio? In other words, how do I move the data from the relational source to Dremio’s in-memory structures?
When you access data from any source, the data is read into Apache Arrow in-memory buffers automatically.
You can also create Data Reflections, which maintain physically optimized representations of the source data in Dremio’s Reflection Store. Dremio’s cost-based optimizer will then consider all reflections when generating a query plan, and will usually determine running the query on one or more reflections is more cost effective than pushing the query into the source.
Thanks, I am now learning the Dremio and I see a reflection based on a RDBMS source also doesn’t support the OVER clause.
(1) Is there any workaround I can currently bring in to have the OVER clause work or is this just a feature/fix I need to wait for?
(2) Also, what’s the best way to monitor progress on this becoming available?
If you have a JSON or a CSV file that has these columns the query will work fine. We have a bug when we need to push this to a RDBMS (JDBC pushdown) . We are working on the bug and it should arrive in our next release.
Kindly let me know if you have any further questions
Hey @ben , thanks for your response.
This the query profile of the job. 0474078f-a8f6-4497-80d5-a6d2710adcf5.zip (9.2 KB)
As I said the problem lies with the ordering of the results, where ORDER BY DESC seems to be ignored.
So the issue is not an error, but what looks like “wrong results”.
In your query Dremio actually pushes the query down into the underlying source, MS SQL Server:
SELECT
[IssueType] COLLATE LATIN1_GENERAL_BIN2 AS [IssueType],
[ActualImpact],
RANK() OVER (PARTITION
BY
[IssueType] COLLATE LATIN1_GENERAL_BIN2
ORDER BY
CASE
WHEN [ActualImpact] IS NULL THEN 1
ELSE 0
END DESC,
[ActualImpact]) AS [issue_rank]
FROM
[1-BaApi].[dbo].[Issues]
WHERE
[ActualImpact] IS NOT NULL]
But Dremio orders null values as highest while SQL Server orders them lowest so that’s why you see the CASE statement in the pushed down query and probably the unexpected ordering. Does that explain what you are seeing?
Hi @ben, thanks for your response and sorry for the delayed reply, I was on holidays.
While I understand how Dremio and SQL Server order NULL values differently, what I don’t get is how is this affecting the results in my case, since there is a WHERE clause that only returns NON-NULL values.
I addition to that, the query that you have posted above seems to be defining ordering between NULL and NON-NULL values, however what I am seeing is un-ordered (or to be more specific ordered in ASC) results between NON-NULL values.
SELECT IssueType, ActualImpact,
RANK() OVER (PARTITION BY IssueType ORDER BY ActualImpact ASC) issue_rank
FROM "1-BaApi"."1-BaApi".dbo.Issues
WHERE ActualImpact IS NOT NULL
SELECT IssueType, ActualImpact,
RANK() OVER (PARTITION BY IssueType ORDER BY ActualImpact DESC) issue_rank
FROM "1-BaApi"."1-BaApi".dbo.Issues
WHERE ActualImpact IS NOT NULL