Dremio as a Virtualization Layer

Hi!

I am evaluating Dremio as a virtualization layer (SQL → SQL dialect) and found it very appealing. The ARP framework is a great abstraction and there are a lot things that I like about it, but do have specific questions that I am hoping the community will help me answer:

  1. Is the ARP framework closed source? if so, is the plan to make it open or will it be an Enterprise feature?

  2. My underlying DB has Aggregation functions not supported by Dremio so Calcite errors out even if the ARP mapping defines the function as valid. I could wrap the call into an external query, but that defeats the purpose of a virtualization layer. Or I could create a User Defined Function on Dremio, which will make the syntax valid, but that requires 1 UDF per non-supported Aggregation, is there another way?

  3. I didn’t see syntax on ARP to pushdown Window functions and in my testing I am seeing ALL window functions being pushdown (log message: No rewriting signature found. Returning default unparsing syntax. ). Is this correct? If not, how can I define which Window functions to pushdown?

  4. ARP defines the field “supports_over”, but on the experiments run on 3) I always saw the window functions with the over clause being pushdown…which confuse me on the purpose of the “supports_over” field. What does it do? In general, where can I find documentation on the different fields available for APR?

  5. I don’t see any DDL/DML support. Is that something planned on the roadmap?

Thanks a lot for the help, I do like how ARP is structured and found the declarative approach superior to what other tools provide!

Alejandro

@adelcast

I have posted these questions internally and will get back to you. If you do not hear from us in couple of days, please ping here and I will make sure to follow up

Thanks
Bali

thanks @balaji.ramaswamy! will keep an eye on the post

@adelcast ; for item (2), could you use an external query and then wrap it in a VDS?
eg

CREATE OR REPLACE VDS target.interpolated AS
SELECT *
FROM table (vertica.external_query(‘SELECT slice_time, TS_FIRST_VALUE(traffic, ‘‘LINEAR’’) traffic FROM machine_fact
TIMESERIES slice_time AS ‘‘2 seconds’’ OVER(PARTITION BY cell_name ORDER BY datetimetimezone)’))

Then you can expose the VDS as a virtualisation layer.

Thanks for the suggestion @Michael_Flower . I should have mentioned this, but in my use case I have a custom query engine on top of the virtualization layer…so I don’t know the structure of the query ahead of time, as they are dynamically created based on a user-facing UI.

Hi @adelcast,

Answers to a few of your questions:

  1. The framework is currently not open, but is usable by anyone. I cannot comment on if it will become open in the future.

  2. The ARP framework does conditional pass-down, and if the underlying source cannot handle the execution then Dremio takes care of it. The framework is built around Dremio’s execution engine and as such is tied to the functionality that Dremio supports, so to get additional aggregations that Dremio itself does not support you’d need to use external_query right now.

  3. Window functions are handled only at a basic level via the ARP configuration framework right now due to the breadth and complexity for adequately passing them down to the underlying sources. They’re underdocumented at the moment (again, a future enhancement), if there’s a specific thing you’re trying to do which is causing a problem more details would be helpful.

  4. It appears that supports_over is exposed but not yet hooked up and as such does not yet have functionality associated with it, hence what you’re seeing. This is planned for a future update.

  5. Dremio itself doesn’t support DDL/DML at this point, so ARP will not. Again, I unfortunately cannot comment about the upcoming roadmap for this.

1 Like

@Kyle_Porter thanks for sharing!

Regarding Window functions, an example query:

SELECT “col1”, “col2”, “col3”, DENSE_RANK () over (partition by “col2”, “col3” order by “col4”) FROM myschema.mytable

I see the whole statement being pushdown to the DB always, which was a bit of a surprise. What I would have expected is to have DENSE_RANK run on Dremio if the ARP file doesn’t have it defined. So what I would like is to have a way to determine if DENSE_RANK should be pushdown or not.

@adelcast thanks for that, so if there was a high-level check that simply looked at the type of window function and passed or rejected it that would satisfy?

@Kyle_Porter I would envision something similar to what we have for Aggregations: a pass that checked ARP to see if the underlying DB has support for the Window function. If not, do the Window function on the Dremio layer.

Then I would summarize the asked-for-changes to be:

  • High-level checks around window function support
  • supports_over being enabled
1 Like