I am currently evaluating Dremio as an alternative to our existing stack.
I was wondering what version of Apache Calcite Dremio 3.3 is using, and if there are any plans to upgrade it to at least version 1.18 (released on 2018-12-21).
The reason is that I am dealing with an EAV model and would like to interact with it using the JSON functions that were added to Calcite in 1.18, however trying to use any of these functions in Dremio 3.3 just returns a ‘no match found for function signature’ error.
Dremio claims to remove the need for ETL, but at least for this use-case I still need to use an ETL tool.
Which functions are you interested in from calcite? We don’t directly expose calcite functions however we do support json manipulation. Most notably you can select or filter on json subfields:
SELECT "business.json".hours.Monday FROM "business.json"
where "business.json".attributes.GoodForKids='True'
To answer the Apache Calcite version part more specifically, Dremio uses a version closely similar to Apache Calcite 1.16, and we are regularly updating the version.
That said, using a newer version of Calcite doesn’t mean that new functions added to Calcite are automatically available in Dremio: this is because Dremio and Calcite do not use the same execution engine, and simply support a different set of functions.
Like @rymurr, Dremio already has some support for JSON functions, and so you might be able to not use an ETL anymore. If not, feel free to share which functions would be missing, and our product team could then evaluate them for inclusion in the product.
Thanks guys, its a bit clearer now. I was missing the connection that you’re just using Calcite’s parser and you’ve implemented your own execution engine.
My use-case was actually pivoting a table that looks something like:
itemId | attribute | value
1 | name | foo
1 | color | green
1 | size | small
You are correct, currently there is no way to pivot a table into a json document. In your example above what Dremio can do is select itemId, "value" as size where "attribute"='size'. That will produce the same result as your example above. Does that help at all?