Implementing a custom `FLATTEN` function

Hi

I’m the dev behind Dremio UDF GIS
I’m making progress and even started working on aggregation functions implementation, seeing success thus far.

I was wondering about the ability/possibility to implement a custom FLATTEN function.
Say I have a big VarBinary blob that containes a structured list of smaller var binary instances. I’d like to be able to flatten that, e.g. call my function that’ll apply my custom serialization logic and return subordinate var binaries as separate rows, much like the built in FLATTEN does for raw JSON arrays.

Looking around the source code, I found DummyFlatten but from its name and implementation, looks like the magic is happening somewhere else, probably because flatten is part more of physical phase type of thingie.

Any idea on how to at least start implementing such a functionality?
Or even tap into existing flattening capabilities (e.g. have a custom function considered as a flattner)?

Any idea on how to achieve this?

You are correct that flatten cannot be a regular function, as it changes the number of rows in the relation. The code for flatten is here in the Dremio engine:

The way you can accomplish this is with the ComplexWriter interface in your function to produce an array/list after parsing the binary structure yourself, and then use your new function in conjunction with flatten to achieve the desired result.

An example use of this interface can be found in the mappify/kvgen function, which can be used to “discover” which keys are inside of a json object, without having to know the full schema of your JSON structure ahead of time.

Here is an example of kvgen used with flatten.

SELECT flatten(mappify(CONVERT_FROMJSON(‘{a:5, b:10}’))) from (values(1))

Result:

EXPR$0
{“key”:“a”,“value”:5}
{“key”:“b”,“value”:10}

So you can write a function that would take in your varbinary column and that would replace KVGEN in this query.

10x for helping me help the dremio community :slight_smile: . I’ll have a look and let you know.