Elasticsearch query - extract all fields from complex data type (JSON)

Question:
Can dremio automatically expand a complex type (eg: JSON) into columns from an elasticsearch query result?

I’m running a query on elasticsearch which contains a JSON document in as a field:

SELECT d.myJSONField FROM myElasticIndex d

which returns:

{
messageType:“MyType”
messageId:“12345”
field1:“val1”
field2:“val2”
}

I know i can manually select the fields as columns like so:

SELECT d.myJSONField.messageType as messageType, d.myJSONField.messageId as messageId FROM myElasticIndex d

However if the JSON is long this can become tedious, is Dremio able to do this for all the fields without listing them one by one in the SQL?

Hi,

I am afraid we don’t have a convenient way to do this right now. I’ve opened a internal ticket for this.

thanks,
Doron

We have an experimental function called kvgen that actually should help you achieve what you need.

For example given a column called complex with value:

{"foo": 1, "bar": 2} 

Using the following SQL:

SELECT KVGEN("complex") FROM ...

will result in:

[
  {"key":"foo","value":"1"},
  {"key":"bar","value":2}
]

You can then use unnest to achieve what you need.

Thanks for the reply Doron.

It works for the simple case but if the JSON is more complicated with nested fields or if the values are not all of the same type it throws the below error:

Mappify/kvgen does not support heterogeneous value types. All values in the input map must be of the same type. The field [field2] has a differing type [Utf8].

Does the “Extract” field function have capability to pick multiple fields from the complex type (JSON) instead of one by one?

It doesn’t currently handle multiple fields, but if you toggle the SQL window at the top you can see the syntax used. Perhaps you can generate the query programmatically?

Thanks Kelly, yes, generating programatically will work, it would be a good feature if Dremio can do this automatically for our less technical folks who might use it.

1 Like

Agree this would be a good feature, and it is something we have on our roadmap.