Cannot use more than one JSON field in query

We have a string column called metadata which is JSON.

SELECT
    t."metadata".team_id,
    t."metadata".team_name
FROM
(
    SELECT
        s.customer_id,
        s."value" AS value_current,
        s."value" AS value_prior,
        s."value" AS value_change,
        convert_from(s."metadata", 'JSON') AS "metadata"
    FROM
        dev.backend.entity."signal".v1."signal" s
    WHERE
        s.name = 'PerformanceTeam' AND
        s.date_ts = (SELECT MAX(date_ts) FROM dev.backend.entity."signal".v1."signal" WHERE customer_id = s.customer_id) AND
        s.ref_type = 'team'
) t

This query returns an error if we use more than one field in the JSON. Any of the fields we use are fine if we only use one. If we use more than one, it fails.

Funny enough, this work around which is horrible works:

SELECT
    t.customer_id,
    t2."metadata".team_id AS team_id,
    t2."metadata".team_name AS team_name,
    t.value_current,
    t.value_prior,
    t.value_change
FROM
(
    SELECT
        id,
        s.customer_id,
        s."value" AS value_current,
        s."value" AS value_prior,
        s."value" AS value_change
    FROM
        dev.backend.entity."signal".v1."signal" s
    WHERE
        s.name = 'PerformanceTeam' AND
        s.date_ts = (SELECT MAX(date_ts) FROM dev.backend.entity."signal".v1."signal" WHERE customer_id = s.customer_id) AND
        s.ref_type = 'team'
) t,
(
    SELECT
        id,
        convert_from(s."metadata", 'JSON') AS "metadata"
    FROM
        dev.backend.entity."signal".v1."signal" s
    WHERE
        s.name = 'PerformanceTeam' AND
        s.date_ts = (SELECT MAX(date_ts) FROM dev.backend.entity."signal".v1."signal" WHERE customer_id = s.customer_id) AND
        s.ref_type = 'team'
) t2
WHERE t.id = t2.id

Here’s the profile for the first query.

f38161e5-09d8-4c2d-828d-b39e12b0f5da.zip (5.6 KB)

Hi @jhaynie

What type is “metadata”, is it a complex object?

Thanks
@balaji.ramaswamy

Yes, its stored as a STRING field in parquet. But the string value is a JSON object with multiple fields.

It looks something like this:

{"active":true, "team_id":"4fa4a5e4578444b5", "team_name":"some team name too", "team_count":7, "team_salary":300000, "cost_percentile":0.333333333333333, "signals":[{"name":"CycleTime","value":68},{"name":"Throughput","value":8},{"name":"OnTimeDelivery","value":78}]}

Hi @jhaynie

Can you please send us "parquet-tools meta for this Parquet file?

Download, Install Parquet Tools

Thanks
@balaji.ramaswamy

Here’s a dump of one of the parquet files for the query

out.zip (1.1 KB)

Thanks @jhaynie

Was able to reproduce the issue. Let me get bak to you on what is going on here

Thanks
@balaji.ramaswamy

1 Like

Hi @balaji.ramaswamy We’re facing similar issues while trying to extract multiple fields from the json column extracted from parquet. Also, dremio doesn’t seem to identify the nested json column by itself.

Could you share what the issue might be and any potential solution?

@aakash

If you have a JSON struct from a text file, Dremio might mark it as a string. All you need to do is

"SELECT CONVERT_FROMJSON(column_name) as col_name

Thanks
Bali

Thanks @balaji.ramaswamy . That seems to work for queries such as below:

select nested_0.blob1.col1 as col1 from (SELECT CONVERT_FROMJSON(blob1) as blob1 from “test.parquet”) nested_0

However, when we try to query more than one columns from the nested data structure such as below, we get the error: Using CONVERT_FROM(*, ‘JSON’) is only supported against string literals and direct table references of types VARCHAR and VARBINARY.

select nested_0.blob1.col1 as col1, nested_0.blob1.col1 as col2 from (SELECT CONVERT_FROMJSON(blob1) as blob1 from “test.parquet”) nested_0

Is there a way to achieve querying multiple columns?

@aakashsingh

Try this, using the UI, do the extract option and see what SQL is generated. Do you see the struct symbol for the column datatype? Or does it still show “Abc”?

Hi, am running into the same issue. My JSON string looks like OP too format-wise, and fyi I do see the struct symbol in the column.

@nld-8832 You have to give a column name, what is your CONVERT_FROMJSON command?

Is there any other workaround for this issue? I am seeing the exact issue but unlike the OP I don’t have any other fields/unique columns that can help me write a join condition to separate the JSON and non-JSON fields in 2 separate t. @balaji.ramaswamy

@priyanka Do you have one list/array or struct that has more structs inside, can you share a sample format with dummy data?