We have a string column called metadata which is JSON.
SELECT
t."metadata".team_id,
t."metadata".team_name
FROM
(
SELECT
s.customer_id,
s."value" AS value_current,
s."value" AS value_prior,
s."value" AS value_change,
convert_from(s."metadata", 'JSON') AS "metadata"
FROM
dev.backend.entity."signal".v1."signal" s
WHERE
s.name = 'PerformanceTeam' AND
s.date_ts = (SELECT MAX(date_ts) FROM dev.backend.entity."signal".v1."signal" WHERE customer_id = s.customer_id) AND
s.ref_type = 'team'
) t
This query returns an error if we use more than one field in the JSON. Any of the fields we use are fine if we only use one. If we use more than one, it fails.
Funny enough, this work around which is horrible works:
SELECT
t.customer_id,
t2."metadata".team_id AS team_id,
t2."metadata".team_name AS team_name,
t.value_current,
t.value_prior,
t.value_change
FROM
(
SELECT
id,
s.customer_id,
s."value" AS value_current,
s."value" AS value_prior,
s."value" AS value_change
FROM
dev.backend.entity."signal".v1."signal" s
WHERE
s.name = 'PerformanceTeam' AND
s.date_ts = (SELECT MAX(date_ts) FROM dev.backend.entity."signal".v1."signal" WHERE customer_id = s.customer_id) AND
s.ref_type = 'team'
) t,
(
SELECT
id,
convert_from(s."metadata", 'JSON') AS "metadata"
FROM
dev.backend.entity."signal".v1."signal" s
WHERE
s.name = 'PerformanceTeam' AND
s.date_ts = (SELECT MAX(date_ts) FROM dev.backend.entity."signal".v1."signal" WHERE customer_id = s.customer_id) AND
s.ref_type = 'team'
) t2
WHERE t.id = t2.id
Yes, its stored as a STRING field in parquet. But the string value is a JSON object with multiple fields.
It looks something like this:
{"active":true, "team_id":"4fa4a5e4578444b5", "team_name":"some team name too", "team_count":7, "team_salary":300000, "cost_percentile":0.333333333333333, "signals":[{"name":"CycleTime","value":68},{"name":"Throughput","value":8},{"name":"OnTimeDelivery","value":78}]}
Hi @balaji.ramaswamy Weâre facing similar issues while trying to extract multiple fields from the json column extracted from parquet. Also, dremio doesnât seem to identify the nested json column by itself.
Could you share what the issue might be and any potential solution?
Thanks @balaji.ramaswamy . That seems to work for queries such as below:
select nested_0.blob1.col1 as col1 from (SELECT CONVERT_FROMJSON(blob1) as blob1 from âtest.parquetâ) nested_0
However, when we try to query more than one columns from the nested data structure such as below, we get the error: Using CONVERT_FROM(*, âJSONâ) is only supported against string literals and direct table references of types VARCHAR and VARBINARY.
select nested_0.blob1.col1 as col1, nested_0.blob1.col1 as col2 from (SELECT CONVERT_FROMJSON(blob1) as blob1 from âtest.parquetâ) nested_0
Is there a way to achieve querying multiple columns?
Try this, using the UI, do the extract option and see what SQL is generated. Do you see the struct symbol for the column datatype? Or does it still show âAbcâ?
Is there any other workaround for this issue? I am seeing the exact issue but unlike the OP I donât have any other fields/unique columns that can help me write a join condition to separate the JSON and non-JSON fields in 2 separate t. @balaji.ramaswamy