Performing Aggregations on a List field

aida_teja · October 11, 2020, 4:51pm

Hi,

I’ve been struggling to consume json market research data using Dremio, there are few fields in this data which are list of strings (for eg: favorite_brands), i’m trying to calculate average age by favorite_brands column.
select favorite_brands, avg(age) from my_table group by favorite_brands

What i’ve already tried:

select FLATTEN(favorite_brands), avg(age) from my_table group by favorite_brands, which obviously didn’t work (Err: Dremio doesn’t support using FLATTEN in the GROUP BY clause.)
UNNEST favorite_brands to create one new record per list value, below are few downsides of this approach.
a. Dremio removes the records where favorite_brands list is empty, resulting in data loss.
b. average/count/sum calculation on any of the measures are no more valid because of the duplicate records
c. data redundancy is exponential when there are 5-10 list fields in my data.

This scenario is valid when source is a nested json file, MongoDB or Elasticsearch, Is there an alternate way to calculate the same?

Thanks,
Teja

Topic		Replies	Views
How to query list field in dremio	18	7599	January 24, 2024
Aggregating to arrays or a similar structure	2	1581	November 5, 2019
Nested Queries from JSON	7	9967	August 28, 2023
Flatten a LIST of objects Dremio University	2	5023	December 12, 2019
Need to aggregate data based on a specific field?	2	1111	November 15, 2021

Performing Aggregations on a List field

Related topics