I have big flat data file (about 10M rows). File consist of several attribute columns and several measure columns. I need fast query that give me distinct values of attributes. This query should give results similar to the query below:
SELECT DISTINCT ‘dims_1’ as attr, “dims_1” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_2’ as attr, “dims_2” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_3’ as attr, “dims_3” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
I’m going to use results of this query to build attribute filter in GUI.
I tried to use aggregate reflection for this purpose and then SELECT * FROM, but dremio claims that aggregated reflection not match for that query and use only raw reflection for file.
I query data source test1.PERFORMANCE_TEST_10M_ATTR_QUAERY to get all rows:
SELECT *
FROM test1.PERFORMANCE_TEST_10M_ATTR_QUERY_TO_CUBE
I use this data source to get all (attribute name, value) tuples.
Data source definition looks like this:
SELECT attr, val, 1 as cnt
FROM test1.PERFORMANCE_TEST_10M_ATTR_QUAERY
(i added artificial column cnt to have measure in aggregated reflection)
Parent data source (PERFORMANCE_TEST_10M_ATTR_QUAERY) is defined as follows:
SELECT DISTINCT ‘dims_1’ as attr, “dims_1” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_2’ as attr, “dims_2” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_3’ as attr, “dims_3” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_4’ as attr, “dims_4” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_5’ as attr, “dims_5” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_6’ as attr, “dims_6” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_7’ as attr, “dims_7” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_8’ as attr, “dims_8” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_9’ as attr, “dims_9” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dims_10’ as attr, “dims_10” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dim_big_1’ as attr, “dim_big_1” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
UNION ALL
SELECT DISTINCT ‘dim_big_2’ as attr, “dim_big_2” as val
FROM test1.PERFORMANCE_TEST_10M_conv_big_attrs
PERFORMANCE_TEST_10M_conv_big_attrs refers to csv file.
Hi,
I wanted to have a list of “distinct” elements in column to know what can I use later in filters. I expected that “distinct” will check in some meta data in aggregations/accelerations.
I rewrote query with “group” by and “count(*)” and split aggregation to have single aggregation for single column. That helps a lot. Now it use accelerations and speed up a lot.