I’m currently integrate Dremio with a BI tool. One common task in our BI tool is to support subtotals for results.
Normally, if the target database support “grouping sets”, we will use it, like
select country,
city,
sum(sales) as sales,
grouping_id() as __grouping_id
from orders
group by
GROUPING SETS (
(country, city),
(country)
()
)
But “Dremio doesn’t support ROLLUP, CUBE or GROUPING SETS functionality”, this is fine, because many DB don’t support those OLAP operations.
In this case, we will generate SQL by using UNION ALL, like:
select country, city, sum(sales) as sales, 0 as __grouping_id
from orders
group by country, city
union all
select country, null as city, sum(sales) as sales, 1 as __grouping_id
from orders
group by country
But it will encounter “We do not support NULL as a return data type in SELECT. Cast it to a different type.”. Yes, I know Arrow format has strict NullType, and if that query is generated by hand, I can explicitly cast that NULL field to exact type, like StringType.
But, for the BI tool, we may not know what is that exact Field Type without first run some SQL against database (that is resource consuming).
So, in this situation, since we are in UNION ALL, and in some sub-clause, we can know that field’s data type. I wonder, whether Dremio can do some optimization, for this UNION ALL case, to auto deduce the missing Type for NULL in some subclause.
Thanks