UNION ALL with NULL support to make BI integration easier

I’m currently integrate Dremio with a BI tool. One common task in our BI tool is to support subtotals for results.

Normally, if the target database support “grouping sets”, we will use it, like

select country,
city,
sum(sales) as sales,
grouping_id() as __grouping_id
from orders
group by
GROUPING SETS (
(country, city),
(country)
()
)

But “Dremio doesn’t support ROLLUP, CUBE or GROUPING SETS functionality”, this is fine, because many DB don’t support those OLAP operations.

In this case, we will generate SQL by using UNION ALL, like:

select country, city, sum(sales) as sales, 0 as __grouping_id
from orders
group by country, city
union all
select country, null as city, sum(sales) as sales, 1 as __grouping_id
from orders
group by country

But it will encounter “We do not support NULL as a return data type in SELECT. Cast it to a different type.”. Yes, I know Arrow format has strict NullType, and if that query is generated by hand, I can explicitly cast that NULL field to exact type, like StringType.

But, for the BI tool, we may not know what is that exact Field Type without first run some SQL against database (that is resource consuming).

So, in this situation, since we are in UNION ALL, and in some sub-clause, we can know that field’s data type. I wonder, whether Dremio can do some optimization, for this UNION ALL case, to auto deduce the missing Type for NULL in some subclause.

Thanks

@Baoqi Can you please share the profile? Want to check if the error is happening in planning or execution?

Sure, the profile is attached. 6684a3ea-f0c0-4c10-9253-449045100aab.zip (6.5 KB)

@Baoqi

Try this instead

select cast(null as timestamp) as “year__0”

Yes, the explicitly cast works for this case. But since the SQL is auto generated by software, it is hard to know the correct type without first run it. Especially when that field itself is a complex calculation.

@Baoqi
Currently using null without casting to a data type is not supported

@Baoqi we probably aren’t going to add the feature you’re asking for here any time soon, but the good news we are actually working on adding GROUPING SETS, and it will be available in the near future.

Also, in the mean time, a quick hack you can do to coerce the type for the null is to wrap it in a case statement:

select country, case when 1=1 then null else city end as city,

Hi, Steven, Thanks a lot for you suggestion. Glad to know Dremio will support GROUPING SETS, it is very helpful.

And thanks a lot for the smart “case when 1=1 then null else city end” solution, it works great! I will first use this way to generate SQL, Thanks!