Variant datatypes in Snowflake seem to cause problems with Dremio

I’m seeing some strange behavior with “limit 100” (or any “limit” additions, not just with “100”). Here is an example I hope makes it clear:

This query:

SELECT
cast(player_start_date as date) as player_start_date,
cast(eventDate as date) as eventDate,
datediff(cast(eventDate as date),cast(player_start_date as date)) as days_between
from "Snowflake".MYDB.RAW."EVENT_GAMETRANSACTION_RAW"
;

Submitted via “Preview”, Dremio returns with this error:

GandivaException: Not a valid date value "2023-07-04 00:00:00.000"

Looking in Snowflake, I see that Dremio sent this query (when run via “Preview”):

SELECT 
"EVENT_GAMETRANSACTION_RAW"."PLAYER_START_DATE", "EVENT_GAMETRANSACTION_RAW"."EVENTDATE"
FROM "MYDB"."RAW"."EVENT_GAMETRANSACTION_RAW"
LIMIT 2000

Running that query in directly in Snowflake, I get the following results:

player_start_date     eventdate
2023-06-27	"2023-07-04 00:00:00.000"
2023-06-22	"2023-07-04 00:00:00.000"
2023-06-07	"2023-07-04 00:00:00.000"
2023-06-27	"2023-07-04 00:00:00.000"

player_start_date is stored as a “Date” datatype in Snowflake
eventdate is stored as a variant datatype in Snowflake

Generally speaking, it seems the “variant” datatype causes problems for Dremio. It seems like I would need to be able to “force” Dremio to have Snowflake do the datatype conversion, rather than Dremio asking for the raw value from Snowflake. It seems as though Dremio doesn’t actually do the datatype conversion when the column is of a variant datatype.

How should I be approaching this kind of situation?

I don’t see “variant” as a supported datatype here: Data Types | Dremio Documentation

It seems this behavior is inconsistent. For some queries, I can see Dremio submits a query to Snowflake that casts the variant as the datatype. For others, I can see Dremio doesn’t ask Snowflake to cast the variant as the datatype I specified; in such cases, Dremio returns “null” values.

For cases where integers or floats are stored in a Snowflake variant field, I can generally get the value out as expected. However, varchars seem to be where the problems show up.

@capnjosh Can you please upload the profile, would like to see what was the pushdown

I’ve made some progress on this. It seems that when Dremio gets a datetime value stored in a Snowflake variant column, it gets back that value with double quotes on either side. So for this error:

GandivaException: Not a valid date value "2023-07-04 00:00:00.000"

The problem is that I need to manually remove those double quotes before I try to cast the value
as a date datatype.

Not all values stored in a Snowflake Variant datatype return to Dremio with literal double quotes, but as a general rule, we should expect to have to manually remove them before using the values in any subsequent query operations.

Also, I found that if I need to do some comparison operations, I need to first remove the double quotes in a subquery and then do the comparison operations in an outer query. Something like this (though I can’t tell if this would be subject to the whims of the query engine):

select
*,
case
  when platform_cleaned = 'Android' then 'Google Play'
  when ilike(platform_cleaned ,'X%') then 'Xbox'
  else platform_cleaned
end as platform_standardized
from
(
s  elect
  substr(platform, 2, length(platform) - 2) as platform_cleaned
  from "Snowflake".MYDB.RAW."EVENT_GAMETRANSACTION_RAW"
)

When I tried to remove the double quotes in-line like this, I got weird errors:

select
*,
case
  when substr(platform, 2, length(platform) - 2) = 'Android' then 'Google Play'
  when ilike(substr(platform, 2, length(platform) - 2),'X%') then 'Xbox'
  else substr(platform, 2, length(platform) - 2)
end as platform_standardized
from "Snowflake".MYDB.RAW."EVENT_GAMETRANSACTION_RAW"

@capnjosh What were the errors when you removed double quotes?

The errors go away after I manually remove the double-quotes that come from variant data types in Snowflake.