TO_DATE with a group by produces dates with 0019-10-27 however when you query for records that have this date they do not appear in the data set.
Also those records in the group by have a very small amount of data relative to the rest of the groupings.
@kprifogle Can you post some of the original dates, before they were incorrectly converted by
TO_DATE? What is the base data type for the column you are casting to date? (string/long/?)
If it’s a timestamp, that includes a timezone, Gandiva doesn’t support those (from what I’ve read in the code). So the presence/absence of a timezone on your date could change the path through the code.
It doesn’t have a timestamp. Its just straight dremio Date time objects. You can guess what the values are before TO_DATE, it doesn’t have the 0019 instead it has 2019. It seems like just a small fraction of the records get cast to these invalid values in the group by.
I’ve done some group by’s in Dremio over MS SQL, but I’m pretty sure that is being pushed down through JDBC to the database and isn’t being run through Dremio.
I still think the way you are getting these dates is important. What format string are you using to convert them to a
YYYY-MM-DDTHH:MM format? I tried this specific format string on your dates and Dremio wouldn’t parse them.
@kprifogle, can you share the query where you are seeing this?
Here is the query:
SELECT dt, COUNT(*) FROM (SELECT TO_DATE(time) as dt FROM data_table) GROUP BY dt ORDER BY dt
and here are some example data rows for time in the table:
The data type of this column in dreimo is valid DateTime stamp
Outside of this query no data appears to have dates leading with 0019, so this behavior only appears as a result of the group by or order operation or TO_DATE operation (likely as a result of their combination?), ie the query itself produces this new date.