TO_DATE produces bogus data

TO_DATE with a group by produces dates with 0019-10-27 however when you query for records that have this date they do not appear in the data set.

DT,EXPR$1
0019-11-19T00:00,36
0019-10-27T00:00,47
0019-11-14T00:00,25
0019-11-21T00:00,33
2019-11-11T00:00,4638613
2019-11-14T00:00,4280537
0019-12-01T00:00,38
2019-11-29T00:00,3484464
2019-11-26T00:00,6476143
0019-10-28T00:00,36
0019-10-26T00:00,40
0019-11-15T00:00,42
0019-11-13T00:00,30
2019-11-16T00:00,5663933
2019-10-26T00:00,4929759
0019-11-28T00:00,17
0019-11-25T00:00,51
0019-11-26T00:00,58
0019-11-20T00:00,30
2019-11-23T00:00,6208627
2019-11-28T00:00,1723201
0019-11-10T00:00,45
0019-11-12T00:00,12
0019-10-25T00:00,32

Also those records in the group by have a very small amount of data relative to the rest of the groupings.

@kprifogle Can you post some of the original dates, before they were incorrectly converted by TO_DATE? What is the base data type for the column you are casting to date? (string/long/?)

More

If it’s a timestamp, that includes a timezone, Gandiva doesn’t support those (from what I’ve read in the code). So the presence/absence of a timezone on your date could change the path through the code.

It doesn’t have a timestamp. Its just straight dremio Date time objects. You can guess what the values are before TO_DATE, it doesn’t have the 0019 instead it has 2019. It seems like just a small fraction of the records get cast to these invalid values in the group by.

I’ve done some group by’s in Dremio over MS SQL, but I’m pretty sure that is being pushed down through JDBC to the database and isn’t being run through Dremio.

I still think the way you are getting these dates is important. What format string are you using to convert them to a YYYY-MM-DDTHH:MM format? I tried this specific format string on your dates and Dremio wouldn’t parse them.

@kprifogle, can you share the query where you are seeing this?

Here is the query:

SELECT dt, COUNT(*) FROM (SELECT TO_DATE(time) as dt FROM data_table) GROUP BY dt ORDER BY dt

and here are some example data rows for time in the table:

2019-12-11 11:11:00.000
2019-12-11 11:11:00.000
2019-12-11 16:47:00.000

The data type of this column in dreimo is valid DateTime stamp

Outside of this query no data appears to have dates leading with 0019, so this behavior only appears as a result of the group by or order operation or TO_DATE operation (likely as a result of their combination?), ie the query itself produces this new date.