unable to coerce from the file’s data type “int64” to the column’s data type “timestamp” in table “lake.games.parq”, column “gameDatetime” and file “/lake/games.parq”
Pandas is able to read these files without any problem. Seems dremio engine is the one having problem and the setup data was never validated I guess before having the event setup. The event date should be extended considering this.
The reason for the error is the data files include nanoseconds which Dremio doesn’t support
To fix this you would want to use a pandas/polars to get the timestamps to milliseconds
This is fine, as you are able to work with the data as you please to complete the exercise
I’ll make sure to make these kinds of limitations and ways to overcome them more clear on future Hackathons. Will discuss with the team re: the deadline.
You can use more recent data if you’d like, this is just one possible dataset.
Created a repo where you can find the updated datasets along with the script for updating them in case other datasets you may find have nanosecond timestamps.