NFL data in parquet showing unable to coerce error

As part of the Dremio 2025 Football Playoff Hackathon hackathon after loading the NFL datasets and trying to query it in dremio, I get following error:

select * from “games.parq”

unable to coerce from the file’s data type “int64” to the column’s data type “timestamp” in table “lake.games.parq”, column “gameDatetime” and file “/lake/games.parq”

Any help is appreciated.

1 Like

Will try to replicate this on Monday and report back on findings.

games_fixed.zip (3.7 KB)
Try attached

I fixed the column “gameDatetime”

1 Like

Thanks for the help. This is happening for all the parquet files in the data set. Can you please help for same for all five datasets.

@AlexMerced For Visualization may I use streamlit?

Yes, anything you want long as the data is pulled from Dremio, we have some blogs on using streamlit with Dremio on the Dremio blog

I’m getting this same error. I tried to cast the column but no success, will try updating the parquet files

Pandas is able to read these files without any problem. Seems dremio engine is the one having problem and the setup data was never validated I guess before having the event setup. The event date should be extended considering this.

To Clarify:

  • The reason for the error is the data files include nanoseconds which Dremio doesn’t support
  • To fix this you would want to use a pandas/polars to get the timestamps to milliseconds
  • This is fine, as you are able to work with the data as you please to complete the exercise

I’ll make sure to make these kinds of limitations and ways to overcome them more clear on future Hackathons. Will discuss with the team re: the deadline.

1 Like

Data is only for 2022.

Task is analyse data and predict for 2025?
Is this data set is not too less?

Are we allowed use more data sets specific for 2024-25?

You can use more recent data if you’d like, this is just one possible dataset.

Created a repo where you can find the updated datasets along with the script for updating them in case other datasets you may find have nanosecond timestamps.

Keep an eye out for an email on the deadline

1 Like

I was able to figure out how to fix the datasets for nanoseconds.

Only confirmation needed can i use more recent data too.
Is deadline will move :stuck_out_tongue_winking_eye:?