This is somewhat of a criticial bug.
Using dremio 26.0.0 OSS, when I create an iceberg table on top of AWS GLUE:
- Dremio doesn’t have timstamp with timezone type, so the date cloumn is just a Timestamp
- Parquet files created record a standard timestamp column -
optional int64 field_id=6 created_at (Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false));
- AWS Glue records a timestamp column (timestamptz are not supported)
- However… the Iceberg metadata json file uses a timestamptz column.
When trying to read the table using pyiceberg, it just looks at the json metadata and fails due to type incompatibility between Timestamp and Timestamptz
You need to make ure timestamp columns in iceberg tables are properly recorded using the timestamp type
@sheinbergon Are you able to send “CREATE TABLE DDL” from Glue?
@balaji.ramaswamy I’m not sure what you mean. The I can provide with the actuall json manifest of the table, showing the type is indeed timestamptz even though dremio only uses timestamp. Would that help
@sheinbergon I would like to reproduce the issue locally so thought will get the table DDL so I can create it locally
Thank you for bringing this up and for sharing the details, @sheinbergon. You’re right - in Dremio OSS 26.0.0, Iceberg tables created with timestamp columns may be recorded in the metadata as timestamptz, even though Dremio only supports timestamp. As you noted, this can cause compatibility issues with external readers.
@Icaro_Seara Thank you for acknowledging this issue. Do you have any plan of fixing this behavior? It’s a serious bug.
Also, is this bug present in Dremio Cloud?
Hi @sheinbergon, we actually released a fix for this issue in Dremio Software Enterprise Edition, Community Edition and OSS 26.0.5 on September 10.
That’s super, 10x for fixing this!
Hey folks, a pyiceberg user reported running into this issue in "Cannot promote timestamp to timestamptz" error when loading Dremio created table · Issue #2663 · apache/iceberg-python · GitHub . I see that its been resolved in version 26.0.5 and the user also confirmed in the thread.
I dug into the fix on the Dremio side and found this change, Release 26.0.5 · dremio/dremio-oss@799ccbd · GitHub
It looks like Dremio can potentially write the TIMESTAMPMILLI data type in parquet with adjustToUtc=true. I dont think this is in accordance with the iceberg spec.
- Timestamp without timezone should always write parquet with
adjustToUtc=false
- Timestamp with timezone should always write parquet with
adjustToUtc=true
Wanted to follow up and double check with yall. LMK if I’m missing something here.