I was doing a test run to read the Airbyte S3 parquet file using Dremio. So for that, I put a sample .parquest file in my S3 source folder and do a sync to another folder. Then I try to read both the source file and copied file using Dremio from S3.
The source file is readable and the airbyte copied file is not readable. From Dremio logs I got this error:
Unable to coerce from the file’s data type “timestamp” to the column’s data type “int64” in table “2022_07_11_1657578941501_0. parquet”, column “_ab_source_file_last_modified.member0”
Why the parquet file created by Airebyte is not readable**
I tried to read it from a python script notebook and it is readable. This is the schema details I got from pyarrow.parquet
_airbyte_ab_id: string not null
_airbyte_emitted_at: timestamp[ms, tz=UTC] not null
extra: double
mta_tax: double
VendorID: int32
ehail_fee: int32
trip_type: double
RatecodeID: double
tip_amount: double
fare_amount: double
DOLocationID: int32
PULocationID: int32
payment_type: double
tolls_amount: double
total_amount: double
trip_distance: double
passenger_count: double
store_and_fwd_flag: string
_ab_source_file_url: string
congestion_surcharge: double
lpep_pickup_datetime: string
improvement_surcharge: double
lpep_dropoff_datetime: string
_ab_source_file_last_modified: struct<member0: timestamp[us, tz=UTC], member1: string>
child 0, member0: timestamp[us, tz=UTC]
child 1, member1: string
_airbyte_additional_properties: map<string, string ('_airbyte_additional_properties')>
child 0, _airbyte_additional_properties: struct<key: string not null, value: string not null> not null
child 0, key: string not null
child 1, value: string not null
-- schema metadata --
parquet.avro.schema: '{"type":"record","name":"s3_taxi_data","fields":[{"' + 1750
writer.model.name: 'avro'
2022_07_11_1657578941501_0.parquet.zip (2.9 MB)
Dremio version
Build : 22.0.0-202206221430090603-1fa4049f
Edition : Community Edition
Attached the file. Any help is appreciated.