Confluent kafka iceberg sink connector local minio - solution

tolgaevren · December 25, 2023, 9:30pm

Hi, I am writing this in case if someone tried or is trying confluent kafka iceberg sink connector to write data into iceberg and may have issues like i have. this is the solution that worked for me:

configuration: source : postgre db or any other
target : iceberg catalog in local minio storage with hadoop catalog type.
this is my working connector config file :

{
“iceberg.catalog.s3a.endpoint”: “http://192.168.1.8:9000”,
“iceberg.catalog.s3.endpoint”: “http://192.168.1.8:9000”,
“iceberg.catalog.io-impl”: “org.apache.iceberg.aws.s3.S3FileIO”,
“iceberg.hadoop.fs.s3a.aws.credentials.provider”: “org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider”,
“iceberg.fs.defaultFS”: “s3a://data2/”,
“iceberg.catalog.uri”: “http://192.168.1.8:9000”,
“iceberg.hadoop.fs.s3a.path.style.access”: “true”,
“iceberg.hadoop.fs.s3a.endpoint”: “http://192.168.1.8:9000”,
“iceberg.fs.s3a.impl”: “org.apache.hadoop.fs.s3a.S3AFileSystem”,
“iceberg.catalog.warehouse”: “s3a://data2/”,
“iceberg.catalog.type”: “hadoop”,
“iceberg.hadoop.fs.s3a.connection.ssl.enabled”: “false”,
“fs.s3a.endpoint”: “http://192.168.1.8:9000”,
“name”: “icebergsink2”,
“connector.class”: “io.tabular.iceberg.connect.IcebergSinkConnector”,
“transforms”: [
“”
],
“errors.log.enable”: “true”,
“errors.log.include.messages”: “true”,
“topics”: [
“customer2”
],
“iceberg.tables”: [
“customer2”
]
}

kafka confluent is running in docker container.
In order to successfully run the connector , 2 jar files must be added to the iceberg connector lib directory . This is usually located in /usr/share/confluent_hub_components folder.
the two necessary jars are: aws-java-sdk-bundle-1.12.625.jar and hadoop-aws-3.3.6.jar

After these settings , i still got errors (aws credentials error) Then i realized that although i set the fs.s3a.endpoint in above settings (it points to my local minio) , connector tries to reach global aws.
The solution is to set environment variable AWS_S3_ENDPOINT in confluent. ( This can be done in docker-compose.yaml file in connect container part)

AWS_S3_ENDPOINT=http://192.168.1.8:9000

these are the other env variables that are needed:
AWS_SECRET_ACCESS_KEY=key_generated_in_minio
AWS_REGION=us-east-1
AWS_S3_ENDPOINT=http://192.168.1.8:9000
AWS_ACCESS_KEY_ID=keyid_generated_in_minio

with these settings iceberg sink run and data is ingested into iceberg. Tables then can be queried inside dremio.

balaji.ramaswamy · December 27, 2023, 5:30am

Thanks for the write up @tolgaevren

Barry · November 17, 2024, 4:51pm

but I can not query data on dremio. the data was loaded by kafak-connector.

connector.class=io.tabular.iceberg.connect.IcebergSinkConnector
tasks.max=2
topics=transactions
iceberg.tables.dynamic-enabled=true
iceberg.tables.route-field=table
iceberg.tables.auto-create-enabled=true
iceberg.tables.default-id-columns=id
iceberg.tables.upsert-mode-enabled=true
iceberg.tables.default-partition-by=day(date)`

balaji.ramaswamy · December 1, 2024, 9:18pm

@Barry Is the query failing? Or is the table not even showing up? Which catalog?

Topic		Replies	Views
Airbyte and minio	3	1115	December 13, 2023
Dremio Iceberg JDBC catalog	9	2010	May 24, 2023
Dremio cant read iceberg tables from minio s3 Dremio University	1	82	August 5, 2024
Unable to access iceberg data created through AWS glue and stored in s3 Dremio Cloud	7	1012	September 14, 2023
Unable to Query Iceberg Data in Dremio - Data Written by Kafka Connector Apache Iceberg	11	416	January 10, 2025

Confluent kafka iceberg sink connector local minio - solution

Related topics