Confluent kafka iceberg sink connector local minio - solution

Hi, I am writing this in case if someone tried or is trying confluent kafka iceberg sink connector to write data into iceberg and may have issues like i have. this is the solution that worked for me:

configuration: source : postgre db or any other
target : iceberg catalog in local minio storage with hadoop catalog type.
this is my working connector config file :

{
“iceberg.catalog.s3a.endpoint”: “http://192.168.1.8:9000”,
“iceberg.catalog.s3.endpoint”: “http://192.168.1.8:9000”,
“iceberg.catalog.io-impl”: “org.apache.iceberg.aws.s3.S3FileIO”,
“iceberg.hadoop.fs.s3a.aws.credentials.provider”: “org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider”,
“iceberg.fs.defaultFS”: “s3a://data2/”,
“iceberg.catalog.uri”: “http://192.168.1.8:9000”,
“iceberg.hadoop.fs.s3a.path.style.access”: “true”,
“iceberg.hadoop.fs.s3a.endpoint”: “http://192.168.1.8:9000”,
“iceberg.fs.s3a.impl”: “org.apache.hadoop.fs.s3a.S3AFileSystem”,
“iceberg.catalog.warehouse”: “s3a://data2/”,
“iceberg.catalog.type”: “hadoop”,
“iceberg.hadoop.fs.s3a.connection.ssl.enabled”: “false”,
“fs.s3a.endpoint”: “http://192.168.1.8:9000”,
“name”: “icebergsink2”,
“connector.class”: “io.tabular.iceberg.connect.IcebergSinkConnector”,
“transforms”: [
“”
],
“errors.log.enable”: “true”,
“errors.log.include.messages”: “true”,
“topics”: [
“customer2”
],
“iceberg.tables”: [
“customer2”
]
}

kafka confluent is running in docker container.
In order to successfully run the connector , 2 jar files must be added to the iceberg connector lib directory . This is usually located in /usr/share/confluent_hub_components folder.
the two necessary jars are: aws-java-sdk-bundle-1.12.625.jar and hadoop-aws-3.3.6.jar

After these settings , i still got errors (aws credentials error) Then i realized that although i set the fs.s3a.endpoint in above settings (it points to my local minio) , connector tries to reach global aws.
The solution is to set environment variable AWS_S3_ENDPOINT in confluent. ( This can be done in docker-compose.yaml file in connect container part)

AWS_S3_ENDPOINT=http://192.168.1.8:9000

these are the other env variables that are needed:
AWS_SECRET_ACCESS_KEY=key_generated_in_minio
AWS_REGION=us-east-1
AWS_S3_ENDPOINT=http://192.168.1.8:9000
AWS_ACCESS_KEY_ID=keyid_generated_in_minio

with these settings iceberg sink run and data is ingested into iceberg. Tables then can be queried inside dremio.

2 Likes

Thanks for the write up @tolgaevren