Read iceberg table from S3

Hi, I have a dremio install running with an S3 data lake. This contains an Iceberg dataset created by Nessie

s3://<my-bucket>/database/messages

And inside I have metadata and data folders.

s3://<my-bucket>/database/messages/data/<file>.parquet

s3://<my-bucket>/database/messages/metadata/<meta>.json
s3://<my-bucket>/database/messages/metadata/<meta>.avro

When I click the dremio button to connect it as table, it does recognize Iceberg format, but when I click save, it shows:

Failed to get iceberg metadata.

The only reference to this error I could find is here: Dremio
But it seems to be related to existing tables, not ones being converted.

I tried enabling flags as stated here: Dremio
But also no luck!

I tried changing many things, including adding S3 connection properties, but it still fails to load table with same error. (iceberg.catalog_type = nessie and iceberg.namespace = ltm_db)

It seems that Dremio is trying to fetch table with the whole s3 path as a table name and it is crashing.
See error message below

Anyone has an idea on how to make it work?

022-03-14 11:15:24,769 [grpc-default-executor-49] ERROR c.d.s.nessie.ContentsApiService - GetContents failed with a NessieNotFoundException.
org.projectnessie.error.NessieContentsNotFoundException: Could not find contents for key ‘ltm_db./dummy-data-lake-bucket/datalake_dev/ltm_db’ in reference ‘main’.
at org.projectnessie.services.impl.ContentsApiImpl.getContents(ContentsApiImpl.java:61)
at com.dremio.service.nessie.ContentsApiService.getContents(ContentsApiService.java:47)
at com.dremio.service.nessieapi.ContentsApiGrpc$MethodHandlers.invoke(ContentsApiGrpc.java:232)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.opentracing.contrib.grpc.TracingServerInterceptor$2.onHalfClose(TracingServerInterceptor.java:231)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
2022-03-14 11:15:24,770 [qtp1612894078-150] DEBUG c.d.e.s.i.n.IcebergNessieTableOperations - Metadata location was not found for table: ltm_db./dummy-data-lake-bucket/datalake_dev/ltm_db
127.0.0.1 - - [14/Mar/2022:11:15:24 +0000] “PUT /apiv2/source/poc/folder_format/dummy-data-lake-bucket/datalake_dev/ltm_db HTTP/1.1” 400 181 “http://localhost:58803/source/poc/folder/dummy-data-lake-bucket/datalake_dev/ltm_db” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36”
2022-03-14 11:15:25,397 [scheduler-17] DEBUG c.d.e.s.options.SystemOptionManager - Background fetch system option from kv store started.
2022-03-14 11:15:25,397 [scheduler-17] DEBUG c.d.e.s.options.SystemOptionManager - Up to now, there are 353682 cache calls and 1406 kv store call for system options
10.5.176.46 - - [14/Mar/2022:11:15:26 +0000] “GET / HTTP/1.1” 200 2591 “-” “kube-probe/1.21+”

@joao Are the tables already promoted? If yes, can you unpromote ( remove formatting) and promote again and see if the issue goes away?

Hi @balaji.ramaswamy

No, the tables can’t be promoted because of the issue above.

I have the same problem. Can’t promote an Iceberg table from S3 (local minio). The table has been created by Trino.

@zsvoboda The Iceberg table is stored in Minio, do know which tool/version created it? Example Spark 3.2

In our experience, there is a bug in Dremio with Iceberg reading from an s3 datalake. The only solution that worked for us was using a Hive metastore…

There’s an Iceberg catalog associated to every Iceberg table. This is described here: Apache Iceberg: An Architectural Look Under the Covers | Dremio

You can get the error “Failed to get iceberg metadata” when promoting an Iceberg table if Dremio can’t determine the “current metadata pointer”.

So for example, if you create an Iceberg table using Nessie and you point Dremio to the physical table, there will be no “version-hint.text” to indicate which metadata file to use.

@Benny_Chow Is there any documentation that provides a minimal example of how to setup a basic Iceberg table for read/write from Dremio? I am struggling with the same issue.

Hi @mg66 . You might wanna look into Getting Started with Apache Iceberg Using AWS Glue and Dremio | Dremio to get started. With Dremio v22(coming soon) you will be able to do the DML operations.

Thanks for the link @Dipankar_Mazumdar. Is there any documentation for using a catalog that can be used outside of AWS?

1 Like