I am currently using Dremio 24.2 on AKS.
In our deployment, we are using Azure Storage as the main source where to store files.
We have the requirement to be able to rollback transactions, and therefore we added Nessie to provide Transactional operations cross table.
Since Nessie only supports S3, we have configured S3Proxy to be able to communicate with Azure Storage via an S3-like interface, so that basically the communication flow is Dremio → Nessie → S3Proxy → Azure
All these steps seems to work, for read operations, but we are encountering and error when it comes to write operations
com.dremio.common.exceptions.UserException: org.apache.iceberg.exceptions.RuntimeIOException: Failed to write manifest list file
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:926)
at com.dremio.common.exceptions.UserException$Builder.buildSilently(UserException.java:993)
at com.dremio.exec.planner.sql.handlers.direct.CreateEmptyTableHandler.callCatalogCreateEmptyTableWithCleanup(CreateEmptyTableHandler.java:129)
at com.dremio.exec.planner.sql.handlers.direct.CreateEmptyTableHandler.createEmptyTable(CreateEmptyTableHandler.java:219)
at com.dremio.exec.planner.sql.handlers.direct.CreateEmptyTableHandler.toResult(CreateEmptyTableHandler.java:97)
at com.dremio.exec.planner.sql.handlers.commands.DirectWriterCommand.plan(DirectWriterCommand.java:100)
at com.dremio.exec.work.foreman.AttemptManager.plan(AttemptManager.java:571)
at com.dremio.exec.work.foreman.AttemptManager.lambda$run$4(AttemptManager.java:462)
at com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3(ReleasableBoundCommandPool.java:140)
at com.dremio.service.commandpool.CommandWrapper.run(CommandWrapper.java:70)
at com.dremio.context.RequestContext.run(RequestContext.java:109)
at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$4(ContextMigratingExecutorService.java:226)
at com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run(ContextMigratingExecutorService.java:206)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: org.apache.iceberg.exceptions.RuntimeIOException: Failed to write manifest list file
at com.dremio.exec.store.iceberg.model.IcebergTableCreationCommitter.commit(IcebergTableCreationCommitter.java:93)
and further down:
Caused by: java.io.IOException: regular upload failed: java.lang.IllegalArgumentException: Input is expected to be encoded in multiple of 2 bytes but found: 17
at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:346)
at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.putObject(S3ABlockOutputStream.java:577)
at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:392)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at com.dremio.exec.hadoop.FSDataOutputStreamWrapper.close(FSDataOutputStreamWrapper.java:85)
at com.dremio.io.FilterFSOutputStream.close(FilterFSOutputStream.java:101)
at com.dremio.exec.store.dfs.LoggedFileSystem$LoggedFSOutputStream.close(LoggedFileSystem.java:318)
at com.dremio.exec.store.iceberg.DremioOutputFile$PositionOutputStreamWrapper.close(DremioOutputFile.java:113)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:461)
at org.apache.iceberg.avro.AvroFileAppender.close(AvroFileAppender.java:94)
at org.apache.iceberg.ManifestListWriter.close(ManifestListWriter.java:65)
at org.apache.iceberg.SnapshotProducer.$closeResource(SnapshotProducer.java:315)
at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:315)
... 28 common frames omitted
The error Input is expected to be encoded in multiple of 2 bytes but found: 17
is due to the fact that Azure Storage does not return the MD5 sum for ETag (see here for more details) and the solution would be to be able to specify on the S3 Client that Dremio uses internally the option:
System.setProperty("com.amazonaws.services.s3.disablePutObjectMD5Validation", "1");
In order to skip the MD5 Checksum validation. Is there a way to do this?
Thanks for the support
Riccardo