Use 24.3 and 17.2.7 ceph clusters to configure distributed storage . Service: Amazon S3; Status Code: 400; Error Code: XAmzContentSHA256Mismatch

Use 24.3 CE and 17.2.7 ceph clusters to configure distributed storage, an error message is reported when you start the cluster and quit in error. I’m sure the configuration of the s3 is correct.

org.apache.hadoop.fs.s3a.AWSBadRequestException: PUT 0-byte object  on dist/uploads: com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 400; Error Code: XAmzContentSHA256Mismatch; Request ID: tx0000068258129cbf14590-00658a7371-76ebb-xibahe; S3 Extended Request ID: 76ebb-xibahe-sjzx; Proxy: null), S3 Extended Request ID: 76ebb-xibahe-sjzx:XAmzContentSHA256Mismatch: null (Service: Amazon S3; Status Code: 400; Error Code: XAmzContentSHA256Mismatch; Request ID: tx0000068258129cbf14590-00658a7371-76ebb-xibahe; S3 Extended Request ID: 76ebb-xibahe-sjzx; Proxy: null)
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:119)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:322)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
        at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:318)
        at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:293)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:4536)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.access$1900(S3AFileSystem.java:260)
        at org.apache.hadoop.fs.s3a.S3AFileSystem$MkdirOperationCallbacksImpl.createFakeDirectory(S3AFileSystem.java:3465)
        at org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:121)
        at org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:45)
        at org.apache.hadoop.fs.s3a.impl.ExecutingStoreOperation.apply(ExecutingStoreOperation.java:76)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2341)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2360)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:3432)
        at com.dremio.plugins.util.ContainerFileSystem.mkdirs(ContainerFileSystem.java:476)
        at com.dremio.exec.hadoop.HadoopFileSystem.mkdirs(HadoopFileSystem.java:291)
        at com.dremio.io.file.FilterFileSystem.mkdirs(FilterFileSystem.java:92)
        at com.dremio.exec.store.dfs.LoggedFileSystem.mkdirs(LoggedFileSystem.java:127)
        at com.dremio.dac.homefiles.HomeFileSystemStoragePlugin.start(HomeFileSystemStoragePlugin.java:106)
        at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$newStartSupplier$2(ManagedStoragePlugin.java:623)
        at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$nameSupplier$4(ManagedStoragePlugin.java:694)
        at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$refreshState$8(ManagedStoragePlugin.java:1080)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

@as10128

The extended request ID would usually tell what the error means but only AWS support has the error descriptions. Can you check if you can write something from the Dremio coordinator and executor to the bucket defined in dist:/// location?

Also want to check if the syntax is right in dremio.conf, are you able to send that? You also need a core-site.xml with the entries in the document below, can you please make sure that nothing is missed?

@as10128 From the Dremio executor and coordinator, are you able to read/write into that bucket?

I am deleting your last update as it contains your access and secret keys

@balaji.ramaswamy Yes, it can be accessed.

@as10128 Sorry, one last thing. Are you able to send bucker policies?

@balaji.ramaswamy
I use Ceph Object Storage. I don’t have bucker policies configured, the user has full permissions by default. I use the s3cmd tool to look at the policy:

s3cmd info s3://bucket1

s3://bucket1/ (bucket):
   Location:  sjzx
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    none
   CORS:      none
   ACL:       superuser: FULL_CONTROL

@as10128 Is Ceph object storage consudered a S3 compatible storage

@balaji.ramaswamy
Yes. Provides object storage functionality with an interface that is compatible with a large subset of the Amazon S3 RESTful API.
https://docs.ceph.com/en/latest/radosgw/s3/

@as10128 Can you please try adding the below and see if the connection works

<property>
    <name>dremio.s3.compat</name>
    <value>true

@balaji.ramaswamy I followed the documentation configuration for “Configuring Dremio for Minio” and I have added this parameter.

@as10128 One last test if it is possible, does connecting to Minio or regular S3 work? This is just to narrow down if this is specific to CEPH

Also any luck getting the error description from AWS on what the extended request id is?

@balaji.ramaswamy Minio is normal. But I don’t have a minio cluster, just a test environment.

@as10128 Minio is also considered S3 compatible. Are you able to connect to any other S3 other than CEPH? If you do not have one, that is totally fine, was just checking

I had the same Error in my cluster. Seemed to be a Ceph error, not a dremio failure.
My guess was that it corresponds with 2270529 – [rgw][s3-tests]: test_object_anon_put_write_access failed with "An error occurred (XAmzContentSHA256Mismatch) when calling the PutObject operation".

=> My fix was to update ceph to version 19.1.1 in the kubernetes cluster :slight_smile: Then the “XAmzContentSHA256Mismatch” was gone and the connection worked correctly.

Thanks for the update @schwarztrinker this will help if someone wants to connect via CEPH