Kubernetes deployment and connecting to an S3 data lake

Hello all,

Our team is looking to evaluate Dremio and currently are in the process of deploying it within our Kubernetes cluster (through EKS). We have successfully setup the cluster using the following helm chart:

However, when attempting to connect to an S3 data source, we are struggling to have the Dremio cluster authenticate successfully to S3. We have setup a new IAM role with the following policy attached:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutBucketTagging",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:DeleteObject",
                "s3:DeleteBucket",
                "s3:CreateBucket"
            ],
            "Resource": [
                   "<s3 bucket names>"
            ]
        }
    ]
}

Within the S3 setup in the Dremio GUI, we have selected “EC2 Metadata” and entered the following for the IAM role (using the fully qualified ARN): arn:aws:iam::xxxxxxxxxxx:role/<role_name>

However, when we try this, we get the following error in the console:

Caused by: java.util.concurrent.ExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Credential Verification failed.

We suspect the issue is simply that the Dremio cluster isn’t authorized to assume the IAM role that we have created. However, we are struggling to identify the “right” way to do this. Ideally, this would be something we can modify within the helm values.yaml (i.e. include the IAM role to impersonate similar to how EC2 deployments work), but we aren’t sure if this is supported in the Kubernetes deployment or if there is another recommended way.

Any guidance would be greatly appreciated.

@rajiv-wiser Are you ok to send us the entire server.log?

Hello @balaji.ramaswamy , unfortunately it looks like the instance was taken down in favor of the AWS Marketplace version. However, I was able to gather the stack trace produced when the error occurred if that helps:

2021-07-14 04:34:30,131 [start-wiser-test-data-lakehouse] WARN  com.dremio.common.util.Retryer - Retry attempt 9 for the failure at com.dremio.plugins.s3.store.S3FileSystem:verifyCredentials:187, Error - Unable to parse Json String.
2021-07-14 04:34:31,463 [start-wiser-test-data-lakehouse] WARN  c.d.e.catalog.ManagedStoragePlugin - Error starting new source: wiser-test-data-lakehouse
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Credential Verification failed.
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
	at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3976)
	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4960)
	at com.dremio.exec.store.dfs.FileSystemPlugin.newFileSystem(FileSystemPlugin.java:360)
	at com.dremio.exec.store.dfs.FileSystemPlugin.createFS(FileSystemPlugin.java:348)
	at com.dremio.exec.store.dfs.FileSystemPlugin.createFS(FileSystemPlugin.java:344)
	at com.dremio.exec.store.dfs.FileSystemPlugin.createFS(FileSystemPlugin.java:335)
	at com.dremio.exec.store.dfs.FileSystemPlugin.start(FileSystemPlugin.java:686)
	at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$newStartSupplier$1(ManagedStoragePlugin.java:545)
	at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$nameSupplier$3(ManagedStoragePlugin.java:613)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Credential Verification failed.
	at com.dremio.plugins.s3.store.S3FileSystem.verifyCredentials(S3FileSystem.java:193)
	at com.dremio.plugins.s3.store.S3FileSystem.setup(S3FileSystem.java:173)
	at com.dremio.plugins.util.ContainerFileSystem.initialize(ContainerFileSystem.java:167)
	at com.dremio.exec.store.dfs.FileSystemPlugin$1.lambda$load$0(FileSystemPlugin.java:212)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at com.dremio.exec.store.dfs.FileSystemPlugin$1.load(FileSystemPlugin.java:217)
	at com.dremio.exec.store.dfs.FileSystemPlugin$1.load(FileSystemPlugin.java:194)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
	... 14 common frames omitted
Caused by: com.dremio.common.util.Retryer$OperationFailedAfterRetriesException: software.amazon.awssdk.core.exception.SdkClientException: Unable to parse Json String.
	at com.dremio.common.util.Retryer.call(Retryer.java:60)
	at com.dremio.plugins.s3.store.S3FileSystem.verifyCredentials(S3FileSystem.java:187)
	... 26 common frames omitted

If you still require the full server log, I can work with the team to stand the cluster back up as it was and recreate the issue.

Hi @rajiv-wiser
As far as I’m aware, Dremio doesn’t currently support IAM roles for Service Accounts in EKS.

So in order to make use of that IAM role, you’ll need to create a policy that allows your EKS nodes to assume the S3 role via their EC2 instance profile.
As well as this, you’ll need to ensure that the IMDSv2 hop limit for your nodes is 2, this will ensure the pod can reach the metadata service.

ec2_role
Add this policy to your worker node EC2 role

{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Action": "sts:AssumeRole",
		"Resource": "arn:aws:iam::xxxxxxxxxxx:role/s3_role"
	}]
}

s3_role
You have already created this role

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutBucketTagging",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:DeleteObject",
                "s3:DeleteBucket",
                "s3:CreateBucket"
            ],
            "Resource": [
                   "<s3 bucket names>"
            ]
        }
    ]
}

S3 Role - Trust Relationship
Update your existing S3 role to trust the EC2 worker node IAM role

{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Principal": {
			"AWS": "arn:aws:iam::xxxxxxxxxxx:role/ec2_role"
		},
		"Action": "sts:AssumeRole"
	}]
}

This isn’t an ideal situation as it would mean that other pods scheduled to the node can also assume your S3 read/write role. But unfortunately, that’s the only option at the moment.
I’ve gotten around this concern by using dedicated worker groups for Dremio, which is useful anyway as the resource requirements are different from our other workloads.

edit: formatting

Thank you for this. We’ll give it a try and follow up with what we experience.

Hello gpdenny,

I tried what you suggested:

  • Create an IAM Role with permissions on an s3 bucket.
  • Modify the Worker nodes IAM policies to assumeRole created.
  • Modify the trust relationship to assume the Worker node’s role.
  • Then, I updated the hop limits (to 3) in the VMs where Dremio got installed, in addition, I installed the AWS CLI version 2 in those nodes.

But, I’m still getting errors with the connection to S3:

ERROR c.dremio.exec.catalog.PluginsManager - Exception while creating source.
com.dremio.common.exceptions.UserException: Could not connect to S3 source. Check your S3 data source settings and credentials.

Caused by: java.util.concurrent.ExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Credential Verification failed.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)

Caused by: com.dremio.common.util.Retryer$OperationFailedAfterRetriesException: software.amazon.awssdk.core.exception.SdkClientException: Unable to parse Json String.
at com.dremio.common.util.Retryer.call(Retryer.java:60)

Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
at [Source: (String)""; line: 1, column: 0]

ERROR c.d.exec.catalog.CatalogServiceImpl - Exception encountered: Could not connect to S3 source. Check your S3 data source settings and credentials.
com.dremio.common.exceptions.UserException: Could not connect to S3 source. Check your S3 data source settings and credentials.

Can we have another review to check this? any other suggestions?

Thanks in advance,