AWS S3 access using EC2 role instead of Access Keys

Hi,

I’m looking at Dremio on an AWS environment. At the moment it seems the only authentication option for S3 sources is using an Access Key and Secret. We’re not able to use access keys due to security considerations, is it possible to configure Dremio to use an IAM Role attached to the cluster EC2 instances for permissions, as described here?
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html

thanks, Nathan

1 Like

Hey @Nathan_Griffiths this is on our radar. Haven’t gotten to it yet, we’ll reach out once available.

1 Like

Thanks Can, I’ll wait to hear back once this configuration is supported. cheers.

Could you just change the Anonymous client in S3FileSystem.java to in the: return new AmazonS3Client()? That way the SDK will attempt to build the credentials from the EC2 Instance meta-data? Then follow the instructions to rebuild and start up the server.

Just a follow up. I was able to get this to work by modifying the AmazonS3Client() here to AmazonS3Client(clientConf)

Then, updating this s3a configuration to: to

 final List<Property> finalProperties = new ArrayList<>();
finalProperties.add(new Property(FileSystem.FS_DEFAULT_NAME_KEY, "dremioS3:///"));
finalProperties.add(new Property("fs.dremioS3.impl", S3FileSystem.class.getName()));
finalProperties.add(new Property(MAXIMUM_CONNECTIONS, String.valueOf(DEFAULT_MAX_CONNECTIONS)));
finalProperties.add(new Property("fs.s3a.fast.upload", "true"));
finalProperties.add(new Property("fs.s3a.fast.upload.buffer", "disk"));
finalProperties.add(new Property("fs.s3a.fast.upload.active.blocks", "4")); // 256mb (so a single parquet file should be able to flush at once).
finalProperties.add(new Property("fs.s3a.threads.max", "24"));
finalProperties.add(new Property("fs.s3a.multipart.size", "67108864")); // 64mb
finalProperties.add(new Property("fs.s3a.max.total.tasks", "30"));
finalProperties.add(new Property("fs.s3a.server-side-encryption-algorithm", "AES256"));

if(accessKey != null){
  finalProperties.add(new Property(ACCESS_KEY, accessKey));
  finalProperties.add(new Property(SECRET_KEY, accessSecret));
} else {
  finalProperties.add(new Property("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider"));
}

if (properties != null && !properties.isEmpty()) {
  finalProperties.addAll(properties);
}

Now I can connect to S3 buckets via InstanceProfiles attached to EC2 im on. I select public bucket in the s3 menu on the UI (My buckets are not public) but otherwise dremio wants to use S3 Credentials to look for all buckets.

s3a has a sequence for looking for credential providers so by deleting the hardcoded credentialProvider that was setting it to annoymous, we allow s3a to cycle through a few options, including IAM InstanceProfiles.

Hey Andrew,

We have several different requests for further S3 control so we plan to enhance this similar to how you propose. We expect your change to work for your situation. Feel free to propose as a PR on Github if you like and we’ll evaluate inclusion in future versions of Dremio.

Thanks!

Do you have a development guide for the OSS platform? How to get the app running on development machine (MacOS, inteeliJ?) with hot reloading?
Thanks

Hi, any word on when this feature will be available ? I work for a large company that is on AWS and this is a blocker for any possible adoption of Dremio.

@Femi_Anthony should be available soon (actively being QAed right now), we’ll announce once available.

1 Like

Hi, just wondering if there’s been any further progress on this issue? cheers

Hey @Nathan_Griffiths yes! This is available in our 3.0 release.

Here is a screenshot of the updated S3 Auth options.

1 Like

Excellent news! Thank you.

How do connect to aws S3 using AWS Access key? I see an option for access & secret key but not a place to provide the s3:// hostname.

Any idea as how to connect to AWS s3 hosted on premis?

@madantv

You have to only enter the aws access and secret keys. Once you hit save button, Dremio will add all the buckets associated with your AWS account automatically (there is no need to specifically mention any bucket names).
If you only want a specific bucket to be added, you may go to advanced options and whitelist the same. Please see the document for more details.

We have temporary AWS credentials that expire in every hour (so ACCESS_KEY_ID, SECRET_KEY_ID and SESSION_TOKEN needs to be refreshed every hour) via STS (aws token service). Do you have any suggestions or recommendations on how do we setup Dremio S3 configuration to refresh the credentials every hour?

@librian If your source is configured with access key and secret key then the source needs to be modified via API or the UI