I’m looking at Dremio on an AWS environment. At the moment it seems the only authentication option for S3 sources is using an Access Key and Secret. We’re not able to use access keys due to security considerations, is it possible to configure Dremio to use an IAM Role attached to the cluster EC2 instances for permissions, as described here? https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html
Could you just change the Anonymous client in S3FileSystem.java to in the: return new AmazonS3Client()? That way the SDK will attempt to build the credentials from the EC2 Instance meta-data? Then follow the instructions to rebuild and start up the server.
final List<Property> finalProperties = new ArrayList<>();
finalProperties.add(new Property(FileSystem.FS_DEFAULT_NAME_KEY, "dremioS3:///"));
finalProperties.add(new Property("fs.dremioS3.impl", S3FileSystem.class.getName()));
finalProperties.add(new Property(MAXIMUM_CONNECTIONS, String.valueOf(DEFAULT_MAX_CONNECTIONS)));
finalProperties.add(new Property("fs.s3a.fast.upload", "true"));
finalProperties.add(new Property("fs.s3a.fast.upload.buffer", "disk"));
finalProperties.add(new Property("fs.s3a.fast.upload.active.blocks", "4")); // 256mb (so a single parquet file should be able to flush at once).
finalProperties.add(new Property("fs.s3a.threads.max", "24"));
finalProperties.add(new Property("fs.s3a.multipart.size", "67108864")); // 64mb
finalProperties.add(new Property("fs.s3a.max.total.tasks", "30"));
finalProperties.add(new Property("fs.s3a.server-side-encryption-algorithm", "AES256"));
if(accessKey != null){
finalProperties.add(new Property(ACCESS_KEY, accessKey));
finalProperties.add(new Property(SECRET_KEY, accessSecret));
} else {
finalProperties.add(new Property("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider"));
}
if (properties != null && !properties.isEmpty()) {
finalProperties.addAll(properties);
}
Now I can connect to S3 buckets via InstanceProfiles attached to EC2 im on. I select public bucket in the s3 menu on the UI (My buckets are not public) but otherwise dremio wants to use S3 Credentials to look for all buckets.
s3a has a sequence for looking for credential providers so by deleting the hardcoded credentialProvider that was setting it to annoymous, we allow s3a to cycle through a few options, including IAM InstanceProfiles.
We have several different requests for further S3 control so we plan to enhance this similar to how you propose. We expect your change to work for your situation. Feel free to propose as a PR on Github if you like and we’ll evaluate inclusion in future versions of Dremio.
Hi, any word on when this feature will be available ? I work for a large company that is on AWS and this is a blocker for any possible adoption of Dremio.
You have to only enter the aws access and secret keys. Once you hit save button, Dremio will add all the buckets associated with your AWS account automatically (there is no need to specifically mention any bucket names).
If you only want a specific bucket to be added, you may go to advanced options and whitelist the same. Please see the document for more details.
We have temporary AWS credentials that expire in every hour (so ACCESS_KEY_ID, SECRET_KEY_ID and SESSION_TOKEN needs to be refreshed every hour) via STS (aws token service). Do you have any suggestions or recommendations on how do we setup Dremio S3 configuration to refresh the credentials every hour?