Using S3 as a Distributed Storage for Reflections

I have configured my dremio.conf file so that my distributed storage goes to an s3 bucket, as follows:

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  dist: "s3://datasprints-dremio-test-dist-storage/"
}

And configured the core-site.xml file with the access key and secret key of the bucket owner, as follows:

[...]
    <name>fs.s3a.access.key</name>
    <description>AWS access key ID.</description>
    <value>I HAVE PUT MY ACCESS KEY HERE</value>
[...]
    <name>fs.s3a.secret.key</name>
    <description>AWS secret key.</description>
    <value>I HAVE ALSO PUT MY SECRET KEY HERE</value>
[...]

Despite that, dremio stores them locally, in a folder called pdfs (which is the default storage directory).
Here you can see the terminal output:

[ec2-user@ip-**-*-***-*** dremio]$ sudo updatedb
[ec2-user@ip-**-*-***-*** dremio]$ sudo locate dremio | grep pdfs
[...]
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_22.parquet
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_23.parquet
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_24.parquet
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_25.parquet
[...]

I have already reseted the machine after changing the config files.
So, what exactly am I doing wrong here?
Why is it storing locally instead of storing on the S3 bucket specified?

edit: this is the job profile of one of those reflection queries that was stored locally
6ef909c5-f953-4227-adcb-1162cae77c19.zip (11.7 KB)

P.S: Just to give a definitive answer to this thread, the path syntax that i tested and that worked are:

dremioS3:///path/to/folder (notice the three slashes - ///, and the capital S in S3 - S)

s3a://path/to/folder

1 Like

hi, any explanation about the 3 slashes at the start ? I thought it was a typo in the documentation.

1 Like

Did you ever solve this? We are stuck in the same spot as you. Everything is setup just as documented and we are currently getting:

Caused by: java.io.IOException: Unable to find bucket named xxx-xxx-xxx-xxx.
at com.dremio.plugins.util.ContainerFileSystem.getFileSystemForPath(ContainerFileSystem.java:295) ~[dremio-s3-plugin-3.0.8-201812270118560286-801500d.jar:3.0.8-201812270118560286-801500d]

In my dremio.conf, I used

dist: s3a://xxx/yyy

where xxx is our S3 bucket name and yyy is the path to where our dremio folders (accerlator, etc) are stored.

It works as expected.

2 Likes

@robbob,
Which version of Dremio are you using? (you can go to Help -> About Dremio in UI)

In dremio.conf, what does your dist path have for a value?

Is your S3 policy as described in our docs? https://docs.dremio.com/deployment/distributed-storage.html#amazon-s3