I have configured my dremio.conf file so that my distributed storage goes to an s3 bucket, as follows:
paths: {
# the local path for dremio to store data.
local: "/var/lib/dremio"
# the distributed path Dremio data including job results, downloads, uploads, etc
dist: "s3://datasprints-dremio-test-dist-storage/"
}
And configured the core-site.xml file with the access key and secret key of the bucket owner, as follows:
[...]
<name>fs.s3a.access.key</name>
<description>AWS access key ID.</description>
<value>I HAVE PUT MY ACCESS KEY HERE</value>
[...]
<name>fs.s3a.secret.key</name>
<description>AWS secret key.</description>
<value>I HAVE ALSO PUT MY SECRET KEY HERE</value>
[...]
Despite that, dremio stores them locally, in a folder called pdfs (which is the default storage directory).
Here you can see the terminal output:
[ec2-user@ip-**-*-***-*** dremio]$ sudo updatedb
[ec2-user@ip-**-*-***-*** dremio]$ sudo locate dremio | grep pdfs
[...]
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_22.parquet
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_23.parquet
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_24.parquet
/var/lib/dremio/pdfs/accelerator/dc1ee27b-dcf8-44ce-9ece-459701c9dd07/5febd106-3706-44dd-815a-c4fccd8668bd_0/1_0_25.parquet
[...]
I have already reseted the machine after changing the config files.
So, what exactly am I doing wrong here?
Why is it storing locally instead of storing on the S3 bucket specified?
edit: this is the job profile of one of those reflection queries that was stored locally
6ef909c5-f953-4227-adcb-1162cae77c19.zip (11.7 KB)
P.S: Just to give a definitive answer to this thread, the path syntax that i tested and that worked are:
dremioS3:///path/to/folder (notice the three slashes - ///, and the capital S in S3 - S)
s3a://path/to/folder