Saving Reflections in S3

I am have an S3 bucket configured as a data source. When files are dropped into this bucket I am able to open them in Dremio. I’d like to save my virtual data sets and reflections in different folders in the same S3 bucket but have not been able to get this configured correctly. Is there some guidance you can provide that will allow me to configure this behavior?

@summersmd

VDS are simply SQL definitions that are stored in spaces that are just entries in our internal DB

To store reflections on S3, please use documentation below

Reflections on S3

Thanks
@balaji.ramaswamy

1 Like

I’ve updated our dremio.conf file as follows:

# Copyright (C) 2017-2018 Dremio Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

paths: {
  # the local path for dremio to store data.
  #local: "/app/dremio/data"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
  dist: "dremioS3:///xxx-yyy-us-e1-nprod-project.s3.amazonaws.com/Dremio_poc"

  # location for catalog database (if master node)
  db: ${paths.local}/db,

  spilling: [${paths.local}/spill]

  # storage area for the accelerator cache.
  accelerator: ${paths.dist}/accelerator

  # staging area for json and csv ui downloads
  downloads: ${paths.dist}/downloads

  # stores uploaded data associated with user home directories
  uploads: ${paths.dist}/uploads

  # stores data associated with the job results cache.
  results: ${paths.dist}/results

  # shared scratch space for creation of tables.
  scratch: ${paths.dist}/scratch
}

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true
}

Where xxx-yyy-us-e1-nprod-project.s3.amazonaws.com/Dremio_poc is the url to our S3 bucket.

I'm not sure what to do with the following lines since we've commented out   #local: "/app/dremio/data":
  # location for catalog database (if master node)
  db: ${paths.local}/db,

  spilling: [${paths.local}/spill]

I've updated our core-site.xml file with our access and secret keys.

Dremio will not start after making this configuration change. Server.out contains:

Dremio is exiting. Failure while starting services.
com.dremio.common.exceptions.UserException: No Cluster Identity found
        at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:773)
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:170)
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:160)
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:97)
Fri Jun  7 16:22:06 UTC 2019 Starting dremio on xxx.yyy.com
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 124995
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 124995
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Server.log contains:
2019-06-07 16:22:57,888 [main] ERROR ROOT - Dremio is exiting. Failure while starting services.
com.dremio.common.exceptions.UserException: No Cluster Identity found
        at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:773) ~[dremio-common-3.2.3-201905301727340724-7419ad0.jar:3.2.3-201905301727340724-7419ad0]
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:170) ~[dremio-dac-daemon-3.2.3-201905301727340724-7419ad0.jar:3.2.3-201905301727340724-7419ad0]
        at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:160) ~[dremio-dac-daemon-3.2.3-201905301727340724-7419ad0.jar:3.2.3-201905301727340724-7419ad0]
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:97) ~[dremio-dac-daemon-3.2.3-201905301727340724-7419ad0.jar:3.2.3-201905301727340724-7419ad0]

We're looking to have our reflections saved to our S3 bucket.  

Is it possible to have more than 1 storage location?  Some reflections get saved to local disk while others are saved to S3?

How can we get Dremio's parquet files into S3?