Dremio Distributed Storage Question

I’m trying to configure dremio such that reflections are stored in an S3 bucket.

I’ve updated core-site.xml and am wondering what needs to change in the paths section of the dremio.conf file.

I have attributes for:
local: /app/data
dist:s3a://
db:[${paths.local}/spill]

I’m not sure what each of these is used for. Can anyone provide some clarity as to what each of these parameters does? Am I using this correctly?

Both the db and spill paths are local paths, meaning you’d want to configure them to point to either network attached storage (NAS) or storage attached directly to the Dremio node.

For executors, the spill directory is where Dremio will spill records to disk if they can’t fit in memory while a query is executing really large sorts or aggregations.

Th db path is where you’re Dremio metadata is stored on the coordinator.

On the other hand, dist paths point some distributed storage (like S3 or HDFS) where query results, reflections (accelerator), user uploads, downloads and scratch tables are stored.

See our docs for more information on dist and local

I would rewrite your config to eliminate the explicit reference to db, so it would just have:

local: "/app/data"
dist: "dremioS3:///<bucket_name>/<folder1>/<folder1>"

All the local directories will now be under /app/data, like db and spill

See our docs for more information about the distributed S3 storage