Different distributed storage location for cache locations

dfleckinger · May 17, 2018, 7:01pm

Hi,
Dremio uses paths.dist as the cache location for holding accelerator, CREATE TABLE AS tables, job result, download and upload data, as indicated in documentation.
I would like to be able to keep accelerator data on local dremio nodes,
but use S3 for downloads results scratch and uploads.
Is it possible to specify that in dremio.conf ?

Thanks

yufeldman · May 17, 2018, 7:05pm

Please look here: http://docs.dremio.com/advanced-administration/configuration-files.html?h=dremio.conf

Not that you can keep acceleration data completely node local, but you can use different location for acceleration data and some others

dfleckinger · May 18, 2018, 6:28am

Indeed. Thanks a lot @yufeldman !

swarren · May 18, 2018, 6:48am

By “node local” you mean local storage that is shared among all the nodes right? I think it would be great to use ephemeral storage on AWS servers for the acceleration cache but it’s local to a single node. Or would that work?

dfleckinger · May 18, 2018, 7:06am

I understand by “node local”, as the disk space on local Dremio nodes. Dremio is using pdfs to distribute the storage over the nodes.

swarren · May 18, 2018, 1:27pm

This storage plugin allows to query local filesystem metadata on remote nodes instances. However, it only allows to open and/or create file on the local filesystem.

I see where this would allow other nodes to read parquet files written locally to a node.

I got this reply when I asked this question…

But your reply makes sense as well.

can · May 18, 2018, 6:52pm

@swarren @dfleckinger hope this helps:

Dremio has 2 modes for distributed storage (reflections, downloads, etc. ):

Use local node disks. In this mode, Dremio leverages available disk space across nodes to store data and also allow remote reads on this data across the cluster. However, we do not do replication (think HDFS) – which means if a node goes down, the reflection will be marked as invalid and will need to be refreshed.
Use a shared storage layer (recommended for production): S3, HDFS, ADLS, etc.

Topic		Replies	Views
Selectively push reflections/accelerator data to s3 distributed storage backend	1	984	August 28, 2018
Is reflection storage intended to be private per node or per cluster?	4	1646	May 16, 2018
Dremio Distributed Storage Question	1	1495	June 28, 2019
Dremio storage ha	8	1823	October 15, 2018
Saving Reflections in S3	2	2009	June 7, 2019

Different distributed storage location for cache locations

Related topics