Dremio uses paths.dist as the cache location for holding accelerator, CREATE TABLE AS tables, job result, download and upload data, as indicated in documentation.
I would like to be able to keep accelerator data on local dremio nodes,
but use S3 for downloads results scratch and uploads.
Is it possible to specify that in dremio.conf ?
Please look here: http://docs.dremio.com/advanced-administration/configuration-files.html?h=dremio.conf
Not that you can keep acceleration data completely node local, but you can use different location for acceleration data and some others
Indeed. Thanks a lot @yufeldman !
By “node local” you mean local storage that is shared among all the nodes right? I think it would be great to use ephemeral storage on AWS servers for the acceleration cache but it’s local to a single node. Or would that work?
I understand by “node local”, as the disk space on local Dremio nodes. Dremio is using pdfs to distribute the storage over the nodes.
This storage plugin allows to query local filesystem metadata on remote nodes instances. However, it only allows to open and/or create file on the local filesystem.
I see where this would allow other nodes to read parquet files written locally to a node.
I got this reply when I asked this question…
But your reply makes sense as well.
@swarren @dfleckinger hope this helps:
Dremio has 2 modes for distributed storage (reflections, downloads, etc. ):
- Use local node disks. In this mode, Dremio leverages available disk space across nodes to store data and also allow remote reads on this data across the cluster. However, we do not do replication (think HDFS) – which means if a node goes down, the reflection will be marked as invalid and will need to be refreshed.
- Use a shared storage layer (recommended for production): S3, HDFS, ADLS, etc.