Hi,
Dremio uses paths.dist as the cache location for holding accelerator, CREATE TABLE AS tables, job result, download and upload data, as indicated in documentation.
I would like to be able to keep accelerator data on local dremio nodes,
but use S3 for downloadsresultsscratch and uploads.
Is it possible to specify that in dremio.conf ?
By “node local” you mean local storage that is shared among all the nodes right? I think it would be great to use ephemeral storage on AWS servers for the acceleration cache but it’s local to a single node. Or would that work?
This storage plugin allows to query local filesystem metadata on remote nodes instances. However, it only allows to open and/or create file on the local filesystem.
I see where this would allow other nodes to read parquet files written locally to a node.
Dremio has 2 modes for distributed storage (reflections, downloads, etc. ):
Use local node disks. In this mode, Dremio leverages available disk space across nodes to store data and also allow remote reads on this data across the cluster. However, we do not do replication (think HDFS) – which means if a node goes down, the reflection will be marked as invalid and will need to be refreshed.
Use a shared storage layer (recommended for production): S3, HDFS, ADLS, etc.