Clarification on reflections and local storage

I set up a 6 node Dremio cluster on bare metal Ubuntu machines; all nodes are storing data in a local directory name /data/dremio and have the executor role, with the exception of the master that has both the executor and the coordinator roles.

I created a reflection for a dataset and it was written to a directory with a UUID on one of the executors; assuming that the original dataset does not change, if I want to make the reflection available to all the executors for testing purposes can I simply copy the reflection directory to the other nodes?

Dremio comes pre-configured using what we call PDFS out of the box. This stands for pseudo-distributed filesystem and allows Dremio to stripe a dataset across a collection of nodes. In the case you described, Dremio can see that data from all nodes but will always schedule reading of the data locally to that node (what we call “hard affinity”). Dremio will use the created reflection as long as all the nodes that were involved in the reflection are still running when the reflection is considered. If some of the nodes are removed from the cluster after creation of the reflection, Dremio will fall back to read from the original source until the reflection is regenerated.

If you write a larger reflection and/or enable reflection partitioning, you will likely see portions of the dataset be spread across all nodes. In your case, it is likely that the dataset was small enough that Dremio happened to use only a single machine for persistence of the reflection.

4 Likes