We’ve seen it a few different ways, and it depends on the sophistication of your data consumers.
One model is that reflection admin is purely a data engineering job. In this model, people who log into Dremio to search the catalog, find datasets, and build new datasets have the opportunity to “upvote” the need for reflections on a dataset:
Then the admin can see all the votes (Enterprise Edition feature):
Data engineers can then reason about deciding which reflections would provide the best value in terms of resource utilization. This is an area of the product that we are developing to be more automated, and to provide smarter recommendations across datasets and workloads. Today the data engineer makes these decisions. In practice, a relatively small number of reflections can typically accelerate a wide range of workloads.
In another model the data consumer decides for themselves what to accelerate. For this to work the data consumer needs to be more sophisticated in order to decide whether, for example, an aggregation reflection or a raw reflection is more appropriate for their queries. Whether sorting the data would be beneficial, etc. In this scenario there tends to be more reflections created, however this can be fine as the cost is primarily storage and reflection maintenance. Dremio is pretty smart about optimizing the daisy chaining of reflection updates to minimize the load on the source system.
In both scenarios Dremio’s workload management features (Enterprise Edition) can help you mange how resources are allocated to reflection maintenance jobs vs. user jobs.
Does that help?