I’m sure I’m missing something obvious. I created a VDS, then created a raw reflection that includes all columns as “Display”. However, when I do a “select *” on the VDS, the reflection is not used – it appears in the job status as “Did not cover query”.
I also created an aggregation reflection but it too is not being used on a simple aggregated query.
Reflections are associated with the query tree that defines the VDS rather than the VDS itself. This is generally beneficial as it allows the matching process to match on reflections that may be associated with ancestor or sibling datasets, allowing for more frequent cache hits.
In your example, the underlying query for the VDS is fairly complex with several joins, aggregations, and sub-queries. As a result, the matching process to determine whether the reflection covers your query is more complex and sometimes will not match. The more simple the underlying query for a VDS the more likely the match will be successful.
There are some enhancements to the matching algorithm we are planning for a release in the near future that will consider the relationship of the VDS to the reflection. We will report back here to let you know when you can give it a try.
Thank you Kelly. That makes sense, but is disappointing. I was hoping to be able to use Dremio to materialize this complex, long running query and join it with other sources to make it query-able in Tableau.
Is there anything to guide me on how complex the VDS query can be before its reflections won’t be used? Or more detail on why a particular reflection can’t cover a query?
The enhancement you mentioned sounds like it would be beneficial.
@Joe sorry you ran into this. We’re actively working on an improvement here so that the complexity of a VDS is irrelevant for reflection matching purposes when you are directly querying that VDS. As long as it has relevant reflections your query should get accelerated (e.g. raw/agg reflection on VDS, as you mentioned, should match if applicable).
Thank you, Can. That should greatly improve the usefulness.
Incidentally, are there any plans to introduce an explain plan with the query window? It would be great to be able to see which reflections will be used before actually executing the query (as well as seeing which underlying databases will be called, etc.).
We’re also thinking of a few ways to do “what-if” type analysis for reflections/queries in general. However, we don’t have a timeline on this yet – not short term.
And just for me to understand, is the plan different for creating a reflection on a VDS from the plan for running the query in the underlying VDS? I ask because I have seen the query gets accelerated but the reflection doesn’t when there is either a join or there is a group
We’ve made a number of significant changes to the Dremio query planner since last year. If you are on the newest version of Dremio OSS or Community and you are still not seeing your reflections chosen for a particular query, please provide the profile here and we can take a look.
And just for me to understand, is the plan different for creating a reflection on a VDS from the plan for running the query in the underlying VDS? I ask because I have seen the query gets accelerated but the reflection doesn’t when there is either a join or there is a group
If you build a reflection on you.vds, reflection build job would get accelerated if there are applicable reflections. Can you describe the work flow you are trying to create and include query profiles?
I have been trying to write the queries in all sorts of manner to get it to pick up the reflection
I am used to periscopedata’s cache. For large queries, I create a cache and run queries on that. It fails on dremio because of what @kelly described as Reflections are associated with the query tree that defines the VDS rather than the VDS itself.
824b8acb-c367-4e07-a413-928dec761d5c.zip (79,0 KB)
We are experimenting same issues on Dremio 3.3. We have a VDS, and created an Aggregation Reflection with some of the fields of that VDS, plus one measure. A simple query like “SELECT sum(pax) FROM VDS WHERE custom_field=‘XX’” is not using the Aggregation Reflection created for this case. Or are we wrong, and the Aggregation Reflection is only working on fields that do exist in the Anchor dataset?
Attached profile for that query
A bit more on this, after 1.5 days building the aggregation reflection, the “Reflection” tab in the VDS has a red icon saying “Reflection cannot accelerate. Multiple attempts to build reflection failed, will not reattempt”. There is no more information, how could we solve this? I’ve attached the profile