Bug in the process of refreshing physical dataset : dependent reflections refreshed too soon?

I have a Physical dataset composed of parquet files with an aggregation reflection.
When I add daily files into this dataset, I trigger the refresh using the Catalog API (/api/v3/catalog/{id}/refresh")

What happens in terms of Jobs :

  1. First there is a REFRESH REFLECTION for the aggregation reflection.
  2. Then, in parallel, 1 job for LOAD MATERIALIZATION METADATA for the reflection, and a refresh of the dependent reflections.

When I look at the overview of the refresh jobs of the dependent reflections, the Age indicated in the Accelerated by section , it mentions an age of 22h:51m:00s, so it does not use the last materialization of the reflection, but the previous one.

Shouldn’t the refresh of the dependent reflections happen after the “LOAD MATERIALIZATION” job, rather than starting at the same time ?

I’m using Community version 2.1.6.

It seems like you are refreshing the dataset metadata via API, why not refresh the reflection via API right afterwards via https://docs.dremio.com/rest-api/reflections/endpoints.html ?

The endpoint /api/v3/catalog/{id}/refresh is exactly the one that is called by Dremio UI when actioning the button “Refresh Now”, in the “Reflection Refresh” screen, so I used the same.

But which endpoint would you suggest me to use ? I found no “refresh” endpoint in the reflections API.

Are the dependent reflections on virtual datasets that derive from the physical and do those virtual datasets derive from other physical datasets as well? That should help us reproduce what you are describing.

sorry I did not get the first part of the question,
but yes the virtual datasets of the dependant reflections derive from other physical datasets, being themselves accelerated.

it seems worse in 3.0.1

When I click on Refresh now, both the physical dataset reflections and virtual dataset reflections are updating at the same time, while I would expect physical dataset reflections to be updated first, then virtual dataset reflections.

Hey @dfleckinger, that’s strange.

A few questions:

  • What’s the definition of the VDS? Specifically I’m looking to see if there are joins.
  • Can you share a quick description/diagram of your PDS/VDS chains?
  • Can you double check that your reflection(s) on the PDS covers the VDS reflections. Are you using raw reflection(s) on the PDS? Ideally a past and a recent query profile, where we can look at before/after would be helpful.