PDS Refresh Dependent Reflections does not order the refresh, does extra work


Turns out when Reflection B finished, it went back and refreshed reflection C again… so Dremio did extra work but at least my data is consistent.

When I refresh dependent reflections of a PDS Dremio chooses an order to refresh them (which is very helpful) but seems to have gotten it wrong in this case:

Reflection A refreshes, no other reflections queued (good!).
Reflection B and C (which were both built using reflection A) are then queued.
Reflection C was chosen to refresh first, but uses reflection B to accelerate. (oh no…)
Reflection B is then refreshed…

Now when my queries are accelerated by reflection B they return different data to when queries are accelerated by reflection C.

The PDS refresh was done using api/catalog/refresh.

Any advice on how to proceed with this?


I jumped the gun a bit as I was quite concerned, but it turns out that when reflection B finished it then went back and queued reflection C again…

It did extra work, but at least my data is ok. Thanks!

Thanks for the update @Saltxwater

@Saltxwater Wondering how did you check the order of these reflections happening? Is there a mechanism in Dremio that allows us to see reflection begin, end etc. details?

Encountering some problem with reflection not picking up new data files in a folder, and getting to review these reflection details may prove to be helpful to troubleshoot it.

Hi @KrishnaPG ,
I saw the reflections building on the jobs detail page in the UI. By default you can only see “UI, External Tools” but if you click that dropdown you can also select “Accelerator” which will include all reflection building. This has start time and time spent in the queue.

Regarding your instance of reflections not picking up new data files, make sure you refresh the metadata on your PDS. This causes dremio to gather up data on the files in your directory. Without it new files will be missed.

You can do this with the sql:

Thank you @Saltxwater