Reflections stop working

Today when I signed in all the reflections on my cluster suddenly had a 0B in the advanced settings and an “…” but the history showed that the last activity of a reflection being succesfully built. Also the reflections did not get used in any queries. The only thing I can think to do is rebuild them, but its really bad that sometimes they seem to become unlinked somehow from the queries?

What is causing this?


@kprifogle

Can you filter failed jobs and look at the profile - error

If you did not recreate them and just refreshed them then the old reflection id would still exists and you can search for that reflection id in the jobs page and look at them

If you dropped and recreated the reflection then you have to filter failed jobs and search for “REFRESH REFLECTION” as the id would have changed

The id for the refleciton hasn’t changed, I looked at the profile and there was no error. As far as I can tell the refleciton was last successfully built with no errors however I see the 0B and the reflections are not used. It just happened again, second time today. This makes reflections practically unusable.

So I notice if I go into the syslogs the reflection is trying to be refreshed, so its behaving as though its wanting to be refreshed according to a schedule. Then I went into the metadata and checked and its refresh policy was still set to 9 hours and the neverRefresh and neverExpire booleans were not in the source metadata.

This makes me think that the issue is a UI bug. I will try manually updating those boolean fields via the rest API and see if that makes a difference. Have you fixed any bugs with refresh settings in the UI since Dremio 3.2?

I manually updated the reflection using a put request to the rest api and those boolean attributes still will not appear. Is there a bug with updates to those boolean never fields in the rest api for dremio 3.2?

I’m going to just set refresh time and refresh expiration to 99 million weeks to get around the issue. I think something is buggy in those boolean fields.

Turns out this didn’t solve the problem. I just signed in and the reflections had become disassociated again. @balaji.ramaswamy do you have any idea whats happening from looking at the above queries?

@kprifogle,

reflection_bad: Is a query that is getting accelerated. Is the wrong reflection getting selected?
reflection_bad2: Is a refresh reflection on “Version-1.transactions.Raw.transactions_union_delta”?
reflection_bad3: Is a refresh on “Version-1.transactions.Raw.transactions_bridge”?
delete_bad: is a drop reflection

I also see you are on a very old version 3.2. Both refreshes have a non-zero count.

We have some important bug fixes/enhancements from 3.2 to 4.0. I would recommend first upgrading. Rebuild the reflections and then lets draw a baseline and solve issues arising after that

reflection_bad: This is a query that is not selecting because reflection_bad2 is going bad. reflection_bad2 is the create reflection job and nothing seems to be wrong with it. Same with 3. delete_bad is dremio deleting the reflection after whatever is happening happens to it to disassociate it from the table.

What do you mean by “both refreshes have a non-zero count”, they show as having 0B in the reflection page.

Is upgrading the only option?

@balaji.ramaswamy I found that one of the sources that was being joined into the reflection that was expiring incorrectly (or whatever you want to call what it was doing with the 0B and ellipses) had bad reflection refresh settings (3 hours expire and 9 hours refresh). This happened due to another issue I just posted about. Even though it didn’t itself have any reflections I think it was causing things to break possibly because the other reflections may have picked up those settings? I think this may actually be the bug. (fingers crossed)
[/quote]

This appears to have been the issue. The source with bad reflection refresh settings was causing all its dependent downstream reflections to “break” with the ellipses and 0B. I no longer have this issue.

Nope, I was wrong. The issue returned. I think the issue is actually related to recycling the master node.