Reflections refresh failures

Hi, we are using version 20.1.0-202202061055110045-36733c65 running on Kubernetes and observe the following behavior:
sooner or later, usually after several days of uptime, all reflections start to fail.

Our data is in iceberg tables or elastic, some reflections are on elastic, some on iceberg, some on joins between them - it doesn’t matter.

All failures look similar:

Line 150844: 2023-01-02 13:54:46,337 [dremio-general-100] INFO  c.d.s.reflection.ReflectionManager - reflection 93d480ed-7433-4bc4-a1c4-465f3ca3203b (Raw Reflection) is due for refresh
	Line 150847: org.projectnessie.error.NessieContentsNotFoundException: Could not find contents for key 'dremio.reflections./opt/dremio/dist/accelerator/93d480ed-7433-4bc4-a1c4-465f3ca3203b/f8f6e105-bac5-4576-be46-f8f9e65c4853_0' in reference 'main'.
	Line 163121: 2023-01-02 14:54:47,914 [dremio-general-109] INFO  c.d.s.reflection.ReflectionManager - reflection 93d480ed-7433-4bc4-a1c4-465f3ca3203b (Raw Reflection) is due for refresh
	Line 163122: 2023-01-02 14:54:47,916 [dremio-general-109] WARN  c.d.s.reflection.ReflectionManager - failed to refresh reflection 93d480ed-7433-4bc4-a1c4-465f3ca3203b (Raw Reflection)
	Line 167638: 2023-01-02 15:54:47,999 [dremio-general-112] INFO  c.d.s.reflection.ReflectionManager - reflection 93d480ed-7433-4bc4-a1c4-465f3ca3203b (Raw Reflection) is due for refresh
	Line 167639: 2023-01-02 15:54:48,000 [dremio-general-112] WARN  c.d.s.reflection.ReflectionManager - failed to refresh reflection 93d480ed-7433-4bc4-a1c4-465f3ca3203b (Raw Reflection)

lines with “failed to refresh” are followed by

io.grpc.StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at com.dremio.service.nessieapi.TreeApiGrpc$TreeApiBlockingStub.getReferenceByName(TreeApiGrpc.java:354)
	at com.dremio.exec.store.iceberg.nessie.IcebergNessieTableOperations.getBranchRef(IcebergNessieTableOperations.java:101)
	at com.dremio.exec.store.iceberg.nessie.IcebergNessieTableOperations.doRefresh(IcebergNessieTableOperations.java:69)

At this moment, if we manually disable/enable reflection, it gets created and works till first refresh - then the cadence repeats itself.

Any ideas what can be done except “upgrade to the latest and greatest” version ?

10x.

Hi @Alexarl

NessieContentsNotFoundException means that Nessie could not find the object for the given key

If the metadata and parquet files for the Iceberg table located under:

still exist, then I would recommend upgrade to a more stable version of Nessie starting with Dremio 21.4+. Otherwise, if the files are there, then it’s not a Nessie catalog issue and somehow those files are being physically removed.

Hi,
Just to clarify - when I disable and re-enable reflection for the same view, this path (bold part) /opt/dremio/dist/accelerator/93d480ed-7433-4bc4-a1c4-465f3ca3203b/f8f6e105-bac5-4576-be46-f8f9e65c4853_0 stays the same (i.e. it is defined by the view, and not by the reflection instance) ?

Or I have to recheck it after I see the error, but before I reenable the reflection ?

Here’s an explanation of the path components:

  • /opt/dremio/dist/accelerator/ - “paths.accelerators” as configured in dremio.conf
  • 93d480ed-7433-4bc4-a1c4-465f3ca3203b - reflection Id
  • f8f6e105-bac5-4576-be46-f8f9e65c4853 - initial refresh’s materialization id. If reflection is incrementally refreshed, same ID will be re-used.
  • _0 - Job attempt number

When you disable and re-enable a reflection, the reflection id will continue to stay the same but a new materialization will be generated.

You need to check the physical folder after the error and before re-enabling the reflection.