Executors are crashing due to ProtocolNotRegisteredException

Currently our dremio-executor-pods are frequently crashing and restarting due to this execption:

2023-01-18 13:04:30,481 [FABRIC-rpc-event-queue] ERROR c.d.s.fabric.EnterpriseFabricServer - ProtocolNotRegisteredException: Protocol not registered with Fabric
com.dremio.common.exceptions.UserException: ProtocolNotRegisteredException: Protocol not registered with Fabric
    at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:885)
    at com.dremio.exec.rpc.RpcBus$ResponseSenderImpl.sendFailure(RpcBus.java:297)
    at com.dremio.exec.rpc.RpcBus$ResponseSenderImpl.access$700(RpcBus.java:230)
    at com.dremio.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:477)
    at com.dremio.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:96)
    at com.dremio.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:338)
    at com.dremio.common.SerializedExecutor.execute(SerializedExecutor.java:129)
    at com.dremio.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:372)
    at com.dremio.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:343)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:308)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:422)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:750)
Caused by: com.dremio.exec.rpc.UserRpcException: com.dremio.exec.rpc.RpcException: Protocol not registered with Fabric
    at com.dremio.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:476)
    ... 32 common frames omitted
Caused by: com.dremio.exec.rpc.RpcException: Protocol not registered with Fabric
    at com.dremio.services.fabric.FabricMessageHandler.handle(FabricMessageHandler.java:61)
    at com.dremio.services.fabric.FabricServer.handle(FabricServer.java:85)
    at com.dremio.services.fabric.FabricServer.handle(FabricServer.java:40)
    at com.dremio.exec.rpc.RpcBus.handle(RpcBus.java:91)
    at com.dremio.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:472)
    ... 32 common frames omitted
Caused by: com.dremio.services.fabric.ProtocolNotRegisteredException: Protocol not registered with Fabric
    at com.dremio.services.fabric.FabricMessageHandler.handle(FabricMessageHandler.java:59)
    ... 36 common frames omitted

What could be the reason for that?

Another error we receive is that - maybe the main-reason?

2023-01-18 13:04:30,504 [<my-mongodb-instance>] WARN  c.d.p.mongo.metadata.MongoVersion - Could not get the mongo compatibility version from the server. Defaulting to detected version: 4.4
com.mongodb.MongoCommandException: Command failed with error 13 (Unauthorized): 'not authorized on admin to execute command { getParameter: true, featureCompatibilityVersion: true, $db: "admin", $clusterTime: { clusterTime: Timestamp(1674047061, 1), signature: { hash: BinData(0, 008590A4FADC39BEC714555FC4940D700A44D274), keyId: 7132011658519511202 } }, lsid: { id: UUID("b7889134-bd8c-4a0c-8fdc-2802579f5362") }, $readPreference: { mode: "primaryPreferred" } }' on server <my-mongodb-instance>.mongodb.net:27017. The full response is {"operationTime": {"$timestamp": {"t": 1674047061, "i": 1}}, "ok": 0.0, "errmsg": "not authorized on admin to execute command { getParameter: true, featureCompatibilityVersion: true, $db: \"admin\", $clusterTime: { clusterTime: Timestamp(1674047061, 1), signature: { hash: BinData(0, 008590A4FADC39BEC714555FC4940D700A44D274), keyId: 7132011658519511202 } }, lsid: { id: UUID(\"b7889134-bd8c-4a0c-8fdc-2802579f5362\") }, $readPreference: { mode: \"primaryPreferred\" } }", "code": 13, "codeName": "Unauthorized", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1674047061, "i": 1}}, "signature": {"hash": {"$binary": "AIWQpPrcOb7HFFVfxJQNcApE0nQ=", "$type": "00"}, "keyId": {"$numberLong": "7132011658519511202"}}}}
	at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:175)
	at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:303)
	at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:259)
	at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99)
	at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:450)
	at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:72)
	at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:226)
	at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:269)
	at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:131)
	at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:123)
	at com.mongodb.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:343)
	at com.mongodb.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:334)
	at com.mongodb.operation.CommandOperationHelper.executeCommandWithConnection(CommandOperationHelper.java:220)
	at com.mongodb.operation.CommandOperationHelper$5.call(CommandOperationHelper.java:206)
	at com.mongodb.operation.OperationHelper.withReadConnectionSource(OperationHelper.java:463)
	at com.mongodb.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:203)
	at com.mongodb.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:198)
	at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:59)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:196)
	at com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:194)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:163)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:158)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:148)
	at com.dremio.plugins.mongo.metadata.MongoVersion.getVersionForConnection(MongoVersion.java:93)
	at com.dremio.plugins.mongo.metadata.MongoTopology.getMinVersion(MongoTopology.java:219)
	at com.dremio.plugins.mongo.metadata.MongoTopology.getClusterTopology(MongoTopology.java:167)
	at com.dremio.plugins.mongo.connection.MongoConnectionManager.getTopology(MongoConnectionManager.java:179)
	at com.dremio.plugins.mongo.connection.MongoConnectionManager.validateConnection(MongoConnectionManager.java:338)
	at com.dremio.plugins.mongo.connection.MongoConnectionManager.connect(MongoConnectionManager.java:150)
	at com.dremio.plugins.mongo.MongoStoragePlugin.start(MongoStoragePlugin.java:184)
	at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$newStartSupplier$1(ManagedStoragePlugin.java:546)
	at com.dremio.exec.catalog.ManagedStoragePlugin.lambda$nameSupplier$3(ManagedStoragePlugin.java:614)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

We have added this mongodb-instance as an external datasource. Recently we updated the mongodb-server-version. Maybe this is the problem?

@h.hansmeier The Fabric port seems to be having an issue, this could be a symptom too so the root cause could be else where, the second error you had on MongoDB, was a command failure and should not cause a crash

Would you be able to send the full server.log at the time this happened? Like for the entire day, would like to see events before. Can you send us the below info too?

  • Total pod memory
  • output of ps -ef | grep dremio from an executor
  • GC log along with the server.log (the day crash happens)