Exception in RPC communication after cluster migration

h.hansmeier · September 16, 2022, 8:31am

After we migrated our dremio 19.5 instance to a new aks cluster, we receive many of this (and other) errors in the logs:

2022-09-16 07:09:36,226 [FABRIC-8] INFO c.d.s.fabric.EnterpriseFabricServer - [FABRIC]: Channel closed /10.18.108.53:45678 ↔ /10.18.100.247:47514 (fabric server)
2022-09-16 07:09:36,226 [FABRIC-8] INFO com.dremio.exec.rpc.MessageDecoder - Channel is closed, discarding remaining 168 byte(s) in buffer.
2022-09-16 07:10:06,227 [FABRIC-9] ERROR c.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.18.108.53:45678 ↔ /10.18.100.247:53776 (fabric server). Closing connection.
io.netty.handler.codec.CorruptedFrameException: Expected to read a tag of 10 but actually received a value of 69. Happened after reading 0 message.
at com.dremio.exec.rpc.MessageDecoder.checkTag(MessageDecoder.java:223)
at com.dremio.exec.rpc.MessageDecoder.decodeMessage(MessageDecoder.java:143)
at com.dremio.exec.rpc.MessageDecoder.decode(MessageDecoder.java:69)
at com.dremio.services.fabric.FabricProtobufLengthDecoder.decode(FabricProtobufLengthDecoder.java:37)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:750)

What can we do?

balaji.ramaswamy · September 30, 2022, 9:58am

@h.hansmeier Check GC logs on 10.18.100.247 and see of there is a Full GC pause, it should also print the largest heap objects

Topic		Replies	Views
ERROR c.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication	17	4790	August 2, 2022
Dremio docker container not working	1	422	October 23, 2023
ChannelClosedException: [FABRIC]: Channel closed null <--> null (fabric client) on large query	5	195	March 18, 2025
Connection errors, queries fail intermittently	1	1392	April 14, 2021
Jdbc connection user rpc.ChannelClosedException	3	3058	December 1, 2020

Exception in RPC communication after cluster migration

Related topics