ERROR c.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication

Does anyone know what this means?

2021-05-27 16:29:36,308 [UserServer-1] INFO c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /10.244.44.95:31010 <–> /10.244.2.0:19041 (user client)
2021-05-27 16:29:36,308 [UserServer-1] WARN c.d.e.s.o.SessionOptionManagerFactoryImpl - Could not find session with sessionId 9550b01e-a28a-463b-ac86-828a8487016b.
2021-05-27 16:29:36,308 [UserServer-1] INFO com.dremio.ConnectionLog - [9550b01e-a28a-463b-ac86-828a8487016b] Connection Closed
2021-05-27 16:29:36,308 [UserServer-1] INFO c.d.s.r.u.UserProtobufLengthDecoder - Channel is closed, discarding remaining 709 byte(s) in buffer.
2021-05-27 16:29:36,309 [UserServer-1] ERROR c.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.244.44.95:31010 <–> /10.244.2.0:7449 (user client). Closing connection.
io.netty.handler.codec.CorruptedFrameException: Expected to read a tag of 10 but actually received a value of 3. Happened after reading 0 message.
at com.dremio.exec.rpc.MessageDecoder.checkTag(MessageDecoder.java:212)
at com.dremio.exec.rpc.MessageDecoder.decodeMessage(MessageDecoder.java:132)
at com.dremio.exec.rpc.MessageDecoder.decode(MessageDecoder.java:105)
at com.dremio.sabot.rpc.user.UserProtobufLengthDecoder.decode(UserProtobufLengthDecoder.java:35)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
2021-05-27 16:29:36,309 [UserServer-1] INFO c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /10.244.44.95:31010 <–> /10.244.2.0:7449 (user client)
2021-05-27 16:29:36,314 [UserServer-1] WARN c.d.e.s.o.SessionOptionManagerFactoryImpl - Could not find session with sessionId 3a80d037-99d8-4993-a206-29df5b0f6f2b.
2021-05-27 16:29:36,314 [UserServer-1] INFO com.dremio.ConnectionLog - [3a80d037-99d8-4993-a206-29df5b0f6f2b] Connection Closed

@Shirisha This means that client like Tableau Desktop (10.244.2.0) was connected to either a Dremio coordinator or executor (10.244.44.95). For some reason the client dropped the connection, send us logs from 10.244.44.95 and we can check what happened. If possible send us logs from 10.244.2.0, both logs when the problem happened “2021-05-27 16:29”

Hello I got same issue, Can I have any help here?
I installed in docker( latest windows version)
and tried dremio(latest version && 15.7.0)
Any time when I clicked “Open In Browser” I got this issue.

Logs In Docker.zip (8.7 KB)

@Hao It seems like the 172.17.0.1 is closing the connection, is this your laptop’s IP address?

2021-07-22 01:12:23,797 [UserServer-1] INFO c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /172.17.0.3:31010 <--> /172.17.0.1:48246 (user client)

Hi, I have the same issue with dremio 19.1 on EKS deployed by helm chart_v2.
On UI i see a lot of exceptions like

Could not connect to the Dremio server.

Also every time when I try to connect to big table with a lot of files. I got error and server log show:

 An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: org.eclipse.jetty.io.EofException

May it be somehow related to ALB controller configuration?
Maybe I need to add session affinity?
I use absolutely default config, except reduced CPU to 7.

@GrigorievNick What are your heap and direct memory settings on your coordinator? Do you see a Full GC pause when this issue happens?

@balaji.ramaswamy I have 64Gb per node, 48 I think is a heap(chart compute it).
No, I don’t see top big Stop the world pause or too frequent GC call. And my current installation is used only by me, and there are only 3-5 tables for now.

@GrigorievNick If you are able to upload logs, I can take a look and see what happened

@balaji.ramaswamy yes, please
deremio-coordinator-log.zip (600.8 КБ)

@GrigorievNick Dremio started at starting dremio 2021-12-11T12:31:10.635+0000: [GC pause (G1 Evacuation Pause) (young), 0.0059140 secs] and that is the first line in the log, any chance you have the previous log?

@balaji.ramaswamy I am sorry I don’t understand, your question.

starting dremio

is the first log which Dremio Coordinator writes after JVM start.
At least during deployment to EKS with default log configuration from dremio chart_v2.
And it looks reasonable, later in logs appeared classpath and configurations.

2021-12-11 12:31:11,597 [main] INFO  com.dremio.common.config.SabotConfig - Configuration and plugin file(s) identified in 169ms.
Base Configuration:
	- jar:file:/opt/dremio/jars/dremio-common-19.1.0-202111160130570172-0ee00450.jar!/sabot-default.conf

Intermediate Configuration and Plugin files, in order of precedence:
	- jar:file:/opt/dremio/jars/dremio-adls-plugin-19.1.0-202111160130570172-0ee00450.jar!/sabot-module.conf
	- jar:file:/opt/dremio/jars/
....

@GrigorievNick You are right, Dremio has started on 9047, if UI is not opening up then is there a load balancer or proxy?

2021-12-11 12:31:28,660 [main] INFO com.dremio.dac.server.DremioServer - Started on http://localhost:9047

Yes @balaji.ramaswamy, by default in EKS dremio use Service with LoadBalancer type. Which actually means it uses NLB.

ср, 22 дек. 2021 г., 23:59 Balaji Ramswamy via Dremio <dremio@discoursemail.com>:

@GrigorievNick Can we make sure, there are no issues there? Can you try JDBC/ODBC queries?

Hi @balaji.ramaswamy Sorry for the delay in response.
The problem was in l4 ELB load Balancer.
With ALB all good. I believe it’s somehow related to default socket timeout and no session affinity.