Past few days i am observing dremio gui not responding for few minutes and after that it auto connect and start working …i am attacing some screen-shots log which take during this behavious
after i restarted dremio master above issue solved but always restart container is not best practice as i have used dremio for Production server and i want it to be 0 DownTime
Kindly send us the server.log and server.gc* files from the Dremio log folder and specify time (including time zone) that Dremio UI was slow
i dont see those folder in container
server.log and server.gc* i can do one thing get container log in file using kubectl log -f dremio-master-0 > dremio-master-0.log
Did not realize it is K8’s. Now those logs would be gone as you restarted. When this problem happens next time, first do “kubectl log -f dremio-master-0 > dremio-master-0.log” and move the log to your local laptop or something and then restart Dremio. Send us the saved logfile
i have collected logs and get this Error
org.apache.commons.dbcp2.datasources.PooledConnectionAndInfo@7cf19e37
java.lang.IllegalStateException: Object not currently part of this pool
dremio-master-0.zip (2.1 MB)
ERROR c.d.services.fabric.FabricClient - RpcException: This daemon doesn’t support coordination operations.
com.dremio.common.exceptions.UserException: RpcException: This daemon doesn’t support coordination operations.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:776) ~[dremio-common-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.rpc.RpcBus$ResponseSenderImpl.sendFailure(RpcBus.java:287) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.rpc.RpcBus$ResponseSenderImpl.access$700(RpcBus.java:220) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:467) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:96) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:328) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.common.SerializedExecutor.execute(SerializedExecutor.java:129) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:362) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at com.dremio.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:333) [dremio-services-base-rpc-4.0.2-201910020123580864-a98a0b9.jar:4.0.2-201910020123580864-a98a0b9]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) [netty-handler-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:315) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:429) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1421) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) [netty-common-4.1.38.Final.jar:4.1.38.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.38.Final.jar:4.1.38.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]
That is expected, when the executors are up and the coordinator is not full up. When coordinator restarts, and some of the executors had queries running in them or are just up, there is a window of time where the coordinator can receive messages and it’s not fully up yet so it rejects them thinking it’s not a coordinator.
Thanks
@balaji.ramaswamy