I have a custom plugin. If the query time is much longer (around 5 mins), it will report the error. Seems ZK cannot get the heartbeat. Do you have any idea?
SYSTEM ERROR: ForemanException: One or more nodes lost connectivity during query. Identified nodes were [xxx:-1].
(com.dremio.exec.work.foreman.ForemanException) One or more nodes lost connectivity during query. Identified nodes were [xxx:-1].
com.dremio.exec.work.foreman.QueryManager$3.nodesUnregistered():660
com.dremio.exec.work.foreman.AttemptManager.nodesUnregistered():187
com.dremio.exec.work.protector.Foreman.nodesUnregistered():154
com.dremio.exec.work.protector.ForemenWorkManager$NodeStatusListener.nodesUnregistered():294
com.dremio.service.coordinator.AbstractServiceSet.nodesUnregistered():39
com.dremio.service.coordinator.zk.ZKServiceSet.updateEndpoints():127
com.dremio.service.coordinator.zk.ZKServiceSet.access$000():39
com.dremio.service.coordinator.zk.ZKServiceSet$EndpointListener.cacheChanged():53
org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():177
org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():173
org.apache.curator.framework.listen.ListenerContainer$1.run():93
org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():297
org.apache.curator.framework.listen.ListenerContainer.forEach():85
org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():171
org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():522
org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():516
org.apache.curator.framework.listen.ListenerContainer$1.run():93
org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():297
org.apache.curator.framework.listen.ListenerContainer.forEach():85
org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():514
org.apache.curator.framework.recipes.cache.EventOperation.invoke():35
org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run():773
java.util.concurrent.Executors$RunnableAdapter.call():511
java.util.concurrent.FutureTask.run():266
java.util.concurrent.Executors$RunnableAdapter.call():511
java.util.concurrent.FutureTask.run():266
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
More logs:
2019-10-30 06:52:12,014 [Curator-ServiceCache-0] WARN c.d.e.w.protector.ForemenWorkManager - Foreman 2246cd46-49c6-320b-f6bd-26f6e2ecbb00 failed to handle unregistered nodes [address: "xxx.dev.net"
user_port: -1
fabric_port: 46011
roles {
sql_query: false
java_executor: true
master: false
}
start_time: 1572417900659
provision_id: "container_e86_1572151378458_0126_01_000002"
max_direct_memory: 60129542144
available_cores: 8
]
java.lang.RuntimeException: Exceptions caught during event processing
at com.dremio.common.EventProcessor.processEvents(EventProcessor.java:116) ~[dremio-common-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.common.EventProcessor.sendEvent(EventProcessor.java:63) ~[dremio-common-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$StateSwitch.addEvent(AttemptManager.java:714) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.addToEventQueue(AttemptManager.java:724) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$CompletionListenerImpl.succeeded(AttemptManager.java:169) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:561) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.QueryManager.access$900(QueryManager.java:81) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:511) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.QueryManager$NodeTracker.nodeDead(QueryManager.java:529) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.QueryManager$3.nodesUnregistered(QueryManager.java:666) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.nodesUnregistered(AttemptManager.java:187) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.protector.Foreman.nodesUnregistered(Foreman.java:154) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.protector.ForemenWorkManager$NodeStatusListener.nodesUnregistered(ForemenWorkManager.java:294) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.AbstractServiceSet.nodesUnregistered(AbstractServiceSet.java:39) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.zk.ZKServiceSet.updateEndpoints(ZKServiceSet.java:127) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.zk.ZKServiceSet.access$000(ZKServiceSet.java:39) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.zk.ZKServiceSet$EndpointListener.cacheChanged(ZKServiceSet.java:53) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply(ServiceCacheImpl.java:177) [curator-x-discovery-2.12.0.jar:na]
at org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply(ServiceCacheImpl.java:173) [curator-x-discovery-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.12.0.jar:na]
at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [curator-client-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.12.0.jar:na]
at org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent(ServiceCacheImpl.java:171) [curator-x-discovery-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.12.0.jar:na]
at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [curator-client-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) [curator-recipes-2.12.0.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]
Caused by: java.lang.IllegalStateException: null
at com.google.common.base.Preconditions.checkState(Preconditions.java:429) ~[guava-20.0.jar:na]
at com.dremio.exec.work.foreman.AttemptManager$AttemptResult.close(AttemptManager.java:513) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.moveToState(AttemptManager.java:693) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.access$1900(AttemptManager.java:86) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$StateSwitch.processEvent(AttemptManager.java:719) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$StateSwitch.processEvent(AttemptManager.java:711) ~[dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.common.EventProcessor.processEvents(EventProcessor.java:105) ~[dremio-common-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
... 37 common frames omitted
2019-10-30 06:52:12,015 [Curator-ServiceCache-0] WARN c.d.exec.work.foreman.QueryManager - Nodes [xxx.dev.net:-1] no longer registered in cluster. Canceling query 2246ce7d-34d1-74bf-4644-37a6bd7a4d00
2019-10-30 06:52:12,017 [Curator-ServiceCache-0] ERROR c.d.exec.work.foreman.AttemptManager - ForemanException: One or more nodes lost connectivity during query. Identified nodes were [xxx.dev.net:-1].
com.dremio.common.exceptions.UserException: ForemanException: One or more nodes lost connectivity during query. Identified nodes were [xxx.dev.net:-1].
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:773) ~[dremio-common-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$AttemptResult.close(AttemptManager.java:540) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.moveToState(AttemptManager.java:669) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.access$1900(AttemptManager.java:86) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$StateSwitch.processEvent(AttemptManager.java:719) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$StateSwitch.processEvent(AttemptManager.java:711) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.common.EventProcessor.processEvents(EventProcessor.java:105) [dremio-common-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.common.EventProcessor.sendEvent(EventProcessor.java:63) [dremio-common-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$StateSwitch.addEvent(AttemptManager.java:714) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.addToEventQueue(AttemptManager.java:724) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager$CompletionListenerImpl.failed(AttemptManager.java:174) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.QueryManager$3.nodesUnregistered(QueryManager.java:659) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.foreman.AttemptManager.nodesUnregistered(AttemptManager.java:187) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.protector.Foreman.nodesUnregistered(Foreman.java:154) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.exec.work.protector.ForemenWorkManager$NodeStatusListener.nodesUnregistered(ForemenWorkManager.java:294) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.AbstractServiceSet.nodesUnregistered(AbstractServiceSet.java:39) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.zk.ZKServiceSet.updateEndpoints(ZKServiceSet.java:127) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.zk.ZKServiceSet.access$000(ZKServiceSet.java:39) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.service.coordinator.zk.ZKServiceSet$EndpointListener.cacheChanged(ZKServiceSet.java:53) [dremio-services-coordinator-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply(ServiceCacheImpl.java:177) [curator-x-discovery-2.12.0.jar:na]
at org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply(ServiceCacheImpl.java:173) [curator-x-discovery-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.12.0.jar:na]
at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [curator-client-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.12.0.jar:na]
at org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent(ServiceCacheImpl.java:171) [curator-x-discovery-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.12.0.jar:na]
at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [curator-client-2.12.0.jar:na]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.12.0.jar:na]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) [curator-recipes-2.12.0.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]
Caused by: com.dremio.exec.work.foreman.ForemanException: One or more nodes lost connectivity during query. Identified nodes were [xxx.dev.net:-1].
at com.dremio.exec.work.foreman.QueryManager$3.nodesUnregistered(QueryManager.java:660) [dremio-sabot-kernel-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
... 28 common frames omitted