I did a fresh install on my Mac of dremio Community Edition, dremio-community-4.0.1-201909191652190301-211720e
I created some sources (mysql, excel)
I can see the schemas and tables but every query fails after running for a long time ~1 min with this error : Error setting up remote fragment execution
Any idea of the issue?
I wasn’t able to run a single query so far. I have downloaded the query profile however who do I send it to as I am currently using the community edition?
It looks like there was a problem with node “au10739” which is acting as both coordinator and executor. Kindly check the Dremio logs on that server and see if you see any error messages. Also look at /var/log/messages or dmesg for any errors
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) ~[curator-client-2.12.0.jar:na]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) ~[curator-client-2.12.0.jar:na]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117) ~[curator-client-2.12.0.jar:na]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) ~[curator-framework-2.12.0.jar:na]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) ~[curator-framework-2.12.0.jar:na]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) ~[curator-framework-2.12.0.jar:na]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) ~[curator-framework-2.12.0.jar:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:835) ~[na:na]
2019-09-25 20:15:16,406 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
2019-09-25 20:15:16,432 [Curator-ServiceCache-0] INFO c.d.s.c.TaskLeaderStatusListener - New Leader node for task MASTER au10739:45678 registered itself
Looks like you lost connection to zookeeper, does this happen every time? What is the total RAM on your MAC?
at java.base/java.lang.Thread.run(Thread.java:835) ~[na:na]
2019-09-25 20:15:16,406 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
2019-09-25 20:15:16,432 [Curator-ServiceCache-0] INFO c.d.s.c.TaskLeaderStatusListener - New Leader node for task MASTER au10739:45678 registered itself
8 gb and I haven’t had a successful query via Dremio before. The mysql db I am connected to via Dremio runs under docker container. I can connect to the db directly via mysql workbench and query without any problem
I’m seeing the same problems with our Linux Tar install after we upgraded to 4.0.0. We didn’t change anything in the configuration file. Are there new defaults or settings in 4.0.0. that the upgrade process may have missed?
Eventually the Master Coordinator loses touch with ZK and the Dremio server crashes.
server.log:2019-09-25 11:47:41,571 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
server.log:2019-09-25 11:47:41,581 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
server.log:2019-09-25 11:49:39,098 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
server.log: at com.dremio.service.coordinator.zk.ZKClusterClient$1$1.call(ZKClusterClient.java:233) [dremio-services-coordinator-4.0.1-201909191652190301-211720e.jar:4.0.1-201909191652190301-211720e]
server.log: at com.dremio.service.coordinator.zk.ZKClusterClient$1$1.call(ZKClusterClient.java:217) [dremio-services-coordinator-4.0.1-201909191652190301-211720e.jar:4.0.1-201909191652190301-211720e]
server.log:2019-09-25 11:50:22,846 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
server.log:2019-09-25 11:50:22,861 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
server.log:2019-09-25 11:51:09,625 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
I see your coordinator is constantly losing ZK connectivity
> 2020-02-24 02:02:00,838 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
> 2020-02-24 02:02:02,726 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
> 2020-02-24 02:02:02,748 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
This can be either
#1 Your Zoo keeper is unstable - check zk logs #2 You cordinator is constantly doing garbage collection check server.gc logs on your coordinator log folder for the same time
I get the same error message but I can not find the root of the problem. Attached my server.log file and my dremio-env file. Glad for anyone to help me out. dremio-env.zip (628,6 KB)
@vincent_mayer I see the same pattern where your Zoo keeper is constantly going into a SUSPENDED state. Can you check if your GC logs show Full GC pause? You can upload your GC log to https://gceasy.io/ and it will report
Thanks for your help. I have analyzed the server.gc file. Is this the right one?
The website you mentioned has identified memory problems. But I am not able to retrieve any possible solutions from this report. Any ideas? dremio-env.zip (652,6 KB)