Error setting up remote fragment execution

@balaji.ramaswamy

I did a fresh install on my Mac of dremio Community Edition, dremio-community-4.0.1-201909191652190301-211720e

I created some sources (mysql, excel)

I can see the schemas and tables but every query fails after running for a long time ~1 min with this error :
Error setting up remote fragment execution

Any idea of the issue?

I wasn’t able to run a single query so far. I have downloaded the query profile however who do I send it to as I am currently using the community edition?

Thanks

563f1d4c-45cb-4619-b0a2-10c09ddc7112.zip (6.8 KB)

Attached is the query profile

@mystic

It looks like there was a problem with node “au10739” which is acting as both coordinator and executor. Kindly check the Dremio logs on that server and see if you see any error messages. Also look at /var/log/messages or dmesg for any errors

Thanks
@balaji.ramaswamy

@balaji.ramaswamy Dremio is running locally on my mac laptop. It was installed using tar file available for download. au10739 is the name of my laptop

Check the Dremio log on your local install to see if you find any errors.

Thanks @balaji.ramaswamy . I checked the server.log file and it has following error

2019-09-25 20:15:16,125 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (19248)

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) ~[curator-client-2.12.0.jar:na]

at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) ~[curator-client-2.12.0.jar:na]

at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117) ~[curator-client-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) ~[curator-framework-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) ~[curator-framework-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) ~[curator-framework-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) ~[curator-framework-2.12.0.jar:na]

at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]

at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]

at java.base/java.lang.Thread.run(Thread.java:835) ~[na:na]
2019-09-25 20:15:16,406 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
2019-09-25 20:15:16,432 [Curator-ServiceCache-0] INFO c.d.s.c.TaskLeaderStatusListener - New Leader node for task MASTER au10739:45678 registered itself

Also, should I look for anything specific on dmesg?

Tried following and nothing came up

sudo dmesg | grep -i “dremio”

sudo dmesg | grep -i “errror”

Looks like you lost connection to zookeeper, does this happen every time? What is the total RAM on your MAC?

at java.base/java.lang.Thread.run(Thread.java:835) ~[na:na]
2019-09-25 20:15:16,406 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
2019-09-25 20:15:16,432 [Curator-ServiceCache-0] INFO c.d.s.c.TaskLeaderStatusListener - New Leader node for task MASTER au10739:45678 registered itself

8 gb and I haven’t had a successful query via Dremio before. The mysql db I am connected to via Dremio runs under docker container. I can connect to the db directly via mysql workbench and query without any problem

Send me your dremio-env file under the conf folder

I’m seeing the same problems with our Linux Tar install after we upgraded to 4.0.0. We didn’t change anything in the configuration file. Are there new defaults or settings in 4.0.0. that the upgrade process may have missed?

Eventually the Master Coordinator loses touch with ZK and the Dremio server crashes.

server.log:2019-09-25 11:47:41,571 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
server.log:2019-09-25 11:47:41,581 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
server.log:2019-09-25 11:49:39,098 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
server.log:     at com.dremio.service.coordinator.zk.ZKClusterClient$1$1.call(ZKClusterClient.java:233) [dremio-services-coordinator-4.0.1-201909191652190301-211720e.jar:4.0.1-201909191652190301-211720e]
server.log:     at com.dremio.service.coordinator.zk.ZKClusterClient$1$1.call(ZKClusterClient.java:217) [dremio-services-coordinator-4.0.1-201909191652190301-211720e.jar:4.0.1-201909191652190301-211720e]
server.log:2019-09-25 11:50:22,846 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
server.log:2019-09-25 11:50:22,861 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
server.log:2019-09-25 11:51:09,625 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED

@david.lee

Can you please send your server.log?

@balaji.ramaswamy - dremio-env file attached

dremio-env.zip (1.6 KB)

@balaji.ramaswamy - All good now. Seems like it was a Java version issue.

Worked with Java 8

I have the same issue. However, I’m using dremio/dremio-oss docker container (v.18.09.3). Here’s a copy of the server log:

dremio.log.zip (36.6 KB)

Thanks

@christopherrbyrd

I see your coordinator is constantly losing ZK connectivity

> 2020-02-24 02:02:00,838 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
> 2020-02-24 02:02:02,726 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
> 2020-02-24 02:02:02,748 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED 

This can be either

#1 Your Zoo keeper is unstable - check zk logs
#2 You cordinator is constantly doing garbage collection check server.gc logs on your coordinator log folder for the same time

Thanks
Bali

I get the same error message but I can not find the root of the problem. Attached my server.log file and my dremio-env file. Glad for anyone to help me out.
dremio-env.zip (628,6 KB)

@vincent_mayer I see the same pattern where your Zoo keeper is constantly going into a SUSPENDED state. Can you check if your GC logs show Full GC pause? You can upload your GC log to https://gceasy.io/ and it will report

Thanks for your help. I have analyzed the server.gc file. Is this the right one?

The website you mentioned has identified memory problems. But I am not able to retrieve any possible solutions from this report. Any ideas?
dremio-env.zip (652,6 KB)

@vincent_mayer I see your GC algorithm used is very old. Can you send output of below command? What version of Dremio is this?

ps -ef | grep dremio