Error setting up remote fragment execution

mystic · September 25, 2019, 11:51am

I did a fresh install on my Mac of dremio Community Edition, dremio-community-4.0.1-201909191652190301-211720e

I created some sources (mysql, excel)

I can see the schemas and tables but every query fails after running for a long time ~1 min with this error :
Error setting up remote fragment execution

Any idea of the issue?

I wasn’t able to run a single query so far. I have downloaded the query profile however who do I send it to as I am currently using the community edition?

Thanks

mystic · September 25, 2019, 11:53am

563f1d4c-45cb-4619-b0a2-10c09ddc7112.zip (6.8 KB)

Attached is the query profile

balaji.ramaswamy · September 25, 2019, 2:56pm

@mystic

It looks like there was a problem with node “au10739” which is acting as both coordinator and executor. Kindly check the Dremio logs on that server and see if you see any error messages. Also look at /var/log/messages or dmesg for any errors

Thanks
@balaji.ramaswamy

mystic · September 25, 2019, 7:49pm

@balaji.ramaswamy Dremio is running locally on my mac laptop. It was installed using tar file available for download. au10739 is the name of my laptop

balaji.ramaswamy · September 25, 2019, 8:02pm

Check the Dremio log on your local install to see if you find any errors.

mystic · September 25, 2019, 8:21pm

Thanks @balaji.ramaswamy . I checked the server.log file and it has following error

2019-09-25 20:15:16,125 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (19248)

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) ~[curator-client-2.12.0.jar:na]

at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) ~[curator-client-2.12.0.jar:na]

at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117) ~[curator-client-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) ~[curator-framework-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) ~[curator-framework-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) ~[curator-framework-2.12.0.jar:na]

at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) ~[curator-framework-2.12.0.jar:na]

at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]

at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]

at java.base/java.lang.Thread.run(Thread.java:835) ~[na:na]
2019-09-25 20:15:16,406 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
2019-09-25 20:15:16,432 [Curator-ServiceCache-0] INFO c.d.s.c.TaskLeaderStatusListener - New Leader node for task MASTER au10739:45678 registered itself

mystic · September 25, 2019, 8:31pm

Also, should I look for anything specific on dmesg?

Tried following and nothing came up

sudo dmesg | grep -i “dremio”

sudo dmesg | grep -i “errror”

balaji.ramaswamy · September 25, 2019, 8:43pm

Looks like you lost connection to zookeeper, does this happen every time? What is the total RAM on your MAC?

at java.base/java.lang.Thread.run(Thread.java:835) ~[na:na]
2019-09-25 20:15:16,406 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
2019-09-25 20:15:16,432 [Curator-ServiceCache-0] INFO c.d.s.c.TaskLeaderStatusListener - New Leader node for task MASTER au10739:45678 registered itself

mystic · September 25, 2019, 10:40pm

8 gb and I haven’t had a successful query via Dremio before. The mysql db I am connected to via Dremio runs under docker container. I can connect to the db directly via mysql workbench and query without any problem

balaji.ramaswamy · September 25, 2019, 10:57pm

Send me your dremio-env file under the conf folder

david.lee · September 25, 2019, 11:25pm

I’m seeing the same problems with our Linux Tar install after we upgraded to 4.0.0. We didn’t change anything in the configuration file. Are there new defaults or settings in 4.0.0. that the upgrade process may have missed?

Eventually the Master Coordinator loses touch with ZK and the Dremio server crashes.

server.log:2019-09-25 11:47:41,571 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
server.log:2019-09-25 11:47:41,581 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
server.log:2019-09-25 11:49:39,098 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
server.log:     at com.dremio.service.coordinator.zk.ZKClusterClient$1$1.call(ZKClusterClient.java:233) [dremio-services-coordinator-4.0.1-201909191652190301-211720e.jar:4.0.1-201909191652190301-211720e]
server.log:     at com.dremio.service.coordinator.zk.ZKClusterClient$1$1.call(ZKClusterClient.java:217) [dremio-services-coordinator-4.0.1-201909191652190301-211720e.jar:4.0.1-201909191652190301-211720e]
server.log:2019-09-25 11:50:22,846 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
server.log:2019-09-25 11:50:22,861 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED
server.log:2019-09-25 11:51:09,625 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED

balaji.ramaswamy · September 25, 2019, 11:26pm

@david.lee

Can you please send your server.log?

mystic · September 26, 2019, 12:42am

@balaji.ramaswamy - dremio-env file attached

dremio-env.zip (1.6 KB)

mystic · September 26, 2019, 2:57am

@balaji.ramaswamy - All good now. Seems like it was a Java version issue.

Worked with Java 8

christopherrbyrd · February 24, 2020, 5:00am

I have the same issue. However, I’m using dremio/dremio-oss docker container (v.18.09.3). Here’s a copy of the server log:

dremio.log.zip (36.6 KB)

Thanks

balaji.ramaswamy · February 24, 2020, 6:38am

@christopherrbyrd

I see your coordinator is constantly losing ZK connectivity

> 2020-02-24 02:02:00,838 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to SUSPENDED
> 2020-02-24 02:02:02,726 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
> 2020-02-24 02:02:02,748 [Curator-ConnectionStateManager-0] INFO  c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED

This can be either

#1 Your Zoo keeper is unstable - check zk logs
#2 You cordinator is constantly doing garbage collection check server.gc logs on your coordinator log folder for the same time

Thanks
Bali

vincent_mayer · March 4, 2023, 6:42pm

I get the same error message but I can not find the root of the problem. Attached my server.log file and my dremio-env file. Glad for anyone to help me out.
dremio-env.zip (628,6 KB)

balaji.ramaswamy · March 6, 2023, 4:44am

@vincent_mayer I see the same pattern where your Zoo keeper is constantly going into a SUSPENDED state. Can you check if your GC logs show Full GC pause? You can upload your GC log to https://gceasy.io/ and it will report

vincent_mayer · March 6, 2023, 9:56am

Thanks for your help. I have analyzed the server.gc file. Is this the right one?

The website you mentioned has identified memory problems. But I am not able to retrieve any possible solutions from this report. Any ideas?
dremio-env.zip (652,6 KB)

balaji.ramaswamy · March 10, 2023, 6:40am

@vincent_mayer I see your GC algorithm used is very old. Can you send output of below command? What version of Dremio is this?

ps -ef | grep dremio

Topic		Replies	Views
Error setting up remote intermediate fragment execution	6	2675	August 23, 2018
ERROR com.dremio.exec.rpc.BasicClient - Failed to establish connection	3	2003	April 7, 2020
Error while executing queries on multiple nodes	1	648	April 26, 2023
New Instance Failure	28	2172	February 25, 2021
Error setting up remote intermediate fragment execution on failed node	0	752	November 10, 2022

Error setting up remote fragment execution

Related topics