Exceeded timeout (30000) while waiting after sending work fragments to remote nodes. Sent 1 and only heard response back from 0 nodes

Hello everyone.
I’m facing this error in dremio cloud.

CONNECTION
b3e4344e-fdd5-4bda-a00e-3609c60afa9f.zip (69,6,KB)
ERROR: Exceeded timeout (30000) while waiting after sending work fragments to remote nodes. Sent 1 and only heard response back from 0 nodes

Node(s) that did not respond 10.13.5.198

Do you know what’s happening?

@Filipe.Souza The node IP listed, did not respond to another Dremio node. Could be a RPC taking too much long or a Full GC

If you the see the time taken for “Starting” under the raw profile-query tab-under “State Durations”, you will see it was 30 seconds

Starting:
30,007ms

Usually this should less than 100ms, this tells me the executors are very busy and even to assign the work fragment from it takes 30s

Can you please check if there was long GC pause or on the executor log (When this query ran) see if there are any WARN messages

Hi @balaji.ramaswamy
In AWS we cannot access the logs of the machines created by dremio cloud, we can only check the use of the resources as below.

Is this GC configuration parameterized within the machines?
If so, is there any way to make this change?
All machines are managed exclusively by dremio itself.