ExecutionSetupException: One or more nodes lost connectivity during query during reflection creation

Hi,

I am getting "ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were . This is happening in dremio latest version 24.3.2 and 25.0. This is occuring for both reflection jobs and querying.

I have already gone through the dremio discussion on “ExecutionSetupException: One or more nodes lost connectivity during query”. And enabled garbage collection properties and logs. But, not getting any thing from the logs.

And also i have enabled metrics for dremio. It is not reaching any max points for direct and max heap memories. So, i am not able to find anything useful till now.

I am deploying dremio in k8s cluster with the latest chart.

Please provide proper steps to debug the issue.

@aaasif04 This happens when either the executor has a long GC pause or too many queries cause executor to back up

Can you please provide the server.log and all GC logs from the executor? name or IP should be printed in the error message. Make sure to also send the profile or Job ID so we can search in the logs. Kindly send server.log when query ran, to find that do below

Log on the executor IP or name) printed in error message
cd /archive

zgrep <jobID> server*