Workers in Provisioning or Disconnected


#1

I have a issue with Dremio, very often happens to have one or more Workers in Provisioning or Disconnected giving me errors like that:

  • Unable to acquire queue resources for query within timeout. Timeout for large queue was set at 300 seconds.

  • Exceeded timeout (45000) while waiting send intermediate work fragments to remote nodes. Sent 9 and only heard response back from 0 nodes

I use Dremio (Build 3.0.1-201811132128360291-804fe82, Community Edition) connected to Hive.

How can i solve my problem ?
I can’t stop and restart the workers every day.


#2

Hello @marco,

If queries are reporting this then it’s likely because because you are submitting jobs when you do not have sufficient Dremio workers (executors) or are losing connectivity. When your cluster is fully provisioned (that is, you have all your available workers active), can you rerun these queries successfully?