Zookeeper role in dremio

As per the documentation, Dremio utilizes Apache ZooKeeper behind the scenes for cluster coordination.
If in Kubernetes deployment, Master coordinator is down then kubernetes infrastructure is responsible for HA of Dremio.
So what is the use of Zookeeper in Dremio?

@Ayush.goyal Even if we do not need HA in K8’s world, the ZK is the one that connects the coordinator and the executors, also the coordinator heartbeat is constantly checked by ZK, if for example there is a Full GC and the coordinator is stuck in consecutive full GC’s, ZK will timeout and the node will lose master status and go down, K8’s will at once bring up another pod as the master

@balaji.ramaswamy Any documentation to know how the communication between master and executor happens

Does the executor register the cluster IP with the Zookeeper & the master & co-ordinator get these details from Zookeeper and call the executor.

@unni Yes ZK does the tracking, also when the executor does not respond until the ZK timeout (30s), you will have a ZK SUSPENDED which will cause queries to be cancelled,

Hi @balaji.ramaswamy , is it possible to mitigate full GC which make the coordinator become stuck.

@borasy Once full GC happens, we do not have any option. But we should see why Full GC is happening and avoid it

Are you able to send your GC logs so we can look at them and see the root cause?