My master is running and I am able to connect only one executor . Other executors are throwing the following error
Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
com.dremio.common.exceptions.UserException: The source ["__jobResultsStore"] is currently unavailable. Info: [[Message{level=ERROR, msg=Failure to create directory /data/DremioData/pdfs/results.}]].
My path setting in dremio.conf is
paths: {
the local path for dremio to store data.
local: “/data/DremioData/”
If I just stop Executor 1 and Start executor 2 with same setting , it is working . But if I start Executor 1 now, it starts throwing same error.
Did you check if you have right permissions on /data/DremioData
directory so user that starts Dremio can write to it?
yes, I have full permission and its working if I am running first executor. Its always the 2nd executor which fails. If I change the sequence then again the last executor fails with the same error
Last or second? Did you try three executors?
3 nodes , first one is master and coordinator
2nd is first executor
3rd is 2nd executor
So I am able to connect only 1 executor ( irrespective of sequence)
Do you think path:local can be reason behind it as this setting Paths:local doesn’t exist in executor
executor config
services: {
coordinator.enabled: false,
coordinator.master.enabled: false,
executor.enabled: true
}
zookeeper: “xx.xxx.xx.xx:2181”
Your coordinator/master and executor dremio.conf should be pretty much the same except roles such as:
How many executors have you tested in your environment for single master ?
We support as many as needed.
The reason I was asking about three is to see whether it is second executor that is failing for you or last one. When you have two executors second == last, so hard to differentiate
ok, so actually only one executor is working irrespective os sequence of starting ( of executors)
do I need to use paths: dist instead of local ?
By default paths.dist
is a derivative of paths.local
.
Like: pdfs://"${paths.local}"/pdfs
What’s your dremio.conf on both executors?
same setting on both executors
paths: {
the local path for dremio to store data.
local: “/data/DremioData/”
the distributed path Dremio data including job results, downloads, uploads, etc
dist: “pdfs://”${paths.local}"/pdfs"
}
services: {
coordinator.enabled: false,
coordinator.master.enabled: false,
executor.enabled: true
}
zookeeper: “xx.xxx.236.41:2181”
It is really very strange. Could you look at:
- server.out on failing executor node
- server.log on master coordinator
To see if there are more clues there.
1 Like
great , as you mentioned to check executor logs , i found that executors were not able to see each other due to some dns issue. Error in exector 1 log was
java.net.UnknownHostException: executor2
There is some issue with DNS , so I updated /etc/hosts of all nodes ( master and all executors) and added all IPs and then restarted Master and executors one by one . Now all the executors are working fine and visible in Dremio UI
Thanks a lot for all the support