Dremio-master-coordinator unable to access volume cause CrashLoopBackOff

Nicolas-Malgat · December 20, 2022, 4:40pm

Hello everyone,
I try to deploy dremio on an Azure Kubernetes Service, I already modified values.yaml to fit my cluster capacity, zookeeper and executor are running whithout problems.
I do not really have an idea of what’s going on, but I have collected few clues.
So you may have guessed, my pod dremio-master-0 crash when dremio-master-coordinator tries to access /opt/dremio/data.

java.io.IOException: path /opt/dremio/data is not writable.
        at com.dremio.dac.daemon.PathUtils.checkWritePath(PathUtils.java:58)
        at com.dremio.dac.daemon.DACDaemon.<init>(DACDaemon.java:159)
        at com.dremio.dac.daemon.DACDaemon.newDremioDaemon(DACDaemon.java:316)
        at com.dremio.dac.daemon.DACDaemon.newDremioDaemon(DACDaemon.java:324)
        at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:103)

I also have an error when coordinator tries to write in the opt/dremio/log/ folder, this is the fatal error for the pod

at java.io.FileNotFoundException: /opt/dremio/log/hive.deprecated.function.warning.log (No such file or directory)

So I inspected and found that the InitContainer “upgrade-task” return this log
I don’t think this is the normal behaviour …

Database not found. Skipping upgrade.

I tried to work around the problem by running as root (user 0)

- name: dremio-master-coordinator
  image: {{ $.Values.image }}:{{ $.Values.imageTag }}
  imagePullPolicy: IfNotPresent
  securityContext:
    runAsUser: 0

It gives me multiple errors;

I am at this point, I don’t understand how to fix this but maybe you encountered one them.

balaji.ramaswamy · December 22, 2022, 7:43am

@Nicolas-Malgat Why is /opt/dremio/data not writable or is it a false message?

Nicolas-Malgat · December 22, 2022, 10:14am

Okay, I edited a few files in the dremio chart to fit a previous installation documentation which is not relevant for me. I’m back to the original github chart with my custom values.yaml now.

The file not found Exception is because I removed the following initContainers: wait-for-zookeeper and chown-data-directory.

The “Illegal base64 character 5f” error is caused by an underscore character. It was because I copy-pasted the core-site.xml here.

I encountered an additionnal problem, an authentification failed on my azure datalake gen2 because I used the wrong access key to my ADLS gen2.

Caused by: java.lang.RuntimeException: {"error":{"code":"AuthenticationFailed","message":"Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:bdef46e3-a01f-0050-76e5-15dc12000000\nTime:2022-12-22T09:12:59.0961111Z","detail":
{"AuthenticationErrorDetail":
"The MAC signature found in the HTTP request 'AmSVQNkzlS90LXdYqISo8kQDAofFp5LleihGmwHHuq4=' is not the same as any computed signature. Server used following string to sign: 'GET\n\n\n\n\n\n\n\n\n\n\n\nx-ms-client-request-id:86ff6f9b-9ada-4d36-8c78-0c55a4d7a3ca\nx-ms-date:Thu, 22 Dec 2022 09:12:59 GMT\nx-ms-version:2019-07-07\n\/dlsdremiodev\/\ncontinuation:\nmaxresults:100\nresource:account'."}}}

balaji.ramaswamy · December 22, 2022, 3:39pm

@Nicolas-Malgat Are we all clear now? or do we still have any open issues. Sorry, just asking so we can help

Nicolas-Malgat · December 23, 2022, 10:16am

I have new problems maybe it needs another issue post.

My main problem right now is that I can’t connect to service, there is no reaction from pods when I try to reach public IP + port 9047.

In logs I got few messages that you already answered on commmunity.dremio.com;

“Could not find session with sessionId”
For this one I connected to my master pod’s shell but there is no netstat installed and didn’t find out how to upgrade to superuser to install it
“Master coordinator is down”
I tried to access logs and added below configuration in “env:” after line 81 of dremio-master.yaml

        - name: DREMIO_LOG_TO_CONSOLE
          value: "0"
        - name: DREMIO_LOG_DIR
          value: "/opt/dremio/data/log"

but logs still try to write in /opt/dremio/log/

java.io.FileNotFoundException: /opt/dremio/log/server.log (No such file or directory)

nicolas_logs.zip (32,7 Ko)

I send a short and full version of my kubectl logs to help. Short version is a grep of WARN and Exception keyword. I also collected logs with a more powerfull cluster to test if it was the problem, I labeled “POWER” in that case.
Some Liveness probe and Readiness probe pod events happens but I think there is nothing to worry about since I doesn’t show in my “powerfull cluster test”.

balaji.ramaswamy · December 26, 2022, 1:35am

@Nicolas-Malgat Did you see the below? This is from the executor

java.util.concurrent.ExecutionException: java.net.UnknownHostException: dremio-master-0.dremio-cluster-pod.default.svc.cluster.local: Name or service not known

Nicolas-Malgat · January 5, 2023, 9:45am

Hey I’m back to work,

So I checked and saw that error but I don’t really have a clue of what to do about it.
Am I supposed to set a “publicly accessible DNS name” aka static url to my AKS ?
I don’t think I can set this adress in my values.yaml.
I’m using Microsoft Azure for my cluster.

EDIT: using a different internet connection than my company, I get the dremio login page.

balaji.ramaswamy · January 8, 2023, 5:04am

@Nicolas-Malgat That’s great, is there a firewall that is blocking the port when you are trying within your office network?

Nicolas-Malgat · January 10, 2023, 10:23pm

Yes, exactly.
I think the issue can be closed. Thanks you for the help

balaji.ramaswamy · January 15, 2023, 1:01am

Thanks for update @Nicolas-Malgat and most welcome

Topic		Replies	Views
Failure while starting services. com.dremio.datastore.DatastoreException: Process user (dremio) doesn't match local catalog db owner (root). Please run process as root Dremio University	20	4628	November 25, 2020
Persist dremio docker - path /opt/dremio/data not writable	3	680	April 19, 2024
Dremio on kubernate not able to start	4	1698	February 28, 2020
Dremio community edition in aks	8	1248	October 7, 2021
Path /opt/dremio/data is not writable when try to run persist dremio docker	1	3283	August 1, 2019

Dremio-master-coordinator unable to access volume cause CrashLoopBackOff

Related topics