Dremio In Kubernetes cluster not getting up

Hi All,

I am deploying dremio in Kubernetes cluster which is installed in Docker Enterprize edition using the below helm charts


After deployment pods are not coming up and is in pending status, Getting Error from server (BadRequest): pod dremio-executor-0 does not have a host assigned

Anyone faced this kind of issue before, can you have any thought what’s going wrong here?

PS C:\Users\E99887\Desktop\Kube\charts> kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
dremio-7cb585bd6d-bfdqj 0/1 Pending 0 20h
dremio-executor-0 0/1 Pending 0 13s
dremio-executor-1 0/1 Pending 0 13s
dremio-master-0 0/1 Pending 0 13s
tiller-deploy-69c64d7945-g75lf 1/1 Running 0 8d 192.168.190.3 s01cdk004
zk-0 0/1 Pending 0 13s
zk-1 0/1 Pending 0 13s
zk-2 0/1 Pending 0 10s

PS C:\Users\E99887\Desktop\Kube\charts> kubectl exec dremio-executor-0 – ls -la /
Error from server (BadRequest): pod dremio-executor-0 does not have a host assigned
PS C:\Users\E99887\Desktop\Kube\charts>

@paulsuk1982

It could be that your cluster does not have enough resources. Try

kubectl describe pod dremio-executor-0

and check why it is pending. If it states something to the tune of no resources available to satisfy the cpu/memory requirements, the underlying resources are not sufficient. The default values.yaml includes values that we expect users to use in production environments. You can try reducing the cpu/memory and count in values.yaml and attempt it again. Note, that depending on what you are attempting to process, the amount of memory/cpu available to Dremio is important.

If reason for pending is something else, please post your error here.

As you said the values.yaml file is created for Prod deployment, so we customized the cpu/memory specifications and count of pods in values.yaml file for master/executor/zookeeper nodes and limited the no of nodes that would like to spin up based on the resource availability in our cluster.

Is it anything to do with nodeSelector value as that’s hashed out in values.yaml, will it be dynamically be decided which cluster node the pod will spin up while deploying releases.

Except below warning I am not seeing any other error in pod describe output.

Conditions:
Type Status
PodScheduled False

             node.kubernetes.io/unreachable:NoExecute for 300s

Events:
Type Reason Age From Message


Warning FailedScheduling 43s (x296 over 4h50m) default-scheduler pod has unbound immediate PersistentVolumeClaims

Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s

nodeSelector is allows you to schedule your pods on nodes with labels specified.

Do you have taints defined on nodes on your kubernetes cluster? What it looks like is that no node is available for scheduling the pods. Looks like you are running it on Windows machine. Can you share some more details about your Kubernetes environment?

@nsen Thanks for all of your response and update so far, feels like we have great community support in dremio.

Our kubernetes cluster is installed in Docker EE, version v1.11.9-docker-1 and based on os image Ubuntu 16.04.5 LTS.
I tried below approach:

As I checked, nodes have default taints defined pre-configured in Docker EE cluster, I tried untaint the nodes, If try untaining the nodes, says untained like below but actually not getting untainted.

kubectl taint nodes q01cdk001 com.docker.ucp.orchestrator.kubernetes-
node/q01cdk001 untainted
after untaining, issue is same pods are not getting scheduled, as I said nodes are not getting untained, taints shows in node describe output.

option-2:
Used node selector option with key value pair to schedule the pods in one node, but even that is not working, pods are being launched not getting scheduled into nodes.

Trying other options…

To get pods deployed on nodes with taints on them, you have to add tolerations to the deployment templates. Let us know how it goes.
https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

@nsen
I tried adding tolerations to the deployment template with both manager and orchestrator nodes, but not working. pods are still in pending status and waiting for host to be assigned, As I said we are using kubernetes integrated with docker EE UCP cluster, it seems toleration are getting added automatically to kubernetes objects while deploying and even if we mention toleration to pod spec will be overridden by UCP toleration, may there could be some configuration changes in UCP to allow service accounts to schedule pods, exploring those options.

For the time being. tested with tomcat chart deploying from helm repo and its up and running, but dremio chart deployed pod are not being scheduled. See below pod is by default scheduled in kubernetes worker and tiller pod is also scheduled in the same worker node. no workload can be scheduled on manager nodes as per configuration.

edm-tomcat-7f6b48887-gtkwm 1/1 Running 0 36m 197.1.100.100 s01cdk004

Is it because of persistent volumes not mounted scheduling being refused.

             node.kubernetes.io/unreachable:NoExecute for 300s

Events:
Type Reason Age From Message


Warning FailedScheduling 18m default-scheduler persistentvolumeclaim “dremio-executor-volume-dremio-executor-0” not found
Warning FailedScheduling 60s (x18 over 18m) default-scheduler pod has unbound immediate PersistentVolumeClaims

@nsen
I think the issue is with persistent volume group, as I tried deployed other helm charts like redid- Postgres none of the pods coming up due to PVCS, could you please help or share some lights on configuring pvcs for Dremio
As mentioned in the values.yaml it will use default storage class if not mentioned anything, not sure what else to define.

Thanks for all your help so far, appreciate it.

Can you check the status of the persistent volumes?

kubectl describe pvc <pvc-name>

Could be related to permission issues…I know ranger requires some extra privileges. You K8S must also require some extra privileges.

Issue in pvc binding, below the status from describe pvc, I think we need to mention storage class and mention persistent volume name.

Events:
Type Reason Age From Message


Normal FailedBinding 93s (x3684 over 15h) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: zk-1

Based on your error, checkout their doc on creating storage class depending on your environment. For example, https://docs.docker.com/ee/ucp/kubernetes/storage/configure-aws-storage/ talks about creating storage classes in AWS. (Using AKS in Azure or EKS in AWS does create a default storageclass).

@nsen we are in the process of creating PV in the kubernetes cluster so PVC can be claimed in pod configuration, will let you know how that goes and thanks all for your feedback and input so far.
I think we have a great dremio community support and active work going on, loving much.

@nsen or @kelly_stirman

we are having issue with dynamic storage provisioning in our Kubernetes cluster that we are trying to resolve,
so helm deployment is not working.
would like to deploy manually, so could you please let us know where these yaml files are in github repo? would be really helpful if you can send the path so can take a look.

kubectl create -f zookeeper.yaml
kubectl create -f dremio-configmap-minimum.yaml
kubectl create -f dremio-master-volume-hostpath.yaml
kubectl create -f dremio-master-volume-pvc.yaml
kubectl create -f dremio-master.yaml
kubectl create -f dremio-service-ui.yaml
kubectl create -f dremio-service-client.yaml
kubectl create -f dremio-executor.yaml
kubectl scale --replicas=5 rs/dremio-executor

++ @kelly
looping kelly for help

@paulsuk1982

Those files are not available. The direction is to use helm.

You can look at the templates directory for the templatized versions of those files. Or, you can do a helm install --debug --dry-run . to generate usable template files and go from there.