Dermio kubernetes dremio-master-0 deploy problem

zhaoxixiang · November 16, 2023, 1:38am

Hello, I am trying to deploy the dremio on my local machine, following this guide:

Other pods is successfuly ready, but the dremio-master-0 is not,

the describe pod shows following infos:

Warning  Unhealthy  26s (x8 over 33s)  kubelet            Startup probe failed: Get "http://10.244.0.19:9047/": dial tcp 10.244.0.19:9047: connect: connection refused

the logs shows following errors:

2023-11-15 23:42:15,684 [main] INFO  c.dremio.exec.catalog.PluginsManager - Result of storage plugin startup:
        INFORMATION_SCHEMA: success (0ms). Healthy
        __jobResultsStore: success (127ms). Healthy
        sys: success (0ms). Healthy

2023-11-15 23:42:15,703 [scheduler-1] INFO  c.d.exec.catalog.CatalogServiceImpl - Creating SysFlight source plugin.
2023-11-15 23:42:15,730 [scheduler-1] WARN  c.d.e.catalog.MetadataSynchronizer - Source 'sys' sync failed unexpectedly. Will try again later
java.lang.NullPointerException: Master coordinator is down
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
        at com.dremio.service.conduit.client.ConduitProviderImpl.getOrCreateChannelToMaster(ConduitProviderImpl.java:180)
        at com.dremio.plugins.sysflight.SysFlightStoragePlugin.getFlightClient(SysFlightStoragePlugin.java:109)
        at com.dremio.plugins.sysflight.SysFlightStoragePlugin.getFlightTableList(SysFlightStoragePlugin.java:239)
        at com.dremio.plugins.sysflight.SysFlightStoragePlugin.listDatasetHandles(SysFlightStoragePlugin.java:191)
        at com.dremio.exec.catalog.MetadataSynchronizer.getDatasetHandleListing(MetadataSynchronizer.java:172)
        at com.dremio.exec.catalog.MetadataSynchronizer.synchronizeDatasets(MetadataSynchronizer.java:186)
        at com.dremio.exec.catalog.MetadataSynchronizer.go(MetadataSynchronizer.java:136)
        at com.dremio.exec.catalog.SourceMetadataManager$RefreshRunner.refreshFull(SourceMetadataManager.java:466)
        at com.dremio.exec.catalog.SourceMetadataManager$AdhocRefresh.run(SourceMetadataManager.java:532)
        at com.dremio.exec.catalog.SourceMetadataManager.refresh(SourceMetadataManager.java:212)
        at com.dremio.exec.catalog.ManagedStoragePlugin.refresh(ManagedStoragePlugin.java:1198)
        at com.dremio.exec.catalog.CatalogServiceImpl.refreshSource(CatalogServiceImpl.java:380)
        at com.dremio.exec.catalog.CatalogServiceImpl.lambda$start$0(CatalogServiceImpl.java:270)
        at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:248)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Can anyone help?
Thanks.

bogdan.coman · November 21, 2023, 10:28pm

Hi @zhaoxixiang,

Please answer the following:

What type of scheme and port do you see in dremio-master.yaml?
In values.yaml, was the port changed under the coordinator: or was the TLS enabled?
Can you attach values.yaml along with the full startup log?

Thanks, Bogdan

zhaoxixiang · November 29, 2023, 11:37pm

Thanks for your help, I figured it out myself.

bogdan.coman · November 30, 2023, 6:51am

Hi @zhaoxixiang,

Thanks for your help, I figured it out myself.

That’s great! For community’s benefit, can you let us know what the problem was?

Cheers, Bogdan

zhaoxixiang · December 4, 2023, 3:25am

Sure! I made a silly mistake, fs.s3a.endpoint of aws storage(minio) was not set correctly, so the master can not be reached correctly. My final setup of s3 storage is below:

    extraProperties: |
     <property>
       <name>fs.s3a.endpoint</name>
       <value>must set the right address</value>
     </property>
     <property>
       <name>fs.s3a.path.style.access</name>
       <value>true</value>
     </property>
     <property>
       <name>fs.s3a.connection.ssl.enabled</name>
       <value>false</value>
     </property>
     <property>
       <name>dremio.s3.compat</name>
       <value>true</value>
     </property>

Abdul · February 10, 2025, 11:30am

Hello @zhaoxixiang I am facing the same issue as you described above. I am using ovh s3 bucket as distributed storage. So, below is my configuration in the values.yaml file.

distStorage:
   type: "aws"
   aws:
     bucketName: "dremio-bucket"
     path: "/"
     authentication: "accessKeySecret"
     credentials:
       accessKey: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
       secret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
     extraProperties: |
       <property>
         <name>fs.s3a.endpoint</name>
         <value>s3.gra.io.cloud.ovh.net</value>
       </property>
       <property>
         <name>fs.s3a.path.style.access</name>
         <value>true</value>
       </property>
       <property>
         <name>fs.s3a.connection.ssl.enabled</name>
         <value>false</value>
       </property>
       <property>
         <name>dremio.s3.compat</name>
         <value>true</value>
       </property>

Not sure why I am getting this error. Where did you find the extraProperties name, value pairs ? is there any documentation available about these extraProperties ?

Please help.
Thanks.

balaji.ramaswamy · February 10, 2025, 3:24pm

@Abdul What is the error you are getting?

Abdul · February 11, 2025, 6:04am

@balaji.ramaswamy I am getting below error in the dremio-master-0 pod events

Startup probe failed: Get “http://10.2.4.8:9047/”: dial tcp 10.2.4.8:9047: connect: connection refused"

There is no error messages in the logs of the master pod though. I deployed dremio on OVH cloud kubernetes cluster. the flavor of the underlying node is “d2-8”. I took the helm chart from below github repo

Abdul · February 11, 2025, 12:23pm

@balaji.ramaswamy dremio is running after removing the below property from extraProperties

   <property>
     <name>fs.s3a.connection.ssl.enabled</name>
     <value>false</value>
   </property>

However, I can access the UI through port forwarding but can’t access using an ingress. I am using apisix as ingress controller. I am getting below error in the UI

Bad Message 431

reason: Request Header Fields Too Large

I have opened a new issue regarding this 6 days back. below is the link. Please check this issue. I used two different helm charts one taken from artifact hub and other taken from dremio github repo. both are giving this same error when deployed dremio behind apisix ingress controller

balaji.ramaswamy · February 11, 2025, 5:03pm

@Abdul Do you see any warnings in the proxy logs about warning message in the proxy when it truncates the headers?

Topic		Replies	Views
Dremio-master-0 restart loop after k8s node problem	0	9	July 1, 2025
Master in ERROR state	6	1832	March 20, 2021
Not able to connect HDFS to Dremio!	7	1806	December 16, 2020
Dremio kubernetes cluster installation	2	1600	April 30, 2021
Connecting Dremio to Storage Grid - s3	1	1198	April 24, 2019

Dermio kubernetes dremio-master-0 deploy problem

Bad Message 431

Related topics