Dermio kubernetes dremio-master-0 deploy problem

Hello, I am trying to deploy the dremio on my local machine, following this guide:

Other pods is successfuly ready, but the dremio-master-0 is not,

the describe pod shows following infos:

Warning  Unhealthy  26s (x8 over 33s)  kubelet            Startup probe failed: Get "http://10.244.0.19:9047/": dial tcp 10.244.0.19:9047: connect: connection refused

the logs shows following errors:

2023-11-15 23:42:15,684 [main] INFO  c.dremio.exec.catalog.PluginsManager - Result of storage plugin startup:
        INFORMATION_SCHEMA: success (0ms). Healthy
        __jobResultsStore: success (127ms). Healthy
        sys: success (0ms). Healthy

2023-11-15 23:42:15,703 [scheduler-1] INFO  c.d.exec.catalog.CatalogServiceImpl - Creating SysFlight source plugin.
2023-11-15 23:42:15,730 [scheduler-1] WARN  c.d.e.catalog.MetadataSynchronizer - Source 'sys' sync failed unexpectedly. Will try again later
java.lang.NullPointerException: Master coordinator is down
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
        at com.dremio.service.conduit.client.ConduitProviderImpl.getOrCreateChannelToMaster(ConduitProviderImpl.java:180)
        at com.dremio.plugins.sysflight.SysFlightStoragePlugin.getFlightClient(SysFlightStoragePlugin.java:109)
        at com.dremio.plugins.sysflight.SysFlightStoragePlugin.getFlightTableList(SysFlightStoragePlugin.java:239)
        at com.dremio.plugins.sysflight.SysFlightStoragePlugin.listDatasetHandles(SysFlightStoragePlugin.java:191)
        at com.dremio.exec.catalog.MetadataSynchronizer.getDatasetHandleListing(MetadataSynchronizer.java:172)
        at com.dremio.exec.catalog.MetadataSynchronizer.synchronizeDatasets(MetadataSynchronizer.java:186)
        at com.dremio.exec.catalog.MetadataSynchronizer.go(MetadataSynchronizer.java:136)
        at com.dremio.exec.catalog.SourceMetadataManager$RefreshRunner.refreshFull(SourceMetadataManager.java:466)
        at com.dremio.exec.catalog.SourceMetadataManager$AdhocRefresh.run(SourceMetadataManager.java:532)
        at com.dremio.exec.catalog.SourceMetadataManager.refresh(SourceMetadataManager.java:212)
        at com.dremio.exec.catalog.ManagedStoragePlugin.refresh(ManagedStoragePlugin.java:1198)
        at com.dremio.exec.catalog.CatalogServiceImpl.refreshSource(CatalogServiceImpl.java:380)
        at com.dremio.exec.catalog.CatalogServiceImpl.lambda$start$0(CatalogServiceImpl.java:270)
        at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:248)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Can anyone help?
Thanks.

Hi @zhaoxixiang,

Please answer the following:

  1. What type of scheme and port do you see in dremio-master.yaml?
  2. In values.yaml, was the port changed under the coordinator: or was the TLS enabled?
  3. Can you attach values.yaml along with the full startup log?

Thanks, Bogdan

Thanks for your help, I figured it out myself.

Hi @zhaoxixiang,

Thanks for your help, I figured it out myself.

That’s great! For community’s benefit, can you let us know what the problem was?

Cheers, Bogdan

Sure! I made a silly mistake, fs.s3a.endpoint of aws storage(minio) was not set correctly, so the master can not be reached correctly. My final setup of s3 storage is below:

    extraProperties: |
     <property>
       <name>fs.s3a.endpoint</name>
       <value>must set the right address</value>
     </property>
     <property>
       <name>fs.s3a.path.style.access</name>
       <value>true</value>
     </property>
     <property>
       <name>fs.s3a.connection.ssl.enabled</name>
       <value>false</value>
     </property>
     <property>
       <name>dremio.s3.compat</name>
       <value>true</value>
     </property>
1 Like

Hello @zhaoxixiang I am facing the same issue as you described above. I am using ovh s3 bucket as distributed storage. So, below is my configuration in the values.yaml file.

distStorage:
   type: "aws"
   aws:
     bucketName: "dremio-bucket"
     path: "/"
     authentication: "accessKeySecret"
     credentials:
       accessKey: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
       secret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
     extraProperties: |
       <property>
         <name>fs.s3a.endpoint</name>
         <value>s3.gra.io.cloud.ovh.net</value>
       </property>
       <property>
         <name>fs.s3a.path.style.access</name>
         <value>true</value>
       </property>
       <property>
         <name>fs.s3a.connection.ssl.enabled</name>
         <value>false</value>
       </property>
       <property>
         <name>dremio.s3.compat</name>
         <value>true</value>
       </property>

Not sure why I am getting this error. Where did you find the extraProperties name, value pairs ? is there any documentation available about these extraProperties ?

Please help.
Thanks.

@Abdul What is the error you are getting?

@balaji.ramaswamy I am getting below error in the dremio-master-0 pod events

Startup probe failed: Get “http://10.2.4.8:9047/”: dial tcp 10.2.4.8:9047: connect: connection refused"

There is no error messages in the logs of the master pod though. I deployed dremio on OVH cloud kubernetes cluster. the flavor of the underlying node is “d2-8”. I took the helm chart from below github repo

@balaji.ramaswamy dremio is running after removing the below property from extraProperties

   <property>
     <name>fs.s3a.connection.ssl.enabled</name>
     <value>false</value>
   </property>

However, I can access the UI through port forwarding but can’t access using an ingress. I am using apisix as ingress controller. I am getting below error in the UI

Bad Message 431

reason: Request Header Fields Too Large

I have opened a new issue regarding this 6 days back. below is the link. Please check this issue. I used two different helm charts one taken from artifact hub and other taken from dremio github repo. both are giving this same error when deployed dremio behind apisix ingress controller

@Abdul Do you see any warnings in the proxy logs about warning message in the proxy when it truncates the headers?