Failure while starting services. com.dremio.datastore.DatastoreException: Process user (dremio) doesn't match local catalog db owner (root). Please run process as root

Hello Dremio Community,

Hope to get some insights and help here.

I am new to Dremio but I look at digging deep as Dev. Ops. who maintains the Dremio OSS instance for our data analyst team who actually uses it.

(Pardon if this is too much of a story)
A few months back one of our junior teammate created a custom Kubernetes setup of Dremio (with loose manifest scripts) - this one does not have master-coordinator-executor - just a single instance (no zookeeper).

This works fine and now has a lot of data/queries set up in it. I need to perform a Dremio version upgrade and also move this to a Helm based installation in a new K8S cluster (AKS).

So I followed the routine - I did EXEC -it into the single instance Dremio and ran backup using dremio-admin (note: the service was running when I did this - but as mentioned in docs - seems that was ok)

kubectl cp to my windows machine after I created a .tar.gz on the dremio instance itself so I only need to cp a single file.

Next step I set a Helm Dremio installation - I am using AzureFiles storage class on AKS and on the first installation it works fine - but obviously does not have all those data/table/query from our old Dremio yet.

So I followed the Helm switch for --set DremioAdmin=true and then EXEC -it into the admin pod - again kubectl cp the backup tar.gz to the AzureFile master PVC. Extracted the tar to a folder and tried running ./dremio-admin restore -d /somepath -r This did not go well - It does say the restore happened but also an error related to operation not permitted.

So I started editing the Helm _v2 folder and updated the dremio-admin manifest and added

    securityContext:
      allowPrivilegeEscalation: false
      runAsUser: 0

and did Helm upgrade - now the restore worked well.

I also did dremio-admin clean for -c -o -i and upgrade - all these went well too.

Switching back with --set DremioAdmin=false and now the dremio-master-0 fails - on a bit of digging I found the following error:

2020-11-05T04:42:43.411+0000: [GC (Allocation Failure) [PSYoungGen: 136808K->8187K(260096K)] 136912K->17255K(432128K), 0.0148694 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 
2020-11-05T04:42:43.887+0000: [GC (Allocation Failure) [PSYoungGen: 260091K->14307K(266240K)] 269159K->40859K(438272K), 0.0357166 secs] [Times: user=0.09 sys=0.01, real=0.04 secs] 
2020-11-05T04:42:44.408+0000: [GC (Allocation Failure) [PSYoungGen: 266211K->25573K(523264K)] 292763K->55072K(695296K), 0.0183565 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 
2020-11-05 04:42:44,691 [main] INFO  c.d.common.scanner.ClassPathScanner - Scanning packages [com.dremio.sabot.task.slicing.SlicingTaskPool, com.dremio.dac, com.dremio.dac.support.SupportService, com.dremio.service.cachemanager, com.dremio.plugins.azure, com.dremio.extra.exec.store.dfs, com.dremio.exec.planner.acceleration.substitution, com.dremio.options, com.dremio.telemetry.api, com.dremio.service.namespace, com.dremio.plugins.adl.store, com.dremio.plugins.mongo, com.dremio.telemetry.utils, com.dremio.plugins.s3.store, com.dremio.provision.yarn.service, com.dremio.service.jobtelemetry.server.store, com.dremio.service.users, com.dremio.exec.fn.hive, com.dremio.service.accelerator, com.dremio.service.reflection, com.dremio.service.voting, com.dremio.exec.store.jdbc, com.dremio.exec.store.hive, com.dremio.exec.store.dfs, com.dremio.resource, com.dremio.resource.basic, com.dremio.exec.store.mock, com.dremio.common.logical, com.dremio.exec.store.dfs, com.dremio.exec.server.options, com.dremio.exec.store.hive, com.dremio.exec.store.hive.exec, com.dremio.dac, com.dremio.dac.support.SupportService, com.dremio.dac.cmd, com.dremio.dac.cmd.upgrade, com.dremio.extras.plugins.elastic, com.dremio.provision, com.dremio.services.configuration, com.dremio.services.configuration.ConfigurationStore, com.dremio.exec.store.jdbc, com.dremio.exec.store.dfs, com.dremio.joust.geo, com.dremio.exec.ExecConstants, com.dremio.exec.catalog, com.dremio.exec.compile, com.dremio.exec.expr, com.dremio.exec.physical, com.dremio.exec.planner.physical, com.dremio.exec.server.options, com.dremio.exec.store, com.dremio.exec.store.dfs.implicit.ImplicitFilesystemColumnFinder, com.dremio.exec.rpc.user.security, com.dremio.sabot, com.dremio.sabot.op.aggregate.vectorized, com.dremio.sabot.rpc.user, com.dremio.service.jobs, com.dremio.plugins.mongo, com.dremio.service.execselector.ExecutorSelectionService, com.dremio.datastore, com.dremio.exec.store.hive, com.dremio.plugins.elastic, com.dremio.exec.store, org.apache.hadoop.hive] in locations [jar:file:/opt/dremio/jars/dremio-ce-services-cachemanager-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-azure-storage-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-sabot-kernel-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-options-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-telemetry-api-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-client-base-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-namespace-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-adls-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-mongo-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-telemetry-utils-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-s3-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-yarn-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-users-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-jdbc-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-hive2-plugin-launcher-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-hdfs-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-resourcescheduler-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-hive-plugin-common-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-dac-daemon-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-elasticsearch-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-provision-common-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-configuration-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-jdbc-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-nas-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-sabot-joust-java-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-coordinator-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-mongo-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-services-execselector-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-hive3-plugin-launcher-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-hive2-plugin-launcher-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-ce-hive3-plugin-launcher-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-elasticsearch-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/dremio-pdfs-plugin-4.3.0-202005130340290582-323f05d9.jar!/, jar:file:/opt/dremio/jars/3rdparty/dremio-hive2-exec-shaded-4.3.0-202005130340290582-323f05d9.jar!/] took 1551ms
2020-11-05 04:42:44,802 [main] INFO  c.d.d.a.LegacyKVStoreProviderAdapter - Starting LegacyKVStoreProviderAdapter.
2020-11-05 04:42:44,803 [main] INFO  c.d.d.a.LegacyKVStoreProviderAdapter - Starting underlying KVStoreProvider.
2020-11-05 04:42:44,803 [main] INFO  c.d.datastore.LocalKVStoreProvider - Starting LocalKVStoreProvider
2020-11-05 04:42:44,818 [main] INFO  c.d.d.a.LegacyKVStoreProviderAdapter - Stopping LegacyKVStoreProviderAdapter by stopping underlying KVStoreProvider.
2020-11-05 04:42:44,818 [main] INFO  c.d.datastore.LocalKVStoreProvider - Stopping LocalKVStoreProvider
2020-11-05 04:42:44,822 [main] ERROR ROOT - Dremio is exiting. Failure while starting services.
com.dremio.datastore.DatastoreException: Process user (dremio) doesn't match local catalog db owner (root).  Please run process as root.
	at com.dremio.datastore.ByteStoreManager.verifyDBOwner(ByteStoreManager.java:165)
	at com.dremio.datastore.ByteStoreManager.start(ByteStoreManager.java:192)
	at com.dremio.datastore.CoreStoreProviderImpl.start(CoreStoreProviderImpl.java:171)
	at com.dremio.datastore.LocalKVStoreProvider.start(LocalKVStoreProvider.java:151)
	at com.dremio.datastore.adapter.LegacyKVStoreProviderAdapter.start(LegacyKVStoreProviderAdapter.java:65)
	at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:181)
	at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:141)
	Suppressed: java.lang.IllegalStateException: #start was not invoked, so metadataManager is not available
		at com.google.common.base.Preconditions.checkState(Preconditions.java:508)
		at com.dremio.datastore.ByteStoreManager.getMetadataManager(ByteStoreManager.java:424)
		at com.dremio.datastore.ByteStoreManager.close(ByteStoreManager.java:431)
		at com.dremio.common.AutoCloseables.close(AutoCloseables.java:126)
		at com.dremio.common.AutoCloseables.close(AutoCloseables.java:76)
		at com.dremio.datastore.CoreStoreProviderImpl.close(CoreStoreProviderImpl.java:258)
		at com.dremio.datastore.LocalKVStoreProvider.close(LocalKVStoreProvider.java:195)
		at com.dremio.datastore.adapter.LegacyKVStoreProviderAdapter.close(LegacyKVStoreProviderAdapter.java:84)
		at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:184)
		... 1 common frames omitted
Dremio is exiting. Failure while starting services.
com.dremio.datastore.DatastoreException: Process user (dremio) doesn't match local catalog db owner (root).  Please run process as root.
	at com.dremio.datastore.ByteStoreManager.verifyDBOwner(ByteStoreManager.java:165)
	at com.dremio.datastore.ByteStoreManager.start(ByteStoreManager.java:192)
	at com.dremio.datastore.CoreStoreProviderImpl.start(CoreStoreProviderImpl.java:171)
	at com.dremio.datastore.LocalKVStoreProvider.start(LocalKVStoreProvider.java:151)
	at com.dremio.datastore.adapter.LegacyKVStoreProviderAdapter.start(LegacyKVStoreProviderAdapter.java:65)
	at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:181)
	at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:141)
	Suppressed: java.lang.IllegalStateException: #start was not invoked, so metadataManager is not available
		at com.google.common.base.Preconditions.checkState(Preconditions.java:508)
		at com.dremio.datastore.ByteStoreManager.getMetadataManager(ByteStoreManager.java:424)
		at com.dremio.datastore.ByteStoreManager.close(ByteStoreManager.java:431)
		at com.dremio.common.AutoCloseables.close(AutoCloseables.java:126)
		at com.dremio.common.AutoCloseables.close(AutoCloseables.java:76)
		at com.dremio.datastore.CoreStoreProviderImpl.close(CoreStoreProviderImpl.java:258)
		at com.dremio.datastore.LocalKVStoreProvider.close(LocalKVStoreProvider.java:195)
		at com.dremio.datastore.adapter.LegacyKVStoreProviderAdapter.close(LegacyKVStoreProviderAdapter.java:84)
		at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:184)
		... 1 more

So as I assume the newly restore DB folder was under the root owner and while dremio is running with dremio:dremio user:group this is happening.

I tried multiple things but all in vain and the master does not start for me and in all these tries - I did the runAsUser: 0 (i.e. run as root) for just the master pod - that did bring the master on - but all operation in the dremio portal were failing for NativeIOException: Operation not permitted. So I went into the Helm _v2 folder and added the runAsUser: 0 part to each and every container that will spin up as part of this Helm installation.

Now I observe that everything works like a charm - which is great as I got it working and the backup files were restored - working well - so we didn’t lose any work - also upgraded and moved to new AKS using Helm.

My concern is:

  • Running with root privileges in a container is not a good practice, thoughts?
  • Because the way it works is via lots of edits in the Helm package - I will have to keep doing that for any future Helm package updates and maintain this custom solution going forward - defies the purpose of having to use a Helm package after all.

I need some help here so I can go back to the unedited Helm release and still have the backup /DB content intact.

@jsinh

Can we not simply change ownership of the “db” folder to dremio:dremio?

@balaji.ramaswamy I did tried that at first but as by default all dremio containers run with dremio:dremio I was not able to run that command. I do that and either gives permission error or just has no effect. I was also not able to sudo while running dremio-admin with “dremio” user

@jsinh

Who owns the binaries? “dremio” or “root”?

@balaji.ramaswamy I see that those files were also owned by root:root too. I am using AzureStorage - Azure files. I assume when you say binaries - you are talking about files in /opt/dremio/bin

@jsinh

Everything under “/opt/dremio”, where is your RocksDB and who owns that folder?

How do I check RocksDB, what folder it is set to be in? Config file? @balaji.ramaswamy

@jsinh

On the coordinator, check what the local path is set to?

@balaji.ramaswamy I don’t have any Coordinator running as per the replica-set I have that at 0/0 at the moment.

Is it really necessary to have it set to at minimum 1 or more to make it work?

Here are my custom values when I did the Helm Install

# The Dremio image used in the cluster.
#
# It is *highly* recommended to update the version tag to
# the version that you are using. This will ensure that all
# the pods are using the same version of the software.
#
# Using latest will cause Dremio to potentially upgrade versions
# automatically during redeployments and may negatively impact
# the cluster.
image: dremio/dremio-oss
imageTag: 4.9.1
# 4.3.0

# Annotations, labels, node selectors, and tolerations
#
# annotations: Annotations are applied to the StatefulSets that are deployed.
# podAnnotations: Pod annotations are applied to the pods that are deployed.
# labels: Labels operate much like annotations.
# podLabels: Labels that are applied to the pods that are deployed.
# nodeSelector: Target pods to nodes based on labels set on the nodes. For more
#   information, see https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
# tolerations: Tolerations allow the negation of taints that have been applied to some set of nodes
#   in the Kubernetes cluster so that pods can be scheduled on those tainted nodes.
annotations: {}
podAnnotations: {}
labels: {}
podLabels: {}
nodeSelector:
  agentpool: ap2dremio
tolerations: []

# Dremio Coordinator
coordinator:
  # CPU & Memory
  # Memory allocated to each coordinator, expressed in MB.
  # CPU allocated to each coordinator, expressed in CPU cores.
  cpu: 1
  memory: 4096

  # This count is used for slave coordinators only.
  # The total number of coordinators will always be count + 1.
  count: 0

  # Coordinator data volume size (applies to the master coordinator only).
  # In most managed Kubernetes environments (AKS, GKE, etc.), the size of the disk has a direct impact on
  # the provisioned and maximum performance of the disk.
  volumeSize: 50Gi

  # Uncomment the lines below to use a custom set of extra startup parameters for the coordinator.
  #extraStartParams: >-
  #  -DsomeKey=someValue

  # Extra Init Containers
  # Uncomment the below lines to use a custom set of extra init containers for the coordinator.
  #extraInitContainers: |
  #  - name: extra-init-container
  #    image: {{ $.Values.image }}:{{ $.Values.imageTag }}
  #    command: ["echo", "Hello World"]

  # Extra Volumes
  # Uncomment below to use a custom set of extra volumes for the coordinator.
  #extraVolumes: []

  # Extra Volume Mounts
  # Uncomment below to use a custom set of extra volume mounts for the coordinator.
  #extraVolumeMounts: []

  # Uncomment this value to use a different storage class for the coordinator.
  storageClass: data-dremio

  # These values, when defined, override the provided shared annotations, labels, node selectors, or tolerations.
  # Uncomment only if you are trying to override the chart's shared values.
  #annotations: {}
  #podAnnotations: {}
  #labels: {}
  #podLabels: {}
  #nodeSelector: {}
  #tolerations: []

  # Web UI
  web:
    port: 9047
    tls:
      # To enable TLS for the web UI, set the enabled flag to true and provide
      # the appropriate Kubernetes TLS secret.
      enabled: false

      # To create a TLS secret, use the following command:
      # kubectl create secret tls ${TLS_SECRET_NAME} --key ${KEY_FILE} --cert ${CERT_FILE}
      secret: dremio-tls-secret-ui

  # ODBC/JDBC Client
  client:
    port: 31010
    tls:
      # To enable TLS for the client endpoints, set the enabled flag to
      # true and provide the appropriate Kubernetes TLS secret. Client
      # endpoint encryption is available only on Dremio Enterprise
      # Edition and should not be enabled otherwise.
      enabled: false

      # To create a TLS secret, use the following command:
      # kubectl create secret tls ${TLS_SECRET_NAME} --key ${KEY_FILE} --cert ${CERT_FILE}
      secret: dremio-tls-secret-client

  # Flight Client
  flight:
    port: 32010
    tls:
      # To enable TLS for the Flight endpoints, set the enabled flag to
      # true and provide the appropriate Kubernetes TLS secret.
      enabled: false

      # To create a TLS secret, use the following command:
      # kubectl create secret tls ${TLS_SECRET_NAME} --key ${KEY_FILE} --cert ${CERT_FILE}
      secret: dremio-tls-secret-flight

# Dremio Executor
executor:
  # CPU & Memory
  # Memory allocated to each executor, expressed in MB.
  # CPU allocated to each executor, expressed in CPU cores.
  cpu: 1
  memory: 4096

  # Engines
  # Engine names be 47 characters or less and be lowercase alphanumber characters or '-'.
  # Note: The number of executor pods will be the length of the array below * count.
  engines: ["default"]
  count: 1

  # Executor volume size.
  volumeSize: 50Gi

  # Uncomment the lines below to use a custom set of extra startup parameters for executors.
  #extraStartParams: >-
  #  -DsomeKey=someValue

  # Extra Init Containers
  # Uncomment the below lines to use a custom set of extra init containers for executors.
  #extraInitContainers: |
  #  - name: extra-init-container
  #    image: {{ $.Values.image }}:{{ $.Values.imageTag }}
  #    command: ["echo", "Hello World"]

  # Extra Volumes
  # Uncomment below to use a custom set of extra volumes for executors.
  #extraVolumes: []

  # Extra Volume Mounts
  # Uncomment below to use a custom set of extra volume mounts for executors.
  #extraVolumeMounts: []

  # Uncomment this value to use a different storage class for executors.
  storageClass: data-dremio

  # Dremio C3
  # Designed for use with NVMe storage devices, performance may be impacted when using
  # persistent volume storage that resides far from the physical node.
  cloudCache:
    enabled: true

    # Uncomment this value to use a different storage class for C3.
    storageClass: data-dremio

    # Volumes to use for C3, specify multiple volumes if there are more than one local
    # NVMe disk that you would like to use for C3.
    #
    # The below example shows all valid options that can be provided for a volume.
    # volumes:
    # - name: "dremio-default-c3"
    #   size: 100Gi
    #   storageClass: "local-nvme"
    volumes:
    - size: 10Gi
      storageClass: data-dremio

  # These values, when defined and not empty, override the provided shared annotations, labels, node selectors, or tolerations.
  # Uncomment only if you are trying to override the chart's shared values.
  #annotations: {}
  #podAnnotations: {}
  #labels: {}
  #podLabels: {}
  #nodeSelector: {}
  #tolerations: []

  # Engine Overrides
  #
  # The settings above are overridable on a per-engine basis. These
  # values here will take precedence and *override* the configured values
  # on a per-engine basis. Engine ovrrides are matched with the name in the above
  # list of engines.
  #
  # Special per-engine parameters:
  # volumeClaimName: For each engine, you can optionally specify a value for the volume claim name,
  #   this value must be unique to each engine or may cause unintended consequences. This value is
  #   primarily intended for transitioning an existing single engine to a multi-engine configuration
  #   where there may already have been existing persistent volumes.
  #
  # The below example shows all valid options that can be overridden on a per-engine basis.
  # engineOverride:
  #   engineNameHere:
  #     cpu: 1
  #     memory: 122800
  #
  #     count: 1
  #
  #     annotations: {}
  #     podAnnotations: {}
  #     labels: {}
  #     podLabels: {}
  #     nodeSelector: {}
  #     tolerations: []
  #
  #     extraStartParams: >-
  #       -DsomeCustomKey=someCustomValue
  #
  #     extraInitContainers: |
  #       - name: extra-init-container
  #         image: {{ $.Values.image }}:{{ $.Values.imageTag }}
  #         command: ["echo", "Hello World"]
  #
  #     extraVolumes: []
  #     extraVolumeMounts: []
  #
  #     volumeSize: 50Gi
  #     storageClass: managed-premium
  #     volumeClaimName: dremio-default-executor-volume
  #
  #     cloudCache:
  #       enabled: true
  #
  #       storageClass: ""
  #
  #       volumes:
  #       - name: "default-c3"
  #         size: 100Gi
  #         storageClass: ""

# Zookeeper
zookeeper:
  # The Zookeeper image used in the cluster.
  image: k8s.gcr.io/kubernetes-zookeeper
  imageTag: 1.0-3.4.10

  # CPU & Memory
  # Memory allocated to each zookeeper, expressed in MB.
  # CPU allocated to each zookeeper, expressed in CPU cores.
  cpu: 0.5
  memory: 1024
  count: 1

  volumeSize: 10Gi

  # Uncomment this value to use a different storage class for Zookeeper.
  storageClass: data-dremio

  # These values, when defined, override the provided shared annotations, labels, node selectors, or tolerations.
  # Uncomment only if you are trying to override the chart's shared values.
  #annotations: {}
  #podAnnotations: {}
  #labels: {}
  #podLabels: {}
  #nodeSelector: {}
  #tolerations: []

# Control where uploaded files are stored for Dremio.
# For more information, see https://docs.dremio.com/deployment/distributed-storage.html
distStorage:
  # The supported distributed storage types are: local, aws, azure, or azureStorage.
  #
  # local: Not recommended for production use. When using local, dist-caching is disabled.
  # aws: AWS S3, additional parameters required, see "aws" section.
  # azure: ADLS Gen 1, additional parameters required, see "azure" section.
  # azureStorage: Azure Storage Gen2, additional paramters required, see "azureStorage" section.
  type: "azureStorage"

  # AWS S3
  # For more details of S3 configuration, see https://docs.dremio.com/deployment/dist-store-config.html#amazon-s3
  #
  # bucketName: The name of the S3 bucket for distributed storage.
  # path: The path, relative to the bucket, to create Dremio's directories.
  # authentication: Valid types are: accessKeySecret or instanceMetadata.
  #   - Note: Instance metadata is only supported in AWS EKS and requires that the
  #       EKS worker node IAM role is configured with sufficient access rights. At this time,
  #       Dremio does not support using an K8s service account based IAM role.
  # credentials: If using accessKeySecret authentication, uncomment the credentials section below.
  aws:
    bucketName: "AWS Bucket Name"
    path: "/"
    authentication: "metadata"
    # If using accessKeySecret for authentication against S3, uncomment the lines below and use the values
    # to configure the appropriate credentials.
    #
    #credentials:
    #  accessKey: "AWS Access Key"
    #  secret: "AWS Secret"

    # Extra Properties
    # Use the extra properties block to provide additional parameters to configure the distributed
    # storage in the generated core-site.xml file.
    #
    #extraProperties: |
    #  <property>
    #    <name></name>
    #    <value></value>
    #  </property>

  # Azure ADLS Gen 1
  # For more details of Azure ADLS Gen 1 storage configuration, see
  # https://docs.dremio.com/deployment/dist-store-config.html#azure-data-lake-storage-gen1
  #
  # datalakeStoreName: The ADLS Gen 1
  azure:
    datalakeStoreName: "Azure DataLake Store Name"
    path: "/"
    credentials:
      applicationId: "Azure Application ID"
      secret: "Azure Application Secret"
      oauth2Endpoint: "Azure OAuth2 Endpoint"

    # Extra Properties
    # Use the extra properties block to provide additional parameters to configure the distributed
    # storage in the generated core-site.xml file.
    #
    #extraProperties: |
    #  <property>
    #    <name></name>
    #    <value></value>
    #  </property>

  # Azure Storage Gen2
  # For more details of Azure Storage Gen2 storage configuration, see
  # https://docs.dremio.com/deployment/dist-store-config.html#azure-storage
  #
  # accountName: The name of the storage account.
  # filesystem: The name of the blob container to use within the storage account.
  # path: The path, relative to the filesystem, to create Dremio's directories.
  # credentials:
  azureStorage:
    accountName: "datadremio"
    filesystem: "blobdatadremio"
    path: "/"
    credentials:
      accessKey: "<redacted>"

    # Extra Properties
    # Use the extra properties block to provide additional parameters to configure the distributed
    # storage in the generated core-site.xml file.
    #
    #extraProperties: |
    #  <property>
    #    <name></name>
    #    <value></value>
    #  </property>

# Dremio Start Parameters
# Uncomment the below lines to provide extra start paramaters to be passed directly to Dremio during startup.
#extraStartParams: >-
#  -DsomeKey=someValue

# Extra Init Containers
# Uncomment the below lines to provide extra init containers to be run first.
#extraInitContainers: |
#  - name: extra-init-container
#    image: {{ $.Values.image }}:{{ $.Values.imageTag }}
#    command: ["echo", "Hello World"]

# Extra Volumes
# Array to add extra volumes to all Dremio resources.
extraVolumes: []

# Extra Volume Mounts
# Array to add extra volume mounts to all Dremio resources, normally used in conjunction wtih extraVolumes.
extraVolumeMounts: []

# Dremio Service
# The dremio-client service exposes the service for access outside of the Kubernetes cluster.
service:
  type: LoadBalancer

  # NodePort or LoadBalancer
  # These values, when defined and not empty, override the provided shared annotations and labels.
  # Uncomment only if you are trying to override the chart's shared values.
  #annotations: {}
  #labels: {}

  # If the loadBalancer supports sessionAffinity and you have more than one coordinator,
  # uncomment the below line to enable session affinity.
  #sessionAffinity: ClientIP

  # Enable the following flag if you wish to route traffic through a shared VPC
  # for the LoadBalancer's external IP.
  # The chart is setup for internal IP support for AKS, EKS, GKE.
  # For more information, see https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
  #internalLoadBalancer: true

  # If you have a static IP allocated for your load balancer, uncomment the following
  # line and set the IP to provide the static IP used for the load balancer.
  # Note: The service type must be set to LoadBalancer for this value to be used.
  # loadBalancerIP: 13.86.102.182

# To use custom storage class, uncomment below.
# Otherwise the default storage class configured for your K8s cluster is used.
storageClass: data-dremio

# For private and protected docker image repository, you should store
# the credentials in a kubernetes secret and provide the secret name
# here.  For more information, see
# https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
#imagePullSecrets:
#  - secretname

@balaji.ramaswamy I set it to 1/1 temporarily and this is what I see.

SS_08-11-2020_000

@jsinh

Having more coordinators is not an issue, but from the screen shot I do see “data” where rocksdb resides has a different owner (dremio) from other folders (root), any particular reason for this?

@balaji.ramaswamy I am not sure either - in the screenshot state - all scripts - manifests are installed using

    securityContext:
      allowPrivilegeEscalation: false
      runAsUser: 0

Would that be the reason it shows root as the owner for other folders but the data folder is still with the dremio user?

If I remove the above and continue - the master does not start at all. Because the restore I did earlier was via the root setup (my theory - I might be totally wrong in this assumption)

@jsinh

Let us change ownership of all the folders to dremio and then start Dremio. If you get an error, please send us the error

IMPORTANT; Backup the data folder before you make the changes

@balaji.ramaswamy - I want to be 100% sure of what instruction you gave to follow. To be clear here are the things I should do:

  • Take a backup of the data folder.
  • Change ownership of all folders to dremio for coordinators and master - executor(s)
  • Remove the runAsUser: 0 changes I added in the helm package to deploy
  • Helm upgrade again without the root and try to let it run as dremio

Am I understanding it correctly?

Also, if there will be errors for starting the pods - how and where do I gather the logs that I would need to submit for further investigation?

Please guide or confirm

@jsinh

Let us do one more step before all this. Are you able to send us your helm charts so we can take a complete look

Thanks
Bali

dremio_payload_1.zip (47.9 KB)

Sure - Here is the Helm package I customized, please don’t judge the CPU/Memory - I am running this one under experiment before I can finalize this for PROD use.

Also has a redacted values file that is used to helm install and a log file I found from my previous experiments for the same. Hope this helps @balaji.ramaswamy

@jsinh

Can you please send us the output of the below command from the Dremio Master-coordinator node?

ps -ef | grep dremio

@balaji.ramaswamy Sorry for the dealy - festivity times in my country.

Here is the output for dremio-master-0

root         1     0  4 Nov12 ?        05:43:20 /usr/local/openjdk-8/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -Djava.library.path=/opt/dremio/lib -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Ddremio.plugins.path=/opt/dremio/plugins -Xmx768m -XX:MaxDirectMemorySize=768m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -Dio.netty.maxDirectMemory=0 -Dio.netty.tryReflectionSetAccessible=true -DMAPR_IMPALA_RA_THROTTLE -DMAPR_MAX_RA_STREAMS=400 -XX:+UseG1GC -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=true -Dservices.coordinator.master.enabled=true -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=false -Dservices.conduit.port=45679 -cp /opt/dremio/conf:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/*:/usr/local/openjdk-8/lib/tools.jar com.dremio.dac.daemon.DremioDaemon
root     13161 13154  0 12:41 pts/0    00:00:00 grep dremio

This one is from dremio-coordinator-0

root         1     0 64 12:42 ?        00:00:45 /usr/local/openjdk-8/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -Djava.library.path=/opt/dremio/lib -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Ddremio.plugins.path=/opt/dremio/plugins -Xmx768m -XX:MaxDirectMemorySize=768m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -Dio.netty.maxDirectMemory=0 -Dio.netty.tryReflectionSetAccessible=true -DMAPR_IMPALA_RA_THROTTLE -DMAPR_MAX_RA_STREAMS=400 -XX:+UseG1GC -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=true -Dservices.coordinator.master.enabled=false -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=false -Dservices.conduit.port=45679 -cp /opt/dremio/conf:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/*:/usr/local/openjdk-8/lib/tools.jar com.dremio.dac.daemon.DremioDaemon
root       136   129  0 12:44 pts/0    00:00:00 grep dremio

This one from dremio-executor-0

root         1     0 11 Nov12 ?        17:28:26 /usr/local/openjdk-8/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -Djava.library.path=/opt/dremio/lib -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Ddremio.plugins.path=/opt/dremio/plugins -Xmx512m -XX:MaxDirectMemorySize=768m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -Dio.netty.maxDirectMemory=0 -Dio.netty.tryReflectionSetAccessible=true -DMAPR_IMPALA_RA_THROTTLE -DMAPR_MAX_RA_STREAMS=400 -XX:+UseG1GC -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=false -Dservices.coordinator.master.enabled=false -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=true -Dservices.conduit.port=45679 -Dservices.node-tag=default -cp /opt/dremio/conf:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/*:/usr/local/openjdk-8/lib/tools.jar com.dremio.dac.daemon.DremioDaemon
root      7075  7068  0 12:44 pts/0    00:00:00 grep dremio

Note: At the moment all of these are running with root as they still do not run when running with default HELM dremio:dremio offering for me after the backup-restore.

Hope this helps.

@jsinh

Can see why you got the first error. The Dremio process owner and the catalog owner needs to be the same. Your Dremio PID is run as root but the “db” folder under data is owned by dremio. They need to be the same.

You can start Dremio as the Dremio user or change “data” to root. Before you do this, take a backup of the “data” folder once

Makes sense?

@balaji.ramaswamy - I removed the root privilege run changes - and try to start - it does not start - no logs - master pod just terminate in few sec - is there a way to see logs while the pod container is initializing. Even before I try to attach logger to it - it terminates without any visibility of what is going on