Dremio auto scaling behaviour

Hi,
I am using dremio oss 25.0 in Kubernetes deployment in EKS and i wanted to test out the autoscale feature for kubernetes. The autoscale configuration i have used is below.

nodeLifecycleService:
    enabled: true
    scalingMetrics:
      default:
        cpuAverageUtilization: 50
        enabled: true
    scalingBehavior:
      scaleDown:
        defaultPolicy:
          enabled: true
          value: 2
      scaleUp:
        userDefined:
          - type: Percent
            value: 100
            periodSeconds: 60

After deploying the helm chart with this setting i noticed the following things

  1. Initially my executor count was 1. As i started running reflections on a pretty huge table, my cpu spiked and the HPA autoscaled my executors. But the new executors were not being used for executing the threads of the reflection refresh. So does this mean that the new executors wont share the load of existing(already running) jobs.
  2. When i ran reflection refreshes on multiple tables this same behaviour was noticed and infact since only my inital executor was being used the reflection refresh jobs failed due to memory limits
  3. When my cluster had autoscaled upto 3 executors i submitted another query. But at this time one of my executors downscaled and this query failed with the error - ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were [dremio-executor-dremioscale-0-3.dremio-cluster-pod-dremioscale-0.dremio-scaling.svc.cluster.local:0. So does this mean downscaling will affect running queries and cause them to fail? If so how can i overcome this?

Am i missing anything that should be configured for autoscaling to work? I hope someone from the dremio team can help me out with these questions

@sanchitsk

  • Since already running jobs have been planned on existing nodes, during runtime it would not dynamically start using the new autoscaled executor
  • Are you saying the reflection refresh did not use the intial set of nodes and ran only on one executor. Degree of parallelism depends on number of cores available and row count estimates. Are you able to send us the job profile of the reflection refresh job that failed with OOM?
  • If there is a query that is already planned on a certain executor and that executor is affected as part of downscale then yes the query would be affected. If you open that profile and look in the planning tab, under the execution planning section, you will see the list of nodes that have been selected. If you autoscale up it will not affect the already running queries and only use the executors in that list. Similarly for downsizing, if one of the executors in that list are brought down then all running queries that have fragments running on the affected executor will fail

@balaji.ramaswamy Regarding your point two, i had only one executor at the start since i was testing out how the autoscaler behaviour is. So as a result all my reflection jobs were being run on that one executor, which reading your point one makes sense. Let me see if i can get the query profile

Regarding your third point, is there any way we can control how scaling works? For example can we tell dremio to not downscale if queries are being run on the executor?

@sanchitsk

There are 2 parameters

  • terminationGracePeriodSeconds is time that HPA will wait for pod to finish fragments before force killing the pod.
  • stabilizationWindows is time before the last recommendation of HPA. So if the load goes up, it must be above threshold for 300 seconds before the HPA provisions a new pod.