Autoscaling in Dremio Elastic Execution Engines

jalandhar · June 9, 2020, 7:09pm

Hello Guys,
I have been exploring dremio for our business use cases and we are using latest AWS community edition.

I have observed the below scenario related to elastic execution engines.
The elastic execution engine is starting all the nodes for a small query also.
Is this expected behaviour?

Example: I have a small query to limit my dataset say to 5 records, dremio is still starting all the nodes(10/20) in that elastic engine.

By the word elasticity, I assume it adds/de-commissions nodes based on query load.

Here is what I’m expecting. For any given elastic engine, initially start the engine with less nodes and then relaunch more nodes if the load becomes higher.Autoscale the engine till it reaches maximum number of nodes we have configured in that engine.
Ex: Spark yarn autoscaling. It adds more nodes only if there is high load as per different parameters like available yarn memory, cpu cores etc.

Also let me know how can i configure my dremio to autoscale as per our needs elastically?

Please let us know your thoughts on this.

Thanks,
Jalandhar

balaji.ramaswamy · June 10, 2020, 6:21am

@jalandhar

Autoscaling will not affect running queries, for example: If you have 5 executors and a query uses up all the available CPU and memory and if there are more executors scaled up, the currently running queries cannot dynamically use the new executors as the planner has already planned for the old number of nodes. Any queries submitted after the scaling has been done would get influenced

Thanks
@balaji.ramaswamy

jalandhar · June 10, 2020, 6:34am

Thanks @balaji.ramaswamy for the info.

Could you also please clarify couple of more things on the same use case:

How can we make use of autoscaling feature for different workloads? Do we need to create different execution engines and route our queries using rules?
So can i assume that, there is no autoscaling at an elastic engine level, as its starting all nodes upfront Or do we have any plan to add autoscaling feature at elastic engine level in upcoming releases.

Thanks,
Jalandhar

balaji.ramaswamy · June 10, 2020, 6:40am

@jalandhar

#1 Even if you create multiple engines, auto scaling will not influence already running queries. But one advantage of creating multiple engines is that based on the workload to each engine you can size accordingly

steven · June 15, 2020, 6:02pm

To clarify, Dremio does not currently support autoscaling. You can use queues to configure certain queries to run in different engines, and the engines can be configured to start automatically, and also stop after a period of being idle. But currently, starting an engine will always start all of the nodes.

jalandhar · June 17, 2020, 6:48am

Thanks @steven,
This clarifies my question.

RayC · May 11, 2022, 1:15pm

My understanding from earlier marketing info was that Dremio does auto-scale. And I think I even remember seeing cost calculations projecting the cost effectiveness of the solution.

“Starting an engine will always start all of the nodes” sounds a bit worrying to me. Is there a roadmap to do true auto-scaling and when will it be likely to be available?

steven · May 11, 2022, 11:58pm

This conversation was referring specifically to elastic engines in Dremio’s AWS Marketplace edition. Autoscaling is included in the recently launched Dremio Cloud offering:
https://docs.dremio.com/cloud/workloads/engines/engines-autoscale/

That might be what you were thinking of.

Topic		Replies	Views
Auto scaling in Dremio Deployments	5	1609	November 2, 2021
AutoScaling dremio	3	1721	February 3, 2022
How to Launch and Manage Elastic Engines Tutorials	0	1477	June 16, 2020
Dremio auto scaling behaviour	3	280	May 22, 2024
Kubernetes elastic engine in Community Edition	9	1617	November 7, 2023

Autoscaling in Dremio Elastic Execution Engines

Related topics