Addional nodes scaled down very quickly

Hi,

Two weeks ago we enabled scaling into a second node because we noticed a 100% peak quite often on our single node. However we are seeing issues with this setup. As you can see in one my uploaded images, scaling down from 1 to 0 nodes happens after about 15 minutes, which matches our Last Replica’s Auto-Stop delay. All good so far. The scaling from 2 to 1 however often already happens within a minute of its last processed query. This causes a false flag when querying the job result from ADF. ADF usually has pretty long delays between actually doing stuff, so when it asks whether a posted job succeeded we get this error:

{“errorMessage”:“com.dremio.daas.service.jobs.DaasJobResultsStore: Error in Job ‘164557f4-478f-30a2-4c55-dc12589a8100’: /var/lib/dremio/data/pdfs/results/164557f4-478f-30a2-4c55-dc12589a8100 (URI:pdfs:/// isPDFS:true) Results cannot be retrieved because the engine replica has been deleted. Please rerun the query”,“context”:,“moreInfo”:“”}

Which makes sense because the replica is deleted..

Can we also tweak the scale-down delay of additional nodes besides the main node?

Attached images of the engine events and our engine setup. Tx for any input.

Patrick

Hi Patrick,

Thanks for reporting this — the behavior you’re seeing is expected but I understand it’s problematic for your use case.

What’s happening: The “Last Replica’s Auto-Stop delay” only applies to the last replica of an engine — it controls when the engine scales from 1 to 0. Additional replicas follow a different rule: when the concurrency (running queries + waiting queries) is low enough, we scale down after about 1 minute. The goal is to reduce your costs by releasing unnecessary resources as quickly as possible. This is a longstanding design choice, and there is currently no separate configurable delay for non-last replicas.

The error you’re seeing (“Results cannot be retrieved because the engine replica has been deleted”) occurs because query results are stored on the executor’s local disk. When that specific replica is removed, the results go with it.

Workarounds to consider:
1. Increase min replicas: If your workload regularly needs 2 nodes, setting min replicas to 2 avoids the scale-down entirely. The auto-stop delay then applies when going from 1 → 0.
2. Use CTAS (CREATE TABLE AS SELECT): Instead of relying on cached query results, write output directly to a table using CTAS. The data persists in your lakehouse independently of the engine lifecycle.
3. Retrieve results immediately: If your pipeline can be adjusted to fetch query results as part of the same API call (synchronous execution) rather than polling later, the results will be returned before the replica scales down.

We’re aware that the gap between last-replica and non-last-replica scale-down behavior can be surprising, and improving result availability after executor scale-down is something we’re looking at.

Hope this helps — let me know if you have follow-up questions!