Hash aggregate operator with flag exec.operator.aggregate.vectorize.use_spilling_operator enabled causing OOM issues

pavankumargundeti · June 26, 2024, 9:36am

Hi,
with exec.operator.aggregate.vectorize.use_spilling_operator enabled:
[failure-flag-enabled.zip]
On the source PDS we have a timeseries data for 300 columns and partition by day and hour. We created a raw reflection on view created with aggregate operators min ,max,count…etc at hour level. We are getting the job getting killed closed connection exceptions. It is failing after accumulation of large volumes of data at hash aggregate operator. Disk spill is not happening to keep memory pressure down. Hence which eventually causing Kube process getting killed and restarting the executors.
with exec.operator.aggregate.vectorize.use_spilling_operator diabled:
[sucess.zip]
When we ran the same then we dont see any issues and the large volume of data is processed fine in hash aggregate operator in memory.

  Let us know is this a known issue. What should be done when we enable the flag exec.operator.aggregate.vectorize.use_spilling_operator to use disk and mitigate issue.
We do see slight performace deltas on eneabling and disabling the flags

dremio version:
24.2.11-202404102036120114-d9c69a0e

Thanks,
Pavan
sucess.zip (1.6 MB)
failure-flag-enabled.zip (1.5 MB)

pavankumargundeti · June 27, 2024, 3:34am

[@Julian_Kiryakov] [@balaji.ramaswamy] can you investigate the memory issue?.

pavankumargundeti · June 29, 2024, 9:58am

dd7128a5-bc5b-4b6b-a9ab-af583056c722.zip (1.6 MB)
I dont see any disk spill. This is success case. But there are scenarios we are getting OOM no disk spill happening. Please les us know the config to enable diskspilling for operators.

pavankumargundeti · June 29, 2024, 10:03am

Here is a scenario where the Hash aggregate operator is failing without using disk spill.
26e4b5db-4b10-41f1-b854-307b2fe75643.zip (729.2 KB)

balaji.ramaswamy · July 2, 2024, 12:24am

@pavankumargundeti

The first profiles you uploaded, the failed one seems to have an unresponsive executor situation. The third and fourth profiles have completely different plans. I see a lot of support keys toggled. Let’s do this. Reset the keys below and run the query, if it fails, send the profile

exec.operator.partitioner.vectorize
exec.operator.sort.external.spill.allocation_density
exec.op.join.spill
exec.operator.copier.complex.vectorize
exec.operator.aggregate.vectorize
exec.operator.aggregate.vectorize.use_spilling_operator

You can download the output of sys.options to revert back later. Alternatively, use alter session

pavankumargundeti · July 2, 2024, 5:02am

Thanks balaji for your initial observations. Attaching the 3 profiles of jobs of reflection failed with OOM. I toggled/reset the needed flags. Let me know your observations on Hashaggregate operator. These reflections does not had issues when exec.operator.aggregate.vectorize.use_spilling_operator false.
a8898afd-9dbf-4ace-a5a0-af8a06b5b76b.zip (946.9 KB)
f6bd1360-f121-48f3-8061-b448891e324e.zip (1.5 MB)
fd25a87f-5f82-423c-b915-7fd19def732f.zip (1.5 MB)

pavankumargundeti · July 26, 2024, 8:00am

Any update on this @balaji.ramaswamy

balaji.ramaswamy · July 31, 2024, 5:25pm

Apologize for the delay @pavankumargundeti Will provide an update this week

pavankumargundeti · August 1, 2024, 6:15am

Thank you waiting for your reply …

balaji.ramaswamy · August 8, 2024, 10:38pm

When the support option is set to ‘true’, the spillable hash agg algorithm is allocating something called as ‘PREALLOCATED_MEMORY’ and if the query is highly parallelized it could be an issue. I am investigating what this in MB per thread is

balaji.ramaswamy · August 12, 2024, 5:46am

There are 1800 accumulators which include min, max, sum, count…

Dremio preallocate a batch of memory in the hashtable as part of initDataStructures… and the batch size is 256k for all 8 partitions for a single accumulator…as there are 1800 accumulators we allocate (256 * 1024 * 1800) = 450MB per operator. which explains the initial prealloc memory required for the operator…this is by design of the Spillable HashAgg operator

pavankumargundeti · August 13, 2024, 3:26am

@balaji.ramaswamy thanks a lot for your analysis. Is there any way we can reduce the preallocation batchsize and go for DISK for spill?. As this poses more OOM issues and might require more memory for each operator. When exec.operator.aggregate.vectorize.use_spilling_operator is false looks like no pre-alloaction happening hence the reflections are working fine.

balaji.ramaswamy · August 13, 2024, 5:10am

@pavankumargundeti 25.x has Memory Arbiter and Join spilling enabled. Can you please run the query on latest 25.0.x with default settings, exec.operator.aggregate.vectorize.use_spilling_operator back to true and send us the profile?

pavankumargundeti · August 13, 2024, 7:37am

We have to wait as we are using SAG supported dremio 24.3. Not sure when they will update to 25.x. But thanks for the update.

balaji.ramaswamy · August 13, 2024, 4:20pm

@pavankumargundeti Let me see what other work arounds we can do in the mean time

Thanks
Bali

Topic		Replies	Views
Out-of-core HashAgg	1	1101	April 16, 2018
Dremio Aggregate reflections go OOM	4	1966	January 17, 2018
HASH_JOIN doesn't spill on disk	5	1583	August 26, 2020
Hash exceeded the memory limits	3	1769	August 11, 2021
OOM on SortExternal when spilling to disk	15	721	January 29, 2024

Hash aggregate operator with flag exec.operator.aggregate.vectorize.use_spilling_operator enabled causing OOM issues

Related topics