Hash aggregate operator with flag exec.operator.aggregate.vectorize.use_spilling_operator enabled causing OOM issues

Hi,
with exec.operator.aggregate.vectorize.use_spilling_operator enabled:
[failure-flag-enabled.zip]
On the source PDS we have a timeseries data for 300 columns and partition by day and hour. We created a raw reflection on view created with aggregate operators min ,max,count…etc at hour level. We are getting the job getting killed closed connection exceptions. It is failing after accumulation of large volumes of data at hash aggregate operator. Disk spill is not happening to keep memory pressure down. Hence which eventually causing Kube process getting killed and restarting the executors.
with exec.operator.aggregate.vectorize.use_spilling_operator diabled:
[sucess.zip]
When we ran the same then we dont see any issues and the large volume of data is processed fine in hash aggregate operator in memory.

  Let us know is this a known issue. What should be done when we enable the flag exec.operator.aggregate.vectorize.use_spilling_operator to use disk and mitigate issue.
We do see slight performace deltas on eneabling and disabling the flags

dremio version:
24.2.11-202404102036120114-d9c69a0e

Thanks,
Pavan
sucess.zip (1.6 MB)
failure-flag-enabled.zip (1.5 MB)

[@Julian_Kiryakov] [@balaji.ramaswamy] can you investigate the memory issue?.


dd7128a5-bc5b-4b6b-a9ab-af583056c722.zip (1.6 MB)
I dont see any disk spill. This is success case. But there are scenarios we are getting OOM no disk spill happening. Please les us know the config to enable diskspilling for operators.

Here is a scenario where the Hash aggregate operator is failing without using disk spill.
26e4b5db-4b10-41f1-b854-307b2fe75643.zip (729.2 KB)

@pavankumargundeti

The first profiles you uploaded, the failed one seems to have an unresponsive executor situation. The third and fourth profiles have completely different plans. I see a lot of support keys toggled. Let’s do this. Reset the keys below and run the query, if it fails, send the profile

exec.operator.partitioner.vectorize
exec.operator.sort.external.spill.allocation_density
exec.op.join.spill
exec.operator.copier.complex.vectorize
exec.operator.aggregate.vectorize
exec.operator.aggregate.vectorize.use_spilling_operator

You can download the output of sys.options to revert back later. Alternatively, use alter session

Thanks balaji for your initial observations. Attaching the 3 profiles of jobs of reflection failed with OOM. I toggled/reset the needed flags. Let me know your observations on Hashaggregate operator. These reflections does not had issues when exec.operator.aggregate.vectorize.use_spilling_operator false.
a8898afd-9dbf-4ace-a5a0-af8a06b5b76b.zip (946.9 KB)
f6bd1360-f121-48f3-8061-b448891e324e.zip (1.5 MB)
fd25a87f-5f82-423c-b915-7fd19def732f.zip (1.5 MB)