We are running a rather complex query on top of Dremio using a VDS, but on a rather small dataset (50K-200K events).
We have a 1 coordinated/master node and 2 executor nodes.
Each executor has 32GB, 16 cores, and around 15GB of free space on the disk.
Heap is 10GB, Direct is 18GB, JDK 11.
Dremio version is 24.0.0 (OSS)
When running a star query on that view and getting the stackrace below
Looks like the external sort allocator expects ~20MB but needs 200MB
Any idea why we might be getting OOM when trying to spill to disk?
@balaji.ramaswamy I’ve sent you the query profile privetly
com.dremio.common.exceptions.UserException: Query was cancelled because it exceeded the memory limits set by the administrator.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:907)
at com.dremio.sabot.op.sort.external.ExternalSortTracer.prepareAndThrowException(ExternalSortTracer.java:197)
at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:616)
at com.dremio.sabot.op.sort.external.MemoryRun.spillNextBatch(MemoryRun.java:328)
at com.dremio.sabot.op.sort.external.ExternalSortOperator.outputData(ExternalSortOperator.java:431)
at com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData(SmartOp.java:209)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:124)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:114)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:544)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:472)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700(FragmentExecutor.java:106)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:978)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:121)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:249)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 67108864 due to memory limit. Current allocation: 34750720
at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:270)
at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:240)
at org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:522)
at com.dremio.sabot.op.common.ht2.Reallocators$VarCharReallocator.ensure(Reallocators.java:84)
at com.dremio.sabot.op.copier.FieldBufferCopier4Util$VariableCopier.copy(FieldBufferCopier4Util.java:201)
at com.dremio.sabot.op.sort.external.VectorCopier4.evalLoop(VectorCopier4.java:54)
at com.dremio.sabot.op.copier.CopierTemplate4.copyRecords(CopierTemplate4.java:115)
at com.dremio.sabot.op.sort.external.DiskRunManager$3.copyRecords(DiskRunManager.java:1028)
at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:576)
... 13 common frames omitted
2023-10-30 12:50:14,468 [e5 - 1ac0597c-3d57-7f68-e3a2-3044ff37b000:frag:18:3] ERROR com.dremio.sabot.driver.SmartOp - Unexpected exception occurred
com.dremio.common.exceptions.UserException: Query was cancelled because it exceeded the memory limits set by the administrator.
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:907)
at com.dremio.sabot.op.sort.external.ExternalSortTracer.prepareAndThrowException(ExternalSortTracer.java:197)
at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:616)
at com.dremio.sabot.op.sort.external.MemoryRun.spillNextBatch(MemoryRun.java:328)
at com.dremio.sabot.op.sort.external.ExternalSortOperator.outputData(ExternalSortOperator.java:431)
at com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData(SmartOp.java:209)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:124)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:114)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:544)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:472)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700(FragmentExecutor.java:106)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:978)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:121)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:249)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 67108864 due to memory limit. Current allocation: 34750720
at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:270)
at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:240)
at org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:522)
at com.dremio.sabot.op.common.ht2.Reallocators$VarCharReallocator.ensure(Reallocators.java:84)
at com.dremio.sabot.op.copier.FieldBufferCopier4Util$VariableCopier.copy(FieldBufferCopier4Util.java:201)
at com.dremio.sabot.op.sort.external.VectorCopier4.evalLoop(VectorCopier4.java:54)
at com.dremio.sabot.op.copier.CopierTemplate4.copyRecords(CopierTemplate4.java:115)
at com.dremio.sabot.op.sort.external.DiskRunManager$3.copyRecords(DiskRunManager.java:1028)
at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:576)
... 13 common frames omitted
2023-10-30 12:50:14,469 [e5 - 1ac0597c-3d57-7f68-e3a2-3044ff37b000:frag:18:3] INFO c.d.common.memory.MemoryDebugInfo -
Allocation failure:
Detailed Allocator dominators:
Allocator(op:18:3:3:ExternalSort) 20000000/183343024/183343024/197005163 (res/actual/peak/limit) numChildAllocators:2
Allocator(spill_with_snappy) 65536/0/0/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(sort-copy-target) 85746976/262144/101859584/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0