OOM on SortExternal when spilling to disk

We are running a rather complex query on top of Dremio using a VDS, but on a rather small dataset (50K-200K events).

We have a 1 coordinated/master node and 2 executor nodes.
Each executor has 32GB, 16 cores, and around 15GB of free space on the disk.
Heap is 10GB, Direct is 18GB, JDK 11.
Dremio version is 24.0.0 (OSS)

When running a star query on that view and getting the stackrace below

Looks like the external sort allocator expects ~20MB but needs 200MB

Any idea why we might be getting OOM when trying to spill to disk?

@balaji.ramaswamy I’ve sent you the query profile privetly

com.dremio.common.exceptions.UserException: Query was cancelled because it exceeded the memory limits set by the administrator.
        at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:907)
        at com.dremio.sabot.op.sort.external.ExternalSortTracer.prepareAndThrowException(ExternalSortTracer.java:197)
        at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:616)
        at com.dremio.sabot.op.sort.external.MemoryRun.spillNextBatch(MemoryRun.java:328)
        at com.dremio.sabot.op.sort.external.ExternalSortOperator.outputData(ExternalSortOperator.java:431)
        at com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData(SmartOp.java:209)
        at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
        at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:124)
        at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:114)
        at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:544)
        at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:472)
        at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700(FragmentExecutor.java:106)
        at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:978)
        at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:121)
        at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:249)
        at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 67108864 due to memory limit. Current allocation: 34750720
        at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:270)
        at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:240)
        at org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:522)
        at com.dremio.sabot.op.common.ht2.Reallocators$VarCharReallocator.ensure(Reallocators.java:84)
        at com.dremio.sabot.op.copier.FieldBufferCopier4Util$VariableCopier.copy(FieldBufferCopier4Util.java:201)
        at com.dremio.sabot.op.sort.external.VectorCopier4.evalLoop(VectorCopier4.java:54)
        at com.dremio.sabot.op.copier.CopierTemplate4.copyRecords(CopierTemplate4.java:115)
        at com.dremio.sabot.op.sort.external.DiskRunManager$3.copyRecords(DiskRunManager.java:1028)
        at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:576)
        ... 13 common frames omitted
2023-10-30 12:50:14,468 [e5 - 1ac0597c-3d57-7f68-e3a2-3044ff37b000:frag:18:3] ERROR com.dremio.sabot.driver.SmartOp - Unexpected exception occurred
com.dremio.common.exceptions.UserException: Query was cancelled because it exceeded the memory limits set by the administrator.
        at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:907)
        at com.dremio.sabot.op.sort.external.ExternalSortTracer.prepareAndThrowException(ExternalSortTracer.java:197)
        at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:616)
        at com.dremio.sabot.op.sort.external.MemoryRun.spillNextBatch(MemoryRun.java:328)
        at com.dremio.sabot.op.sort.external.ExternalSortOperator.outputData(ExternalSortOperator.java:431)
        at com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData(SmartOp.java:209)
        at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
        at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:124)
        at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:114)
        at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:544)
        at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:472)
        at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700(FragmentExecutor.java:106)
        at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:978)
        at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:121)
        at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:249)
        at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 67108864 due to memory limit. Current allocation: 34750720
        at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:270)
        at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:240)
        at org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:522)
        at com.dremio.sabot.op.common.ht2.Reallocators$VarCharReallocator.ensure(Reallocators.java:84)
        at com.dremio.sabot.op.copier.FieldBufferCopier4Util$VariableCopier.copy(FieldBufferCopier4Util.java:201)
        at com.dremio.sabot.op.sort.external.VectorCopier4.evalLoop(VectorCopier4.java:54)
        at com.dremio.sabot.op.copier.CopierTemplate4.copyRecords(CopierTemplate4.java:115)
        at com.dremio.sabot.op.sort.external.DiskRunManager$3.copyRecords(DiskRunManager.java:1028)
        at com.dremio.sabot.op.sort.external.DiskRunManager.spillNextBatch(DiskRunManager.java:576)
        ... 13 common frames omitted
2023-10-30 12:50:14,469 [e5 - 1ac0597c-3d57-7f68-e3a2-3044ff37b000:frag:18:3] INFO  c.d.common.memory.MemoryDebugInfo - 
Allocation failure: 
Detailed Allocator dominators:
 Allocator(op:18:3:3:ExternalSort) 20000000/183343024/183343024/197005163 (res/actual/peak/limit) numChildAllocators:2
  Allocator(spill_with_snappy) 65536/0/0/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
  Allocator(sort-copy-target) 85746976/262144/101859584/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0

@balaji.ramaswamy I’ve sent you an updated profile

@sheinbergon Just saw this and will get back to you once I review the profile

1 Like

@balaji.ramaswamy any response

@sheinbergon This has to be validated internally if there is an issue with memory allocation

Does this query complete when the system is less busy?

Hi. No, it’s not a matter of resources. The data-set is also very small (2000x2000 records) on both ends of the join where the issue happens

@sheinbergon From your executor, are you able to paste the output of ps -ef | grep dremio

Sure, here you go:

/usr/lib/jvm/zulu11/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -Djava.library.path=/opt/dremio/lib -Xlog:gc*::time,uptime,tags,level -Ddremio.plugins.path=/opt/dremio/plugins -Xmx4096m -XX:MaxDirectMemorySize=10112m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -Dio.netty.maxDirectMemory=0 -Dio.netty.tryReflectionSetAccessible=true -DMAPR_IMPALA_RA_THROTTLE -DMAPR_MAX_RA_STREAMS=400 -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCHeuristics=compact -XX:ConcGCThreads=4 -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=false -Dservices.coordinator.master.enabled=false -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=true -Dservices.conduit.port=45679 -Dservices.node-tag=default -cp /opt/dremio/conf:/opt/dremio/jars/:/opt/dremio/jars/ext/:/opt/dremio/jars/3rdparty/* com.dremio.dac.daemon.DremioDaemon

@sheinbergon

The direct memory on your box is ~ 10 GB which seems less as usually we recommend 128 GB nodes on the executor nodes with heap of 8 GB, 8 GB to the OS and rest to direct. I agree there is some optimization we need to do on the Dremio end but essentially Dremio is running out off direct memory which is only 10 GB for the entire node

@balaji.ramaswamy thank you for your answer. but I don’t think pouring 100GB worth of ram would help here.

The allocation spec states:

So it expected 20MB, but got 183 MB

It just sounds like a calulcation bug.

The out of memory exception of the arrow buffer states:

Caused by: org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 67108864 due to memory limit. Current allocation: 34750720

So it tries to allocate 67 MB, but is limited to to current allocation of around 34MB.

The server, when running this query, never passes the 5GB process RSS. it’s not like it’s trying to process tens of GB. Also, even when doubling these specs (10GB HEAP, 18GB Direct), we are getting the same error.

  • Can you, or somewone else on your team, relate to these concrete numbers?
  • From the profile I sent you, do you have any evidance that it tried to alloacte more than 10GB worth of directy memory?
  • If so, how much did it try to allocate?

@balaji.ramaswamy any response?

@sheinbergon Let me probe more and get back to you

Hi @balaji.ramaswamy.

Any updates regarding this issue?

@sheinbergon This does look like an inefficiency. I am currently discussing with the right team on what can be done, will update in couple of days

Still waiting for an update. Over 2 months now.

@sheinbergon I think what is happening here is probably a single record is so big that it is not able to make progress. Does your customer have very large records?