Query Failure (Query was cancelled because it exceeded the memory limits set by the administrator)

Hi,

I am using raw data reflections on VDS connected to postgresql database. The system is running on GKE . I have not changed any memory related configurations. Executor is running on a node having 128 GB of RAM. When I check environment variables in the executor pod, I see following values for memory related configuration parameters.

declare -x DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=“114608”
declare -x DREMIO_MAX_HEAP_MEMORY_SIZE_MB=“8192”

With this configuration query fails to execute and I see following error in Jobs section →

OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator.

Failure allocating buffer.

Allocator dominators:
Allocator(ROOT) 0/117660477432/117663638456/120175198208 (res/actual/peak/limit) numChildAllocators:13
Allocator(general-workload-allocator) 0/117481009720/117484106296/9223372036854775807 (res/actual/peak/limit) numChildAllocators:1
Allocator(query-1db5409c-d74c-0d13-2205-d182ffed7200) 0/117481009720/117484106296/9223372036854775807 (res/actual/peak/limit) numChildAllocators:20
Allocator(phase-18) 0/38062575008/38062575008/9223372036854775807 (res/actual/peak/limit) numChildAllocators:22
Allocator(frag:18:5) 41242880/3565446272/3565446272/9223372036854775807 (res/actual/peak/limit) numChildAllocators:38
Allocator(op:18:5:31:HashJoinPOP) 1000000/3470016512/3470016512/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(op:18:5:incoming) 0/37224448/39321600/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(frag:18:3) 41242880/3564618880/3564618880/9223372036854775807 (res/actual/peak/limit) numChildAllocators:38
Allocator(op:18:3:31:HashJoinPOP) 1000000/3468140544/3468140544/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(op:18:3:incoming) 0/38797312/40370176/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(phase-19) 0/38052908448/38052908448/9223372036854775807 (res/actual/peak/limit) numChildAllocators:22
Allocator(frag:19:6) 41242880/3567396992/3567396992/9223372036854775807 (res/actual/peak/limit) numChildAllocators:38
Allocator(op:19:6:31:HashJoinPOP) 1000000/3465675776/3465675776/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(op:19:6:incoming) 0/38273024/40370176/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(frag:19:3) 41242880/3565667488/3565667488/9223372036854775807 (res/actual/peak/limit) numChildAllocators:38
Allocator(op:19:3:31:HashJoinPOP) 1000000/3468140544/3468140544/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(op:19:3:incoming) 0/39845888/41418752/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(background-workload-allocator) 0/74642880/408300608/9223372036854775807 (res/actual/peak/limit) numChildAllocators:1
Allocator(query-1db5445d-6b82-8f4f-4bc5-cfa86d767000) 0/74642880/399666304/9223372036854775807 (res/actual/peak/limit) numChildAllocators:1
Allocator(phase-0) 0/74642880/399666304/9223372036854775807 (res/actual/peak/limit) numChildAllocators:2
Allocator(output-frag:0:0) 524288/2244608/4489216/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(frag:0:0) 10000000/72398272/397421696/9223372036854775807 (res/actual/peak/limit) numChildAllocators:11
Allocator(op:0:0:6:ParquetWriter) 1000000/63398272/385358528/9223372036854775807 (res/actual/peak/limit) numChildAllocators:2
Allocator(ParquetCodecFactory) 0/262144/262144/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(ParquetColEncoder) 0/63136128/385096384/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0
Allocator(op:0:0:8:Project) 1000000/0/0/9223372036854775807 (res/actual/peak/limit) numChildAllocators:0

[Error Id: adbf02c5-b1a4-4253-9b2d-fb0b8c8d4161 ]

(org.apache.arrow.memory.OutOfMemoryException) Failure allocating buffer.
io.netty.buffer.PooledByteBufAllocatorL.allocate():67
org.apache.arrow.memory.NettyAllocationManager.():77
org.apache.arrow.memory.NettyAllocationManager.():84
org.apache.arrow.memory.NettyAllocationManager$1.create():34
org.apache.arrow.memory.BaseAllocator.newAllocationManager():315
org.apache.arrow.memory.BaseAllocator.newAllocationManager():310
org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation():298
org.apache.arrow.memory.BaseAllocator.buffer():276
org.apache.arrow.memory.BaseAllocator.buffer():240
com.dremio.exec.rpc.MessageDecoder.decodeMessage():172
com.dremio.exec.rpc.MessageDecoder.decode():69
com.dremio.services.fabric.FabricProtobufLengthDecoder.decode():37
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection():498
io.netty.handler.codec.ByteToMessageDecoder.callDecode():437
io.netty.handler.codec.ByteToMessageDecoder.channelRead():276
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():379
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():365
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():357
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead():1410
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():379
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():365
io.netty.channel.DefaultChannelPipeline.fireChannelRead():919
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read():163
io.netty.channel.nio.NioEventLoop.processSelectedKey():714
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized():650
io.netty.channel.nio.NioEventLoop.processSelectedKeys():576
io.netty.channel.nio.NioEventLoop.run():493
io.netty.util.concurrent.SingleThreadEventExecutor$4.run():989
io.netty.util.internal.ThreadExecutorMap$2.run():74
java.lang.Thread.run():750
Caused By (java.lang.OutOfMemoryError) Direct buffer memory
java.nio.Bits.reserveMemory():695
java.nio.DirectByteBuffer.():123
java.nio.ByteBuffer.allocateDirect():311
io.netty.buffer.PoolArena$DirectArena.allocateDirect():758
io.netty.buffer.PoolArena$DirectArena.newChunk():734
io.netty.buffer.PoolArena.allocateNormal():245
io.netty.buffer.PoolArena.allocate():227
io.netty.buffer.PoolArena.allocate():147
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():181
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():214
io.netty.buffer.PooledByteBufAllocatorL.allocate():58
org.apache.arrow.memory.NettyAllocationManager.():77
org.apache.arrow.memory.NettyAllocationManager.():84
org.apache.arrow.memory.NettyAllocationManager$1.create():34
org.apache.arrow.memory.BaseAllocator.newAllocationManager():315
org.apache.arrow.memory.BaseAllocator.newAllocationManager():310
org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation():298
org.apache.arrow.memory.BaseAllocator.buffer():276
org.apache.arrow.memory.BaseAllocator.buffer():240
com.dremio.exec.rpc.MessageDecoder.decodeMessage():172
com.dremio.exec.rpc.MessageDecoder.decode():69
com.dremio.services.fabric.FabricProtobufLengthDecoder.decode():37
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection():498
io.netty.handler.codec.ByteToMessageDecoder.callDecode():437
io.netty.handler.codec.ByteToMessageDecoder.channelRead():276
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():379
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():365
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():357
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead():1410
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():379
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():365
io.netty.channel.DefaultChannelPipeline.fireChannelRead():919
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read():163
io.netty.channel.nio.NioEventLoop.processSelectedKey():714
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized():650
io.netty.channel.nio.NioEventLoop.processSelectedKeys():576
io.netty.channel.nio.NioEventLoop.run():493
io.netty.util.concurrent.SingleThreadEventExecutor$4.run():989
io.netty.util.internal.ThreadExecutorMap$2.run():74
java.lang.Thread.run():750

As the value of “DREMIO_MAX_DIRECT_MEMORY_SIZE_MB” parameter already very high, what are the options left for me to try? This query I am running returns results after doing UNION on output from multiple (3) select statements. Those individual select statement run fine without any errors. Appreciate any help/insights.

Thanks

@SAS080 Kindly send the job profile of the query that failed with OOM. We can look into it and see which operator used maximum memory and if it can be brought down

Hi Balaji,

Thanks for your reply. These queries may contain lot of proprietary/sensitive stuff about the domain. Is it possible to use some other channel to share the query profile?

Thanks

@SAS080 Tried to send you an email to your registered email and it bounced back

@balaji.ramaswamy I am sorry about that. Not sure what is going wrong with my email. I have added an alternate email to my account. Can you please try now?

@SAS080 Do not find it, but I have sent you my email so you can send the profile