Memory leak error while executing sql group by query to ElasticSearch Index which 580Millions record

hi,

we are trying to execute SQL group by query on elasticsearch datasets which has 589Millions records. while executing the query it got failed with error 'Failed to spill to disk. Please check space availability’. this error we got on DREMIO UI. when we checked the dremio server logs we found below logs:

2018-05-29 03:21:46,673 [e5 - 24f2faa6-a26e-b615-029f-1ca79c06d601:frag:2:7] ERROR c.d.s.exec.fragment.FragmentExecutor - IllegalStateException: Memory was leaked by query. Memory leaked: (24000000)
Allocator(query-24f2faa6-a26e-b615-029f-1ca79c06d601) 0/24000000/1270123200/9223372036854775807 (res/actual/peak/limit)

com.dremio.common.exceptions.UserException: IllegalStateException: Memory was leaked by query. Memory leaked: (24000000)
Allocator(query-24f2faa6-a26e-b615-029f-1ca79c06d601) 0/24000000/1270123200/9223372036854775807 (res/actual/peak/limit)

at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:746) ~[dremio-common-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.fragment.FragmentExecutor.retire(FragmentExecutor.java:415) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.fragment.FragmentExecutor.finishRun(FragmentExecutor.java:349) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:263) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$800(FragmentExecutor.java:83) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:577) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:92) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:71) [dremio-extra-sabot-scheduler-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]

Caused by: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (24000000)
Allocator(query-24f2faa6-a26e-b615-029f-1ca79c06d601) 0/24000000/1270123200/9223372036854775807 (res/actual/peak/limit)

at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:405) ~[arrow-memory-0.8.0-201803292058100752-a14b263-dremio.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.TicketWithChildren.close(TicketWithChildren.java:58) ~[dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.common.AutoCloseables.close(AutoCloseables.java:92) ~[dremio-common-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.common.AutoCloseables.close(AutoCloseables.java:71) ~[dremio-common-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.QueriesClerk.removeQueryTicket(QueriesClerk.java:107) ~[dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.QueryTicket.removePhaseTicket(QueryTicket.java:124) ~[dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.QueriesClerk$FragmentTicket.close(QueriesClerk.java:179) ~[dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.common.DeferredException.suppressingClose(DeferredException.java:181) ~[dremio-common-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
at com.dremio.sabot.exec.fragment.FragmentExecutor.retire(FragmentExecutor.java:399) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
... 6 common frames omitted

attaching the job profile.80498a13-e7f0-47ab-ba5a-8c2302b47e4f.zip (25.6 KB)

kindly suggest on this.

Can you describe the cluster where you are running Dremio?

it is a standalone dremio instance. and below are the config and server details.

OS: RHEL 7
Total Ram: 256GB RAM
DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=16384
Number of CPU:24

It seems like you are hitting an OOM (out of memory) issue due to an increase of data volume.

DREMIO_MAX_DIRECT_MEMORY_SIZE_MB = direct memory is used for query execution. Given you are running single node with a box with 256gb of RAM, please up this limit. I would say try 128000, restart Dremio, and try again. Btw, another option (the common deployment of Dremio) is actually using a distributed computing environment.