Memory leak in Dremio

stifstyle · July 15, 2020, 8:42am

I see that for some reason free memory on the node with Dremio seems to be leaking (I use single node on-premise setup); see screenshot attached.

After Dremio restart the memory gets free but then continues to leak.
In server.gc there are tons of GC (Allocation Failure) messages.

Is there any way I can fix it?

Venugopal_Menda · July 20, 2020, 4:24am

Hi @stifstyleAre you using G1GC for garbage collection. If not please implement G1GC and observe the failures.
https://docs.dremio.com/knowledge-base/g1-garbage-collection.html
What is the Dremio version your using ?

@Venugopal_Menda

balaji.ramaswamy · July 20, 2020, 4:44pm

@stifstyle

What is the memory setting on the Dremio node. Send us the server.gc* files and we can review

Thanks
Bali

stifstyle · July 21, 2020, 8:37am

@Venugopal_Menda
Thank you very much for your advice; errors in server.gc disappeared after implementing G1GC but memory seems to continue leaking.
I use Dremio version 4.3.1-202005202256080999-5dcfb82a

@balaji.ramaswamy
Uploaded server.gc.* log files here https://transfersh.com/K98ez/server.gc.zip

I run Dremio on AWS i3.2xlarge instance.

The only settings I specified are the following:
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=57000
DREMIO_JAVA_SERVER_EXTRA_OPTS="-XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25"
and maximum number of file descriptors.

balaji.ramaswamy · July 25, 2020, 7:50am

@stifstyle

The total RAM on the box on a “AWS i3.2xlarge instance.” is 60 GB
Your direct memory is 56 GB
Your heap memory is 4 GB

You have no space left for the OS, you are probably getting killed by “oom-killer”

Please check “/var/log/messages” for “oom-killer” around the time Dremio is going down

Leave at least 4-8 GB for OS

If this is coordinator, increase heap to 12 GB and decrease 16 GB from direct (make it 40 GB)
If this is executor, increase heap to 8 GB, decrease 12 GB from direct (make it 44 GB)

stifstyle · July 27, 2020, 8:42am

@balaji.ramaswamy

Thanks, will try to leave more memory for OS. But the thing is, Dremio is not getting killed because I monitor free memory on server and restart Dremio before OOM. What bothers me is that free memory is constantly decreasing until I restart Dremio.

By the way, I use single node setup of Dremio; is that a bad practice?

balaji.ramaswamy · August 23, 2020, 5:51am

@stifstyle

Yes, running Dremio on a single node is not recommended for production workloads

Thanks
Bali

aaasif04 · May 6, 2024, 6:13pm

Hi,

As per dremio chart for k8s, the “direct memory + heap memory = total memory request”. So, there is not enough space for OS from the dremio official chart itself. Why is that…??

Topic		Replies	Views
How optimize memory use on executor	1	1543	January 10, 2019
Dremio is allocating more than 99% of OS memory	6	1796	November 7, 2019
Memory was leaked by query	6	2524	April 30, 2019
Dremio not starting due to [GC (Allocation Failure)	0	1455	June 11, 2019
What is the memory config I should set for each node in the Dremio cluster	1	996	April 6, 2022

Memory leak in Dremio

Related topics