Memory leak in Dremio

I see that for some reason free memory on the node with Dremio seems to be leaking (I use single node on-premise setup); see screenshot attached.

After Dremio restart the memory gets free but then continues to leak.
In server.gc there are tons of GC (Allocation Failure) messages.

Is there any way I can fix it?

Hi @stifstyleAre you using G1GC for garbage collection. If not please implement G1GC and observe the failures.
https://docs.dremio.com/knowledge-base/g1-garbage-collection.html
What is the Dremio version your using ?

@Venugopal_Menda

@stifstyle

What is the memory setting on the Dremio node. Send us the server.gc* files and we can review

Thanks
Bali

@Venugopal_Menda
Thank you very much for your advice; errors in server.gc disappeared after implementing G1GC but memory seems to continue leaking.
I use Dremio version 4.3.1-202005202256080999-5dcfb82a

@balaji.ramaswamy
Uploaded server.gc.* log files here https://transfersh.com/K98ez/server.gc.zip

I run Dremio on AWS i3.2xlarge instance.

The only settings I specified are the following:
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=57000
DREMIO_JAVA_SERVER_EXTRA_OPTS="-XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25"
and maximum number of file descriptors.

@stifstyle

The total RAM on the box on a “AWS i3.2xlarge instance.” is 60 GB
Your direct memory is 56 GB
Your heap memory is 4 GB

You have no space left for the OS, you are probably getting killed by “oom-killer”

Please check “/var/log/messages” for “oom-killer” around the time Dremio is going down

Leave at least 4-8 GB for OS

If this is coordinator, increase heap to 12 GB and decrease 16 GB from direct (make it 40 GB)
If this is executor, increase heap to 8 GB, decrease 12 GB from direct (make it 44 GB)

@balaji.ramaswamy

Thanks, will try to leave more memory for OS. But the thing is, Dremio is not getting killed because I monitor free memory on server and restart Dremio before OOM. What bothers me is that free memory is constantly decreasing until I restart Dremio.

By the way, I use single node setup of Dremio; is that a bad practice?

@stifstyle

Yes, running Dremio on a single node is not recommended for production workloads

Thanks
Bali

Hi,

As per dremio chart for k8s, the “direct memory + heap memory = total memory request”. So, there is not enough space for OS from the dremio official chart itself. Why is that…??