I see that for some reason free memory on the node with Dremio seems to be leaking (I use single node on-premise setup); see screenshot attached.
After Dremio restart the memory gets free but then continues to leak.
In server.gc there are tons of GC (Allocation Failure) messages.
Is there any way I can fix it?
Hi @stifstyleAre you using G1GC for garbage collection. If not please implement G1GC and observe the failures.
https://docs.dremio.com/knowledge-base/g1-garbage-collection.html
What is the Dremio version your using ?
@Venugopal_Menda
@stifstyle
What is the memory setting on the Dremio node. Send us the server.gc* files and we can review
Thanks
Bali
@Venugopal_Menda
Thank you very much for your advice; errors in server.gc disappeared after implementing G1GC but memory seems to continue leaking.
I use Dremio version 4.3.1-202005202256080999-5dcfb82a
@balaji.ramaswamy
Uploaded server.gc.* log files here https://transfersh.com/K98ez/server.gc.zip
I run Dremio on AWS i3.2xlarge instance.
The only settings I specified are the following:
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=57000
DREMIO_JAVA_SERVER_EXTRA_OPTS="-XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25"
and maximum number of file descriptors.
@stifstyle
The total RAM on the box on a “AWS i3.2xlarge instance.” is 60 GB
Your direct memory is 56 GB
Your heap memory is 4 GB
You have no space left for the OS, you are probably getting killed by “oom-killer”
Please check “/var/log/messages” for “oom-killer” around the time Dremio is going down
Leave at least 4-8 GB for OS
If this is coordinator, increase heap to 12 GB and decrease 16 GB from direct (make it 40 GB)
If this is executor, increase heap to 8 GB, decrease 12 GB from direct (make it 44 GB)
@balaji.ramaswamy
Thanks, will try to leave more memory for OS. But the thing is, Dremio is not getting killed because I monitor free memory on server and restart Dremio before OOM. What bothers me is that free memory is constantly decreasing until I restart Dremio.
By the way, I use single node setup of Dremio; is that a bad practice?
@stifstyle
Yes, running Dremio on a single node is not recommended for production workloads
Thanks
Bali
Hi,
As per dremio chart for k8s, the “direct memory + heap memory = total memory request”. So, there is not enough space for OS from the dremio official chart itself. Why is that…??