Dremio is allocating more than 99% of OS memory

Hello there,

We are facing this issue: suddenly Dremio eats more memory than expected, reaching almost 100% of OS, probably almost causing a system collapse. The service remains working, however it is extremely dangerous considering it’s a production system.

Why is that? What should we do to avoid this?

Thank you,

Danilo from DataSprints

1 Like

Adding some informations:

It’s something happening in coordinator node, which is a m5.2xlarge instance with 32gb of RAM.
The DREMIO_MAX_MEMORY_SIZE_MB is set to 29gb.

Which tool are you using to view memory consumption?

Dremio should use only what’s allocated to it using the setting you mentioned. The Admin->Node Activity is showing the percentage of that allocation that is being used.

however it is extremely dangerous considering it’s a production system.

Depending on your OS, you should reserve a minimum amount of memory system services to function. I looks like you’ve reserved 3 GB for everything else on the node. You may want to scale the memory or reduce Dremio’s allocation.

Hi Ben,

We are using Grafana to do this:

Dremio Master 1 is the active node.

So, there’s a routine or configuration to ensure that Dremio sets free memory, such as GC configuration? Our cluster isn’t under heavy charge, what seems to be memory allocation over the time that’s never freed.

Thank you,

Danilo from DataSprints

I’m not sure what goes into the Grafana calculation of these numbers, so I can’t speak to what it’s reporting. If Dremio is using all of that memory, it would leave only a few hundred MB for the system. On that node, what do the system tools tell you? For example top -pid [dremio pid]

Hi @ben.

We are using the Prometheus jmx-exporter to expose all metrics created by Dremio.

So, what you are seeing there is a panel created using these metrics.

If you are using the JMX monitoring, then you are interacting with the MemoryMXBean which let’s you monitor heap and direct memory usage from the memory pools that the JVM has control of. I believe this monitoring doesn’t know anything about system memory usage outside of the Java process being monitored. When Grafana says 99% usage, is there a byte value it reports as well? Is this 99% of 29 GB?