How dremio manage between heap and direct memory

For example, I have a 64GB instance.
I will set DREMIO_MAX_MEMORY_SIZE_MB to 60GB and leave the rest for OS use.

  1. How Dremio split these 60 GB between heap and direct?
  2. Is it safe to max this setting to match instance memory capacity (i.e 64 GB)?

Hi @chafidz

Let me start by answering #2,
#2. Is it safe to max this setting to match instance memory capacity (i.e 64 GB)? It is not safe and not recommended, the reason being the OS needs some memory for performing internal operations. When doing those if it finds there i no memory available it invokes the oom-killer (Google for "oom-killer). The oom-killer wakes up and looks for the process that is consuming the most memory. In your case it would be Dremio and will get killed, you would see something like the below in your /var/log/messages

Feb 1 18:07:02 kernel: output.rb:140 invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

It is recommended that you leave at least 1-2 GB to the OS. 60 GBB to Dremio souns like a good start. Also if there are other applications running on the same box, you have make sure the OS does not swap due to lack of memory

#1. How Dremio split these 60 GB between heap and direct?

If you give DREMIO_MAX_MEMORY_SIZE_MB = 60 GB, by default 4 GB would be heap (might change if you have more for Dremio). You can run the below command and see how much has gone to heap and how much to direct

ps -ef | grep dremio

Look for below 2 parameters (-Xmx is heap and -XX: is Direct)

-Xmx4096m -XX:MaxDirectMemorySize=8192m

On your coordinator if you have large datasets sometimes the default value of 4 GB may not be enough. Going too high on heap can also cause Full GC pauses

Kindly let us know if you have any further questions

Thanks
@balaji.ramaswamy

1 Like

Thanks for the explanation. This help us a lot!

Hi,

I just need a clear cut understanding on the memory allocation to discuss with the team. I am currently working with the latest helm chart of dremio. In the chart, it is given that, for executor

  1. if memory request <= 32786, then heap is 8192
  2. if memory request <= 6144, then heap is 4096
  3. else heap is 2048
    Only 3 values are possible for heap. And we can override this with environemtn variable, DREMIO_MAX_MEMORY_SIZE_MB. But when i give that, both executor and master heap is changed to the value provided in the environment variable. I have tried will all these combinations, and getting out of heap errors.

My question is,

  1. why did the criteria came for these 3 particular values. Is there any significance for these.?
  2. For 1 and 3 rd option, the helm chart is taking an exact difference from tottal request to the heap as direct memory. That is, as per chart, for 1 and 3 option, “direct memory+heap=total memory request”. This violates the above point, that is leave some for os. We can only assume that the pod will be getting the requested resource only. But for 2nd option, actually this is not the case and there is a 2GB of memory left for usage other than direct and heap.
  3. Why is the variable setting effecting both master and executor together. As per the documents, it is not the case with dremio.
  4. How exactly should i handle the heap, direct and other memory for dremio master and executor pods for better performance.?
  5. Is there any ratio between all these memory which affects the dremio usage and performance.?

I have modified the helm chart to change the heap and direct memory as per my requirement via chart values file. Even now i have heap related issues.

  1. Why is there such an option to handle this dynamic heap memory change.?
  2. How can i debug these issues, since i am not able to find any logs regarding these in dremio pods.

can you provide a brief idea of all these topics, so that i can fine tune my spec for dremio for proper working.

6th question is why there is no option to handle this…

@aaasif04

For Master, you need atleast a 32 GB pod as you need 16 GB for heap, 8 for direct and leave rest for OS, pod memory. IF you give 32 GB as memory in values.yaml, the calculator should automatically do it

For Executor, queries use direct memory, typically recommend 128 GB RAM and 16 cores

Out of which 8 GB will be for heap to handle the heap required for joins aggs, sorts etc that happen on direct memory, 4-8 for OS and remaining for direct

You can manually override any of these by passing JVM flags in values.yaml under extraStartParams section of either coordinator or executor