My Dremio cluster has 4 executors, each with 10 vCPUs and 32 GB RAM. Node Activity page shows all 4 nodes are periodically(every other minute) running at 90% CPU with 25% RAM. Is this page as accurate as doing a top on the node? If so, I think I will need to provision more nodes. What’s Dremio’s recommendation on how to monitor and provision new nodes?
The memory utilization is only the direct memory utilization, so may not relate to the physical memory of the box. For CPU is it 90% of one core? Also Dremio would use up all the CPU available, so seeing 90% utilization is normal.
For your second question, regarding adding more nodes, it would depend on 2 factors
- Are any queries running out of memory (either heap or direct)
- When you look at query profiles, do you see consistent sleep time when the concurrency is not that high