Dremio stopped automatically java.lang.OutOfMemoryError: Java heap space

Hi ,

I have Dremio 3.2 installed on kubernate Cluster Azure ,i am getting out of memory exception,

Here is my Configuartion

coordinator: 24GB RAM, 8 CPU

executor: 44GB RAM,8 CPU

zookeeper: 1GB RAM,1CPU

I have not changed anything in dremo.env i have used as default one .when i restart Dremio Nodes it will healthy for few days might be 2-3 days after that it automatically stopped .please suggest me how to solve since all our production instances based on dremio

here is the logs file

2019-07-26 05:09:02,964 [22c5760a-9177-abbc-8d17-e02bf92dbd00/0:foreman-planning] ERROR c.d.s.commandpool.CommandWrapper - command 22c5760a-9177-abbc-8d17-e02bf92dbd00/0:foreman-planning failed
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.writeCurrentBufferToService(AbfsOutputStream.java:273) ~[hadoop-azure-2.8.5-dremio-r2.jar:na]
at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushInternalAsync(AbfsOutputStream.java:261) ~[hadoop-azure-2.8.5-dremio-r2.jar:na]
at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flush(AbfsOutputStream.java:191) ~[hadoop-azure-2.8.5-dremio-r2.jar:na]
at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) ~[na:1.8.0_212]
at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[na:1.8.0_212]
at com.dremio.exec.store.dfs.FSDataOutputStreamWrapper.flush(FSDataOutputStreamWrapper.java:78) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.store.dfs.FSDataOutputStreamWithStatsWrapper.flush(FSDataOutputStreamWithStatsWrapper.java:60) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.cache.VectorAccessibleSerializable.writeToStream(VectorAccessibleSerializable.java:324) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.store.easy.arrow.ArrowRecordWriter.writeBatch(ArrowRecordWriter.java:131) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.sabot.op.writer.WriterOperator.consumeData(WriterOperator.java:131) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.planner.sql.handlers.commands.DirectWriterCommand.execute(DirectWriterCommand.java:132) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.work.foreman.AttemptManager.plan(AttemptManager.java:390) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.work.foreman.AttemptManager.lambda$run$0(AttemptManager.java:292) ~[dremio-sabot-kernel-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at com.dremio.exec.work.foreman.AttemptManager$$Lambda$304/953554482.get(Unknown Source) ~[na:na]
at com.dremio.service.commandpool.CommandWrapper.run(CommandWrapper.java:62) ~[dremio-services-commandpool-3.2.0-201905102005330382-0598733.jar:3.2.0-201905102005330382-0598733]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]

2019-07-26T05:38:13.783+0000: [Full GC (Ergonomics) [PSYoungGen: 1246720K->1246713K(1282560K)] [ParOldGen: 2796543K->2796543K(2796544K)] 4043263K->4043257K(4079104K), [Metaspace: 151274K->151274K(1271808K)], 2.8596362 secs] [Times: user=21.85 sys=0.00, real=2.86 secs]

I think by configuring dremio-env properly should solve your problem. I’ve an answered related question here
As my current implementation (3.1.11) for 64GB executors:
DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=53248

Hope this can help you