Reflection memory limits & lost connectivity

I’m having issues with refresh reflections - i am ending with below errors:

1 . Query was cancelled because it exceeded the memory limits set by the administrator”
2. some reflections ending with the Setup Exception: One or more nodes lost connectivity during query

Please help me to troubleshoot with the below issue.

increased the direct memory Max Direct Memory size and the Direct Memory size as per below

DREMIO_MAX_MEMORY_SIZE_MB = 819200
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB = 98304

Default engine type: 5d.8xlarge - 1 X master coordinator
8 X executor nodes

Query Profiles:

0cb74b3d-9a88-45ba-b047-5bbbf9068049.zip (101.3 KB)
ab38c2c4-1f6b-42ee-9346-0ab16397b011.zip (23.8 KB)

what was the max direct memory limits that i can assign ?
can you please provide command reference for accessing all logs files & for changing the memory settings ?

Regards,
KK

@Karna For the issue with ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were [10.162.17.143:0], check GC logs on that IP, should be under the Dremio log folder

For the issue with Out of memory on 10.162.17.186 it looks like you ran out of direct memory. Can you please confirm if you have the same direct memory allocation on all the six executors? Do a ps -ef | grep dremio and confirm if the below 2 flags have the same value, also make sure -Xmx is 8192

-Xmx4096m
-XX:MaxDirectMemorySize=8192m

Hi @balaji.ramaswamy, which variable are you referring to? there is two XX: variables and -Xmx1 & Xmx. Please check attached. how to set/edit the Xmx value?

avg.memory

Could you please confirm whether below link help us to resolve the issue?
https://docs.dremio.com/deployment/aws/aws-edition-executors/

@Karna The screen shot suggests you have direct memory of 341 GB while your heap is 16 GB, The ratio of direct to heap should be kept around 1:10, recommended heap is 8 GB on the executors. Are you able to scle more horizontally and keep heap at 8 GB per node and direct at something like 100 GB per node and use 128 GB machines but more of them

Hi @balaji.ramaswamy not sure below errors is correlated but I am getting below errors,

1 ) Query CompileException error now : SYSTEM ERROR: CompileException: File ‘com.dremio.exec.compile.DremioJavaFileObject[StreamingAggregatorGen107674.java]’, Line 27, Column 8: StreamingAggregatorGen107674.java:27: error: cannot access java.lang.Object
public class StreamingAggregatorGen107674
class file for java.lang.Object not found (compiler.err.cant.access

  1. Error loading the keystore /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.amzn2.0.1.x86_64/jre/lib/security/cacerts.

@Karna These errors should not cause the original error you had shared on nodes losing connectivity