Failure when i create Raw Reflection on a VDS

Hello All,
I have an issue that occurs when a try to create a Raw Reflection on a VDS. This is the error message :
“Query was cancelled because it exceeded the memory limits set by the administrator. Expected at least 208.79GB bytes, but only had 8.00GB available.”
It says that Dremio needs 208.79GB minimum: it’s too much compare to the simple query that is processed, this VDS return about 5000 recodrs only.

NB : The query in this VDS is the UNION ALL of 3 other VDSs, each one of these 3 VDSs is accelerated by Raw Reflection.

I’m Using Dremio 4.8.0
Thanks

Hi @farid

Are you able to provide us with a query profile?

@balaji.ramaswamy

We are using Dremio 4.7.3 on Kubernetes.
I have a reflection with sort on a date column. When I refresh the reflection I get the below error.
Is this error because of the heap memory or because of the direct memory given to the executor?

On planning screen, it also shows that error node is dremio-master-0.dremio-cluster-pod.default.svc.cluster.local:31010.

The reflection refresh works when I remove the sort option. The total input bytes after reflection refresh is around 64GB.

RESOURCE ERROR: Query was cancelled because it exceeded the memory limits set by the administrator.
Fragment 3:0
[Error Id: 063905f2-94a9-4180-b9b6-d92532523145 on dremio-executor-0.dremio-cluster-pod.default.svc.cluster.local:0]
  (com.dremio.sabot.exec.OutOfHeapMemoryException) heap monitor detected that the heap is almost full
    com.dremio.sabot.exec.AbstractHeapClawBackStrategy.failQueries():72
    com.dremio.sabot.exec.FailGreediestQueriesStrategy.clawBack():66
    com.dremio.sabot.exec.HeapMonitorThread.checkAndClawBackHeap():143
    com.dremio.sabot.exec.HeapMonitorThread.run():76
Fragment 3:0
com.dremio.sabot.exec.AbstractHeapClawBackStrategy(AbstractHeapClawBackStrategy.java:72)
com.dremio.sabot.exec.FailGreediestQueriesStrategy(FailGreediestQueriesStrategy.java:66)
com.dremio.sabot.exec.HeapMonitorThread(HeapMonitorThread.java:143)
com.dremio.sabot.exec.HeapMonitorThread(HeapMonitorThread.java:76)

The memory settings for the executor are
name: DREMIO_MAX_HEAP_MEMORY_SIZE_MB
value: “8192”
- name: DREMIO_MAX_DIRECT_MEMORY_SIZE_MB
value: “50808"

@unni

SORT operation until it spills will use some heap. Can you please add Class Histograms and then rerun the query. Once the query fails with above error, send us

  • server.gc
  • server.gc.1
  • server.log
  • Query Profile

To enable Histograms and write GC to a separate file, do below

V1

  • Open dremio-master.yaml and dremio-executor.yaml under templates

  • Add below under the DREMIO_JAVA_EXTRA_OPTS section

    -Xloggc:/opt/dremio/data/gc.log
    -XX:+UseGCLogFileRotation
    -XX:NumberOfGCLogFiles=5
    -XX:GCLogFileSize=4000k
    -XX:+PrintClassHistogramBeforeFullGC
    -XX:+PrintClassHistogramAfterFullGC
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/opt/dremio/data
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:MaxGCPauseMillis=500
    -XX:InitiatingHeapOccupancyPercent=25
    -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log

V2

  • Open values.yaml

  • Add the below under the appropriate section, executor or coordinator

    extraStartParams: >-
    -Xloggc:/opt/dremio/data/gc.log
    -XX:+UseGCLogFileRotation
    -XX:NumberOfGCLogFiles=5
    -XX:GCLogFileSize=4000k
    -XX:+PrintClassHistogramBeforeFullGC
    -XX:+PrintClassHistogramAfterFullGC
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/opt/dremio/data
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:MaxGCPauseMillis=500
    -XX:InitiatingHeapOccupancyPercent=25
    -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log

I will add the options and get back to you. Is there any email where I can mail the profile?

@balaji.ramaswamy I have mailed the profile to your email id.

@unni I do not see a full GC at the time the query was cancelled, any chance you have the server.log from that executor? “dremio-executor-0.dremio-cluster-pod.default.svc.cluster.local”

Time we need are
Start Time: 2021-05-13 09:28:21 UTC
End Time: 2021-05-13 09:32:24 UTC

@balaji.ramaswamy INFO level logs should be enough for you to get some relevant information?

@unni Yes just the regular server.log when this issue happens from the executor

2021-06-24 07:01:10,195 [heap-monitoring-thread] INFO  c.d.sabot.exec.HeapMonitorThread - heap usage 7307931672 in pool G1 Old Gen exceeded threshold 7301444403 threshold_cnt 8
2021-06-24 07:01:10,195 [heap-monitoring-thread] INFO  c.d.s.e.FailGreediestQueriesStrategy - Failing query 1f2bd2f5-4ca6-9ba7-b3b0-dc1b81380600 to avoid heap outage

@balaji.ramaswamy this is the error for a particular raw reflection. Sort by date column is disabled

Heap memory of executor: 8GB
Direct memory of executor 42GB

@unni Looks like the heap monitor killed the job to avoid an executor crash, how many executors do you have in this cluster?

We have two executors.

@unni Can you try making it to 4 and retry the query?

Ok will try.

Can you please explain the thought process for this suggestion?

@unni I have no clue on how many executors you need but can certainly see you need more as you are heap starved, o starting somewhere and we can fine tune