OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator

      OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator.

Location -  Major Fragment:28, Minor fragment:75
Allocation outcome details:
allocator[op:28:75:incoming] reservation: 0 limit: 9223372036854775807 used: 50331648 requestedSize: 1048576 allocatedSize: 0 localAllocationStatus: success
allocator[frag:28:75] reservation: 6000000 limit: 5497558138880 used: 50331648 requestedSize: 1048576 allocatedSize: 0 localAllocationStatus: success
allocator[phase-28] reservation: 0 limit: 5497558138880 used: 452984832 requestedSize: 1048576 allocatedSize: 0 localAllocationStatus: success
allocator[query-1df168c8-251c-8d94-5ee5-79162f936701] reservation: 0 limit: 5497558138880 used: 2264924160 requestedSize: 1048576 allocatedSize: 0 localAllocationStatus: success
allocator[queue-119631b2-74e9-4020-b1e9-de7a3c038cc8] reservation: 0 limit: 5497558138880 used: 24548152208 requestedSize: 1048576 allocatedSize: 0 localAllocationStatus: success
allocator[ROOT] reservation: 0 limit: 24653070336 used: 24653009808 requestedSize: 1048576 allocatedSize: 0 localAllocationStatus: fail

on this node type

[root@ip-172-31-8-173 dremio]# cat dremio-env
DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio
DREMIO_GC_LOGS_ENABLED="yes"
DREMIO_GC_OPTS="-XX:+UseG1GC"
DREMIO_JAVA_VERSION_CHECK="true"
#DREMIO_JAVA_EXTRA_OPTS=
DREMIO_EXTRA_CLASSPATH=/var/dremio_efs/thirdparty/*
DREMIO_MAX_MEMORY_SIZE_MB=27607
[root@ip-172-31-8-173 dremio]# cat dremio.conf
#
# Copyright (C) 2017-2019 Dremio Corporation. This file is confidential and private property.
#

services.coordinator.enabled: false
services.executor.cache.pctquota.db: 20
services.executor.cache.pctquota.fs: []
services.executor.cache.path.fs: []
debug.dist.caching.enabled: true
paths.local = "/var/lib/dremio/data"
paths.spilling: []
zk.client.retry.unlimited: false,
zk.client.retry.limit: 7
zk.client.retry.initial_timeout_ms: 300000
zk.client.connection_handle.enabled: true
services.node-tag: "high-priority"
zookeeper: "172.31.14.149:2181"

registration.publish-host: "172.31.8.173"
services.executor.cache.path.fs += "/mnt/c1/cache/"
paths.spilling += "/mnt/c1/spilling"
services.executor.cache.pctquota.fs += "95"

on coordinator

[root@ip-172-31-14-149 dremio]# cat dremio.conf
#
# Copyright (C) 2017-2019 Dremio Corporation. This file is confidential and private property.
#

services.executor.enabled: false
debug.dist.caching.enabled: true
paths.local: "/var/lib/dremio"
paths.results: "pdfs://"${paths.local}"/data/results"

# Web server encryption
#services.coordinator.web.ssl.enabled: true
#services.coordinator.web.ssl.auto-certificate.enabled: true
#services.coordinator.web.port: 443
services.coordinator.master.embedded-zookeeper.enabled: false
zookeeper: "172.31.14.149:2181"
paths.accelerator = "dremioS3:///dremio-me-df9038ad-0166-40a9-81b5-5776e3d83b2b-202f450f5cc8814f/dremio/accelerator"
paths.uploads = "dremioS3:///dremio-me-df9038ad-0166-40a9-81b5-5776e3d83b2b-202f450f5cc8814f/dremio/uploads"
paths.downloads = "dremioS3:///dremio-me-df9038ad-0166-40a9-81b5-5776e3d83b2b-202f450f5cc8814f/dremio/downloads"
paths.scratch = "dremioS3:///dremio-me-df9038ad-0166-40a9-81b5-5776e3d83b2b-202f450f5cc8814f/dremio/scratch"
provisioning.coordinator.enableAutoBackups = "true"
paths.metadata = "dremioS3:///dremio-me-df9038ad-0166-40a9-81b5-5776e3d83b2b-202f450f5cc8814f/dremio/metadata"
registration.publish-host: "172.31.14.149"
provisioning.ec2.efs.mountTargetIpAddress = "172.31.10.119"
[root@ip-172-31-14-149 dremio]# cat dremio-env
DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio
DREMIO_GC_LOGS_ENABLED="yes"
DREMIO_GC_OPTS="-XX:+UseG1GC"
DREMIO_JAVA_VERSION_CHECK="true"
DREMIO_DAC_DEAMON_AWS_OPTIONS=""
#DREMIO_JAVA_EXTRA_OPTS=
DREMIO_EXTRA_CLASSPATH=/var/dremio_efs/thirdparty/*
DREMIO_MAX_MEMORY_SIZE_MB=120000

@mary.lang I would like to see the SQL and how many joins are involved, any chance you can send over the job profile?

@mary.lang The main memory consuming operator are the Hash Aggregates and I see 46 of them in the query, although Dremio spills Hash Aggregates, until V19, it does not spill MAX/MIN on varchar columns, I see there are a few of them, can you please confirm and if that is the case, upgrade to 19.x and retry the query?

@balaji.ramaswamy
how can I check if it is spilling max/min on varchar columns?

@mary.lang The version you are on, does not spill max/min on columns that are of type string, your VDS definition should say if you ar doing max/min and if yes then check the data type of those columns in Dremio by doing a describe command and see if any of those are strings