How to handle huge reflections?

#1

Good morning, guys.
I’m trying to create a reflection that it’s pretty big. This VDS was made up from a query which joins other 6 VDS (~12 millions registers each). For each of these 6 VDS I’ve created a reflection as well. However, when I try to create a reflection for the main VDS (which joins the other 6), I got a memory error. When I check the profile, seems that Dremio consumed a lot of memory in operators like HASH_JOIN (which I know that consumes a lot of memory). My question is: How can I handle problems like this, when I need to create this kind of reflection?

Here’s my profile: c7ec2295-879d-4a40-a1b6-bb04fb6f9db7.zip (397.7 KB)

I’m running Dremio in a cluster with 1 coordinator and 2 executors:

Coordinator: c5.xlarge
Executors: c5.2xlarge

#2

Hi @Paulo_Vasconcellos

You do not have enough direct memory to run this query. What are your direct memory settings on your executor. Can you please send your dremio-env file from your executor?

Thanks
@balaji.ramaswamy

#3

I’ve configured my executors to limit the memory to 13GB. Here’s the dremio-env file:

# Copyright (C) 2017-2018 Dremio Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Dremio environment variables used by Dremio daemon
#

#
# Directory where Dremio logs are written
# Default to $DREMIO_HOME/log
#
DREMIO_LOG_DIR=/var/log/dremio

#
# Send logs to console and not to log files. The DREMIO_LOG_DIR is ignored if set.
#
#DREMIO_LOG_TO_CONSOLE=1

#
# Directory where Dremio pidfiles are written
# Default to $DREMIO_HOME/run
#
DREMIO_PID_DIR=/var/run/dremio

#
# Max total memory size (in MB) for the Dremio process
#
# If not set, default to using max heap and max direct.
#
# If both max heap and max direct are set, this is not used
# If one is set, the other is calculated as difference
# of max memory and the one that is set.
#
DREMIO_MAX_MEMORY_SIZE_MB=13100

#
# Max heap memory size (in MB) for the Dremio process
#
# Default to 4096 for server
#
#DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192

#
# Max direct memory size (in MB) for the Dremio process
#
# Default to 8192 for server
#
#DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=10240

#
# Max permanent generation memory size (in MB) for the Dremio process
# (Only used for Java 7)
#
# Default to 512 for server
#
#DREMIO_MAX_PERMGEN_MEMORY_SIZE_MB=512

#
# Garbage collection logging is enabled by default. Set the following
# parameter to "no" to disable garbage collection logging.
#
#DREMIO_GC_LOGS_ENABLED="yes"

#
# The scheduling priority for the server
#
# Default to 0
#
# DREMIO_NICENESS=0
#

#
# Number of seconds after which the server is killed forcibly it it hasn't stopped
#
# Default to 120
#
#DREMIO_STOP_TIMEOUT=120

# Extra Java options
#
#DREMIO_JAVA_EXTRA_OPTS=
#4

Hi @Paulo_Vasconcellos

Out of 13 GB, 4 GB would go to heap which is not used during a HASH_JOIN

Thanks
@balaji.ramaswamy

#5

Got it. What would you suggest me to do, @balaji.ramaswamy?

#6

Hi, @balaji.ramaswamy! Any thoughts on this issue? Is my only option scale up my clusters by incresing the memory size?

#7

@Paulo_Vasconcellos

Send me your profile, I can see how much the query completed and may be try to give you an estimate on how much more memory you need