We are having constant problems of lack of memory when executing this query, I am forwarding the profile, can you evaluate what we can do to solve this problem?
The error happens once or twice in a row, and sometimes on the third attempt it works, because when the error occurred the DREMIO processing was high, and when it works is because it has little processing?
How do I resolve this?
Good night, friends, has anyone managed to solve this problem? I need help.
Thanks in advance
You have 24 GB of direct memory per node, see error tab
Allocator(ROOT) 0/24639199972/24639690542/24639438848 (res/actual/peak/limit) numChildAllocators:18
out of this, the query that failed, job id#
1b2f1b5a-d5fe-de77-abf7-615bfc218600 used 16GB per node
Allocator(query-1b2f1b5a-d5fe-de77-abf7-615bfc218600) 0/17171254708/17171598772/9223372036854775807 (res/actual/peak/limit) numChildAllocators:231
There were 10 queries running at this time
Allocator(queue-25364dcc-ecf7-400d-a7a7-efc8f19617f7) 0/20877793812/24211071416/9223372036854775807 (res/actual/peak/limit) numChildAllocators:10
So you are falling short of direct memory, only Aggs and Sorts can spill at this time and hence if a Join is processing in memory and there is no direct memory left, it will run OOM
A few other query ID’s that were running at the same time
Allocator(query-1b2f1bdd-e529-7102-791c-ef0a7ba00600) 0/1963532056/3708841128/9223372036854775807 (res/actual/peak/limit) numChildAllocators:19
Allocator(query-1b2f21b8-6d1a-235f-e77b-7ffcad54d600) 0/1569822068/2081086528/9223372036854775807 (res/actual/peak/limit) numChildAllocators:7
Allocator(query-1b2f21bc-dac1-7f8f-e502-0c83a37e3700) 0/1097226644/1303792836/9223372036854775807 (res/actual/peak/limit) numChildAllocators:8
@balaji.ramaswamy, good morning,
We have 32 GB on this machine where DREMIO is installed on AWS, based on what you analyzed in this problem that we sent, do you have any recommendations on how we could parameterize the memory in the dremio-env.
@Gerbasi Really hard to tell as I do not know the concurrency you desire, SLA’s of the query and type of workload, for example, are these queries dash board queries or Ad hoc queries?
For example ad-hoc user queries may need a larger memory per node requirements while queries from dashboard usually select one or columns with aggregates using filters on partition columns. Creating an aggregate reflection on this would help. If you are limited on memory, another thing we can do is separate out workloads during different times of the day, for example run Ad-hoc queries after the dashboards are done. Increasing memory is also another option, by how much is hard to tell without studying the workload and looking at your profiles and see what is happening