When the old REST job result can be garbage collected?

dli16 · July 23, 2018, 9:03pm

Hello,

We have an out-of-memory issue when using the REST API to query Dremio job result with the same SQL query. The memory usage just keeps increasing until we hit an error:

Failure trying to allocate initial reservation for Allocator.

We set Xmx10000m and 20000m for MaxDirectMemory. I suspect if Dremio keeps the old job result in memory without letting GC to reclaim the space.

Is there a way to specify Dremio only keep a limited number of job result?

doron · July 23, 2018, 9:28pm

Hi,

Can you describe the steps for when this happens? Are you running the same SQL query over and over and using REST to execute and fetch the results? If so, what is the size of the resulting dataset and how often are you running it?

dli16 · July 24, 2018, 1:21pm

I’m running the same SQL query using REST to execute a job and then fetch the result, the Output Bytes is 152.56 MB for the result of the job. We saw the problem when Dremio was used by a few BAs.

When we dig around the problem, we can reproduce it by fetching the result of the same job ID through REST in a loop(one after one, not parallel). The memory usage just increasing with each request and eventually comes with the error. We can’t recover from that state unless restarting the Dremio instance.

dli16 · July 24, 2018, 2:04pm

I used the same SQL query using ODBC driver, there’s no such memory problem.

doron · July 24, 2018, 3:35pm

We have a in memory cache for job results - however fetching the same job results multiple times should not be causing that, will open an internal ticket.

JDBC/ODBC will always be more efficient so hopefully that is a workaround that is acceptable.

dli16 · July 24, 2018, 3:59pm

Thank you for your response. We are planning to use more ODBC to query but it’s a little hard since there are no other Linux ODBC drivers except for Red Head/CentOS.

For the job result, how long is the cache kept in memory? Can I tweak that?

doron · July 24, 2018, 4:13pm

The cache is cleared if the results have not been accessed for 15 minutes. Are you able to tell if its direct or heap memory that keeps increasing?

dli16 · July 24, 2018, 4:35pm

I think it’s direct memory but not sure.

I did a few more tests by query the results from 4 different job IDs. The memory increases with each REST calls(there are 100+ calls for that 4 jobs). I suspect if this is a memory leak from the REST layer not the cache of jobs.

Moreover, the error comes from arrow, so I was wondering if the Accountant tree has released *old_nodes. https://github.com/apache/arrow/blob/078b8068f1c7176ff4a56aa95c9b57800aacce82/java/memory/src/main/java/org/apache/arrow/memory/Accountant.java#L81

ahmadimt · October 4, 2018, 5:49am

Hi @doron,

I am also facing the same issue. I have 14m records that i am trying to fetch using rest api. After hitting multiple time i am getting “One or more nodes ran out of memory while executing the query”. Mine is single node deployment with 4 core and 16 gb RAM.

Is there any workaround for this except ODBC/JDBC?

Thanks in advance.

doron · October 4, 2018, 8:43pm

Hi,

You could potentially run multiple smaller queries instead of one large query until we improve the REST API, which may not always be possible depending on the type of query.

Another option would be to use CTAS but that would generate a parquet file.

Topic		Replies	Views
One or more nodes ran out of memory while executing the query	1	1541	August 24, 2018
DREMIO - Query was cancelled because it exceeded the memory limits set by the administrator	4	887	August 16, 2023
Summary Job Statistics	5	1255	April 25, 2024
Dremio space usage	13	3017	June 22, 2020
Memory Issue on planning phase	11	1551	May 19, 2022

When the old REST job result can be garbage collected?

Related topics