When the old REST job result can be garbage collected?

Hello,

We have an out-of-memory issue when using the REST API to query Dremio job result with the same SQL query. The memory usage just keeps increasing until we hit an error:

Failure trying to allocate initial reservation for Allocator.

We set Xmx10000m and 20000m for MaxDirectMemory. I suspect if Dremio keeps the old job result in memory without letting GC to reclaim the space.

Is there a way to specify Dremio only keep a limited number of job result?

Hi,

Can you describe the steps for when this happens? Are you running the same SQL query over and over and using REST to execute and fetch the results? If so, what is the size of the resulting dataset and how often are you running it?

I’m running the same SQL query using REST to execute a job and then fetch the result, the Output Bytes is 152.56 MB for the result of the job. We saw the problem when Dremio was used by a few BAs.

When we dig around the problem, we can reproduce it by fetching the result of the same job ID through REST in a loop(one after one, not parallel). The memory usage just increasing with each request and eventually comes with the error. We can’t recover from that state unless restarting the Dremio instance.

I used the same SQL query using ODBC driver, there’s no such memory problem.

We have a in memory cache for job results - however fetching the same job results multiple times should not be causing that, will open an internal ticket.

JDBC/ODBC will always be more efficient so hopefully that is a workaround that is acceptable.

Thank you for your response. We are planning to use more ODBC to query but it’s a little hard since there are no other Linux ODBC drivers except for Red Head/CentOS.

For the job result, how long is the cache kept in memory? Can I tweak that?

The cache is cleared if the results have not been accessed for 15 minutes. Are you able to tell if its direct or heap memory that keeps increasing?

I think it’s direct memory but not sure.

I did a few more tests by query the results from 4 different job IDs. The memory increases with each REST calls(there are 100+ calls for that 4 jobs). I suspect if this is a memory leak from the REST layer not the cache of jobs.

Moreover, the error comes from arrow, so I was wondering if the Accountant tree has released *old_nodes. https://github.com/apache/arrow/blob/078b8068f1c7176ff4a56aa95c9b57800aacce82/java/memory/src/main/java/org/apache/arrow/memory/Accountant.java#L81

Hi @doron,

I am also facing the same issue. I have 14m records that i am trying to fetch using rest api. After hitting multiple time i am getting “One or more nodes ran out of memory while executing the query”. Mine is single node deployment with 4 core and 16 gb RAM.

Is there any workaround for this except ODBC/JDBC?

Thanks in advance.

Hi,

You could potentially run multiple smaller queries instead of one large query until we improve the REST API, which may not always be possible depending on the type of query.

Another option would be to use CTAS but that would generate a parquet file.