Query cancelled because it exceeded the memory limits set by the administrator

jaroslav_marko · August 27, 2024, 9:41pm

Hi, can you please help me with proper setup.

I have
1x coordinator (10cpus 10gb mem)
3x executor (14cpus 50gb mem)

i run simple update on table with cca 700M rows.

here is the profile:
060f776d-be6d-4ca1-b040-365017fef751.zip (150.9 KB)

Thanks
Jaro

prashanthb · August 28, 2024, 3:58pm

Hi Jaro,

could you please share the executor’s server.log file when the query failed?
Also, could you please upgrade to 25.0.8 as it has some other important fixes.

Thanks,
Prashanth

jaroslav_marko · August 28, 2024, 4:59pm

Hi Prashanth, thanks for reply. Attaching server.log from executors.

I’m using dremio-oss image, is there new update available too?

executor_0_server.zip (76.6 KB)
executor_1_server.zip (81.8 KB)
executor_2_server.zip (77.9 KB)

Our physical configuration is 10nods of 16cpus and i think 64gb of mem each.

Thanks for your advice
Jaro

prashanthb · August 29, 2024, 2:43pm

Looks like the log file got rolled over. From the profile, the issue happened on 2024-08-27. Could you please upload the log files for this.

Thanks,
Prashanth

prashanthb · August 29, 2024, 10:34pm

Can you please retry the query with the support key planner.use_max_rowcount set to false ?

Thanks,
Prashanth

jaroslav_marko · August 30, 2024, 1:37pm

Hey @prashanthb i have changed the setting but it failed again.
Jaro

jaroslav_marko · August 30, 2024, 1:44pm

I have run the query again (with adjusted setting)
executors_server_log.zip (197.1 KB)
b9658bb9-61e4-483f-bfba-da4a8ce5a409.zip (201.1 KB)
thanks
Jaro

prashanthb · August 30, 2024, 4:10pm

Hi Jaro,

By setting the support key to false we were able to get around the initial issue that was reported.

Looks like you are hitting another issue that we addressed recently and should be a part of 25.1 payload sometime in September. Can we circle back to this after an upgrade? Sorry and thank you for reporting this

Thanks,
Prashanth

jaroslav_marko · September 1, 2024, 6:03am

Hi @prashanthb, thank you for investigating the issue. Not being able to perform larger operations would be a major obstacle in my project. isn’t there a workaround available, prior to the update is available, please?
Would the update be available for -oss community edition, please?

Thanks
Jaro

jaroslav_marko · September 3, 2024, 12:26pm

Hi @prashanthb,
I have another instance of the same error, this time by executing select.
Intresting is that the query finishes without reflection and fails with reflection.

here is the profile:
ab6942bf-4988-416a-8e34-c7a635e201b8.zip (228.4 KB)

Thank you for your help.
Jaro

prashanthb · September 5, 2024, 2:55pm

Hi Jaro,

25.1.0 has been released and is available for oss community edition.

Thanks,
Prashanth

jaroslav_marko · September 6, 2024, 8:22pm

Hi @prashanthb, we have just performed an update and unfortunately it did not help.

on top we generally observe cca 5x slower performance on same data and setup.
331a8d2b-5982-4695-b0ee-5379a9af1466.zip (173.5 KB)

Thanks for helping me.

Jaro

prashanthb · September 9, 2024, 6:27pm

Hi Jaro,
Even though the signature of the error message looks the same, these are 3 different issues.
By upgrading to 25.1 and setting the support key two of the issues were resolved. We are looking into the third issue.
In the meantime can you increase the executor memory size to maybe 100GB and try running the queries?

Thanks,
Prashanth

jaroslav_marko · September 9, 2024, 7:13pm

Hi @prashanthb, unfortunately i’m on a nod limit, cannot go higher than cca 50GB per executor.
I know that it is less than ideal, but still a plenty of memory. in the meantime we run with 8 executors.
anything else we can try?
jaro

balaji.ramaswamy · September 14, 2024, 1:12am

@jaroslav_marko

Any chance you can avoid the left join and add a partition column filter on the build side? this will pass runtime filter to the pronbe side, prune partitions
Have you tried another measure other than median? Just to narrow down the issue

jaroslav_marko · September 26, 2024, 1:15pm

Hi @balaji.ramaswamy

enclosing another 3 examples that appeared recently. maybe this helps.
20895a23-06eb-4c2f-9ba3-bc6bd5ef113a.zip (228.0 KB)
849b51d5-0ce8-4dbd-9fca-fafb9300d851.zip (398.0 KB)
66d5616d-4706-48cd-b765-0ff87021eac5.zip (521.2 KB)

Let me know please. Strange is that when I have rerun the jobs they have succeeded.

Thanks
Jaro

jaroslav_marko · October 2, 2024, 5:00am

Hi @balaji.ramaswamy did you have a chance to look into this issue, please?
Thanks
Jaro

balaji.ramaswamy · October 2, 2024, 5:41am

@jaroslav_marko It looks like when they failed memory arbiter failed the query as not enough memory to run. It will be interesting to see the completed profile. If the completed profile has the same e=pan and execution plan then it is probably due to other queries running on the server. If the completed profile has a different plan and execution plan then we need to investigate why it changed

jaroslav_marko · October 2, 2024, 6:17am

Hi @balaji.ramaswamy, attaching completed profiles. Same data, same query. The cluster is not exposed to users so there is usually nothing else running.

f6a18036-d750-4676-985c-cd5cff8fb85f.zip (967.6 KB)
85f7418f-8b64-449e-a852-7f017b569f81.zip (1.1 MB)
f88c9780-584f-4338-9b1d-a31396011334.zip (389.5 KB)

Jaro

Topic		Replies	Views
Query was cancelled because it exceeded the memory limits set by the administrator	26	9706	January 26, 2023
Query Failure (Query was cancelled because it exceeded the memory limits set by the administrator)	5	1745	April 8, 2022
OUT_OF_MEMORY ERROR: Query was cancelled because it exceeded the memory limits set by the administrator	4	2449	March 7, 2022
Memory Issue on planning phase	11	1551	May 19, 2022
Query exceeded memory limits	4	2663	January 25, 2022

Query cancelled because it exceeded the memory limits set by the administrator

Related topics