Query cancelled because it exceeded the memory limits set by the administrator

Hi, can you please help me with proper setup.

I have
1x coordinator (10cpus 10gb mem)
3x executor (14cpus 50gb mem)

i run simple update on table with cca 700M rows.

here is the profile:
060f776d-be6d-4ca1-b040-365017fef751.zip (150.9 KB)

Thanks
Jaro

Hi Jaro,

could you please share the executor’s server.log file when the query failed?
Also, could you please upgrade to 25.0.8 as it has some other important fixes.

Thanks,
Prashanth

Hi Prashanth, thanks for reply. Attaching server.log from executors.

I’m using dremio-oss image, is there new update available too?

executor_0_server.zip (76.6 KB)
executor_1_server.zip (81.8 KB)
executor_2_server.zip (77.9 KB)

Our physical configuration is 10nods of 16cpus and i think 64gb of mem each.

Thanks for your advice
Jaro

Looks like the log file got rolled over. From the profile, the issue happened on 2024-08-27. Could you please upload the log files for this.

Thanks,
Prashanth

Can you please retry the query with the support key planner.use_max_rowcount set to false ?

Thanks,
Prashanth

Hey @prashanthb i have changed the setting but it failed again.
Jaro

I have run the query again (with adjusted setting)
executors_server_log.zip (197.1 KB)
b9658bb9-61e4-483f-bfba-da4a8ce5a409.zip (201.1 KB)
thanks
Jaro

Hi Jaro,

By setting the support key to false we were able to get around the initial issue that was reported.

Looks like you are hitting another issue that we addressed recently and should be a part of 25.1 payload sometime in September. Can we circle back to this after an upgrade? Sorry and thank you for reporting this

Thanks,
Prashanth

Hi @prashanthb, thank you for investigating the issue. Not being able to perform larger operations would be a major obstacle in my project. isn’t there a workaround available, prior to the update is available, please?
Would the update be available for -oss community edition, please?

Thanks
Jaro

Hi @prashanthb,
I have another instance of the same error, this time by executing select.
Intresting is that the query finishes without reflection and fails with reflection.

here is the profile:
ab6942bf-4988-416a-8e34-c7a635e201b8.zip (228.4 KB)

Thank you for your help.
Jaro

Hi Jaro,

25.1.0 has been released and is available for oss community edition.

Thanks,
Prashanth

Hi @prashanthb, we have just performed an update and unfortunately it did not help.

on top we generally observe cca 5x slower performance on same data and setup.
331a8d2b-5982-4695-b0ee-5379a9af1466.zip (173.5 KB)

Thanks for helping me.

Jaro

Hi Jaro,
Even though the signature of the error message looks the same, these are 3 different issues.
By upgrading to 25.1 and setting the support key two of the issues were resolved. We are looking into the third issue.
In the meantime can you increase the executor memory size to maybe 100GB and try running the queries?

Thanks,
Prashanth

Hi @prashanthb, unfortunately i’m on a nod limit, cannot go higher than cca 50GB per executor.
I know that it is less than ideal, but still a plenty of memory. in the meantime we run with 8 executors.
anything else we can try?
jaro

@jaroslav_marko

  • Any chance you can avoid the left join and add a partition column filter on the build side? this will pass runtime filter to the pronbe side, prune partitions
  • Have you tried another measure other than median? Just to narrow down the issue

Hi @balaji.ramaswamy

enclosing another 3 examples that appeared recently. maybe this helps.
20895a23-06eb-4c2f-9ba3-bc6bd5ef113a.zip (228.0 KB)
849b51d5-0ce8-4dbd-9fca-fafb9300d851.zip (398.0 KB)
66d5616d-4706-48cd-b765-0ff87021eac5.zip (521.2 KB)

Let me know please. Strange is that when I have rerun the jobs they have succeeded.

Thanks
Jaro

Hi @balaji.ramaswamy did you have a chance to look into this issue, please?
Thanks
Jaro

@jaroslav_marko It looks like when they failed memory arbiter failed the query as not enough memory to run. It will be interesting to see the completed profile. If the completed profile has the same e=pan and execution plan then it is probably due to other queries running on the server. If the completed profile has a different plan and execution plan then we need to investigate why it changed

Hi @balaji.ramaswamy, attaching completed profiles. Same data, same query. The cluster is not exposed to users so there is usually nothing else running.

f6a18036-d750-4676-985c-cd5cff8fb85f.zip (967.6 KB)
85f7418f-8b64-449e-a852-7f017b569f81.zip (1.1 MB)
f88c9780-584f-4338-9b1d-a31396011334.zip (389.5 KB)

Jaro