Memory usage understanding

Hello Dremio crowd,

we are constantly running into memory problems, despite having a powerful Azure VM with 1.7TB RAM as executor.
Queries failed with an out of memory error, despite memory usage not going over 20% for the node, so we figured heap memory is to blame and adjusted the settings in the K8 helm chart.
That seems to have relieved the problems, however there are still many queries that spill (Small “This query was spilled” icon in the jobs).
Could someone explain why that is happening and what the correct helm settings are for operations that use a lot of memory? My understanding is quite limited despite having read the documentation.

please share your query profile

@mdes Only 2 operators spill Aggs and Sorts, what version of Dremio are you on? As @dacopan rightly said, let us start with the profile

Thank you very much for getting back to me.
Is there an overview of what is included in the full downloadable query profile, especially in regards to possibly sensitive data?
I am unfortunately not cleared to upload it publicly yet.

In the meantime, would screenshots of the raw profile in the UI be helpful?
We are on version 24.3.2 (community)

If there is some sort of automated query analtics tool we would be very interested in that as well.

In query profile include all metrics, stats and metadata about how Dremio runs a query, the only “private” data included is “server paths” and the SQL that you run.

Not sure if this is hijacking the thread, but I do think it’s relevant. If you want me to create a separate thread for it, let me know:

I can’t speak for others, but we are not able to share query profiles in their current form (although we often would like to).
The profile at least contains names of all configured data sources as well as the full SQL and at least “some” information about the reflections used. For my company, the table names and column names themselves are considered sensitive information and cannot be shared - let alone the full SQL in it’s original form.

We have though about systematically anonymising the downloaded query profile, keeping the structure of the SQL query and stats/metrics intact but anonymising columns, tables, databases etc.,. in order to be able to share the profile. However, the prepare_profile_attempt_x and profile_attempt_x contains a “serialisedPlan”, which is base64 encoded binary data. We aren’t able assert whether that binary data contains information we consider sensitive, which means we aren’t able to share that either. We could strip those out, but that might break whatever tooling you have to inspect the profile and render the profile useless.

In my opinion, having a way to share anonymised query profiles, that verifiably don’t leak sensitive information, would open the door for you receiving a lot more feedback from non-trivial, real world use cases and datasets.
I see it as a great way for the community to share where Dremio falls short on their use cases, hopefully improving Dremio to the benefit of everyone - which should be a win-win.

Are you aware of any work or talks in this regard @balaji.ramaswamy and @dacopan - or is it even something you an issue?

2 Likes

@wundi

That would be a great tool to have, let me check on this and get back to you

Thanks
Bali

1 Like