GKE Master disk usage

We have deployed Dremio 4.5 on GKE.
1 master node (100GB disk), 3 executor pod (100GB disk)

We are getting No space left on device error on the master pod.
98GB is used by the catalog.

Can you tell me how to debug what is taking up so much of disk space?

Dremio stores various things under ./data, e.g. a Lucene index (in db/search) for searching in the “Jobs” UI.
catalog/ contains the actual profiles for previous job executions. If you run many queries and have a long retention period configured, it can take up quite a lot of disk space.

You can do two things to reduce the disk consumption:
Bring the cluster into administration mode and run the “dremio-admin clean” task (instructions in the helm chart) – this will introduce a cluster downtime.
Or: set jobs.max.age_in_days in Admin > Support > Support Keys to a smaller value than the default (30 days I think). It will then do a nightly cleanup (default: 01:00 in the morning, support key: job.cleanup.start_at_hour)

Best, Tim

Thank you Tim.

Can you share the other support settings?
I could not find jobs.max.age_in_days setting in the docs.

Hi @unni, it is not documented, but it should still appear if you enter it into the Support key field as described in the link.

1 Like

Hi, @unni
Big portions of Dremio are open-source. I found the config keys in “ExecConstants.java” in the GitHub project. Here’s the link for the current 4.7 release: https://github.com/dremio/dremio-oss/blob/d255abfabad2c9122e1cdf030ea6bbe8f9b7ce50/sabot/kernel/src/main/java/com/dremio/exec/ExecConstants.java
Since some keys might have been added in Dremio versions more recent than yours, you might need to switch to an older version of ExecConstants via git history.

(I’m not an Dremio engineer, so please re-confirm with Dremio that it is actually okay to change some of the keys you’ll find in ExecConstants. Some of the stuff is pretty low-level and you might break things when tweaking the settings.)

Best regards, Tim

@tid Thank you Tim. Will check out the code.