Storage Out of space suddenly

Hello. Suddenly we started to experience rapid grows of disk space that caused dremio UI/API be not responsive.

The issue is this folder /mnt/c1/cm/fs Its size exceeds 200gb and is quickly growing.
we tried to clean it up, but within a week it grew to the same size again.
What is this folder for? how can i reduce its grows? Or maybe find the reason for it (it was okay for years).

configs attached

/etc/dremio/dremio.conf
#
# Copyright (C) 2017-2019 Dremio Corporation. This file is confidential and private property.
#

services.executor.enabled: false
debug.dist.caching.enabled: true
paths.local: "/var/lib/dremio"
paths.results: "pdfs://"${paths.local}"/data/results"

#test
services.executor.cache.path.db: "/mnt/c1/cm"
services.executor.cache.path.fs: ["/mnt/c1/cm"]
services.executor.cache.pctquota.db: 50
services.executor.cache.pctquota.fs: [50]
services.executor.cache.ensurefreespace.fs: [20]

# Web server encryption
services.coordinator.web.ssl.enabled: true
services.coordinator.web.ssl.auto-certificate.enabled: false
services.coordinator.web.ssl.keyStore: "/etc/dremio/ssl/...."
services.coordinator.web.ssl.keyStorePassword: "...."
#services.coordinator.web.port: 443

services.coordinator.master.embedded-zookeeper.enabled: false
zookeeper: "10.0.158.65:2181"
paths.accelerator = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/accelerator"
paths.uploads = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/uploads"
paths.downloads = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/downloads"
paths.scratch = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/scratch"
provisioning.coordinator.enableAutoBackups = "true"

paths.metadata = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/metadata"
paths.gandiva = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/gandiva"
paths.system_iceberg_tables = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/system_iceberg_tables"
paths.results_cache = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/data/results_cache"
paths.node_history = "dremioS3:///dremio-me-047e62a8-7fab-4b4d-825e-3f022ee4e3c4-.../dremio/node_history"
registration.publish-host: "10.0.158.65"
provisioning.ec2.efs.mountTargetIpAddress = "10.0.146.31"
dremio-env
#JAVA_HOME in AWSE points to jre, point it to the jdk instead


DREMIO_LOG_DIR=/var/log/dremio
DREMIO_PID_DIR=/var/run/dremio
DREMIO_GC_LOGS_ENABLED="yes"
DREMIO_JAVA_VERSION_CHECK="true"
DREMIO_GC_LOG_FILENAME="server-%t.gc"
#DREMIO_JAVA_EXTRA_OPTS=
DREMIO_EXTRA_CLASSPATH=/var/dremio_efs/thirdparty/*
DREMIO_MAX_MEMORY_SIZE_MB=19089
JAVA_HOME=/usr/lib/jvm/java-11-openjdk
DREMIO_GC_OPTS="-XX:+UseG1GC -Xlog:gc:file=/var/log/dremio/server-%t.gc::filecount=20,filesize=8M

25.2.20-202510310050480576-b3eb2d13
Edition
AWS Edition (activated)

@vladislav-stolyarov Can u please send output of du -sh /mnt/c1/

Done 388G /mnt/c1/

sorry my bad, please run below and send output

cd /mnt/c1/

du -sh *

65M buffer
327G cm
72G db
111M etc
16K lost+found
113M results
21M s3Backup
8.0K security
32K spilling

@vladislav-stolyarov It is all c3. columnar cloud cache files, good problem to have in the sense, you are running a lot of queries. Can you add engines and redirect some workloads to the new engines?

The 72 gb unde “db” is probably RocksDB, just to confirm that, are you able to run below command and send output

cd cd /mnt/c1/db

du -sh *

du -sh *
3.1M blob
93G catalog
312K metadata
136M search
seems its also growing rapidly.

Anyway is there a way to configure less extensive caching? or limit cache size? or do some cleanup safely periodically?

@vladislav-stolyarov C3 usage is a good problem to have as the more C3 files the better is the performance of the engine. Is adding engines/executors not an option?

I would say it can be an option as a last resort. Current cluster performance is pretty acceptable and we do not want to increase costs if possible. Also it will be cheaper to increase ebs volume for now.
Anyway as i still do not understand how cache invalidation(if any) works here:

  1. won’t the same issue happen on extra node?
  2. can we still somehow affect cache size/invalidation? also is removing cm safe?

Note: btw within last 7 days cm folder size has not changed (it is still 327GB) just db grew from 72GB to 99GB. is there any way to decrease db size?

@vladislav-stolyarov db folder main contributor is jobs. By default it hold 30 days. Do you need 30 days of job history or can it be reduced? Is this on VM or K8’s, are you able to send dremio.conf or values.yaml?