Storage running full under /cm/fs/

I am running Dremio OSS 24 in docker and have mounted a volume for local storage as well as an minio bucket:

dremio.conf

paths: {
  # the local path for dremio to store data.
  local: ${DREMIO_HOME}"/data"

  # https://docs.dremio.com/software/deployment/dist-store-config/
  # the distributed path Dremio data including job results, downloads, uploads, etc
  #dist: "pdfs://"${paths.local}"/pdfs"
  #dist: "hdfs://<NAMENODE_HOST>:8020/path"}
  dist: "dremioS3:///dremio/"
}

Now I can see that the path /var/lib/docker/volumes/iod_dremio_app/_data/cm/fs with many subdirectories of numbered folders has ~ 93Gb in size. I have some reflections enabled, but on the web interface it tells me that they are between 1 and 200 MB each. So I am not sure why the storage need is so high.

I tried cleanup with dremio-admin clean, but could not lower the space needed.
Here is the ouput of one of the dremio-admin clean commands I ran:

dremio-admin-log.zip (1,6 KB)

[root@toolx01 fs]# ls -laht
total 4.9M
drwxr-xr-x. 131 root root 4.0K Oct  5 14:49 .
drwxr-xr-x.   2 root root    6 Oct  5 14:49 boostedSubDir
drwxr-xr-x.   2 root root  16K Oct  5 13:57 000042
drwxr-xr-x.   2 root root  16K Oct  5 13:57 000050
drwxr-xr-x.   2 root root  16K Oct  5 13:57 000093
drwxr-xr-x.   2 root root  16K Oct  5 13:57 000031
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000023
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000018
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000045
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000086
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000038
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000094
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000044
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000043
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000105
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000079
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000036
drwxr-xr-x.   2 root root  16K Oct  5 13:56 000017
drwxr-xr-x.   2 root root  16K Oct  5 12:01 000104
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000049
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000124
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000110
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000028
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000115
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000111
drwxr-xr-x.   2 root root  84K Oct  3 16:08 000022
drwxr-xr-x.   2 root root  80K Oct  3 16:08 000034
.......
many more
.......
rwxr-xr-x.   2 root root 8.0K Aug 21 16:00 000116
drwxr-xr-x.   2 root root 8.0K Aug 21 16:00 000118
drwxr-xr-x.   2 root root 8.0K Aug 21 16:00 000112
drwxr-xr-x.   2 root root 8.0K Aug 21 16:00 000107
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000060
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000055
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000058
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000056
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000051
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000065
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000053
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000066
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000061
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000063
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000054
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000067
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000062
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000064
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000052
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000057
drwxr-xr-x.   2 root root  16K Aug  7 12:00 000059
drwxr-xr-x.   2 root root 8.0K Jul 20 08:00 000037
drwxr-xr-x.   2 root root 8.0K Jul 20 08:00 000041
drwxr-xr-x.   2 root root 8.0K Jul 11 14:47 000026
drwxr-xr-x.   2 root root 8.0K Jul 11 14:47 000025
drwxr-xr-x.   2 root root 8.0K Jul 11 12:00 000027
drwxr-xr-x.   2 root root 8.0K Jul 11 12:00 000024

There are 132 folders with each one between 200 MB and 2.5GB in Size

What is causing this large storage need? What can I do to minimize it?
Thanks!

@tha This look like the C3 cache files for both data and reflections.One level up, do you see a folder called “cm” and under “cm” you will see 2 folders “db” and “fs”

So I turned of caching by setting reflection.cloud.cache.enabled to false two weeks ago. Still, Dremio did not evict the old and now unused caching files. Should I delete them manually?

If yes, should I just delete db and fs folders?

I also read that Dremio uses 70 percent of the total available disk space for the specified database and file system mount paths. Can this number be adapted?

@tha Yes, old files you can clean them manually. 705 can be changed, the documentation link above should explain that

Hello @balaji.ramaswamy
im working on the community edition of dremio .
I see that the cm/fs/ folder is occupying a lot of space after adding a s3 storage and running few queries.
47G cm/
5.6G db/
6.4G pdfs/
55K security/
87K spill/
50K zk/

any suggestions to optimize or remove older files to maintain the storage efficiently ?

@Hunter That is your Cloude cache files that reads blocks from local rather than going to your sidt store. It means your users are querying more Dremio :slight_smile:

Is it possible to increase disk size? Good problem to have

Hi @balaji.ramaswamy
are u saying that the storage is getting full since im using its local UI to query ?

i was kind of using its UI for testing the performance…

if i use a JDBC client to read data from dremio , then i wont have this issue .. right ?

Can I just rm -rf it? I mean when no one nothing is querying anything?

Yep, you can just

rm -rf /opt/dremio/data/cm/fs/*

Then restart service

@quangbilly79 Evenetually it will again fill up. If you see their cm folder which contains the cache files are the biggest consumer. This helps in performance of queries as the more we cache the less reads from object store and more reads from local SSD. So if the query volume is high then it is better @Hunter increases disk space