Clean up job results every hour

Hi, is it possible to set up Dremio to clean up the job results every 5 minutes or every hour? Currently, I have results.max.age_in_days=0 , but it cleans only after 24 hours. Or is there any other method to run the dremio-admin clean with Dremio on and clean up everything?

@alexandreio dremio-admin clean does not clean results. Unfortunately the support key does not take anything < 1

THe results files can be cleaned manually, you can do it at a frequency via a script

Are you running all your queries via UI or REST API?

@balaji.ramaswamy I’m already cleaning the results, but I also need to run the dremio-admin clean -o. Removing only its content frees up only 20% of the disk space; if I run the dremio-admin afterwards, I’m able to clean up 50% of the disk space."
I’m running it on docker with the version 24.2

@alexandreio

dremio-admin clean -o only cleans orphan splits. This is needed only if you have Non parquet files or you have disabled unlimited splits or you are on a version < 21. Restart removes sst files not needed for recovery and that could be the reason you are seeing a gain. Can you send me your KVstore report by running the below API? (Token needs to be changed)

curl --location --request GET 'localhost:9047/apiv2/kvstore/report?store=none' \
--header 'Authorization: _dremioirr8hj6qn' > kvstore_summary.zip

kvstore_summary.zip (1,8,KB)

Hi @balaji.ramaswamy, sorry for the delay. Here it is. I just removed the source names from the sources.json. This summary was generated when the disk usage was > 85%.

I have two mysql tables and one azure data lake gen2 connection. inside the data lake I have some folders with zip files that I don’t query or do nothing inside dremio and two folders with a lot of subfolders with parquet(raw, delta and iceberg) files inside.

@alexandreio Do not see more than 10 GB there, can you send output of below 2 commands

df -h <mount on which db exists>
du -sh * <On mount where db exists>"

Thanks
Bali

Sure. I’ll wait to fill the disk again before I run these commands.

@balaji.ramaswamy

df -h
Filesystem Size Used Avail Use% Mounted on
overlay 64G 54G 9.2G 86% /
tmpfs 64M 0 64M 0% /dev
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
/dev/sda2 64G 54G 9.2G 86% /localFiles
tmpfs 7.8G 0 7.8G 0% /proc/acpi
tmpfs 7.8G 0 7.8G 0% /proc/scsi
tmpfs 7.8G 0 7.8G 0% /sys/firmware

/opt/dremio/data$ du -sh – * | sort -rh
21G db
6.4G pdfs
85M cm
1.3M zk
8.0K dremio
4.0K spill

@alexandreio I need to kow the volume size of /opt/dremio/data can you please send output of below command

df -h /opt/dremio/data

Thanks,
Bali

Hi @balaji.ramaswamy

64G

@alexandreio Any chance the du -sh /opt/dremio/data was sent after a restart? The numbers do not add up as your du -sh only show usage for 27 GB. The reason I am asking is after restart RocksDB removes files that are no longer needed for recovery