Hello everyone, I am executing several queries in my dremio cluster, as they are being executed, the disk space in the coordinator node is decreasing, and there comes a point where it runs out of space to continue executing the queries.
I only need to retrieve the result of the query to continue with the others, I do not need the jobs to be saved or any other type of data that is being stored in the coordinator node. How can I make it so that only the query is executed and the rest of the data that is stored in the coordinator does not?
@gaston_guerra, by default, Dremio will write results of UI/API queries to the storage (results path as configured in dremio.conf) on the executors and if your run many queries, storage can fill up before it is automatically cleaned up. Do you have the coordinator node also configured as an executor?
You can adjust the number of days of query results that are saved with the results.max.age_in_days support key.
@gaston_guerra, check dremio.conf on the coordinator node. Does it also have: executor.enabled: true. Alternatively, the “Node Activity” page in the Dremio UI would show a node as both coordinator + executor.
To set results.max.age_in_days go to Admin–>Support–>scroll to “Support Keys”, enter that key and click show. Then enter the desired value. I believe 0 is the same as disabling clean up, but I will need to confirm this.
You can also try setting planner.output_limit_size (in a similar manner as above) to a lower value (the default is 10 000 00). This basically limits the number of results written to disk that are available to be paged back to the UI or API. The idea here is that you generally will be using Dremio for analytic queries with small results sets and not consuming millions of records via API requests.
But this all depends on whether the coordinator runs out of space because it’s also configured as an executor. Can you observe specifically what directories are filling up?
Thank you very much for showing me the “Support Keys” option, I didn’t know about it. If I put value 0 in results.max.age_in_days at the end, won’t the results be cleaned?
I am using Dremio because it has the ability to query between two different data sources, elasticsearch and an oracle database. Since it is a job that analyzes logs, many records are reviewed, millions.
I have noticed that in the address /var/lib/dremio/db/catalog many .sst and .log files are being generated which consume a lot of space, is there a way to clean them automatically?
As Ben had explained , the results only affect an executor node. If your coordinator is running out of space then it is most likely because of your rocksDB (db folder).
cd <DREMIO_HOME>/data/db (this could have been customized)
du -sh *
df -h .
Bring Down the executor and coordinator
cd <DREMIO_HOME>/bin
./dremio-admin clean
save the report to a file
start Dremio coordinator and the executors
Send us the saved file