How to avoid the results storage under $DREMIO_HOME location once querying in dremio
Hi,
If I understand you correctly you want to change the location where we store the Job results cache?
Go to your Dremio installation dir, open conf/dremio.conf
and add the following to the paths
section:
results: /location/of/resultscache
For future reference, our configuration options are documented here - you can configure the individual paths of everything we store.
You can make adjustments to dremio.conf to point to a different set of directories:
paths: {
the local path for dremio to store data.
local: “/data/dremio”, <---- directory you want to use to store internal data
the distributed path Dremio data including job results, downloads, uploads, etc
dist: “pdfs://data/dremio/pdfs” <----- that’s directory you are interested in changing
Below directories you don’t need to specify if you keep them co-located
storage area for the accelerator cache.
accelerator: ${paths.dist}/accelerator
staging area for json and csv ui downloads
downloads: ${paths.dist}/downloads
stores uploaded data associated with user home directories
uploads: ${paths.dist}/uploads
stores data associated with the job results cache.
results: ${paths.dist}/results
}
Hi,
How can I avoid the results storing as results cache if more than 10 caches around, Since dealing with the larger data consuming more memory whenever I execute a single query.
Is there any possibility to avoid result caches? Or do I need to allocate one separate location where we have good memory space?
Hi,
Sounds like you want to reduce the amount of job results Dremio keeps around. By default, Dremio cleans up job results that are older than 30 days. We have a system option called results.max.age_in_days
which controls the cleanup behavior.
To set a system option, as an Administrator click on Admin
in the top right. Then click on Advanced Settings
at the bottom of the left hand sidebar. You should see the following at the bottom of the page:
Type results.max.age_in_days
into the input field and press Show
. After you edit the value a Save
button will appear like this:
Once you press Save
, you will need to restart Dremio for this specific setting to take affect.
I tried this but still see old data under results folder. Can I just simply delete directories under data/pdfs/results ?
Can you confirm that you restarted Dremio after changing the setting? That could be an issue in Dremio.
You can delete the results, they are mainly used when viewing datasets using the UI. Dremio will recreate the data if needed.
Thanks Doron. I did restart several times. Let me know if there are any informative log message (in server.out) that I should be looking out for to indicate that “purging” is taking place.
I will take your advice and just delete them manually for now.
Let me know if there are any informative log message (in server.out) that I should be looking out for to indicate that “purging” is taking place.
Please check log/server.log . If the directory is cleaned up, there will be log entries of the following pattern:
INFO c.d.service.jobs.JobResultsStore - Deleted job output directory - /tmp/dremio/data/pdfs/results/266d658d-3a6f-9aa5-fb41-4c3c19996f00
If there is an error while deleting, there will be log entries like this:
WARN c.d.service.jobs.JobResultsStore - Could not delete job output directory : /tmp/dremio/data/pdfs/results/266d65a7-1115-1380-63f6-50fd617ebb00