Question's on cloud cache

Hi Dremio Team,
can we get some isight’s on cloud caching?

after enabling cloud cache,
how do i correlate cached data to a physical data/queried data?
how do i know if query is using the cache(from profile)?
how do i manually cleanup cache db and files directory?

Thanks

@smora

how do i correlate cached data to a physical data/queried data? - Do you want to compare? Then query the source data using an external tool and compare
how do i know if query is using the cache(from profile)? - Yes, from profile
how do i manually cleanup cache db and files directory? We control this by setting the percentage property - Deleting files from cache

hi @balaji.ramaswamy

  1. if i run a query select * from A from UI, will it cache all the data from A? and by going into cache directory/files folder can i see some some reference to table A
    if we know what data exactly cached, we can pre populate cache before actual user/reporting happens.

  2. which part of query profile indicates if cache is used? only messages i see from profile is something like “PERMISSION_CACHE_HIT (0 ms)”

  3. this link indicates completely wiping out cache and disabling cache on a source. can i selectively cleanup some cache folders, by manually deleting some folder from cache files directory

1 Like

any update on @smora question? I am interested in cleaning up some cache folders

#1 Yes, if you run select * from UI (run), that should cache all rows so subsequent queries on that dataset should use C3 cache. The files are under the folder configured but not readable.
#2 Expand the Parquet_Scan, scroll to the Operator_metrics section and scroll to the right and you will see “NUM_CACHE_HITS” and “NUM_CACHE_MISSES”, see attached screenshot
#3 Currently this is not possible

Queries run via UI are truncated to roughly 1M records. Does Dremio still fetch all the data from the data lake and stores it in C3 or would it only cache the Parquet files required for the ~ 1M records?

I have exactly same question.
Plus to this, I want understand can I disable cache for only one table?
Some of my tables has very fat columns, which never used in Queries only in ETL flow.