How long is Postgres cache?

Hi everyone,

I have an annoying issue with postgres as data source. It seems like Dremio applies a very long cache to the data so, if I change a value in a table, I still get the old value after hours (after 6 hours still the old value). I’m querying the table directly on dremio data source, not through views and there are no reflections.
Metadata caching is set to the lowest values: Dataset discovery every 1 minute, and so is Dataset Details, with expiration after 3 minutes. I also deleted results files in pdfs folder, still getting the old value.
However, if I create a duplicate data source (same postgres db with same login), I get the new data from the cloned one and still the old data from the original data source.
Of course I just can’t recreate the data source every time… is there a way to disable the caching and force Dremio to always get the fresh data?


ps: We are still using Dremio 2.0.5 since we tried 2.1.4 but there was too many errors and we had no time to investigate.

update: our fault.
Looks like there was a reflection on a view that use that table. So even when querying the data source directly, Dremio was using that reflection. This means that there is no real bug or problem, still I wonder if there is a way to force a query not to use reflections, when asked not to.

Thanks and sorry :slight_smile:

Hi Luca, thanks for reporting back. You probably figured this out, but if you look at the job history you can easily see if a data reflection was used. This tutorial on data reflections shows some of the screens:

Currently there’s not a way to bypass the cost-based optimizer (perhaps one day we will support a hint for this). If it determines using a reflection is more efficient, then it will rewrite the query to do so.

One option is to create a new source to the same database - reflections don’t span sources - so as long as you don’t create reflections on the second connection to that source you can be sure those queries won’t be accelerated.