I have an annoying issue with postgres as data source. It seems like Dremio applies a very long cache to the data so, if I change a value in a table, I still get the old value after hours (after 6 hours still the old value). I’m querying the table directly on dremio data source, not through views and there are no reflections.
Metadata caching is set to the lowest values: Dataset discovery every 1 minute, and so is Dataset Details, with expiration after 3 minutes. I also deleted results files in pdfs folder, still getting the old value.
However, if I create a duplicate data source (same postgres db with same login), I get the new data from the cloned one and still the old data from the original data source.
Of course I just can’t recreate the data source every time… is there a way to disable the caching and force Dremio to always get the fresh data?
Thanks
ps: We are still using Dremio 2.0.5 since we tried 2.1.4 but there was too many errors and we had no time to investigate.
update: our fault.
Looks like there was a reflection on a view that use that table. So even when querying the data source directly, Dremio was using that reflection. This means that there is no real bug or problem, still I wonder if there is a way to force a query not to use reflections, when asked not to.
Hi Luca, thanks for reporting back. You probably figured this out, but if you look at the job history you can easily see if a data reflection was used. This tutorial on data reflections shows some of the screens:
Currently there’s not a way to bypass the cost-based optimizer (perhaps one day we will support a hint for this). If it determines using a reflection is more efficient, then it will rewrite the query to do so.
One option is to create a new source to the same database - reflections don’t span sources - so as long as you don’t create reflections on the second connection to that source you can be sure those queries won’t be accelerated.