I have a pipeline that runs every 5 minutes that pushes new files to Azure Blob Storage. Those containers are source PDS in Dremio with Raw Reflection (incremental) setup with a 1 hr refresh (UI).
The goal is to be able to see those new files (through the PDS) ideally every 5 min (as soon as they are uploaded to Azure).
Given that Azure caches metadata and the Reflections are, in effect, another layer of cache, we need to refresh the metadata and reflection programmatically as part of the pipeline.
To do this, I wrote a Python wrapper around the API to execute a metadata refresh (SQL API) and a reflection refresh (catalog API) of the PDS.
Both calls succeed and I can see them in the Job list in the UI.
For the Reflection refresh, I see to 2 jobs:
- REFRESH REFLECTION ‘e24742f9-e61f-43b5-b5b3-7712844f40cc’ AS ‘8755176e-1f34-4cc1-9545-9bccd57149b2’
- LOAD MATERIALIZATION METADATA “e24742f9-e61f-43b5-b5b3-7712844f40cc”.“8755176e-1f34-4cc1-9545-9bccd57149b2”
However, when I look in the Job list for the subsequent jobs that hit that PDS Reflection, it appears like the reflection Age is 5 hrs old. I can see the Reflection being used (flame icon) and next to the Reflection name that was used, it says “Age: 5hr 22min”. Even if I wasn’t refreshing via API, it should have refreshed every hour.
Any idea what’s going on? How can I tell if the reflection was actually refreshed with the latest data?