Last modified date for catalog virtual data set

Dremio has a long list of dots to the right of the Query Editor that shows past versions of the query for a virtual data set. I would like to know when the most recent dot was last edited. I looked for this in the Catalog API but only see there the createdAt key. I also looked for this in the sys and information_schema schema. The closest match is I think the reflections modified_at column, but I think this refers to the reflection of the virtual dataset, not the query definition last modified date.

My gut says this is a feature request. I would request exposing more of the version (and tag) related data to the API and to the information schema views.

I noticed this when finding mismatching data after querying seemingly “new” reflections ran on datasets downstream from the reflected dataset that I changed. I changed the upstream dataset’s column meaning but reused the same column name.

Workarounds I have considered include:

  • Add columns when making edits. However, renaming columns is difficult for end users.
  • Keep two copies of the physical data source (dev and prod), then only apply changes on the dev versions, disable then re-enable reflections, then rewire all downstream data sets to the dev version until the next “global refresh” or “maintenance refresh” runs. That would require some coding to identify and swap data sources in the FROM statements I think.
  • Harvesting the last modified date from the dremio logs, detecting all downstream reflections, then scripting disable and re-enable reflections on each, possibly as a background process.

FYI, we are reflecting everything during the design phase for speed, biggest table is in the <25GB range, don’t expect many changes to columns down the road.

@datocrats-org

Are you looking for last modified timestamp of the VDS?

Yes exactly that. I’m hoping to compare to the metadata refresh time for the parent PDS and the reflection refresh times for any related PDS and VDS too, if possible.

We are abandoning the reflect everything approach and instead are planning just to reflect the PDS and the finalmost VDS. The PDS appears to still have some metadata detection happening that can bug if you change data types without changing column names, depending on the order in which you query the data set. While designing queries, it will help to refresh the metadata more completely through the VDS and PDS layers to make the editing process faster, earlier we may have been accomplishing this by forcing reflection refresh at each juncture.

@datocrats-org

VDS are just view, so refresh metadata is only for PDS. Maybe just refresh the reflection on the VDS?

Thanks
Bali