If the refresh policy is set to refresh every 1 day, how to do
set time for when to refresh, if the ETL load finishes at 11PM, we want to setup Dremio reflection to refresh after 11PM, is there a SQL command or some other way to specify the time
can we refresh on demand, usually the ETL that loads DW (source for Dremio once a day), in case we run it more than once, how can we kick off the refresh of Dremio reflections. Any command, back-end, SQL or some other API to achieve this?
Is there a system view which can be queried to see the last refresh date time for each reflection?
set time for when to refresh, if the ETL load finishes at 11PM, we want to setup Dremio reflection to refresh after 11PM, is there a SQL command or some other way to specify the time
Currently, there is no way to specify a fixed time to start the refresh. The support of this feature is definitely in our roadmap.
One workaround is to enable the reflection at 11 PM on the first day so that it will run every day at starting around 11 PM or little later (depending on the workload of the system
can we refresh on demand, usually the ETL that loads DW (source for Dremio once a day), in case we run it more than once, how can we kick off the refresh of Dremio reflections. Any command, back-end, SQL or some other API to achieve this?
Currently, there is no way to trigger a refresh on demand.
This is feature is in our immediate road map.
For a “Full update”, you could achieve an “on-demand” refresh by disabling and enabling a reflection.
Since disabling a reflection deletes any previously cached data, this will not be useful for incremental update
Is there a system view which can be queried to see the last refresh date time for each reflection?
You can see the last refresh time of a reflection by clicking the jobs link of the specific reflection. The time of the last “CREATE TABLE” in the Job pages should give you the refresh date time.
Another way is to run a query “select * from sys.materializations where layout_id=<layout_id>”
The “create time” should give you the last refresh time.
You can obtain the layout_id for a reflection by clicking the jobs link of the specific reflection and copying the id shown in the job . See attached -
Is there a query to get layout_id as well? in a production system, we need to know as of what time the data is current in Dremio, so that this can be published in someway to end users. There may be multiple data sources which may have 100s of jobs, there should be a way to know thru some query, the UI is also getting this information from some data source, the same can be put into a system view or query
You may be able to use the following system tables to obtain the layout id
sys.accelerations
sys.layouts
In this model, there may be one acceleration record for a dataset (identified by the table column in sys.accelerations).
Each acceleration can have multiple layouts (sys.layouts). A layout correspond to a reflection on a dataset.
The layouts are differentiated as raw or aggregate and also contain the various properties like field names, sort fields etc.
So you may be able to obtain the layout_id by querying the sys.layouts with the acceleration_id, type, display etc.
We also have a way to name a reflection, but currently we do not show the name of the reflection as a column in sys.layouts . This would have made it much easier to obtain the layout_id from sys.layouts using the name. We will fix this soon.
Thanks, is there a data model available for sys tables? or some document which tells what is stored in each sys tables and if this is related to another table what is the relationship)
I can see following tables is sys, how to describe these tables in SQL GUI or some other CLI or using tool which can connect to dremio thru JDBC driver
Yes the API will respect the refresh type. Its treated just like a scheduled refresh.
If a incremental refresh detects that the dataset it is on has changed (schema has changed, or for virtual datasets the SQL has changed), it will do a full refresh. To force a full refresh, you can disable and enable the reflection as that will force a rebuild. I’ve opened an internal ticket to allow forcing it as that sounds like a good thing to have.
Hi,
Our use case is we get data at different times and as soon as we get new data we have to schedule a refresh (Full or incremental) how do we do that.Because its not a structural change its a data change.
@doron I’m trying this API enpoint to refresh a VDS, but I’m receiving 204 response code and nothing happens. Does this API works for VDS or are PDS only?