How to refresh on demand

If the refresh policy is set to refresh every 1 day, how to do

  1. set time for when to refresh, if the ETL load finishes at 11PM, we want to setup Dremio reflection to refresh after 11PM, is there a SQL command or some other way to specify the time

  2. can we refresh on demand, usually the ETL that loads DW (source for Dremio once a day), in case we run it more than once, how can we kick off the refresh of Dremio reflections. Any command, back-end, SQL or some other API to achieve this?

  3. Is there a system view which can be queried to see the last refresh date time for each reflection?

image

2 Likes

Hi @pradeep,

set time for when to refresh, if the ETL load finishes at 11PM, we want to setup Dremio reflection to refresh after 11PM, is there a SQL command or some other way to specify the time

Currently, there is no way to specify a fixed time to start the refresh. The support of this feature is definitely in our roadmap.
One workaround is to enable the reflection at 11 PM on the first day so that it will run every day at starting around 11 PM or little later (depending on the workload of the system

can we refresh on demand, usually the ETL that loads DW (source for Dremio once a day), in case we run it more than once, how can we kick off the refresh of Dremio reflections. Any command, back-end, SQL or some other API to achieve this?

Currently, there is no way to trigger a refresh on demand.
This is feature is in our immediate road map.

For a “Full update”, you could achieve an “on-demand” refresh by disabling and enabling a reflection.
Since disabling a reflection deletes any previously cached data, this will not be useful for incremental update

Is there a system view which can be queried to see the last refresh date time for each reflection?

You can see the last refresh time of a reflection by clicking the jobs link of the specific reflection. The time of the last “CREATE TABLE” in the Job pages should give you the refresh date time.

Another way is to run a query “select * from sys.materializations where layout_id=<layout_id>”
The “create time” should give you the last refresh time.

You can obtain the layout_id for a reflection by clicking the jobs link of the specific reflection and copying the id shown in the job . See attached -

Please note that we have not finalized the sys tables and hence the table names and columns are subject to change.

Is there a query to get layout_id as well? in a production system, we need to know as of what time the data is current in Dremio, so that this can be published in someway to end users. There may be multiple data sources which may have 100s of jobs, there should be a way to know thru some query, the UI is also getting this information from some data source, the same can be put into a system view or query

You may be able to use the following system tables to obtain the layout id

sys.accelerations
sys.layouts

In this model, there may be one acceleration record for a dataset (identified by the table column in sys.accelerations).
Each acceleration can have multiple layouts (sys.layouts). A layout correspond to a reflection on a dataset.
The layouts are differentiated as raw or aggregate and also contain the various properties like field names, sort fields etc.

So you may be able to obtain the layout_id by querying the sys.layouts with the acceleration_id, type, display etc.

We also have a way to name a reflection, but currently we do not show the name of the reflection as a column in sys.layouts . This would have made it much easier to obtain the layout_id from sys.layouts using the name. We will fix this soon.

Thanks, is there a data model available for sys tables? or some document which tells what is stored in each sys tables and if this is related to another table what is the relationship)

I can see following tables is sys, how to describe these tables in SQL GUI or some other CLI or using tool which can connect to dremio thru JDBC driver

image

I was able to get structure of sys tables, if there is a model or document it will be helpful

CREATE TABLE sys.fragments
(
hostname VARCHAR(65536)
, queryId VARCHAR(65536)
, majorFragmentId INTEGER
, minorFragmentId INTEGER
, memoryUsage BIGINT
, rowsProcessed BIGINT
, startTime TIMESTAMP
, blocks VARCHAR(65536)
, thread INTEGER
, schedulerInfo VARCHAR(65536)
, sleeping BIGINT
, blocked BIGINT
)
;

Refresh source metadata cache…
Any update on this feature? Did it happen in a recent release?

Hi,

In 2.0.0 you can request refreshes of a dataset’s reflections using https://docs.dremio.com/rest-api/catalog/endpoints.html#refreshing-a-catalog-entity.

@doron, does this API respect the incremental refresh property? I.e. will it refresh incrementally if refresh policy is incremental?

What we ideally need is support of the following schedule:

  • Refresh incrementally at specific times or intervals during the day
  • Full refresh (reconciliation) at a specific time of a day/week

The latter is needed to account for amendments/corrections in older data which won’t be picked up by the incremental refresh.

Ideally I’d like to be able to switch off automated refresh in the GUI completely and trigger either incremental or full refresh via API.

Yes the API will respect the refresh type. Its treated just like a scheduled refresh.

If a incremental refresh detects that the dataset it is on has changed (schema has changed, or for virtual datasets the SQL has changed), it will do a full refresh. To force a full refresh, you can disable and enable the reflection as that will force a rebuild. I’ve opened an internal ticket to allow forcing it as that sounds like a good thing to have.

1 Like

Hi,
Our use case is we get data at different times and as soon as we get new data we have to schedule a refresh (Full or incremental) how do we do that.Because its not a structural change its a data change.

Thanks & Regards
A.Dinesh

https://docs.dremio.com/rest-api/catalog/endpoints.html#refreshing-a-catalog-entity --> this link is not accessible

The new location for the documentation is https://docs.dremio.com/rest-api/catalog/post-catalog-id-refresh.html.

@doron I’m trying this API enpoint to refresh a VDS, but I’m receiving 204 response code and nothing happens. Does this API works for VDS or are PDS only?

It only works for pds and it’s an async call - we currently don’t notify when the refresh finishes.

1 Like

Any other workaround for this?

Hi Doron,
how to get my VDS ID ? I only know name of my VDS,is there any system table to fetch this ID?

Thanks
Dinesh

@Dinesh

You can get the id via the path using this API.

@doron, there is any documentation about the sys tables? With the relationship and what rules for each table.

Thanks!

@pbofonseca Which system table are you interested in? We are still working on improving them so they are usually not documented at this time.