Dremio Refreshing Data

rajupillai · May 12, 2020, 3:08pm

I have parquet hive table I have externalized via the S3 file in Dremio. The query is running fine. But when I change the data in S3, add more data or reload the table, Dremio is not reflecting the changes. I am not sure why it is ? Is there a command I need to issue for Dremio to immediately reflect the changes ?

Rakesh_Malugu · May 12, 2020, 3:20pm

Hello @rajupillai

The default metadata refresh is 1 hour. If you want to see them quickly you need to set that to 1 minute (which is not recommended unless you are doing some quick testing).

Right click on source S3 >> Edit details >> Metadata >> Dataset Discovery >> Fetch every
1 Minute

rajupillai · May 12, 2020, 3:36pm

But, if I am using Dremio has a real time analytics system then even refreshing every minute won’t work. Is there a reason why there is a 1 minute delay in Dremio. Can we issue a command at the end of the batch process to refresh the meta data or have dremio always read current data ?
For example Apache Drill (which I think is what Dremio was based on) doesn’t have this problem. It always access current data. I don’t think presto has this issue as well.
I am assuming the free edition . Is this a free edition limitation ?

doron · May 12, 2020, 3:48pm

Dremio caches metadata for performance reasons - collecting metadata can be time consuming and doesn’t change often. Trying to refresh metadata for every request can be expensive and slow down queries unnecessarily. We have the same behavior in community and enterprise editions.

You can tell Dremio to forget metadata (https://docs.dremio.com/sql-reference/sql-commands/datasets.html#forgetting-physical-dataset-metadata) or to do a full refresh using SQL commands.

Topic		Replies	Views
Refresh Metadata Taking Ling Time	15	4034	February 25, 2021
Near real time metadata refresh	8	2434	December 10, 2021
Metadata Refresh process for Hive is taking long time	4	1813	February 2, 2021
Unable to refresh metadata for the dataset (due to concurrent updates). Please retry Dremio Cloud	2	444	December 6, 2023
DREMIO - Hourly update question	3	498	August 16, 2023

Dremio Refreshing Data

Related topics