I have parquet hive table I have externalized via the S3 file in Dremio. The query is running fine. But when I change the data in S3, add more data or reload the table, Dremio is not reflecting the changes. I am not sure why it is ? Is there a command I need to issue for Dremio to immediately reflect the changes ?
Hello @rajupillai
The default metadata refresh is 1 hour. If you want to see them quickly you need to set that to 1 minute (which is not recommended unless you are doing some quick testing).
Right click on source S3 >> Edit details >> Metadata >> Dataset Discovery >> Fetch every
1 Minute
But, if I am using Dremio has a real time analytics system then even refreshing every minute won’t work. Is there a reason why there is a 1 minute delay in Dremio. Can we issue a command at the end of the batch process to refresh the meta data or have dremio always read current data ?
For example Apache Drill (which I think is what Dremio was based on) doesn’t have this problem. It always access current data. I don’t think presto has this issue as well.
I am assuming the free edition . Is this a free edition limitation ?
Dremio caches metadata for performance reasons - collecting metadata can be time consuming and doesn’t change often. Trying to refresh metadata for every request can be expensive and slow down queries unnecessarily. We have the same behavior in community and enterprise editions.
You can tell Dremio to forget metadata (https://docs.dremio.com/sql-reference/sql-commands/datasets.html#forgetting-physical-dataset-metadata) or to do a full refresh using SQL commands.