Folder caching/refreshes

(running version 2.1 at the moment)

I have a folder in a NAS source with multiple tsv files in it. I configured the folder while data was being written into it, and found that the queries only performed on files that were there when the folder was configured. Subsequent runs of simple “select count(*)” queries did not show increased rows, despite new files being written into the folder.

Am I missing something simple here? Thanks.

Did you have a look at this page from the docs?

https://docs.dremio.com/advanced-administration/metadata-caching.html

That page specifically says : “Dataset Discovery option is not available for file-system sources such as HDFS, MapR-FS or NAS”. This is a NAS source.

The first setting pertains to discovery of new tables.

The second setting is the one you’re interested in. You essentially have three options:

  1. Dremio refreshes its catalog on a schedule you define.
  2. You explicitly refresh the metadata via REST API call.
  3. Dremio refreshes its catalog at query time.

The last option will make query planning more expensive, but you can try that out to see how much it impacts your performance.