Directory as Dataset > new files seems not to update query results

Hi,

I have a set of parquet file which have all the same structure and which are present in a specific directory.
Over time new parquet files are added to this directory.

I have configured dremio to make a dataset out of all files in that directory. However, when new files come into that directory, the query results are not modified based on the info in those new files.
However when I go to the folder (in datasets) and I say remove format, and than reaply the same format, the query results reflect the new situation.
No reflections have been applied.

How to avoid this in order to have dremio refetch the results based on the content of those new files ?

Dremio version : Build 4.0.1-201909191652190301-211720e

Kind regards !

Hi @geertschneider

You would need wait for Dremio to know that new files have been added, see below documentation link

Caching Metadata

Hi,

thanks for your answer - I’ve reconfigured the default setting (1 hrs) to 5 mins.
I’ll check if this indeed solves the issue.
I was not aware that extra files where considered as metadata as the doc’s specify databases, tables, indexes, etc.

Thanks already !

Hi,

putting the metadata refresh rate more frequent helps when querying the data source directly. So indeed the new data is now part of the results.
However strange side effect - I’ve made a virtual data set on top of this data set (folder).
If I do a count on both - I see different results:
VDS : 8945852
Initial Data source : 15894738

I guess they get out of date due to different refresh rates ?

@geertschneider

VDS is just a view, see if there is a filter or you did a preview

Also 5 minute refreshes might be too expensive

Hi,

no the only thing I do is a data convertion (string > timestamp).
The strange thing is that later in time the VDS got a higher record count that the record count on the data source.

After a while things got stable - and started to return same results.
But that makes the output of the queries unpredictable.

@geertschneider

Please send me profiles of

VDS : 8945852
Initial Data source : 15894738

Share a Query Profile