Hi,
I have a set of parquet file which have all the same structure and which are present in a specific directory.
Over time new parquet files are added to this directory.
I have configured dremio to make a dataset out of all files in that directory. However, when new files come into that directory, the query results are not modified based on the info in those new files.
However when I go to the folder (in datasets) and I say remove format, and than reaply the same format, the query results reflect the new situation.
No reflections have been applied.
How to avoid this in order to have dremio refetch the results based on the content of those new files ?
Dremio version : Build 4.0.1-201909191652190301-211720e
Kind regards !
Hi @geertschneider
You would need wait for Dremio to know that new files have been added, see below documentation link
Caching Metadata
Hi,
thanks for your answer - I’ve reconfigured the default setting (1 hrs) to 5 mins.
I’ll check if this indeed solves the issue.
I was not aware that extra files where considered as metadata as the doc’s specify databases, tables, indexes, etc.
Thanks already !
Hi,
putting the metadata refresh rate more frequent helps when querying the data source directly. So indeed the new data is now part of the results.
However strange side effect - I’ve made a virtual data set on top of this data set (folder).
If I do a count on both - I see different results:
VDS : 8945852
Initial Data source : 15894738
I guess they get out of date due to different refresh rates ?
@geertschneider
VDS is just a view, see if there is a filter or you did a preview
Also 5 minute refreshes might be too expensive
Hi,
no the only thing I do is a data convertion (string > timestamp).
The strange thing is that later in time the VDS got a higher record count that the record count on the data source.
After a while things got stable - and started to return same results.
But that makes the output of the queries unpredictable.
@geertschneider
Please send me profiles of
VDS : 8945852
Initial Data source : 15894738
Share a Query Profile