I created a datasource out of a S3 bucket, which contains JSON files (records or documents).
New JSON files are continuously incoming into the bucket -
However, when querying the datasource with Dremio, new records are not appearing.
Is this pattern applicable with Dremio + S3 ?
Or should the datasource be MongoDB or something else ?
How frequent the new files are coming into the bucket?. What is the Metadata refresh policy for the source you have?. Default is 1 hr, meaning metadata for the source will get update every hour, within that hour if there is a new file creates in the source it will not be visible in Dremio.
So you need to set the Metadata refresh time according to your work load,i.e how frequently new files writing and how frequently newly created files queried from Dremio.
What is the reason behind refreshing metadata 1000 times per second, if the entire source need not be refreshed, then you can only refresh the datasets you need by using the below command
Thanks for your reply.
Is there a timeline about it?
Dremio is a greate product with best performance.
It is unfortunately that dremio can not support streaming like Delta lake and Hudi.
And the streaming is more and more important.