S3 : streaming JSON files in a bucket as new documents / records


I created a datasource out of a S3 bucket, which contains JSON files (records or documents).
New JSON files are continuously incoming into the bucket -
However, when querying the datasource with Dremio, new records are not appearing.
Is this pattern applicable with Dremio + S3 ?
Or should the datasource be MongoDB or something else ?

Hi @mlb

How frequent the new files are coming into the bucket?. What is the Metadata refresh policy for the source you have?. Default is 1 hr, meaning metadata for the source will get update every hour, within that hour if there is a new file creates in the source it will not be visible in Dremio.

So you need to set the Metadata refresh time according to your work load,i.e how frequently new files writing and how frequently newly created files queried from Dremio.


Thanks @Venugopal_Menda -
I figured this out and reduced it to 1mn. I am considering MongoDB for a realtime alternative datasource.