Reading S3 directory file not found error

mitroberts · April 16, 2018, 11:08pm

Hey all,

Looking at using dremio to query S3. I have a folder which pyspark writes files into. The path of the folder is always constant - so I set dremio up to query it as a “directory” as per https://docs.dremio.com/data-sources/files-and-directories.html.

First time - it works fine. However when I run the job again and pyspark writes another file into the folder to replace the old one (with a slightly different name) and when I go to query it again with dremio I get a "File Not Found " error.

I would have thought that dremio would just query whatever files are in the containing directory, whatever the name?

doron · April 16, 2018, 11:16pm

Hi,

Dremio caches metadata for performance reasons, and the default is to refresh every hour. If you edit the source you will need to expand Advanced Options (we are working on making the source configuration easier to use) and you can configure the metadata caching options.

09%20PM

We usually recommend adding new files over time.

mitroberts · April 16, 2018, 11:24pm

Thankyou - it looks like you can only refresh every hour? - can it be faster?

Thanks,
Tim

doron · April 16, 2018, 11:42pm

You can go down to 1 minute as the smallest refresh - note that depending on the size of your S3 data and network between Dremio and S3 metadata refresh can take some time.

mitroberts · April 17, 2018, 12:02am

Hi again, thanks for your help - im having trouble finding the advanced options? can you post some screenshots how to get there?

doron · April 17, 2018, 12:08am

Sure, you need to click on the Show Advanced Options shown here with an arrow:

mitroberts · April 17, 2018, 1:04am

Hey - thanks so much, that worked a treat!

Topic		Replies	Views
Refresh Metadata Taking Ling Time	15	4072	February 25, 2021
S3 : streaming JSON files in a bucket as new documents / records	8	2043	October 1, 2020
Near real time metadata refresh	8	2448	December 10, 2021
Failure mode for Dremio metadata refresh for asynchronously mutating data sets	4	1574	January 26, 2022
Unable to find bucket named error when querying S3	9	2470	April 20, 2020

Reading S3 directory file not found error

Related topics