First time - it works fine. However when I run the job again and pyspark writes another file into the folder to replace the old one (with a slightly different name) and when I go to query it again with dremio I get a "File Not Found " error.
I would have thought that dremio would just query whatever files are in the containing directory, whatever the name?
Dremio caches metadata for performance reasons, and the default is to refresh every hour. If you edit the source you will need to expand Advanced Options (we are working on making the source configuration easier to use) and you can configure the metadata caching options.
You can go down to 1 minute as the smallest refresh - note that depending on the size of your S3 data and network between Dremio and S3 metadata refresh can take some time.