We are using dremio to query S3 using AWS glue. We have encountered issues with metadata inconsistencies leading to failed queries. Our datasets are asynchronously mutating which leads to the inconsistency.
The exception on the query we get is “(java.io.FileNotFoundException) Version of file changed <path of file>”. When rerunning the query, the same failure is present. After doing the explicit “ALTER PDS <dataset> REFRESH METADATA”, rerunning the query succeeds. We would like to know if there is any existing features we can use to mitigate this. Questions:
Can queries trigger a metadata refresh on a failure? If this is the default case then why do we see these failures repeating?
I know in the settings we can set metadata to refresh after a fixed time but this is not sufficient for our asynchronously mutating datasets. Is there anything else we can do other than setting the refresh to execute more frequently?