Failure mode for Dremio metadata refresh for asynchronously mutating data sets

Hello,

We are using dremio to query S3 using AWS glue. We have encountered issues with metadata inconsistencies leading to failed queries. Our datasets are asynchronously mutating which leads to the inconsistency.

The exception on the query we get is “(java.io.FileNotFoundException) Version of file changed <path of file>”. When rerunning the query, the same failure is present. After doing the explicit “ALTER PDS <dataset> REFRESH METADATA”, rerunning the query succeeds. We would like to know if there is any existing features we can use to mitigate this. Questions:

  1. Can queries trigger a metadata refresh on a failure? If this is the default case then why do we see these failures repeating?

  2. I know in the settings we can set metadata to refresh after a fixed time but this is not sufficient for our asynchronously mutating datasets. Is there anything else we can do other than setting the refresh to execute more frequently?

@galevy

If you want to instantly see data in Dremio, you need to run the ALTER PDS command. Ideally when a file is removed, Dremio should forget the file and retry on the fly

Please send query profile when you see this

Needs 7 total runs to succeed:

6 failed attempts with the 7th succeeding:

Profile of a failed attempt that does metadata retrieval but still gets the version change error:

Profile of 7th attempt that finally succeeded:

We would like to try and understand the reason for the consecutive failures

@galevy

As Dremio is scanning files under that folder, it encountered 7 different schemas and kept learning

Thanks
Bali

@galevy Do you still have this issue?