Query failures due to inconsistent metadata and S3 Versioning Support


While running queries on mutating S3 data we encounter FileNotFound exceptions, which we have traced to a conflict between Dremio’s metadata and the S3 object’s (current) timestamp.

The S3AsyncByteReader does not seem to have the same behavior, btw.

We have enabled asynchronous access when possible, so we were wondering why the sync client is used, and also why there is a difference in behavior of the sync vs async based readers.

Also, using S3 versioned objects (and storing versions in Dremio metadata instead of timestamps) would be a way to guarantee metadata is never inconsistent (but can be stale) and queries don’t fail. Does Dremio support using version metadata?


It seems like after Dremio learnt about a certain file, it was deleted. We have an open internal ticket to address this. Dremio currently does this when files are deleted on Azure storage by collecting metadata on the fly. Profile will have 2 attempts, but not on S3

Thanks for explaining.
Our files (S3 objects) are never deleted, just modified (replaced).


If file names change, you will run into this issue