Issues with a view (VDS) built using queries on an auto expiring/refreshing PDS

  • Our data is comprised of parquest files stored in S3
  • We are refreshing the metadata every 3 hours, and expiring it every 8 hours
  • Running direct queries seem to work as expected, with auto formatting enabled.
  • Parquet files contains schema from different versions (being forward compatible, generaly speaking)

However, we recentaly started working with views (VDS) on top our PDS.
Those seem to break everytime the dataset metadata refreshes, due to schema learning errors:

Error while expanding view...
FAILED, Exception com.dremio.common.exceptions.UserRemoteException: SCHE
MA_CHANGE ERROR: New schema found. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema.

During the refresh, we’re getting a handful of these warnings

[metadata-refresh-modifiable-scheduler-17] WARN  c.d.e.s.p.ParquetFormatDatasetAccessor - Cannot convert parquet schema to dremio schema using parquet-arrow schema converter, fall back to generate schema from first parquet file

Parquet file are created using parquet proto writer.

Any idea why is the schema being properly learned on the first go?

@sheinbergon Do you have heterogenous schema (not schema from different versions)? What version of Dremio is this?