Unable to refresh metadata for the dataset (due to concurrent updates). Please retry

Hi,

We have many files generated in the following example S3 folder structure. Sometimes in bursts of 100 over a few seconds and sometimes a couple every few minutes. We use partition column inference on the source.

└── day=10
    ├── hour=08
         ├── second=19
         │   └── data.parquet
         ├── second=20
         │   └── data.parquet
         ├── second=21
         │   └── data.parquet

After each file is written to disk, we run the Metadata Refresh via the API for each file similar to the following:

ALTER TABLE "xyz"."table1" REFRESH METADATA FOR PARTITIONS ( "day"='10', "hour"='08', "second"='19');

On average we are seeing refreshes take between 30-60 seconds per file. Which seems surprising for a 2MB parquet file with ~400 columns and ~1000 rows, but maybe not?

After 2 or more concurrent REFRESH METADATA calls (within a few seconds of each other) we start seeing jobs fail with the error:

Unable to refresh metadata for the dataset (due to concurrent updates). Please retry.

and sometimes from the same refresh calls we have seen other errors:

com.dremio.common.exceptions.UserRemoteException: CONCURRENT_MODIFICATION ERROR: Unable to refresh metadata for the dataset (due to concurrent updates). Please retry.

NessieReferenceConflictException: Values of existing and expected content for key ‘dremio.internal./dremio-b14a93fa-6981-4b9e-998e-c036056fc230/metadata/6e859ad3-dc84-4555-af79-78025d289af4’ are different.

We need the data to be present in Dremio asap, hence why we cant rely on scheduled refreshes.

Any help understanding how we can best refresh the metadata other than just keep retrying until it eventually everything succeeds?

Any help would be most appreciated. :pray:

Cheers,
Jonathan

Hi @jdwills ,

I’d be happy to look into this for you further. Would you mind private messaging me your organization ID, as well as any job IDs for these failed metadata refreshes?

Cindy

Thanks @cindy.la PM’d you.