Unable to refresh metadata for the dataset (due to concurrent updates). Please retry

jdwills · December 6, 2023, 10:01pm

Hi,

We have many files generated in the following example S3 folder structure. Sometimes in bursts of 100 over a few seconds and sometimes a couple every few minutes. We use partition column inference on the source.

└── day=10
    ├── hour=08
         ├── second=19
         │   └── data.parquet
         ├── second=20
         │   └── data.parquet
         ├── second=21
         │   └── data.parquet

After each file is written to disk, we run the Metadata Refresh via the API for each file similar to the following:

ALTER TABLE "xyz"."table1" REFRESH METADATA FOR PARTITIONS ( "day"='10', "hour"='08', "second"='19');

On average we are seeing refreshes take between 30-60 seconds per file. Which seems surprising for a 2MB parquet file with ~400 columns and ~1000 rows, but maybe not?

After 2 or more concurrent REFRESH METADATA calls (within a few seconds of each other) we start seeing jobs fail with the error:

Unable to refresh metadata for the dataset (due to concurrent updates). Please retry.

and sometimes from the same refresh calls we have seen other errors:

com.dremio.common.exceptions.UserRemoteException: CONCURRENT_MODIFICATION ERROR: Unable to refresh metadata for the dataset (due to concurrent updates). Please retry.

NessieReferenceConflictException: Values of existing and expected content for key ‘dremio.internal./dremio-b14a93fa-6981-4b9e-998e-c036056fc230/metadata/6e859ad3-dc84-4555-af79-78025d289af4’ are different.

We need the data to be present in Dremio asap, hence why we cant rely on scheduled refreshes.

Any help understanding how we can best refresh the metadata other than just keep retrying until it eventually everything succeeds?

Any help would be most appreciated.

Cheers,
Jonathan

cindy.la · December 6, 2023, 10:34pm

Hi @jdwills ,

I’d be happy to look into this for you further. Would you mind private messaging me your organization ID, as well as any job IDs for these failed metadata refreshes?

Cindy

jdwills · December 6, 2023, 10:58pm

Thanks @cindy.la PM’d you.

Topic		Replies	Views
Iceberg table has updated. Expected metadataRootPointer	1	1059	March 4, 2022
Refresh Metadata Taking Ling Time	15	4106	February 25, 2021
Failure mode for Dremio metadata refresh for asynchronously mutating data sets	4	1581	January 26, 2022
Near real time metadata refresh	8	2461	December 10, 2021
METADATA REFRESH PER PARTITION - Duplicated data	1	152	June 13, 2024

Unable to refresh metadata for the dataset (due to concurrent updates). Please retry

Related topics