Unable to do METADATA REFRESH FOR PARTITIONS

rohitshetty · October 18, 2023, 4:12pm

I am running Dremio cloud (Not software) - and using S3 as a source.
I formatted a folder with a parquet that looks like this.

└── year=2023
    ├── month=08
    │   ├── day=19
    │   │   └── userdata.parquet
    │   ├── day=20
    │   │   └── userdata.parquet
    │   └── day=21
    │       └── userdata.parquet
    └── month=09 (Added later after first ingestion - I am trying to refresh metadata for this)
        └── day=15
            └── userdata.parquet

Then queried to see all my data correctly.
Then added new data to month=09/day=15.

Running this gives me the errors in the screenshot.

ALTER TABLE "xyz"."standalone-test_1" REFRESH METADATA FOR PARTITIONS (
    dir0='year=2023', 
    dir1 = 'month=09', 
    dir2 = 'day=15', 
    "year"='2023', 
    "month"='09',
    "day"='15'
);

“Input error. Expected partition dir0”

Also, another screenshot (Cannot attach it due to new signup restrictions - which I got by accidentally misnaming one of the partitions) shows me the list of partitions - and I can confirm, that I have included all partitions too, but I still am unable to refresh the metadata for the partition.

Although REFRESH METADATA without “FOR PARTITIONS” works as intended.

What am I doing wrong? Please help.

I am attaching the folder for reference.
test_parquet.zip (154.9 KB)

lenoyjacob · October 18, 2023, 7:55pm

@rohitshetty, Welcome to Dremio Community!

Try the following command. Works for me on your sample dataset:

ALTER TABLE path.to.dataset REFRESH METADATA FOR PARTITIONS (
    dir0 = 'year=2023', 
    dir1 = 'month=08', 
    dir2 = 'day=21'
);

jdwills · October 18, 2023, 9:08pm

Hi @lenoyjacob,

One thing I think @rohitshetty forgot to mention (we work together) is that the source requires “Enable partition column inference” to be set, which creates partitions for the inferred columns too. If we do not include those partitions in the REFRESH command we get the following error:

When we do include them we get the original error “Input error. Expected partition dir0”.

Do you think you can run your test again with the partition inference enabled?

All the help is very much appreciated.

Cheers,
Jonathan

lenoyjacob · October 19, 2023, 6:26am

Yup, looks like a bug. I’ve raised a ticket internally. As a workaround disable partition inference and use the metadata refresh query I posted above.

For regular queries, you can create a View to abstract away the prefixed “dir0=”, “dir1=” and “dir2=” using something like split_part().

rohitshetty · October 19, 2023, 10:42am

Thank you @lenoyjacob , and thank you for raising the ticket too.

I disabled partition inference, and used metadata refresh just as you did, and can confirm it works.

I am wondering what are the implications of not having a partition inference. Would there be performance penalties?

lenoyjacob · October 19, 2023, 3:41pm

@rohitshetty @jdwills Quick update. Turns out this has been fixed in 24.2.3 and 23.2.3. And should be fixed in the next release cycle for Dremio Cloud.

There shouldn’t be a performance implication using the workaround. You should be able to see partition pruning happening in the raw profile of the query. IMO, partition inference is more of a convenience feature.

rohitshetty · October 19, 2023, 6:42pm

Thanks!

And should be fixed in the next release cycle for Dremio Cloud.

That is great to hear! Do you know the approximate time when that would be?

Thank you again for all your guidance so far!

lenoyjacob · November 17, 2023, 12:07am

This should be now fixed. It was part of the November 16th, 2023 update of Dremio Cloud: Changelog | Dremio Documentation.

Topic		Replies	Views
Unable to refresh metadata for the dataset (due to concurrent updates). Please retry Dremio Cloud	2	451	December 6, 2023
METADATA REFRESH PER PARTITION - Duplicated data	1	141	June 13, 2024
Physical Dataset not Auto Refreshing	4	917	June 21, 2023
Near real time metadata refresh	8	2441	December 10, 2021
Function not work for refreshng metadata and reflection by partition	1	441	November 1, 2023

Unable to do METADATA REFRESH FOR PARTITIONS

Related topics