Goal: reflect the latest content on PDS via v3 API.
my source is s3 and often files are changed on s3.
e.g. one column is removed or added in the original file on s3.
I don’t want to forget PDS metadata and promote a file to PDS cause VDS is dependent on PDS.
I found that v2.file_format API on Dremio UI does what I want to achieve.
It automatically syncs with the latest file on s3.
the problem is it’s not in the documentation and don’t want to include this in a production system.
I tried edit_catalog API to reformat PDS and ALTER PDS REFRESH METADATA but all doesn’t work for my use case.
Is there any v3 compatible API that performs format_file API job?
Thanks Doron. I read the documentation several times but couldn’t figure it out to automatically detect new or removed fields.
Should I manually specifies all fields if a column is removed or added? or is there specific argument for that?
Adding on to the above reply,
case 1. new column addition
Don’t need to do anything, PDS gets updated when source file on s3 is changed.
case 2. existing column deletion
I can see null values in old column names.
How can I delete old fields using edit-catalog?
I added fields in payload but it doesn’t edit fields. Probably my payload is wrong but dont’ know how to set up this. Thanks in advance.
You don’t need to specify columns, Dremio will figure them out for you.
Dremio’s schema learning is additive - it won’t remove dropped columns. What you can do is run a metadata refresh sql (docs) - you can do a FORGET or FORCE UPDATE to clear old metadata.
Thank you Doron. I was able to remove old columns by forgetting and creating PDS.
But I’m not sure about METADATA REFRESH FORCE UPDATE.
It doesn’t drop unused columns.
Tested the below commands and all don’t drop old columns.
ALTER PDS path REFRESH METADATA FORCE UPDATE
ALTER PDS path REFRESH METADATA AVOID PROMOTION FORCE UPDATE
ALTER PDS path REFRESH METADATA AVOID PROMOTION FORCE UPDATE DELETE WHEN MISSING
Am I missing something? or as you said, field management is additive and can’t remove old columns by REFRESH METADATA?