Even though save as functionality is saving a new dataset and doesn’t unlink it. It makes you think its actually renaming the dataset which is pressumably not the intention for the save as functionality.
If you open a query based of a PDS (Like a direct database table, HDFS Parquet file etc) and write a query on top of it we will link that to the original PDS. If you open the saved version (called VDS) and write a query on top of that or modify the SQL and do “Save As” then we will only link it to the original PDS and not the VDS it is based off
For example,
zips.json is json file of S3, considered PDS
I write the below SQL and save as VDS “zips_state”.
SELECT * FROM “zips.json” where state=‘CA’
Say we name it “zips_ca”. it will be linked to zips.json as shown in the below screen shot
Now say we want to create another view with the below SQL
SELECT * FROM “zips_ca” where state=‘CA’ and city = ‘LOS ANGELES’
Like above, if we base the above SQL based on “zips_state” then and name the new VDS as “zips_state_city”, this would be linked to “zips_state” like below screenshot
So now if I rename or drop the parent VDS zips_ca, we will get the WARNING
Changing the name of this dataset will disconnect 1 dependent datasets. Make a copy to preserve these connections.
Instead if I base “zips_state_city” on the direct zips.json PDS then both VDS are independent and are not linked
Thanks
@balaji.ramaswamy