I have imported a csv file from HDFS into dremio and initially it has 4 columns. To test metadata management, deleted one column from HDFS file and tried to refresh physical dataset by below command
ALTER TABLE <dataset_path> REFRESH METADATA
Command executed successfully but still I can see dropped column with null value while querying dataset.
Thanks for your response. Executed below commands;
Alter Table <dataset_path> FORGET METADATA
Alter Table <dataset_path> REFRESH METADATA
select * from <dataset_path>
But now columns are not split on delimiter
Can you try to unnest, then extract the columns and save as VDS? To the right of the columns heading there should be an inverted arrow and clicking that you should be able to see the unnest option
Then after unnest, click on the the dots (…) to the right of any data row and you should be able to do an extract
Re-formatting physical dataset resolved the issue.
But my question is do we need to re-format each time Metadata is changed in HDFS. Can’t it be done with some property or any SQL command?
@Monika_Goel did the configuration for the HDFS source change between metadata refreshes? In some instances, where a “metadata impacting” change is made to the source (e.g. changing connection settings like NameNode URI etc.), Dremio will reset existing metadata with the expectation that you’re now working with a new set of files/datasets.
Otherwise, we’d expect existing formats to be always preserved between metadata refreshes.
Hi balaji,
Did this issue fixed in latest version? As mentioned in another page(see below), ‘forget metadata’ will lose the reflections and their definitions. I don’t want to re-format it, it will take high cost, any other approach we can choose to refresh metadata when columns changed?
If I execute ‘forget metadata’ on a PDS,and there is no changes in parquet file(no column added/deleted/updated), does it impact existing VDS? This PDS will be auto promoted as we enabled auto promotion in data source level config.