Refresh HDFS Metadata not working

Hi Team,

I have imported a csv file from HDFS into dremio and initially it has 4 columns. To test metadata management, deleted one column from HDFS file and tried to refresh physical dataset by below command
ALTER TABLE <dataset_path> REFRESH METADATA

Command executed successfully but still I can see dropped column with null value while querying dataset.

Followed: How to setting metadata in dremio system



But in dataset format settings dropped column is not visible

Any help will be much appreciated.

Thanks.

Hi @Monika_Goel

Can you please try to first forget metadata and then refresh again

Thanks
@balaji.ramaswamy

Hi Balaji,

Thanks for your response. Executed below commands;
Alter Table <dataset_path> FORGET METADATA
Alter Table <dataset_path> REFRESH METADATA
select * from <dataset_path>
But now columns are not split on delimiter

Metadata_5

Hi @Monika_Goel

Can you try to unnest, then extract the columns and save as VDS? To the right of the columns heading there should be an inverted arrow and clicking that you should be able to see the unnest option

Then after unnest, click on the the dots (…) to the right of any data row and you should be able to do an extract

Thanks
@balaji.ramaswamy

Sorry @Monika_Goel

Before that, can you please try to format again and use the right options and see if this time you see only 3 columns

Thanks
@balaji.ramaswamy

Re-formatting physical dataset resolved the issue.
But my question is do we need to re-format each time Metadata is changed in HDFS. Can’t it be done with some property or any SQL command?

Thanks for looking in to it.

Hi @Monika_Goel

Ideally not. Not sure why only reformatting only fixed it this time. Did you put in the same formatting options this time too? Or did it change?

Thanks,
@balaji.ramaswamy

Hi Balaji,

Haven’t changed anything in formatting option. It’s same as for previous Metadata.

Thanks.

@Monika_Goel did the configuration for the HDFS source change between metadata refreshes? In some instances, where a “metadata impacting” change is made to the source (e.g. changing connection settings like NameNode URI etc.), Dremio will reset existing metadata with the expectation that you’re now working with a new set of files/datasets.

Otherwise, we’d expect existing formats to be always preserved between metadata refreshes.

Hi balaji,
Did this issue fixed in latest version? As mentioned in another page(see below), ‘forget metadata’ will lose the reflections and their definitions. I don’t want to re-format it, it will take high cost, any other approach we can choose to refresh metadata when columns changed?

hi balaji,

If I execute ‘forget metadata’ on a PDS,and there is no changes in parquet file(no column added/deleted/updated), does it impact existing VDS? This PDS will be auto promoted as we enabled auto promotion in data source level config.

@ljd520cc

ALTER PDS REFRESH METADATA should discover the new columns. Is it not doing it? If not then you may have to do a “SELECT *FROM PDS” and run