i created some delta lake table on my Ubuntu machine and stored them in HDFS.
Connected HDFS to Dremio and formated the tables as delta tables in Dremio.
Now, i enabled Delta Lake column mapping for one table and changed a column’s name. Querried the table with spark and the changes are active. I tried refreshing the metadata in dremio so that the changes become visible there too and got the following error message: “Missing version file 3”. I think the problem is delta lake column mappings doesnt’t create a new parquet file with the data, just a new .json file in the delta log to log the metadata changes of the schema (column name change).
What can i do to reflect this changes in Dremio?
Does someone have any experience with using Delta Lake column mapping and then querying the tables in Dremio after a ALTER TABLE table REFRESH METADATA ?
I still hope that i am doing something wrong.
@RolandR Can you please try
ALTER PDS FORGET METADATA
followed by ALTER PDS REFRESH METADATA and then retry the query?
Thank you for your answer! @balaji.ramaswamy
I ran ALTER PDS FORGET METADATA on my Delta Table. This turned the tabled again into a normal Folder without the Delta Format. When i try formatting it again to a delta Table it says: Missing version file 1.
All I did was enable column mapping with pyspark and then using ALTER TABLE RENAME COLUMN card_type TO type. This operation created a delta log file (now i have 3 delta log json files: 0, 1, 2) and one parquet file (specific to delta lake’s column mapping operation that just changes the metadata and points to the new column name of the old column).
@RolandR Interested to the see what the error stack is. Are you able to send the server.log from the coordinator when this error happened? Also was there a job profile created with this error? If yes, it is possible to attach that one too?
I have just sent the files in private! Thank you again!
@RolandR Sent to any email?
Thanks @RolandR This is something we need to repro, will keep you posted