We have spark writing parquet files and the application being a streaming one, it creates _spark_metadata folder with some json files under it. When trying to query the top level folder, dremio complains about the contents of the _spark_metadata folder. Any suggestions on how to ignore them.
Here is the file layout
The subfolder _spark_metadata has json files that are metadata and not data.
Here is the error when creating a dataset
dataset1/_spark_metadata/0 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [100, 100, 34, 125]
Currently we cannot promote mixed file types by ignoring certain files. Is there by any chance you can move the Parquet files under a folder and then promote the folder containing only the Parquet files?
It is needed for spark structured streaming and new files are added for each micro batch. I don’t think we will have a choice to remove them. We are also trying to see if spark can write it else where.