Exclude certain subfolders when creating a dataset

We have spark writing parquet files and the application being a streaming one, it creates _spark_metadata folder with some json files under it. When trying to query the top level folder, dremio complains about the contents of the _spark_metadata folder. Any suggestions on how to ignore them.

Here is the file layout


The subfolder _spark_metadata has json files that are metadata and not data.

Here is the error when creating a dataset

dataset1/_spark_metadata/0 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [100, 100, 34, 125]

Hi @Madhu

Currently we cannot promote mixed file types by ignoring certain files. Is there by any chance you can move the Parquet files under a folder and then promote the folder containing only the Parquet files?


It is needed for spark structured streaming and new files are added for each micro batch. I don’t think we will have a choice to remove them. We are also trying to see if spark can write it else where.