Unexpected Parquet Error


I am consulting with a company that uses Dremio. We have built a few data pipelines using Airflow to load data (csv files) into Dremio through Azure Blob Storage.

Our next task is to do some transformations on that data using dbt (dbt-dremio). These dbt models run fine when executed manually from the CLI. However, when I run them through Airflow, triggered by the existence of new files, we consistently get the following error:

IOException: pdfs:/var/lib/dremio/pdfs/scratch/dev zais data mart/dbt_mart/application/loanz/markit_mapping_moodys_green/zais-drem000003@1_2_0.parquet is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [39, 12, 0, 0]

I believe the dataset markit_mapping_moodys_green is an internal detail of dbt-dremio as it uses a blue/green approach to creating datasets. It appears to be checking for the existence of the ‘_green’ dataset before dropping it in order to recreated it.

The client is using a old version of dremio (Build: 4.1.3-202001022113020736-53142377).

Anyone come across this before? I can’t see this dataset in the UI and cannot seem to do anything with it as it is in some corrupted state.

Thanks for your help!

@drew Recommend upgrading as 4.1.3 is really old, also writing to PDFS is not recommended. To fix this issue, have you tried to refresh metadata on the failing PDS?

Thank you @balaji.ramaswamy. The error seems to have gone away. I am not sure what was going on. Hopefully we will be upgrading soon!