I am consulting with a company that uses Dremio. We have built a few data pipelines using Airflow to load data (csv files) into Dremio through Azure Blob Storage.
Our next task is to do some transformations on that data using dbt (dbt-dremio). These dbt models run fine when executed manually from the CLI. However, when I run them through Airflow, triggered by the existence of new files, we consistently get the following error:
IOException: pdfs:/var/lib/dremio/pdfs/scratch/dev zais data mart/dbt_mart/application/loanz/markit_mapping_moodys_green/zais-drem000003@1_2_0.parquet is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [39, 12, 0, 0]
I believe the dataset
markit_mapping_moodys_green is an internal detail of dbt-dremio as it uses a blue/green approach to creating datasets. It appears to be checking for the existence of the ‘_green’ dataset before dropping it in order to recreated it.
The client is using a old version of dremio (Build: 4.1.3-202001022113020736-53142377).
Anyone come across this before? I can’t see this dataset in the UI and cannot seem to do anything with it as it is in some corrupted state.
Thanks for your help!