Non-deterministic indexing error during virtual dataset catalog creation

Sarthak-at-Ikigai · June 3, 2025, 8:40pm

Non-deterministic indexing error during virtual dataset catalog creation

Hi,

I’m encountering a non-deterministic indexing error on my platform related to Dremio. I have a dataset upload workflow set up which uploads a directory of parquet files to S3 and immediately attempts to index it with Dremio. This indexing involves creating a physical dataset catalog using the POST api/v3/catalog/{catalog_id} followed by a virtual dataset catalog (same API, different payload). This works correctly most of the time, but I’m encountering an issue with an upload pipeline.

physical_dataset_payload = {
                "entityType": "dataset",
                "path": physical_dataset_catalog_path,
                "type": "PHYSICAL_DATASET",
                "format": {"type": "Parquet"},
            }

virtual_dataset_payload = {
            "entityType": "dataset",
            "path": virtual_dataset_catalog_path,
            "type": "VIRTUAL_DATASET",
            "sql": <query>,
            "sqlContext": physical_dataset_path,
}

This pipeline generally produces indexable datasets. However, in about 10-15% of the cases, virtual catalog creation throws the following error -

com.dremio.dac.service.errors.NewDatasetQueryException: Unable to create dataset. Selected table has no columns.

And the dataset is not indexed. The bizarre thing is that this error happens seemingly randomly - and re-running the same workflow always results in a dataset which is indexable. If I go to this dataset on the dremio UI, I can always index it manually, and the data is correct and sane.

I’ve not been able to find any documentation/blog post which tells me how I can deal with this issue. Any assistance would be appreciated. Thank you!

balaji.ramaswamy · June 10, 2025, 5:41am

@Sarthak-at-Ikigai When you say “indexing”, do you promote to a PDS, any reason you are promoting everytime? Do you remove format on the Dremio side before each load and again do your POST? If you are adding files, you just have to refresh metadata but if you are deleting the entire folder and doing it again then try this

Remove formatting on existing folder either through API or SQL
Once ETL is done, new folder created and files added
Add formatting aagain
- Query the dataset

Topic		Replies	Views
Flaky error complain location of dataset when creating virtual dataset Dremio University	3	603	March 29, 2023
Frequent errors(Calcite) while running queries on Virtual Datasets	6	1202	August 13, 2020
Cannot POST virtual_dataset from sample source data	7	1222	October 11, 2019
Dremio Create Catalog API	1	942	April 22, 2022
Cannot create physical dataset catalog through API	2	1758	October 28, 2019

Non-deterministic indexing error during virtual dataset catalog creation

Related topics