I am trying to create Reflection using Rest API but it fails with error ‘datasetId must reference an existing dataset’. While dataset already exists in dremio, tried with Physical & Virtual dataset but same error.
Can you please provide any insight in this?
Can you post the way you’ve tried to implement this so we can suggest what should change?
How are you retrieving the ID of the dataset when you are creating the reflection? Most likely what you need to do is use the by-path endpoint to get the id of the dataset you want to reference.
@doron Thanks for your reply. Yes, I have used get Catalog by-path endpoint and HDFS dataSet id returned by it is ‘dremio:/CMITCluster/dremio/“GE”’. As per my understanding id should be something like ‘a0c84110-67c2-46ce-8582-d3e079a5e9d8’.
In case of PostgreSQL source, id is returned in correct format and able to create Reflection for PostgreSQL while it fails for HDFS source.
Please suggest how to mitigate this error.
For filesystem based sources such as HDFS, we do not automatically create datasets for each item in the source as they are potentially millions of them and that would be expensive - plus often we don’t know how to properly format the data without user input. That is why in the UI if you click on a file in HDFS you are asked to identify its type and other formatting options.
In this case you would have to promote the GE entity (its either a file or a folder). In fact, when you do the by-path call, the
entityType returned won’t be
folder. We have docs on how to promote using the API or you could use the UI. Only once its promoted can you add reflections.