How to promote a file in S3 to Physical Data Set

Dremio Community,

I would like to create a dataset in Dremio via REST API doing the following.

  1. Upload file (.csv) to S3 type data source.
  2. Convert file to a DataSet
    a. Get the id of the physical file via REST API:
    GET http://:port/api/v3/catalog/by-path/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/uscities-1

Result:
{“entityType”:“file”,“id”:“dremio:/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/“uscities-1"”,“path”:[“DQ Metrics”,“dataqueryplatform-dev-jsondata2”,“turik”,“data”,”“uscities-1"”]}

b. Promote the file to a Dataset via REST API

POST http://server:port/api/v3/catalog/by-path/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/uscities-1
{
“entityType”: “dataset”,
“id”: “dremio:/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/“uscities-1"”,
“type”: “PHYSICAL_DATASET”,
“path”: [
“DQ Metrics”,
“dataqueryplatform-dev-jsondata2”,
“turik”,
“data”,
““uscities-1"”
],
“format”: {
“type”: “Text”,
“fieldDelimiter”: “,”,
“lineDelimiter”: “\r\n”,
“quote”: “””,
“comment”: “#”,
“escape”: “””,
“skipFirstLine”: true,
“extractHeader”: true,
“trimHeader”: true,
“autoGenerateColumnNames”: true
}
}
Result:
{“errorMessage”:“Something went wrong”,“moreInfo”:“HTTP 405 Method Not Allowed”}

However, I am not able to complete step 2b. I suspect there is a problem with the
{id}. Is the id ok?

Also, Please confirm that promotion of a file should be HTTP Method ‘POST’.

Please advise.

@netmille

When you upload a file through the upload button, they are already a PDS, you can create VDS based on the uploaded file. The second option is to upload the file to S3/HDFS/ADLS and then add it as a source to Dremio and then promote via REST API

Thanks
Bali

Yes, I am attempting to ‘promote’ via REST API (POST /api/v3/catalog/{id}) As I mentioned, there is a problem when I
initiate a request to promote a file. Dremio does not recognize the ID.

Request:

Post https://: /api/v3/catalog/dremio%3A%2FDQ%20Metrics%2Fdataqueryplatform-dev-jsondata2%2Fturik%2Fdata%2F%5C%22uscities-1%5C%22

POST data:
{
“entityType”: “dataset”,
“id”: “dremio%3A%2FDQ%20Metrics%2Fdataqueryplatform-dev-jsondata2%2Fturik%2Fdata%2F%5C%22uscities-1%5C%22”,
“type”: “PHYSICAL_DATASET”,
“path”:[“DQ Metrics”,“dataqueryplatform-dev-jsondata2”,“turik”,“data”,"“uscities-1"”],
“format”: {
“type”: “Text”,
“fieldDelimiter”: “,”,
“lineDelimiter”: “\r\n”,
“quote”: “”",
“comment”: “#”,
“escape”: “”",
“skipFirstLine”: true,
“extractHeader”: true,
“trimHeader”: true,
“autoGenerateColumnNames”: true
}
}

Response:

{“errorMessage”:“Something went wrong”,“moreInfo”:“Entity id does not match the path specified in the dataset.”}

NOTE: Here is response when I obtain the ID through the request Get /api/v3/catalog/by-path
Request:

GET http://:/api/v3/catalog/by-path/DQ%20Metrics/dataqueryplatform-dev-jsondata2/turik/data/uscities-1

Response:

{“entityType”:“file”,“id”:“dremio:/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/“uscities-1"”,“path”:[“DQ Metrics”,“dataqueryplatform-dev-jsondata2”,“turik”,“data”,”“uscities-1"”]}

Would you please advise?

Hi @netmille,

There is either an error in our Docs or a bug in the API v3. I am encountering similar issues when using this endpoint in the manner we suggest in our Docs. Looking at the code it looks like problem is related to how quotes are (or are not) handled for either the id field or the path elements. I will get back you shortly with my findings.

In the meantime, you could try using the API v2 to promote the datasets and infer the request syntax by using the developer tools in your browser.

I was able to finally get it to work once I removed encoding within the URL:
around ‘usecities-1’.

https://: /api/v3/catalog/dremio%3A%2FDQ%20Metrics%2Fdataqueryplatform-dev-jsondata2%2Fturik%2Fdata%2Fuscities-1

POST data:
{
“entityType”: “dataset”,
“id”: “dremio:/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/“uscities-1"”,
“type”: “PHYSICAL_DATASET”,
“path”:[“DQ Metrics”,“dataqueryplatform-dev-jsondata2”,“turik”,“data”,”“uscities-1"”],
“format”: {
“type”: “Text”,
“fieldDelimiter”: “,”,
“lineDelimiter”: “\r\n”,
“quote”: “”",
“comment”: “#”,
“escape”: “”",
“skipFirstLine”: true,
“extractHeader”: true,
“trimHeader”: true,
“autoGenerateColumnNames”: true
}
}

It may be good idea to review the API for how Dremio handles encoding of ID’s. Thanks.

@netmille,

So you had this URL:

/api/v3/catalog/ dremio%3A%2FDQ%20Metrics%2Fdataqueryplatform-dev jsondata2%2Fturik%2Fdata%2F%5C%22uscities-1%5C%22

with this “id” in the request body:

“id”: “dremio%3A%2FDQ%20Metrics%2Fdataqueryplatform-dev-jsondata2%2Fturik%2Fdata%2F%5C%22uscities-1%5C%22”

and you changed these to:

/api/v3/catalog/ dremio%3A%2FDQ%20Metrics%2Fdataqueryplatform-dev-jsondata2%2Fturik%2Fdata%2Fuscities-1

and

“id”: “dremio:/DQ Metrics/dataqueryplatform-dev-jsondata2/turik/data/“uscities-1"”

Correct?

@netmille
Even after removing encoding within the URL, it shows 404.

For anyone still struggling with this, I also struggled on 4.9.3 (Enterprise Edition) but the symptoms didn’t appear until I had spaces included in the file name. What seems to work for me:

When generating the url encoded ID for the post call, url encode everything except for the file name(last node of path).