Promoting file to dataset using REST API

Hi,
I am trying to use the the Dremio REST API (through the Python package), and I am having a hard time getting my API call to work. I believe there is some problem in the way I am quoting the name/path of the dataset, but I can’t figure out what exactly the problem is:

Following the last comment in this post:

I’m using the following call:

curl --location --request POST ‘http://eellsworthdev:9047/api/v3/catalog/dremio%3A%2F"Checkbook_cataloged_data"%2F"RxTerms"%2F"run_5930_2019-10-31-144718"%2Ffile_88163_RxTermsArchive201910.txt
–header ‘Accept: /
–header ‘Accept-Encoding: gzip, deflate’
–header ‘Connection: keep-alive’
–header ‘Content-Length: 517’
–header ‘User-Agent: python-requests/2.25.1’
–header ‘content-type: application/json’
–data-raw ‘{“entityType”: “dataset”,
“id”: “dremio:/Checkbook_cataloged_data/RxTerms/"run_5930_2019-10-31-144718"/file_88163_RxTermsArchive201910.txt”,
“path”: [“"Checkbook_cataloged_data"”,
“"RxTerms"”,
“"run_5930_2019-10-31-144718"”,
“file_88163_RxTermsArchive201910.txt”]
,
“type”: “PHYSICAL_DATASET”,
“format”: {
“type”: “Text”,
“fieldDelimiter”: “|”,
“lineDelimiter”: “\r\n”,
“quote”: “"”,
“comment”: “#”,
“escape”: “"”,
“skipFirstLine”: false,
“extractHeader”: true,
“trimHeader”: true,
“autoGenerateColumnNames”: true
}
}’
which produces this error:
Unrecognized token ‘tru’: was expecting (JSON String, Number, Array, Object or token ‘null’, ‘true’ or ‘false’)
at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 17, column: 25] (through reference chain: com.dremio.dac.api.Dataset[“format”])

I have played around with all the combinations of URL encoding and quoting I can think of and I get similar errors.

Does anyone have any suggestions on how to resolve this?

Thanks!
Eric

Here is an example of promoting a CSV file, generated from an REST API tool using regular data and not the raw data:

  --url http://autorelease:9047/api/v3/catalog/dremio%3A%2FSamples%2Fsamples.dremio.com%2Fzip_lookup.csv \
  --header 'Authorization: _dremio613ak71ujlg94pgq8748o8m9t5' \
  --header 'Content-Type: application/json' \
  --data '{
  "entityType": "dataset",
  "type": "PHYSICAL_DATASET",
  "path": [
    "Samples",
    "samples.dremio.com",
    "zip_lookup.csv"
  ],
  "format": {
    "type": "Text",
    "fullPath": [
      "Samples",
      "samples.dremio.com",
      "zip_lookup.csv"
    ],
    "ctime": 0,
    "isFolder": false,
    "location": "/samples.dremio.com/zip_lookup.csv",
    "fieldDelimiter": ",",
    "skipFirstLine": false,
    "extractHeader": false,
    "quote": "\"",
    "comment": "#",
    "escape": "\"",
    "lineDelimiter": "\r\n",
    "autoGenerateColumnNames": true,
    "trimHeader": true
  }
}'```

It might be that raw-data requires true to be "true" in JSON?
1 Like

Hi @doron ,
Thanks for the reply. I was comparing your request with the documentation at:
https://docs.dremio.com/rest-api/catalog/post-catalog-id/
and I noticed some differences:

  • “fullPath” appears as an element of format but is a top-level element in the docs
  • “ctime” is not mentioned in the docs
  • the docs mention “id” but not “location” (this seems to work either way).

Once I added fullPath and ctime, things seemed to work.

I appreciate the help.

Eric