Struggling to update Dataset via API

Hi All

I am trying to set the accelerationRefreshPolicy method of a promoted dataset to be INCREMENTAL.

However, I am getting the following error: “Missing type id when trying to resolve subtype of [simple type, class com.dremio.dac.api.CatalogEntity]: missing type id property ‘entityType’\n at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 1]”

I realise that it seems a simple enough fix: Specify the entityType for the dataset, but when checking the body, the entityType is set:

image

My steps thusfar has been:

  1. Use GET to retrieve the “catalog/by-path/” end point to retrieve the dataset profile
  2. Check the accelerationRefreshPolicy method in the retrieved dataset profile (NULL in this case for some reason)
  3. Add the accelerationRefreshPolicy structure to the dataset profile with the (refreshPeriodMs, gracePeriodMs & method)
  4. Send the update with put (api/v3/catalog/{id})

WHen I inspect the dataset profile object, I do see the entityType is set as dataset, am I perhaps missing something obvious?

Thanks for any help as always.

Can you provide the PUT body?

Hi doron

Attached is the body:

{“entityType”:[“dataset”],
“id”:[“a68113f1-49e4-496d-90a6-86510817454f”],
“type”:[“PHYSICAL_DATASET”],
“path”:[“dmpsto”,“rawdata”,“instrumentstatic”,“instrumentstatic-asisa”,“v2.6”],
“createdAt”:[“2020-03-10T06:19:08.045Z”],
“tag”:[“0”],
“format”:{
“type”:[“Text”],
“ctime”:[0],
“isFolder”:[true],
“location”:["/invaccdata/raw/instrumentstatic/instrumentstatic-asisa/v2.6"],
“fieldDelimiter”:[","],
“skipFirstLine”:[false],
“extractHeader”:[true],
“quote”:["""],
“comment”:["#"],
“escape”:["""],
“lineDelimiter”:["\n"],
“autoGenerateColumnNames”:[true],
“trimHeader”:[true]
},
“approximateStatisticsAllowed”:[false],
“fields”:[{“name”:[“AssetManagerCode”],“type”:{“name”:[“VARCHAR”]}},
{“name”:[“AssetManagerName”],“type”:{“name”:[“VARCHAR”]}},
{“name”:[“ReportStartDate”],“type”:{“name”:[“VARCHAR”]}},
{“name”:[“ReportEndDate”],“type”:{“name”:[“VARCHAR”]}},
{“name”:[“ValuationDate”],“type”:{“name”:[“VARCHAR”]}}
],
“accelerationRefreshPolicy”:{
“refreshPeriodMs”:[86400000],
“gracePeriodMs”:[86400000],
“method”:[“INCREMENTAL”],
“refreshField”:[""]
}
}

I just noticed that when I try to retrieve the dataset details via catalog/bypath/ I get a the folder profile back vs the promoted dataset profile:

{“entityType”:[“folder”],
“id”:[“dremio:/dmpsto/rawdata/instrumentstatic/instrumentstatic-asisa/“v2.6"”],
“path”:[“dmpsto”,“rawdata”,“instrumentstatic”,“instrumentstatic-asisa”,”“v2.6"”],
“children”:[]}

Checking the results catalog/bypath one level up, I can see this folder’s promoted status with its ID.

Is this the expected result? I am just wondering how to use this to check if the folder was promoted previously…

If I do a GET for http://localhost:9047/api/v3/catalog/by-path/nas/folder, I see it as a dataset.

Regarding your body, the problem seems to be that everything is an array (“entityType”:[“dataset”]) for some reason? It should be “entityType”:“dataset”,. An example request:

{
  "entityType": "dataset",
  "id": "4fe86a74-7ca0-4a83-92c9-615baa4bcc71",
  "type": "PHYSICAL_DATASET",
  "path": [
    "nas",
    "folder"
  ],
  "tag": "KBE+hBbcQMA=",
  "accelerationRefreshPolicy": {
    "refreshPeriodMs": 3600000,
    "gracePeriodMs": 10800000,
    "method": "INCREMENTAL"
  },
  "format": {
    "type": "Text",
    "ctime": 0,
    "isFolder": true,
    "location": "/Users/doron/mystuff/data/folder",
    "fieldDelimiter": ",",
    "skipFirstLine": false,
    "extractHeader": false,
    "quote": "\"",
    "comment": "#",
    "escape": "\"",
    "lineDelimiter": "\r\n",
    "autoGenerateColumnNames": true,
    "trimHeader": true
  },
  "accessControlList": {}
}

Hi doron

Apologies, I missed that the R json parser that I used to print the object showed the objects as arrays. Below is the results that the python scrip generated when I send in the body and this is what is sent in the put:

{
"entityType": "dataset", 
"id": "a68113f1-49e4-496d-90a6-86510817454f", 
"type": "PHYSICAL_DATASET", 
"path": ["instrumentstatic", "v2.6"], 
"createdAt": "2020-03-10T06:19:08.045Z", 
"tag": "54", 
"format": {"type": "Text", 
		"ctime": 0, 
		"isFolder": true, 
		"location": "/instrumentstatic/v2.6", 
		"fieldDelimiter": ",", 
		"skipFirstLine": false, 
		"extractHeader": true, 
		"quote": "\"", 
		"comment": "#", 
		"escape": "\"", 
		"lineDelimiter": "\n", 
		"autoGenerateColumnNames": true, 
		"trimHeader": true}, 
"approximateStatisticsAllowed": false, 
"fields": [{"name": "AssetManagerCode", "type": {"name": "VARCHAR"}}], 
"accelerationRefreshPolicy": {"refreshPeriodMs": 86400000.0, 
							  "gracePeriodMs": 86400000.0, 
							  "method": "INCREMENTAL"}
}

And you are PUTing to apiv3/catalog/a68113f1-49e4-496d-90a6-86510817454f?

Hi doron

Apologies for the late response.

The put is to:

http://localhost:9047/api/v3/catalog/a68113f1-49e4-496d-90a6-86510817454f

I suspect that your original note regarding the JSON format might have been on to something. I am checking the underlying function on my side for the put and will let you know if the problem persists.

Thanks for all the help.