How to create VDS on top of parquet file folder(has partition columns)

I want to run some curl command to create a VDS on top of a HDFS parquet folder path, can you please let me know the curl command.
input might needed:
hdfs parquet folder path: /my/src/path
VDS path and view name: DEV/FRM/TEST_VDS
Dremio UI username/password: username/pwd
Dremio UI link: http://mcdevdremio:9766/

Can you please provide the command that can make VDS DEV.FRM.TEST_VDS.
the parquet folder file has partition columns(date_yyyymmdd and catg), so the folder structure like below:
├── date_yyyymmdd=20120710
│ ├── catg=DIV
│ │ └── part-00108-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ ├── catg=IP
│ │ └── part-00191-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ └── catg=XT
│ └── part-00081-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
├── date_yyyymmdd=20120711
│ ├── catg=DIV
│ │ └── part-00091-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ ├── catg=IP
│ │ └── part-00150-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ └── catg=XT
│ └── part-00064-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet

should the step be:
1.get token for auth,
2.generate PDS on top of parquet path by curl command
3.generate VDS on top of PDS by curl command?
If yes, Can you please send the command for each? Thanks.

@dolphinlei Yes you need token

  • First promote the PDS using API or enable “Automatically format files into physical datasets when users issue queries.” in the source metadata properties, API will be Table | Dremio Documentation

  • Then you can use the POST SQL API to create the VDS

Hi Balaji,
can you please share the steps how to generate token to run those commands?

To generate a token, you need to use the Login API:

So, with cUrl, that would be :
curl http://mcdevdremio:9766/apiv2/login -X POST -H 'Content-Type: application/json' -d"{\"userName\":\"username\",\"password\":\"pwd\"}"

You then just extract the value of the token field from the response and concatenate it with _dremio

So your token ends up looking something like :
_dremiogrd44us5gkbuo01m2s61i80b7o

1 Like

@AndyH thanks for the response, I can generate token:
and get the token like “psk4dfrghtyu7f6n2tqfgctj”, and retrieved my Dremio token “_dremiopsk4dfrghtyu7f6n2tqfgctj”

Is the below command to generate PDS in Dremio?
curl -X POST http://mydremiohost:9700/api/v3/catalog/DEV/dl/STORE/PARQUET/SAMPLE_TEMP -H ‘Content-Type: application/json’ -H ‘Authorization: Bearer _dremiopsk4dfrghtyu7f6n2tqfgctj’ -d ’
{
“entityType”: “dataset”,
“id”: “dremio%3A%2FDEV%2Fdl%2FSTORE%2FPARQUET%2FSAMPLE_TEMP”,
“path”: [
“DEV”,
“dl”,
“STORE”,
“PARQUET”,
“SAMPLE_TEMP”
],
“type”:“PHYSICAL_DATASET”,
“format”: {
“type”: “Parquet”
}
}’

I ran an got the “HTTP 404 Not Found” error. I am sure the parquet path is correct and there is parquet file there. please advise.

Hi team, can you please advise?

Hi,

The 404 means that the URL you are POSTing to is invalid. When you POST a dataset to /api/v3/catalog the rest of the URL needs to be the ID of the dataset. You are currently putting in the path elements as the URL.

If you look at the API docs here: Table | Dremio Documentation
and scroll down to the “Example Request for Excel format type” section, you can see that to specify a dataset you need to transform the path to the dataset into a URL encoded string. The example in the docs is : dremio%3A%2FSamples%2Fsamples.dremio.com%2FDremio%20University%2Foracle-departments.xlsx

Where you have the “id” as the body, you should be putting that as the URL.
E.g.: curl -X POST http://mydremiohost:9700/api/v3/catalog/dremio%3A%2FDEV%2Fdl%2FSTORE%2FPARQUET%2FSAMPLE_TEMP

Don’t include the id in the body, follow the body example in the docs, so your body should be :

{
  "entityType": "dataset",
  "path": [
    "DEV",
    "dl",
    "STORE",
    "PARQUET",
    "SAMPLE_TEMP"
  ],
  "type": "PHYSICAL_DATASET",
  "format": {
    "type": "Parquet"
  }
}

Obviously, I’ve not tested the above as I don’t have the same data structure, but let me know if that works

thanks, have tried, it works. But one point is when run the curl command , need to remove “Bearer” from Authorization part, so the command is below:
curl -X POST ‘http://mydremio:96600/api/v3/catalog/dremio%3A%2FDEV%2Fdl%2Ffin%2FSTORE%2FPARQUET%2F%SAMPLE_TEST_DREMIO_CREATE33’ -H ‘Content-Type: application/json’ -H ‘Authorization: _dremio5ret45peugqwerftgtyuuh6788’ -d ’
{
“entityType”: “dataset”,
“id”: “dremio%3A%2FDEV%2Fdl%2Ffin%2FSTORE%2FPARQUET%2F%SAMPLE_TEST_DREMIO_CREATE33”,
“path”: [
“DEV”,
“dl”,
“fin”,
“STORE”,
“PARQUET”,
“SAMPLE_TEST_DREMIO_CREATE33”
],
“type”:“PHYSICAL_DATASET”,
“format”: {
“type”: “Parquet”
}
}’ -vvv