I want to run some curl command to create a VDS on top of a HDFS parquet folder path, can you please let me know the curl command.
input might needed:
hdfs parquet folder path: /my/src/path
VDS path and view name: DEV/FRM/TEST_VDS
Dremio UI username/password: username/pwd
Dremio UI link: http://mcdevdremio:9766/
Can you please provide the command that can make VDS DEV.FRM.TEST_VDS.
the parquet folder file has partition columns(date_yyyymmdd and catg), so the folder structure like below:
├── date_yyyymmdd=20120710
│ ├── catg=DIV
│ │ └── part-00108-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ ├── catg=IP
│ │ └── part-00191-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ └── catg=XT
│ └── part-00081-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
├── date_yyyymmdd=20120711
│ ├── catg=DIV
│ │ └── part-00091-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ ├── catg=IP
│ │ └── part-00150-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
│ └── catg=XT
│ └── part-00064-7340bbf6-43a4-4574-a9bd-f2eb8c0c929c.c000.snappy.parquet
should the step be:
1.get token for auth,
2.generate PDS on top of parquet path by curl command
3.generate VDS on top of PDS by curl command?
If yes, Can you please send the command for each? Thanks.
First promote the PDS using API or enable “Automatically format files into physical datasets when users issue queries.” in the source metadata properties, API will be Table | Dremio Documentation
Then you can use the POST SQL API to create the VDS
To generate a token, you need to use the Login API:
So, with cUrl, that would be : curl http://mcdevdremio:9766/apiv2/login -X POST -H 'Content-Type: application/json' -d"{\"userName\":\"username\",\"password\":\"pwd\"}"
You then just extract the value of the token field from the response and concatenate it with _dremio
So your token ends up looking something like : _dremiogrd44us5gkbuo01m2s61i80b7o
@AndyH thanks for the response, I can generate token:
and get the token like “psk4dfrghtyu7f6n2tqfgctj”, and retrieved my Dremio token “_dremiopsk4dfrghtyu7f6n2tqfgctj”
Is the below command to generate PDS in Dremio?
curl -X POST http://mydremiohost:9700/api/v3/catalog/DEV/dl/STORE/PARQUET/SAMPLE_TEMP -H ‘Content-Type: application/json’ -H ‘Authorization: Bearer _dremiopsk4dfrghtyu7f6n2tqfgctj’ -d ’
{
“entityType”: “dataset”,
“id”: “dremio%3A%2FDEV%2Fdl%2FSTORE%2FPARQUET%2FSAMPLE_TEMP”,
“path”: [
“DEV”,
“dl”,
“STORE”,
“PARQUET”,
“SAMPLE_TEMP”
],
“type”:“PHYSICAL_DATASET”,
“format”: {
“type”: “Parquet”
}
}’
I ran an got the “HTTP 404 Not Found” error. I am sure the parquet path is correct and there is parquet file there. please advise.
The 404 means that the URL you are POSTing to is invalid. When you POST a dataset to /api/v3/catalog the rest of the URL needs to be the ID of the dataset. You are currently putting in the path elements as the URL.
If you look at the API docs here: Table | Dremio Documentation
and scroll down to the “Example Request for Excel format type” section, you can see that to specify a dataset you need to transform the path to the dataset into a URL encoded string. The example in the docs is : dremio%3A%2FSamples%2Fsamples.dremio.com%2FDremio%20University%2Foracle-departments.xlsx
Where you have the “id” as the body, you should be putting that as the URL.
E.g.: curl -X POST http://mydremiohost:9700/api/v3/catalog/dremio%3A%2FDEV%2Fdl%2FSTORE%2FPARQUET%2FSAMPLE_TEMP
Don’t include the id in the body, follow the body example in the docs, so your body should be :