Dremio Table Format as Parquet Dynamically

Dear Team,

I’m new bee to the Dremio and in exist system, a normal table will be created on daily basis. Once this process is done, manually we will change the table in Parquet format, set auto reflection of data on every 1 hour in settings and save it in our project folder with Custom name.

Hence we do the said process manually, we are facing some issues in automate the reports .

Kindly give us a solution to do this manual work by scripting or any other best methods.

Thank You,

WR,
Sathish R

@sathhishcse

Once the data is available in the datalake, you can do the below

Let Dremio discover the data, this can automatically done by the background refresh job, see the source settings, under the metadata tab (or) you can run a command “alter PDS refresh metadata”, documentation for both below


Next, you would want to refresh the reflection, for this you can use the API provided

Then you are ready to run your dashboards

Thanks
Bali

Hi Balaji,

We have a mandatory manual daily exercise in Dremio which is used to enable the reports. I have documented our activity and request you to guide us to automate the same using any scripts.

WR,

Sathish R

(Attachment Daily Dremio Activity.docx is missing)

Attachment as Zip file.

Daily Dremio Activity.zip (117 KB)

@sathhishcse

Here are the automation steps

  • To automatically promote a folder, you can check “Automatically format files into physical datasets when users issue queries” under the source metadata tab
  • Create reflection can be done via SQL (This will also trigger a refresh)

alter dataset BI.“p1_vds” CREATE AGGREGATE REFLECTION p1_vds_agg
using dimensions (c1,c2) measures (p1(count, min),
p2(sum, max,approximate count distinct))
partition by (c1) localsort by (c2)

http://docs.dremio.com/sql-reference/sql-commands/acceleration.html

Future refreshes can be done via the REST API (Note the id is the PDS ID and not the VDS or reflection ID, even if the reflection is on a VDS, the ID needs to be the ID of the PDS that the VDS is dependent of)

http://docs.dremio.com/rest-api/catalog/post-catalog-id-refresh.html

  • Saving the VDS can be done via SQL,

http://docs.dremio.com/sql-reference/sql-commands/datasets.html

Also it is better we first create the VDS first and turn on reflection on the VDS so you can take advantage of default reflections

http://docs.dremio.com/release-notes/48-release-notes.html#default-reflections

Hi Balaji,

Thanks for your guidance.

We have successfully automated the folder format option and due to some performance issue, handling reflection has been dropped off in our case.

Kindly help us to automate the below process

As discussed earlier, once the folder format is done, our next process is to save the data from Data lake to available spaces.

For Ex. Our Space structure is as below

Master

Application

Staging

As of now, to move the data from Data lake to spaces, manually we will select the save as option à Giving name for the table à select the Space path( for ex. Master à Staging) à Save.

We need to automate this process so that the full Dremio process will free from manual process.

While trying to automate with Python, due to version problem or don’t know the reason exactly can’t able to proceed with examples given in Dremio forum.

As we are very new to Dremio/API, I request you to give us the same code or sample steps to automate the same.

Kindly find our Dremio version detail as below.

2.1.1-dremio-202007281636460620-4925919 r4925919011d4a13cb3d9dc3c930f816300a6070e

Thanks in advance and kindly revert at the earliest.

WR,

Sathish R

@sathhishcse

Dremio 2.1 > 3 years old, the API cals have completely changed. Any reason you are on Dremio 2.1.1?

Thanks
Bali

Hi Balaji,

Thanks for your feedback.

Regarding Dremio 2.1.1 - Hence its handled by another team, we will let you know the reason.

Is the said process can be automated using latest Dremio Version ? If Yes, we will ask concern team to update the same.

Kindly suggest the compatible Dremio version and process.

WR,
Sathish R

@sathhishcse

We can automate in 2.1 too but it is a really old version and some of the API calls have changed

But to automate all this, you need to go through these

Hi Balaji,

We gone through the documentation available on Dremio forum and found by REST API mentod in Python code to achieve the VDS creation.

But as said earlier we cant find any examples with API Ver 2 and got 404 error when tried with apiv2 and got 401 error with api/v3(even authenticated).

Our requirement is as highlighted in below image.

Kindly find the attached Python script for your reference.

Dremio.zip (891 Bytes)

Got 404 error message
{“errorMessage”:“Something went wrong. Please check the log file for details, see https://docs.dremio.com/advanced-administration/log-files.html",“moreInfo”:"HTTP 404 Not Found”}

Kindly help us to meet our requirement and kindly ignore the mistakes in above script.

We found that the installed Dremio Version is “4.9.1-202010230218060541-2e764ed0” but while quering select version() in Dremio API, its showing V2.2. Whether we need to upgrade the API version to 3? Kindly feedback.

WR,
Sathish R

@sathhishcse

Not able to follow why you have use V2 API to create a virtual dataset, have you tried to us the below?

http://docs.dremio.com/rest-api/catalog/post-catalog.html

Hi Balaji,

This is regarding automation of creating a VDS in space from available dataset.

I have tried the below Python script for the automation but getting 401 error. Can you please help me out .
Script:

import json
import requests
import dremio_client

url = "http://localhost:9047/api/v3/catalog"
payload = "{\n  \"entityType\": \"dataset\",\n  \"path\": [\n    \"Wisely\",\n    \"Staging_C\",\n\t\"CDRTest-202102022\"\n  ],\n\t\"type\": \"VIRTUAL_DATASET\",\n\t\"sql\": \"select * from 19\",\n\t\"sqlContext\": [\"tbscitannldproddl01\", \"datalate\prod-compact\cdrin\2021\07\"]\n}"
headers = {
    'Authorization': "_dremioo8opojj6vn4ughkvcpalpr46d6",
    'Content-Type': "application/json",
    'cache-control': "no-cache",
    'Postman-Token': "04e10d12-e287-4052-8c0a-81108d6409a5"
    }

response = requests.request("POST", url, data=payload, headers=headers)
print(response)

Response is <Response [401]>

Thank You

WR,
Sathish R

@sathhishcse If you run the same script from a JDBC tool or the Dremio UI, does the VDS gets created?

Hi Balaji,

Sorry for the delayed response.

Can you please help me to run the above script in SQL Query panel.

WR,
Sathish R

@sathhishcse Open the failed profile and see what query is getting fired

Hi Balaji,

After running the above script, there is no entries found in Dremio jobs. Kindly do the needful.

WR,
Sathish R

How are you generating the token? 401 means the token is invalid or the expired.

Hi Doron,

Greetings.

Thanks for your response. I have copied the token and can you please share the steps to generate the Tokens for this requirement.

WR,
Sathish R

This is covered in our documentation under REST API Overview.

Hi Doron,

Regarding the last update, We have generated the Authorization key dynamically and tried. Now it shows Response:400. In headers, whether Postman-Token to be generated using Postman application? Kindly share any sample script to achieve the same.

Kindly find the attached snip for your reference.

WR,
Sathish R