A way to replace the definition of a VDS?

Hi,

there is currently a way to create a VDS using command CREATE VDS,
and there is a way to create a table through CREATE TABLE AS SELECT (CTAS).

I would like to be able to replace a VDS and also replace a physical table ? Is there a command for that ?

My use case is the following : I create a physical dataset using CTAS. I would like to periodically update this physical dataset (eg with new daily data) without loosing the accelerations already created. The data fields would remain the same.

Thanks
David

What you might be able to do is to partition your data by time using CTAS:

/mytable/day1/files.parquet
/mytable/day2/files.parquet
etc

Then have a VDS that sits on top of /mytable.

This would only work in an append-only model, no ability to update individual rows, but you could re-run the CTAS for a given day.

Does that help?

Thanks @kelly.
I’ve already implemented mechanisms such as the one you describe.

From your answer, I understand there is currently no command to change the definition of a VDS, without removing it. I was hoping there was a command to mimic the “Save” button that we use when updating a dataset.

Hi, @dfleckinger

Did you find any solution to refresh without recreate your dataset based on CTAS? I have something like that. I saved the results in a parquet file in S3, but I need to increment it.

hi @pbofonseca, not really.
I could not use Dremio for that, as during my tests CTAS was not able to create parquet files without also creating a Physical dataset at the same time. Also at the time, parquet files created by Dremio were not readable by my other python tools (pandas, pyarrow). I think that since a few versions, pyarrow pandas should be able to read Dremio CTAS files so may be there’s a way to use CTAS.

What I currently do :

  • I have created a Dremio Physical Dataset (PDS) at the root.
  • For new data to be appended to this dataset, I extract data to a pandas DataFrame with Dremio, and write it as a parquet partitioned dataset on a daily basis within the PDS
  • After parquet writing, I execute the Dremio command “ALTER PDS {my_pds} REFRESH METADATA”, so that Dremio is able to see the files that have just been created.
1 Like

Also I recently saw that there is now a way ro replace a VDS :

https://docs.dremio.com/sql-reference/sql-commands/datasets.html

> CREATE OR REPLACE VDS <VIRTUAL-DATASET-PATH> AS <SQL-QUERY>

For example
> CREATE OR REPLACE VDS demo.jobs_vds as SELECT * FROM "oracle_e2e".DREMIO.JOBS

I don’t know which version introduced that but that’s a good new !

1 Like

I’ll try this!

I think that is my last step. Thanks a lot!