there is currently a way to create a VDS using command CREATE VDS,
and there is a way to create a table through CREATE TABLE AS SELECT (CTAS).
I would like to be able to replace a VDS and also replace a physical table ? Is there a command for that ?
My use case is the following : I create a physical dataset using CTAS. I would like to periodically update this physical dataset (eg with new daily data) without loosing the accelerations already created. The data fields would remain the same.
Thanks @kelly.
I’ve already implemented mechanisms such as the one you describe.
From your answer, I understand there is currently no command to change the definition of a VDS, without removing it. I was hoping there was a command to mimic the “Save” button that we use when updating a dataset.
Did you find any solution to refresh without recreate your dataset based on CTAS? I have something like that. I saved the results in a parquet file in S3, but I need to increment it.
hi @pbofonseca, not really.
I could not use Dremio for that, as during my tests CTAS was not able to create parquet files without also creating a Physical dataset at the same time. Also at the time, parquet files created by Dremio were not readable by my other python tools (pandas, pyarrow). I think that since a few versions, pyarrow pandas should be able to read Dremio CTAS files so may be there’s a way to use CTAS.
What I currently do :
I have created a Dremio Physical Dataset (PDS) at the root.
For new data to be appended to this dataset, I extract data to a pandas DataFrame with Dremio, and write it as a parquet partitioned dataset on a daily basis within the PDS
After parquet writing, I execute the Dremio command “ALTER PDS {my_pds} REFRESH METADATA”, so that Dremio is able to see the files that have just been created.