Creating / altering datasets with SQL

There is some documentation here for working with datasets via SQL. Is this documentation complete? There is no documented way to format a PDS via SQL.

@Joe

Two options

  • For file system sources, edit the source and click on the metadata tab and check “Automatically format files into physical datasets when users issue queries”, if user issues a query, Dremio should automatically format the datasets. However if you are formatting CSV format, then please keep an eye as the default CSV options, example comma as the delimiter may not apply for you and the promotion would result in wrong data. If it is JSON or PARQUET you should be good
  • Using REST API Promote Dataset

Hi Balaji,

I tried promoting a parquet file → csv file but it only ran a query. How do I convert parquet files to CSV files? Or is there an option I can use when creating parquet (CTAS option) that would do this?

Thanks
Wayne

@waynekoepcke using Dremio, you can only convert CSV to PARQUET files using CTAS and not vice versa. Is there a reason you want to convert Parquet to CSV as PARQUET is more performant

Hi Balaji,

We are currently using Apache Drill to create our datalake. We use Dremio to view it.
The datalake is composed of parquet files. However there are additional reports (csv files)
we generate out of that datalake on a nightly basis at the request of our customers.

I am tasked with retiring Drill, replacing it with Dremio for all write operations.
I’m looking for the best way to replace this piece of functionality. Currently we are using the Drill “set store.format = ‘csv’” feature with a CTAS statement.

Do you have any suggestions?
We are running inside a JVM when this operation happens.

Thanks
Wayne

@waynekoepcke Writing CSV’s are not performant so best bet is to change to writing to parquet in the first place. If that is not possible then you can do a CTAS in Dremio on the CSV file or create reflections on top of the CSV file

Hi Balaji,

Thanks for the response. Though I am not sure that was my question.

Just to clarify, we have parquet files, and we are going to start using Dremio to create those files (We are replacing apache drill). We also create some CSV files by querying those parquet files. These are nightly reports we send to our customers.

I am wondering if there is a way to automate the creation of CSV files?

Thanks
Wayne

@waynekoepcke Currently No,

OK - Thanks for the clarification