Regarding join operation from two different sources

akanksha · February 22, 2021, 6:20am

Suppose we took one dataset from mysql and one dataset from hive and after performing join operation, shall we save the resulting dataset in one of the source like mysql, hive, hdfs etc.

balaji.ramaswamy · February 23, 2021, 4:50am

@akanksha

You simply have to save the SQL as a VDS under a space and when ever you execute the VDS the join will take place automatically

http://docs.dremio.com/sql-reference/sql-commands/datasets.html#managing-virtual-datasets

akanksha · February 23, 2021, 5:11am

@balaji.ramaswamy
Can we pull virtual dataset and save it into any data source like hdfs, hive, mysql etc.
It should be the 2-ways process if not can you please clear this point.

balaji.ramaswamy · February 23, 2021, 5:24am

@akanksha

What is the reason you want to store back on Hive or HDFS? is it to be consumed by another tool? You can CTAS the query back to HDFS/S3/Azure Storage as Parquet but not Hive. The advantage of writing it back as CTAS is all Parquet will be faster. The better way of handling this would be to create a reflection on the VDS so it refreshes automatically

http://docs.dremio.com/sql-reference/sql-commands/tables.html
http://docs.dremio.com/acceleration/

There is a white paper on Reflections too

akanksha · February 24, 2021, 12:19pm

@balaji.ramaswamy
What is the reason you want to store back on Hive or HDFS? is it to be consumed by another tool? YES
Is enterprise having this feature of storing back the resulting dataset in hdfs, hive etc.?

balaji.ramaswamy · February 24, 2021, 2:57pm

@akanksha

Both Community and Enterprise editions have the ability to write back to HDFS using CTAS

https://docs.dremio.com/sql-reference/sql-commands/tables.html

Both Community and Enterprise editions DO NOT have the ability to write back to Hive

akanksha · February 25, 2021, 11:42am

@balaji.ramaswamy
thanks

Topic		Replies	Views
Could I save query out result into a Hive source?	2	1402	December 27, 2019
Save data in the source database	2	1754	November 14, 2019
How do you store data into HDFS?	2	1352	November 25, 2019
How join works with two different data sources	6	3397	October 29, 2019
Creating / altering datasets with SQL	8	1601	June 25, 2021

Regarding join operation from two different sources

Related topics