Could I save query out result into a Hive source?

WileyXu · December 24, 2019, 11:11am

I my case, I want to save current result dataset into my Hive so I can do more query base on this dataset and no need query out it again.

Thanks

balaji.ramaswamy · December 25, 2019, 6:07pm

@WileyXu

Hive is only a query shell, actual data is stored in HDFS. Here is it what you can do

#1 Add a HDFS source pointing to the namenode
#2 Use CTAS. “Enable exports into the source (CTAS and DROP)”, see attached screenshot below

#3 Run the query of your choice and add a "Create table as hdfs.source_name.full_path. This would generate PARQUET files with the output of the query in the HDFS path defined

#4 Create Hive external table (via Beeline or Hive shell) pointing to the Parent Parquet folder containing. these Parquet files

#5 Use Dremio to query the files back via the Hive source

Thanks
@balaji.ramaswamy

WileyXu · December 27, 2019, 1:52am

Thanks a lot. Very helpful

Topic		Replies	Views
How do you store data into HDFS?	2	1346	November 25, 2019
Regarding join operation from two different sources	6	1277	February 25, 2021
Streamlining Query Access to Parquet Tables in Dremio Connected to HDFS	6	338	January 31, 2024
Hive 2.x data source	3	992	July 6, 2020
Keep daily snapshot from source tables Dremio University	3	903	February 15, 2023

Could I save query out result into a Hive source?

Related topics