Stacking up tables

gray_log · June 2, 2018, 11:43am

Hello

I would like to know if there is an easy way to stack up tables in a single long table
In short, I have a greenplum installation with many schemas, the schemas have identical structure, tables etc, only the data is different, I would like to be able to stack table1 in schema1 with table1 in schema2 etc
so that I can use a simple select to get the data from all those tables in one go, I could use a union but it adds complications and usually are a performance killers, so I would like to fine a better way

A bit like in the discussion Stacking files? but with tables rather than files

Thanks!

Best regards
Mark

dealercrm · June 2, 2018, 12:29pm

Are you trying to avoid the union on the source database tables? Assuming due to performance?

Then my only thought is to perform a union on the many raw reflections of the physical source queries that you setup. Then configure a reflection on that to increase performance of any queries against the combined records.

kelly · June 2, 2018, 6:28pm

You can do a UNION ALL of the tables when creating a virtual dataset:

https://docs.dremio.com/working-with-datasets/virtual-datasets.html

If you’d like, you can create a reflection on this virtual dataset, and queries will then be rewritten to
use the reflection instead of pushing down into Greenplum.

If your query patterns frequently make use of subsets of the underlying partitions, you could experiment with creating a reflection on the physical datasets, and then a union as described above, and Dremio will decide which combination of reflections to use for a given query.

gray_log · June 3, 2018, 12:07am

Hello

Yes I’m trying to avoid unions on the source database tables, it impacts performances, I did not think about creating cascade reflections

My concern is that the original tables are large so the reflections would be large as well, but I will try your suggestions and I will let you know

Thanks heaps!
Mark

kelly · July 10, 2018, 3:55am

Mark - reflections are highly compressed, so they should be relatively small compared to the source. In addition, they don’t have to include all rows/columns, so you can pick only what is needed. Finally, you can use aggregation reflections to keep only aggregated reflections of the source. This works when your queries are only aggregation queries, which is the case for much of BI.

Just some ideas, hope this helps.

gray_log · August 30, 2018, 12:49am

Thanks!
I have been doing some testing and we can get some good compression

Thanks for the help
Mark

Topic		Replies	Views
Stacking files?	7	2210	December 10, 2019
How to combine multiple tables (same schema for all) into a single table with a single query?	5	7162	November 13, 2020
Incremental reflections with union	9	2694	December 4, 2019
UNION ALL not using aggregate reflections of underlying datasets	8	1225	November 15, 2021
Clarifications on Sorting and Partioning	11	3515	April 16, 2019

Stacking up tables

Related topics