Storage required for reflection

srini8881 · August 18, 2020, 8:01am

Hi, I am pretty new to Dremio and have questions that might be basic. My apologies for the same.

In the document related to reflection, it is given:

“Dremio maintains the data in a highly compressed, columnar form based on Apache Parquet. Dremio stores this data near Dremio’s query engine which can be scaled out with additional nodes to support larger data volumes, greater concurrency, and lower latency”

My questions are:

If I am creating a reflection based on a dataset that is fetching 1 TB of records, then does Dremio store the entire 1 TB in an optimized format after compressing? (I understand that parquet can compress the data by 95%)
From infrastructure planning, how much storage is supposed to allocate in such a scenario? Is there a best practice document which we can refer to plan the storage in general?

Thank you,
Srini.

datocrats-org · August 19, 2020, 4:55pm

Yes it does compress it in a reflection.

You need to run a test query to compare partial result on physical data set and in Dremio to get an estimate on the size needed. If you can run the same query on the database where the record resides to see its compressed size there or get file size from subset of the file if its raw. The storage needed would be proportionate to whatever your test query gave, though it would do even better assuming your data repeats all or part or has a sequential date or ID column.

In the reflections tab it will show the size of the reflection, admin Jobs view, select the job and it will show the size, and in admin reflections list it shows the footprint of the reflection itself.

balaji.ramaswamy · August 23, 2020, 6:33am

@Srini

It is very important we only create reflections on columns needed, also best to create reflections at the Semantic layer, here are 2 white papers

srini8881 · September 3, 2020, 1:40pm

thanks a lot Balaji. Will explore.

Topic		Replies	Views
Why use reflection on reading data from S3?	2	2766	September 15, 2018
Evaluating Dremio	3	2112	May 17, 2018
Large Reflection creation, speed and performance	4	2246	April 16, 2019
Non-Trivial Reflection Storage Costs	4	1187	September 30, 2022
Dremio Architecture Site Feedback	13	5869	November 29, 2018

Storage required for reflection

Related topics