Why use reflection on reading data from S3?

kelly · September 15, 2018, 8:51pm

Hello Akshat, a few questions:

where is Dremio deployed?
how many nodes in your Dremio cluster, and how much RAM, CPU cores per node?
are you only running queries through the SQL console in Dremio, or have you tried via ODBC/JDBC?

things that may help:

If you are creating Parquet files for Dremio, please see these recommendations on configurations for Parquet: https://docs.dremio.com/advanced-administration/parquet-files.html
If your raw data is already in Parquet, then a Raw Reflection may not provide any benefits as it is also in Parquet. It can be helpful in some cases: a) the Raw Reflections may be sorted or partitioned in a way that is different from the raw data, which can accelerate some queries; b) the Raw Reflections may be closer to your Dremio cluster or on a faster storage sub-system; c) it may contain a subset of the columns/rows of the source data; d) it may perform joins ahead of time, removing the need to perform the join at query time (denormalized). There are other examples, but hopefull you get the idea.
Aggregation Reflections can be a very significant performance improvement. It sounds like your particular reflection isn’t configured to cover the queries you are issuing. Can you describe how you have it configured and provide a sample query that isn’t being accelerated? Normally if the query profile says that it wasn’t covered by the reflection that means you are missing columns, or there is a join condition in the virtual dataset that makes it not cover your query. Another example is that you don’t have the correct aggregation operators enabled for a specific measure (ie, MAX, MIN).
Also, if you haven’t seen this tutorial it may be helpful: https://www.dremio.com/tutorials/getting-started-with-data-reflections/

Topic		Replies	Views
Reflection in AWS S3 is slow? store in EBS?	9	2299	June 29, 2018
Large Reflection creation, speed and performance	4	2238	April 16, 2019
Evaluating Dremio	3	2101	May 17, 2018
Error creating s3 reflection	6	1245	December 6, 2018
Reflection is not accelerating query	2	947	July 6, 2021

Why use reflection on reading data from S3?

Related topics