Hello Akshat, a few questions:
- where is Dremio deployed?
- how many nodes in your Dremio cluster, and how much RAM, CPU cores per node?
- are you only running queries through the SQL console in Dremio, or have you tried via ODBC/JDBC?
things that may help:
-
If you are creating Parquet files for Dremio, please see these recommendations on configurations for Parquet: https://docs.dremio.com/advanced-administration/parquet-files.html
-
If your raw data is already in Parquet, then a Raw Reflection may not provide any benefits as it is also in Parquet. It can be helpful in some cases: a) the Raw Reflections may be sorted or partitioned in a way that is different from the raw data, which can accelerate some queries; b) the Raw Reflections may be closer to your Dremio cluster or on a faster storage sub-system; c) it may contain a subset of the columns/rows of the source data; d) it may perform joins ahead of time, removing the need to perform the join at query time (denormalized). There are other examples, but hopefull you get the idea.
-
Aggregation Reflections can be a very significant performance improvement. It sounds like your particular reflection isn’t configured to cover the queries you are issuing. Can you describe how you have it configured and provide a sample query that isn’t being accelerated? Normally if the query profile says that it wasn’t covered by the reflection that means you are missing columns, or there is a join condition in the virtual dataset that makes it not cover your query. Another example is that you don’t have the correct aggregation operators enabled for a specific measure (ie, MAX, MIN).
-
Also, if you haven’t seen this tutorial it may be helpful: https://www.dremio.com/tutorials/getting-started-with-data-reflections/