Dremio reflection creation performance tuning

igreg · July 16, 2019, 7:17pm

Is there currently any guide on performance tuning dremio reflections, specifically the creation of reflections when underlying PDFS reflection storage is HDFS?

We have a use case where we create a reflection on a physical data source (which is an HDFS directory) with approx 16M records and it takes ~2.5 minutes which is good. However when we create a subsequent reflection on the same dataset that includes a partition column, the reflection takes 1.5 hours even though its using the first reflection to accelerate.

We noticed that when we add two partition columns instead of one, the time goes down to 35 minutes.

Our setup uses HDFS for the underlying PDFS reflection storage. We don’t see CPU or memory being the bottleneck.

Is there any further tuning we can do to improve the performance of creating reflections specifically for HDFS? I see many parameters available in Dremio options.

balaji.ramaswamy · July 16, 2019, 11:58pm

HI @igreg

Is the partition on a very high cardinality column?

igreg · July 17, 2019, 2:52pm

The dataset is for a single day and the partition is on a date column which has carnality 1 at this time (i.e just data for a single date). The dataset will have more days added incrementally and thus why the date partition column.

Based on the query profile, it appears the reading, sorting and writing steps are done in sequence whereas with the raw reflection (with no partition columns) it is running these steps in parallel. In addition if we add more sub-partition columns to the reflection, Dremio also performs these read, sort, write steps in parallel.

Is there any other way to parallelize the reflection creation with a partition when carnality is very low?

Topic		Replies	Views
Reflection Creation/Refresh taking too long	9	2326	September 30, 2022
Large Reflection creation, speed and performance	4	2240	April 16, 2019
Evaluating Dremio	3	2102	May 17, 2018
Why use reflection on reading data from S3?	2	2757	September 15, 2018
Creating reflections taking very long	3	1194	September 13, 2020

Dremio reflection creation performance tuning

Related topics