Is there currently any guide on performance tuning dremio reflections, specifically the creation of reflections when underlying PDFS reflection storage is HDFS?
We have a use case where we create a reflection on a physical data source (which is an HDFS directory) with approx 16M records and it takes ~2.5 minutes which is good. However when we create a subsequent reflection on the same dataset that includes a partition column, the reflection takes 1.5 hours even though its using the first reflection to accelerate.
We noticed that when we add two partition columns instead of one, the time goes down to 35 minutes.
Our setup uses HDFS for the underlying PDFS reflection storage. We don’t see CPU or memory being the bottleneck.
Is there any further tuning we can do to improve the performance of creating reflections specifically for HDFS? I see many parameters available in Dremio options.