Suppose I create a reflection, and HASH PARTITION BY (A, B, C). Does this mean that each hash bucket is identified by a combination of (A, B, C)? If so, will queries that filter only A and B be slower (than queries that filter A, B, and C)? I imagined that by having a WHERE clause on all three columns it would be fast to locate a bucket, while giving only two columns can result in full scan of the reflection?
I probably need to read the source code to understand better…
Filtering with a WHERE clause on any of those columns will have essentially the same cost, as Dremio ignores the partition hierarchy at planning time. Instead, it stores information about the partition structure and builds an index on the partition keys. During planning, it uses this index to “prune” out the files that it will need to scan during execution.