Create raw reflection ... distribute by

What does DISTRIBUTE BY mean in the CREATE RAW REFLECTION statement?

Does it have an analogue in Reflection configuration via GUI?

If you switch the UI to advanced mode in reflections, you’ll find all four categories. In basic mode for raw we only allow you to configure display columns.

Here’s what that looks like:

So what does it mean? Could you please point me to the documentation?
Also I suspect we don’t have Distribution in our version on Dremio, or maybe it does not apply to Mongo connector?


Build 2.0.1-201804132205050000-10b1de0


Sorry for the confusion, distribution is only available in the Enterprise Edition of Dremio. We need to improve the documentation to clarify that.

It would still be good to know what it means. In particular given that we are actively assessing Dremio and negotiating the enterprise license.

Hey @muv, distribution is not something the query engine actively takes advantage of today (in either edition) in its query planning – so we don’t recommend using this option for reflections just yet.

In general, distribution is used to co-locate all records with the same distribution fields on the same node. In the future, we expect it to provide performance lift in scenarios where you are doing fact-fact joins. In such cases, typically costs would be dominated by shuffling/remote reads.


was just wondering if there are any improvements on this front?

IMO this is one of the features that could bring performance of HDFS based Dremio deploys closer to the performance MPP databases like Redshift due to local joins.

Hey @dorianb, not yet. We’ll reach out as our plans on this solidify – let us know if this becomes a high priority item and we can track on support portal.

so where are we on this @can?

Today I realized that I spent hours yesterday trying to determine the impact of distribution strategies and it looks like there’s a really good chance the button was a placebo and I was just a monkey twisting a disconnected switch.

@jdlong Unfortunately it is not prioritized yet as other features were higher on priority, do you have a query that is running slow? Would you be able to share a profile?