I was trying to understand why Dremi chose to have query planning in the Dremio VPC (control plane) rather than in the customer VPC (data plane). Does anyone have a historical or architectural perspective on this design choice?
Welcome to the Dremio community.
Regarding your shared question: In Dremio, the Query planning is handled by the master/Coordinator. In the case of Dremio Cloud, it is the Control Plane.
Dremio Cloud consists of two major architectural components: (i) an always-on global control plane that receives queries from clients and is responsible for query planning and engine management. This control plane is hosted and monitored by Dremio, in a Dremio-managed cloud account
and (ii) an execution plane comprised of compute engines that are responsible for query execution.
Please refer to the below link for Dremio Cloud Architecture for more details:
Ref: Dremio Cloud: Under the Hood | Dremio
Additional reference - although it is specific to Dremio Software Edition deployment architecture which is different from the cloud edition:
Hope this helps. Let us if you have further queries.
Thanks for the quick response.
I was trying to understand why Dremio chose this architecture. The 2 options are:
- Query Planning in Dremio VPC. Execution in customer VPC.
- Query Planning and Execution in customer VPC.
The #2 is generally, how I have seen most SaaS services operate.
May be you can shed some light on why Dremio chose #1 option.
Have a wonderful day.
While this may be “an” explanation, it may not be “the” explanation. I’ll let Dremio PMs chime in for an exact response.
There are broadly two modes of deployment of Dremio:
- Dremio Software (self managed) → Install yourself on top of any cloud VMs (AWS, GCP, Azure) or on on-prem hardware. You can also deploy over Kubernetes (including EKS, AKS, GKE) or over YARN.
- Dremio Cloud (managed service) → Available on AWS as of writing.
Both of these have free and enterprise versions. You can see a “Capabilities Overview” table here (scroll down).
Historically, and even today, in Dremio Software, the Dremio Coordinators does the Query Planning (amongst other things like, for example, hosting the UI). This translated broadly to Dremio Cloud’s Control Plane. The Coordinator was the brains of the system and also, by extension, a single point of failure if high availability was not configured. So it made sense for Dremio Cloud’s Control Plane to be in Dremio’s VPC to allow for high uptime, monitoring and also provide online feature upgrades & bug fixes.
Dremio Executors, on the other hand, were built to be ephemeral and also work directly with the data by executing the query plan sent by the coordinator. This translated broadly to Dremio Cloud’s Execution Plane. So it naturally made sense that this belonged within the Customer VPC where the data is.
Hope that helps a little!