This is my first post. So greetings to all from Malawi, the warm heart of Africa!
We decided to evaluate Dremio for use in a project in the education sector where we are building a data platform. Costs and sustainability are a key concern as Malawi is one of the poorest countries in the world. Open source is key principle and Dremio ticks that box.
We have successfully deployed Dremio, Minio, and Nessie on a Kubernetes cluster on Azure AKS with the recommended specs (e16sv3: 16cpu, 128gb ram). But based on the projections this is rather costly for a 3 node deployment (approx $2,500 per month). We would like to try and get this (much) lower.
What is the minimum possible Kubernetes cluster configuration? We expect to have very light workloads.
Dremio can work on lower specs, but this highly depends on the performance SLAs you have set for your production workloads. Dremio typically scales linearly and is typically memory-bound. Feel free to test at lower specs and see if your query workload (with the concurrency you need) is able to fit in memory, and is able to meet your SLAs.
You can also use Reflections to your advantage. i.e. scale up the Dremio executors, build/refresh the Reflections, scale down the executors, and now your queries use reflections to accelerate & hence lesser hardware is required. Repeat the process to refresh the Reflections as per your data freshness criteria (daily, weekly etc.)