Hello, I’m trying to Determine the performance requirements and the recommended number of nodes for Dremio to efficiently execute queries on massive datasets, such as those in the petabyte range.
For example, how many nodes does Dremio need to execute a query? Any suggestions?
1 Like
Hi @Hamzabouazza,
For example, how many nodes does Dremio need to execute a query?
The simplest answer would be one node. Although, depending on how long you want the query to take, what type of resources you are willing to use in regards to CPU, RAM, Storage, Network… then things can be tuned accordingly. All these would be for running one single query. If you have multiple different queries accessing that dataset, then you may consider the type of dataset, how the data is partitioned and configure the workload manager and, possibly, reflections: Managing Job Workloads | Dremio Documentation
Hope this helps.
Thanks, Bogdan
1 Like