I’m evaluating different deployment options of an on-prem solution and would love some feedback on two different scenarios. Which would be most beneficial in terms of performance and cost of governance / maintenance.
All data are on-prem in S3, MySQL, ELK and SQL Server.
I have access to 3-4 machines that could be deployed in the following two manners:
- A YARN-deployent with one Master Node and two to three Worker nodes. No data that will be queried by Dremio will probably be stored on HDFS.
- A simple Dremio cluster with one Coordinator node and two to three Executor nodes.
My initial question is; do I really need a Hadoop deployment if all data are stored in other repos? I guess that HDFS could be utilized to store reflections but would this give a significant performance boost over just going with a simple Dremio cluster?
Any thought or feedback is highly appreciated.
Cheers!