I’m curious what is the execution engine used by Dremio when I deploy it on YARN?
is it something like Spark or MarReduce? Or it’s something completely different?
and if yes (if it’s different) what is the performance differences between Spark and DremIO execution engine?
You can read more about Dremio and YARN here: https://docs.dremio.com/deployment/yarn-hadoop.html
I would suggest reading the Architecture Guide as well: https://www.dremio.com/lp/architecture-guide
In short, Dremio provides its own execution engine based on Apache Arrow. This is only for executing Dremio jobs, so it isn’t really comparable to Spark or MapReduce as those are general purpose. You could compare to SparkSQL or Hive, for example.
Hi Kelly! Thank you. And what about performance difference between DremIO engine and SparkSQL (is it roughly the same or like 10 times faster/slower)
I think you should give it a shot, based on your data, workloads, and operational environment. In general Dremio tends to be somewhat faster than other engines (2x-5x), and hundreds of times faster with Data Reflections.