Hi Folks,
Can anyone describe me how Dremio is different from Presto. I am just started following dremio and I have gone through some videos but I found some similarity with Presto. If anyone through thoughts on this would be appreciated.
Thanks
Hi Folks,
Can anyone describe me how Dremio is different from Presto. I am just started following dremio and I have gone through some videos but I found some similarity with Presto. If anyone through thoughts on this would be appreciated.
Thanks
Presto is a distributed SQL engine. Dremio is a lot more than that. You could think of it as a “Data-as-a-Service Platform” that sits between all your data and the tools that people want to use to analyze it (Tableau, Qlik Sense, Power BI, R, Jupyter, etc.) Traditionally, companies have had to use a combination of 5-10 different tools, and a lot of custom development, to make data available for analytics. That includes data warehouses, ETL, OLAP cubes, aggregation tables, data extracts, etc. In addition to the obvious cost and complexity, this made self-service impossible. Dremio basically collapses/simplifies the entire analytics stack.
Here are a few examples of features that Dremio has that SQL engines like Presto do not:
I hope that helps. Those are just a few high-level differences.
Regarding the last point, raw execution speed, Presto also does columnar processing using an in memory representation that is similar to Arrow.
Without a shared standard like Arrow, that data must be serialized before handing off to another process. With Arrow this step is obviated. That’s a big benefit: 60-80% of CPU and unnecessary copies in many cases.
There’s a lot more to Arrow than being columnar. Now it also means Arrow Kernels, which are highly optimized low-level operators that can be hardware optimized.
Also, see this important announcement about the Gandiva Initiative, which brings LLVM JIT compilation and other benefits to Arrow: https://www.dremio.com/announcing-gandiva-initiative-for-apache-arrow/
@tshiran,
Very well explained.