Performances comparisons

kelly · May 17, 2018, 11:13am

Thanks for the feedback! @can is following this thread as well.

Requests for workflow-style editors come up frequently. This is something we are looking at. Sometimes it comes up because people think of Dremio as a kind of ETL tool, so I thought it would make sense to comment on this a bit. Currently we are not targeting large, bulk transformations with Dremio, or what you might call “long haul” ETL. We think ETL tools or long-running engines like Hive or Spark are the right way to do this. Instead, we are targeting “last mile” transformations, filtering, aggregations, and security controls (eg, masking) that are performed at runtime.

In our Enterprise Edition we include a data provenance and lineage capability. Here’s what that looks like:

This feature helps visualize the relationships between datasets. We track relationships in a dependency graph, which makes it easy to understand data use patterns and “what if” scenarios, among other things.

Another feature relevant to your comments is the transformation history available on every dataset. You can find this on the right hand side of the dataset viewer:

You can hover over each dot to see the changes applied to the dataset as well as who performed them. This is effectively a versioning mechanism for the dataset - if you click on any of these dots, you quickly toggle to that version of the dataset.

Topic		Replies	Views
Dremio on YARN execution engine	3	1389	May 6, 2019
How does dremio move data?	10	3075	July 13, 2021
Dremio vs Athena - Performance Benchmarks	6	4613	October 17, 2021
Low performance compared with Trino	3	4858	April 29, 2021
Difference between Dremio vs Presto	4	12386	June 19, 2020

Performances comparisons

Related topics