Performances comparisons

Thanks for the feedback! @can is following this thread as well.

Requests for workflow-style editors come up frequently. This is something we are looking at. Sometimes it comes up because people think of Dremio as a kind of ETL tool, so I thought it would make sense to comment on this a bit. Currently we are not targeting large, bulk transformations with Dremio, or what you might call “long haul” ETL. We think ETL tools or long-running engines like Hive or Spark are the right way to do this. Instead, we are targeting “last mile” transformations, filtering, aggregations, and security controls (eg, masking) that are performed at runtime.

In our Enterprise Edition we include a data provenance and lineage capability. Here’s what that looks like:

This feature helps visualize the relationships between datasets. We track relationships in a dependency graph, which makes it easy to understand data use patterns and “what if” scenarios, among other things.

Another feature relevant to your comments is the transformation history available on every dataset. You can find this on the right hand side of the dataset viewer:

You can hover over each dot to see the changes applied to the dataset as well as who performed them. This is effectively a versioning mechanism for the dataset - if you click on any of these dots, you quickly toggle to that version of the dataset.