Theory behind dremio

Hi All,

Pretty new here and new to data processing.

What books can i refer to understand better how a system like Dremio is built? I am working on Stream processing technologies nowadays!!



The initial inspiration for Dremio was Google’s Dremel paper. Many of the concepts introduced here are still applicable:

Dremio leverages the Apache Parquet file format and Apache Arrow memory representation and heavily. These are central to performance it can achieve for analytic workloads, so it’s work checking out the docs associated with these projects:

For distributed systems generally, you can’t beat Martin Kleppmann’s book “Designing Data-Intensive Applications”.

And of course there’s Dremio’s own library, and docs, which will give you architectural overviews and insights into the types of use cases it is good for.

Also, you can of course check out the repository for the open source version of the Dremio: