Theory behind dremio

vgtom · May 1, 2020, 4:16pm

Hi All,

Pretty new here and new to data processing.

What books can i refer to understand better how a system like Dremio is built? I am working on Stream processing technologies nowadays!!

Thx
VT

ben · May 1, 2020, 4:47pm

@vgtom,

The initial inspiration for Dremio was Google’s Dremel paper. Many of the concepts introduced here are still applicable:

Dremio leverages the Apache Parquet file format and Apache Arrow memory representation and heavily. These are central to performance it can achieve for analytic workloads, so it’s work checking out the docs associated with these projects:
https://parquet.apache.org/

For distributed systems generally, you can’t beat Martin Kleppmann’s book “Designing Data-Intensive Applications”.

And of course there’s Dremio’s own library, and docs, which will give you architectural overviews and insights into the types of use cases it is good for.

ben · May 1, 2020, 4:49pm

Also, you can of course check out the repository for the open source version of the Dremio:

Topic		Replies	Views
How does dremio move data?	10	3078	July 13, 2021
What Dremio is exactly?	3	9094	April 22, 2019
How does Dremio saves/charges/keeps the data?	2	1085	March 11, 2019
Dremio Architecture query	6	1641	March 5, 2019
Consuming arrow formatted files	6	1769	May 12, 2021

Theory behind dremio

Related topics