In the documentation, I read that dremio keeps the data in memory using Apache Arrow in an optimized way.
Can someone tell me, what does that really mean? Like for example, if I have 2 Go of Data on ES and I connect ES to Tableau using Dremio, how much data will I have in memory? Will it be 2Go or less than that?
And does it only keep it in memory ? Does it never save it on the disk?
As Dremio accesses data from different sources it is read into in-memory structures using the Apache Arrow columnar format. All processing is performed on the data in this format. Once the query is complete the memory is released.
The amount of data in memory will vary based on the query. If you have 2 GB in ES the amount read into Arrow in Dremio depends on the query. Also, the size as reported by ES may not correspond to the size of the JSON that is returned for a query.
Dremio only manages data in the form of Data Reflections which are used to accelerate queries. You can read more about them here: https://docs.dremio.com/acceleration/reflections.html
Data Reflections are stored in a columnar, compressed form using Apache Parquet.
You can read more about Dremio’s architecture here: https://www.dremio.com/lp/architecture-guide
Thank you Kelly. That’s a real helpful and complete reply.