Can't setup a raw reflection on a virtual dataset out of memory

raoufosman · March 22, 2019, 5:40pm

Data source: SQL Server 2016
Dremio install configuration: standalone

My team members and I are working with a data source that has well over 10s of billions of records. The data source is used for reporting and we want to optimize query execution times. We are evaluating Dremio and are coming across some issues optimizing virtual datasets with reflections.

We set Dremio Max Memory to 64gb. The virtual dataset in question joins on several tables that have raw reflections. Those child reflections appear to be optimized and we are happy with the query times. This virtual dataset is based off a query that returns around 17 billion records in the source database.

Dremio bombs when the input records reach around 200 million and throws out of memory exception.

We could increase the memory, but we’re not sure if that’s the real solution here. We installed dremio as standalone. Is it preferable to install in a clustered environment with containers? Would that solve this issue with out of memory and the large datasets we’re dealing with?

ben · March 25, 2019, 1:15am

@raoufosman, if you want to perform large joins, then Dremio will require a large amount of direct memory on executor nodes. In production environments, we don’t suggest using a standalone (coordinator + executor in one JVM) because it’s difficult to guarantees on the memory available for planning and metadata operations (coordinator tasks) and query execution.

I suggests configuring once master coordinator and one executor and scaling the executors vertically and horizontally until you can complete the join.

https://docs.dremio.com/deployment/system-requirements.html

Topic		Replies	Views
How to handle huge reflections?	6	1428	May 12, 2019
Failed - Accelerator (creation)	4	1208	April 6, 2018
Out of Memory / Dremio Configuration	5	1222	November 27, 2022
Aggregation Reflection Limits	6	2150	October 16, 2019
Failure when i create Raw Reflection on a VDS	14	1726	June 25, 2021

Can't setup a raw reflection on a virtual dataset out of memory

Related topics