Data source: SQL Server 2016
Dremio install configuration: standalone
My team members and I are working with a data source that has well over 10s of billions of records. The data source is used for reporting and we want to optimize query execution times. We are evaluating Dremio and are coming across some issues optimizing virtual datasets with reflections.
We set Dremio Max Memory to 64gb. The virtual dataset in question joins on several tables that have raw reflections. Those child reflections appear to be optimized and we are happy with the query times. This virtual dataset is based off a query that returns around 17 billion records in the source database.
Dremio bombs when the input records reach around 200 million and throws out of memory exception.
We could increase the memory, but we’re not sure if that’s the real solution here. We installed dremio as standalone. Is it preferable to install in a clustered environment with containers? Would that solve this issue with out of memory and the large datasets we’re dealing with?