HASH_JOIN doesn't spill on disk

ankravch · October 24, 2019, 1:16am

It appears that HASH_JOIN doesn’t spill on disk when it runs out of memory. I have allocated 900G to DIRECT_MEMORY and hash join still fails to complete.

How one would disable hash join in Dremio? And does Dremio support sort-merge join that spill on disk?

I recall for Drill it was “alter session set “_hash_join_enabled” = false;”.

Anton

balaji.ramaswamy · October 24, 2019, 3:54am

@ankravch

Unfortunately we cannot disable HASH_JOIN. Currently we do not support SORT_MERGE join.

Thanks
@balaji.ramaswamy

chafidz · October 24, 2019, 10:00am

From what data source you’ve tried to join.
One of Dremio behaviour I’ve seen so far (3.3.2) is that when joining from datasource which doesn’t support parallel read (e.g. RDS table from MySQL / PostgreSQL) the join steps are imposed to a single executor only and it will strain its configured resources.
One of the solution is enabling reflection on table from RDS sources. Once the reflection is usable, the join steps from query to this source table will be done in parallel, reading from parquet generated by accelerator and spread to any available executors.

yeshreddy · August 26, 2020, 1:16pm

Hi @balaji.ramaswamy Is HASH_JOIN spilling available in dremio 4.5 ?

ben · August 26, 2020, 3:34pm

No, Dremio does not currently spill during Hash Join.

yeshreddy · August 26, 2020, 5:01pm

Hey @ben ,

So is there is no way dremio can support join without sufficient RAM available in the cluster? In our case we are using 4 node cluster ( 3 executors with 24 GB RAM each). We are seeing memory getting exhausted with 89M44 table joining with 89M2 table. Are we missing something here or Dremio requires very high amount to ram in case of joins ?

Topic		Replies	Views
Dremio OutOfMemoryException	4	174	August 12, 2024
Running a query and its work files... where do they go?	1	897	May 28, 2022
This query was spilled	3	3120	March 25, 2019
Dremio Functionality	1	961	April 28, 2019
OOM on SortExternal when spilling to disk	15	704	January 29, 2024

HASH_JOIN doesn't spill on disk

Related topics