It appears that HASH_JOIN doesn’t spill on disk when it runs out of memory. I have allocated 900G to DIRECT_MEMORY and hash join still fails to complete.
How one would disable hash join in Dremio? And does Dremio support sort-merge join that spill on disk?
I recall for Drill it was “alter session set “_hash_join_enabled” = false;”.
From what data source you’ve tried to join.
One of Dremio behaviour I’ve seen so far (3.3.2) is that when joining from datasource which doesn’t support parallel read (e.g. RDS table from MySQL / PostgreSQL) the join steps are imposed to a single executor only and it will strain its configured resources.
One of the solution is enabling reflection on table from RDS sources. Once the reflection is usable, the join steps from query to this source table will be done in parallel, reading from parquet generated by accelerator and spread to any available executors.
So is there is no way dremio can support join without sufficient RAM available in the cluster? In our case we are using 4 node cluster ( 3 executors with 24 GB RAM each). We are seeing memory getting exhausted with 89M44 table joining with 89M2 table. Are we missing something here or Dremio requires very high amount to ram in case of joins ?