Is there any benchmark for the improvement of vectorized Join/Aggregate

chunhui-shi · September 12, 2018, 11:02pm

I noticed Dremio has some ‘vectorized’ implementations for HashJoin and HashAggregate and I am curious to know what is the gain with these new implementations? Any benchmark or analysis to justify these implementations?

kelly · September 13, 2018, 12:12am

Have a look here: https://www.dremio.com/java-vector-enhancements-for-apache-arrow-0-8-0/

Also further improvements coming from Gandiva: https://www.dremio.com/gandiva-performance-improvements-production-query/

More to chat me in Dremio 3.0 later this year.

chunhui-shi · September 13, 2018, 12:31am

Hi Kelly, thanks for your response.
The first blog mentioned only Project and Filter. And Gandiva has only two operators as far as I can see. What I asked was about VectorizedHashJoinOperator and VectorizedHashAggOperator. Since they are much more complex and time consuming compared with Project and Filter I would expect implementing them to benefit the performance more. But I could not see any visible improvements when running TPCH queries with “exec.operator.join.vectorize” switched between true and false. So I want to know if you have any internal measurement about these operators?

Topic		Replies	Views
Aggregating to arrays or a similar structure	2	1574	November 5, 2019
Join performance	2	1194	November 3, 2019
Gandiva Execution Kernel	0	542	June 16, 2023
Hash Join query performance	0	1161	May 4, 2019
Dremio 1.4 Released!	8	2519	January 26, 2018

Is there any benchmark for the improvement of vectorized Join/Aggregate

Related topics