SQL joins performance

HLNA · February 10, 2018, 1:24am

Do we have best practice for using Dremio SQL joins? like LHS with no UDFs ; push the calculations to the RHS in where clause?

can · February 10, 2018, 1:45am

@HLNA we don’t have best practices around how you form your JOIN conditions/where clauses. In most cases, Dremio’s query optimizer is able to optimize the joins regardless of how it is written. It would be good to understand more about your specific concerns if you can share an example.

HLNA · February 13, 2018, 5:30am

Hi @can

For example, following query will be pushed to the source as-is or any optimization happens before it?
SELECT a.*
FROM table a JOIN table b ON UDF_function(a.id) = b.id ;

2nd example: this fails.
WHERE 1=1
AND

balaji.ramaswamy · February 15, 2018, 6:55am

Hi @HLNA

Would it be possible for you to share the profile of the failed job?

How To Share A Query Profile

Thanks,
@balaji.ramaswamy

HLNA · February 15, 2018, 9:27am

hi @balaji.ramaswamy

Attached is one of the profile.
5cff6d55-dddf-4ddd-b507-5512db6dd7f1.zip (12.3 KB)

balaji.ramaswamy · February 15, 2018, 5:09pm

Hi @HLNA

I am little confused. The profile you uploaded points to the same error in this thread and as I had mentioned we have a bug opened for that.

Kindly let me know if you have any other questions

Note: If there are other new questions, kindly open a new thread

Thanks,
@balaji.ramaswamy

HLNA · February 16, 2018, 11:25pm

@balaji.ramaswamy Thanks. Yes, most of the issues were with those error. Meanwhile, when trying to join multiple datasource-tables, (each table with ~40 records), one of the datasource “RedShift” was showing a performance hit; have not yet gotten back to it. As I was using the standalone version (on a mac, not even ec2 server), jsut curious the same query when executed from EC2 instance or Cluster will perform better or still the same performance hit is expected in the datasource? because, my understanding is in all cases the query is pushed to the data source right?

balaji.ramaswamy · February 16, 2018, 11:46pm

Hi @HLNA

I would need the profile for the job from RedShift. Is that something you can share?

share-query-profile

Thanks,
@balaji.ramaswamy

HLNA · February 16, 2018, 11:58pm

@balaji.ramaswamy Sure, let me work on it. I will send it to you. In case if the query did not complete, still I should be able to generate the profile I believe.

Topic		Replies	Views
Dremio SELECT * on joins	2	5237	November 6, 2018
Is there any approach to control the query order for data join	7	1782	March 19, 2021
Dremio Join of large table in S3 and RedShift	1	1040	December 20, 2021
Performance Issue	4	1779	September 18, 2019
How to Optimize Query Performance for Large Datasets in Dremio Dremio Cloud	1	443	June 21, 2024

SQL joins performance

Related topics