Do we have best practice for using Dremio SQL joins? like LHS with no UDFs ; push the calculations to the RHS in where clause?
@HLNA we don’t have best practices around how you form your JOIN conditions/where clauses. In most cases, Dremio’s query optimizer is able to optimize the joins regardless of how it is written. It would be good to understand more about your specific concerns if you can share an example.
Hi @can
For example, following query will be pushed to the source as-is or any optimization happens before it?
SELECT a.*
FROM table a JOIN table b ON UDF_function(a.id) = b.id ;
2nd example: this fails.
WHERE 1=1
AND
Hi @HLNA
Would it be possible for you to share the profile of the failed job?
Thanks,
@balaji.ramaswamy
Attached is one of the profile.
5cff6d55-dddf-4ddd-b507-5512db6dd7f1.zip (12.3 KB)
Hi @HLNA
I am little confused. The profile you uploaded points to the same error in this thread and as I had mentioned we have a bug opened for that.
Kindly let me know if you have any other questions
Note: If there are other new questions, kindly open a new thread
Thanks,
@balaji.ramaswamy
@balaji.ramaswamy Thanks. Yes, most of the issues were with those error. Meanwhile, when trying to join multiple datasource-tables, (each table with ~40 records), one of the datasource “RedShift” was showing a performance hit; have not yet gotten back to it. As I was using the standalone version (on a mac, not even ec2 server), jsut curious the same query when executed from EC2 instance or Cluster will perform better or still the same performance hit is expected in the datasource? because, my understanding is in all cases the query is pushed to the data source right?
Hi @HLNA
I would need the profile for the job from RedShift. Is that something you can share?
Thanks,
@balaji.ramaswamy
@balaji.ramaswamy Sure, let me work on it. I will send it to you. In case if the query did not complete, still I should be able to generate the profile I believe.