I’m trying to optimize the total execution time in the following scenario:
PDSs* -> Base table (VDS) -> Derived table (VDS, more than 20 left joins in base table)
*PDSs are in parquet files located in S3
We’ve tested some combinations:
- Base table with reflection and Derived table with reflection
- Base table materialized with CTAS and Derived table with reflection
The total execution time is in the best combination (base table w/ CTAS and derived table w/ reflection is the following: query plan 17 and execution 5s = total 22s.
For my purpose, the response time goal must be much less than 10s, and I believe the query plan
is making it impossible. On the other hand, use CTAS for derived tables isn’t and option due to managing complexity.
Is there a way of reducing and optimizing the query plan?
Danilo from DataSprints
We haven’t found a solution for this yet.
Any tips, comments, best practices are really welcome.
It looks like we ar spending time on the materialization (reflection replacement plans), to confirm that, kindly turn on verbose planning and send us the profile
admin-support-support key-“planner.verbose_profile”, show - enable - save
Also try to do the below and see if the planning time comes down?
@balaji.ramaswamy we’ve tested with your suggestions.
The verbose profile can be reached here: https://drive.google.com/open?id=1YGUw0P1Ewh-h1mEVtoAQQ5WYOpH-Zwlk
Have you read the files that I’ve shared?
Balaji, do you have any suggestions on that? Things still the same.