Logical planning phase too slow

Hi,
I have a VDS that I expect to query and get results within a second, so I created the appropriate reflection to accelerate it.
But when I execute the query, I get the results in 10 seconds, and most of the time is spent in Logical Planning phase.

Validation (80 ms)
Convert To Rel (598 ms)
Reduce Expressions (53 ms)
JDBC Pushdown (12 ms)
Window Function Rewrites (6 ms)
Pre-Logical Filter Pushdown (22 ms)
Find Materializations (11 ms)
Normalization (2,000 ms)
Substitution (0 ms)
Logical Planning (7,625 ms)
Multi-Join analysis (0 ms)
LOPT Join Planning (0 ms)
Physical Planning (46 ms)
Final Physical Transformation (1 ms)
Fragment Assignment (0 ms)
Plan Generation (3 ms)
Intermediate Fragments Scheduling (0 ms)
Leaf Fragments Scheduling (0 ms)

Why does it take so long to do this planning while this is a simple group by query where the fields are covered by an acceleration ?

NB : in order to get my results in an acceptable duration, it’s a pity I had to create a scratch version of my VDS that I put in an EFS storage ( CREATE TABLE $scratch.MyVDS_SCRATCH AS SELECT * FROM MyVDS), and I create a new VDS referencing this new parquet dataset.
The same query returns incredibly faster (in less than 300 ms total):

Validation (117 ms)
Convert To Rel (50 ms)
Reduce Expressions (18 ms)
JDBC Pushdown (0 ms)
Window Function Rewrites (0 ms)
Pre-Logical Filter Pushdown (1 ms)
Find Materializations (10 ms)
Substitution (0 ms)
Logical Planning (50 ms)
Multi-Join analysis (0 ms)
LOPT Join Planning (0 ms)
Physical Planning (31 ms)
Final Physical Transformation (4 ms)
Fragment Assignment (0 ms)
Plan Generation (15 ms)
Intermediate Fragments Scheduling (0 ms)
Leaf Fragments Scheduling (0 ms)

NB2 : my accelerations are stored on S3
I can provide the query profile in a private message

It looks like you probably have a lot of reflections matching. You can see this by looking at the Acceleration tab, and seeing not just which reflections matched, but how many times each one matched.

There is a known issue where having lots of reflection matches results in high Normalization and Logical Planning times. This is something we are working to improve, but I do not currently have timeline for this.

If you want, you can send me a private message with the profile, and I can take a look. Your planning times are actually higher than I’ve seen before, so there may be something else going on.

Hi, I’m facing the same issue (takes 52seg), but it occurs when I add where to an aggregate query, something like this:
SELECT COUNT(*)
FROM model
WHERE “fecha” >= '2019-02-20 00:00:00’
AND “fecha” <= '2019-02-27 00:00:00’

When I run this without the “Where”, the query runs perfect and takes only some seconds, the logical planning takes 7s in this case.

@Rosmar_Torres, can you share the query profile?

@Rosmar_Torres what is the data source, how do you have Dremio deployed, and what version of Dremio are you using?

If you could attach a query profile that might help us take a closer look.

The source is a S3, the version is 3.0, Standalone with one cordinator and two executor.

I attached the profile. 99b03306-64e3-4db7-a127-0b877e4687d3.zip (54,6 KB)

Can you upgrade to 3.1 and see if you still face this issue? We have fixed a number of issues related to planning since 3.0.

@Rosmar_Torres, can you also include a profile from running the from query without the WHERE clauses? This would be useful for comparison.

@Rosmar_Torres, also, to understand this better, kindly turn on verbose query planning. You can do this by going to Admin–>Support, add the key, planner.verbose_profile, enable it, and save your changes.

When you rerun a query, the profile will contain a much richer set of information now.

Ben,
I attached the profile without the WHERE.
e4a354cd-4efe-4059-a5ae-9ec143296716.zip (51,9 KB)

Hi,
We deactivated most of the reflection and in some ocasions the SQL runs ok, also we reformulated somethings in the model and now runs with the “Where” condition. But We notice that this same problem occurs when It’s executes a DISTINC SQL… the only way that we found to resolve the problem was making a view with “group by” and adding a constant “1” to sum it (to imitated the “count”) and making a reflection upon the view and that make it work now.

About the upgrade we are considering this.

Thanks for your answers.

@Rosmar_Torres, we are currently working on some fixes related to long planning times when many reflections are applicable.

With regards to:

We notice that this same problem occurs when It’s executes a DISTINCT SQL… the only way that we found to resolve the problem was making a view with “group by” and adding a constant “1” to sum it (to imitated the “count”) and making a reflection upon the view and that make it work now.

Are you seeing the same long logical planning phase? Is it considering a large number of reflections?

Kindly attach a query if possible.

I face similar issue.
A suggestion , can dremio skip some part of the planning phase, like checking all the reflectiosn etcc, for second execution of the same query?. Assuming that it already know which is the best plan and cache that information.

1 Like