Planning phase slow compared to execution

unni · July 27, 2020, 5:12pm

The Convert To Rel phase is taking 3,359 ms, the cluster has 10 or more spaces with 3 to 4 VDS per space. Similar query on a different cluster with less number of spaces and VDS gets completed in 300ms.

How do we reduce time take by Convert to Rel phase?
In general what are the best practices to reduce the time taken in planning phase.

desi · July 27, 2020, 5:24pm

Try this and report back: https://docs.dremio.com/advanced-administration/metadata-cleanup.html

Just a hunch, I don’t work for dremio.

unni · July 30, 2020, 6:23am

@desi
I tried the cleanup, but that did not reduce the planning time.

@balaji.ramaswamy Sorry to tag you directly. But is there some document where I can understand about convert to rel and what impacts the performance of convert to rel.

The planning time for the same query on different clusters are different. The Kuberenetes pod sizing is same. One cluster has more spaces and VDS.
The query only uses two VDS.

desi · July 30, 2020, 8:03pm

Basically planning is reading the db metadata. Look in your dremio conf on master node where it is storing the db folder - ensure that access to it is fast from the cluster which is running it slow.

If you have two clusters each with master node, then check they both have their own unique db folder end-point, not shared. If it is shared, one of the coordinators will automatically become a stand-by as the master will lock the db folder database.

Either way, the problem is in the environment. As a test you can do this test, stop the cluster that is running planning fast. Take the 2 config files dremio conf and env and copy them to slow planning running master node, start it and test it. Ensure you only do select type operations, no not create anything. Ensure no one is connecting to your cluster also.

Rakesh_Malugu · August 6, 2020, 11:04am

Hello @unni

Can you share the query profile with us?

Thanks,
@Rakesh_Malugu

unni · August 6, 2020, 11:49am

@Rakesh_Malugu
Please find the below profile.
The cluster with Dremio 4.0.5 takes less than a second to plan a simple select query.
Cluster with Dremio 4.6.1 takes 2.4 seconds for planning.
The dataset is a single file in S3.
dremio-4.6.1_profile.zip (16.3 KB) dremio-4.0.5_profile.zip (11.6 KB)

Both the clusters are running on AKS

Thank you

Rakesh_Malugu · August 6, 2020, 2:44pm

Hello @unni

We have a known issue with us that planning is taking much time in ‘Convert To Rel’ phase.

I will update once the fix is ready.

Thanks,
@Rakesh_Malugu

unni · August 6, 2020, 3:28pm

Thank you @Rakesh_Malugu for the update.

balaji.ramaswamy · August 7, 2020, 7:40am

@unni

What happens if you do

select * from “s3-cuddle-data”.“dev-backup”.cuddle_bauer.“merged-uk”.“merged.csv”

unni · August 7, 2020, 10:03am

@balaji.ramaswamy
Now the validation time has increased to more than 3sec and convert to rel is 0.

balaji.ramaswamy · August 7, 2020, 10:43pm

@unni

Can I have the profile again please ? The one with high validation?

unni · August 8, 2020, 5:10am

@balaji.ramaswamy
The profile where validation time is high.
2989df5e-b31e-4a05-94a6-8af5d657a68c.zip (15.9 KB)

balaji.ramaswamy · August 8, 2020, 5:34am

@unni

Seems like metadata retrieval, can you please run the query again and see if the validation time goes away?

unni · August 8, 2020, 5:57am

@balaji.ramaswamy
For this cluster the validation time has reduced to 1ms. But the same dataset on another cluster always gives validation time greater than 3secs, cannot share the profile of that cluster.

We are creating VDS on top of this PDS and querying the VDS after enabling the reflection.

Can you suggest what we can do to get a consistent planning time. Any information about the factors which affect the planning time would be helpful.

balaji.ramaswamy · August 8, 2020, 8:28am

@unni

The time you see in validation was metadata, we usually Cache Metadata via the source background refresh job but if the table is new (first time queried) or metadata has expired then we do it during query run time, that is the reason you did not see the high planning time on the second run. Regrading your second cluster, see if the time is again metadata then second onwards should be fine. It is also a good idea to check your metadata settings on your source to make sure your expiry interval > refresh interval

unni · August 8, 2020, 12:15pm

@balaji.ramaswamy
And what can be the possible reasons for time taken by convert to rel.

balaji.ramaswamy · August 14, 2020, 7:42am

@unni

We have filed an internal ticket to address the issue, meanwhile please use the fully qualified name like “s3-cuddle-data”.“dev-backup”.cuddle_bauer.“merged-uk”.“merged.csv”

Thanks
Bali

Topic		Replies	Views
Dremio Planning & Running Time is very high	1	989	December 3, 2020
Planning time differ in same query	12	1374	January 2, 2020
Dremio Cluster Capacity Planning	1	1344	January 11, 2021
Logical planning phase too slow	12	2164	March 25, 2019
Dremio Queries Painfully Slow	3	2038	November 7, 2019

Planning phase slow compared to execution

Related topics