Slow EXTERNAL_SORT

allCag · October 20, 2021, 12:15pm

Hello,

I have a performance bottleneck on the reflection/query operations doing EXTERNAL_SORT steps. It happens on small (few Go) and larger data (>100Go). The processing rate on this step is extremely low and the SPILL_TIME_NANO accounts for all the wait.
My configuration is 3 executors of 48Go/8cores. Different ressource configurations (Eg. 4 executors 32Go/10cores) show the same problem.
Cloud volumes are fast (sequential write ~700Mb/sec). I tested the same field sorting based on the raw parquet data or the a raw reflection "display’ for all concerned fields and it is as slow in both cases.
I changed values of planner.slice_target (30000, 50000 , 100000) without improvements.
Using CTAS method to sort the data as in the reflection ran into the same problem.

What can be the cause of such slow EXTERNAL_SORT ?
Which logs can be used to investigate this problem ?

Eg of raw reflection : raw display on 10 fields, 1 sort on a field (job killed to get logs):
[Single parquet file of 2.62Go snappy compressed, 1 row group]

With a raw reflection “display” on all 10 fields to accelerate:

image625×691 33.8 KB

image1697×568 22.8 KB

image1691×637 24.1 KB
Directly on raw data:

image1708×506 16.8 KB

allCag · October 21, 2021, 11:47am

After many tests I found out that the poor performance of these reflections is due to some heavy field (json array field) being in the reflection as “display”. The idea for accelerating this field on the PDS was to take advantage of the incremental reflection of the PDS along with some sorting/partitioning being done. But it is not possible and these array fields will have to be flatten before ingestion by Dremio…

balaji.ramaswamy · October 22, 2021, 5:08am

@allCag I also see you are spilling to one node. If magnetic disk, how many spindles does the spill disk have, Also depending on the number of spindles, you can give more spill locations. In one of the screenshots, spill is single threaded. Is this PARQUET format?

Topic		Replies	Views
External_sort few threads working	3	1073	March 4, 2022
Reflection are very slow on HDFS storage	1	964	August 9, 2019
Reflection used only when sql splitted in subqueries	12	2026	November 9, 2018
Raw reflection on parquet PDS	2	884	May 28, 2022
Reflections with partitions using External Sort with one thread	3	134	February 28, 2024

Slow EXTERNAL_SORT

Related Topics