I have a performance bottleneck on the reflection/query operations doing EXTERNAL_SORT steps. It happens on small (few Go) and larger data (>100Go). The processing rate on this step is extremely low and the SPILL_TIME_NANO accounts for all the wait.
My configuration is 3 executors of 48Go/8cores. Different ressource configurations (Eg. 4 executors 32Go/10cores) show the same problem.
Cloud volumes are fast (sequential write ~700Mb/sec). I tested the same field sorting based on the raw parquet data or the a raw reflection "display’ for all concerned fields and it is as slow in both cases.
I changed values of planner.slice_target (30000, 50000 , 100000) without improvements.
Using CTAS method to sort the data as in the reflection ran into the same problem.

What can be the cause of such slow EXTERNAL_SORT ?
Which logs can be used to investigate this problem ?

Eg of raw reflection : raw display on 10 fields, 1 sort on a field (job killed to get logs):
[Single parquet file of 2.62Go snappy compressed, 1 row group]

  1. With a raw reflection “display” on all 10 fields to accelerate:

  2. Directly on raw data:

After many tests I found out that the poor performance of these reflections is due to some heavy field (json array field) being in the reflection as “display”. The idea for accelerating this field on the PDS was to take advantage of the incremental reflection of the PDS along with some sorting/partitioning being done. But it is not possible and these array fields will have to be flatten before ingestion by Dremio…

@allCag I also see you are spilling to one node. If magnetic disk, how many spindles does the spill disk have, Also depending on the number of spindles, you can give more spill locations. In one of the screenshots, spill is single threaded. Is this PARQUET format?