We are testing with about 16TB of data. The reflection we are attempting to create is on two columns. The columns each have two or three distinct values. The job to create the reflection has been running for about 2hours now with only one executor at about 10% cpu utilization. Is this typical?
Would you be able to share the query profile with us?
All the time is spent in spilling, are you spilling to SSD’s or different type of disk?