We are testing on our staging cluster which has 4 machines (1 m5d.4xlarge and 3 m5d.4xlarge) and separate zookeepers (3)
We have a set of queries that just are running forever… They seem to be stuck. When running this same job locally they complete fine.
We can issue queries and they run fine.
Here’s a profile of one of them (they are all the same type of query just different date ranges)
145c7d78-fdb1-4feb-b71c-b685f45c7ed4.zip (50.4 KB)
Also, looking at the main machine and one of the other machines and the CPU is almost nothing