However, I’ve noticed that when there’s a large query running, the small queries that came after are waiting for it to finish before start executing. Look at this example:
Hi @balaji.ramaswamy! These spinning queries are very small, and usually they return de resultset in one second or less. However, while the large query doesn’t finish, they remain in this status. I’ve seen this behavior in different queries and I wonder if it’s a bug.
Hi @balaji.ramaswamy
Sorry for the late reply. I’m attaching the query profile. It was supposed to be a very small query, where usually it returns the results in 2 seconds or less. At the time that I extracted the profile, the query was running for over 1 hour.
From the profile you last attached it looks like you lost one of the execution threads on ip-10-12-11-180.ec2.internal. You can see this from the profile under the 01 Phase, where all the other threads have quickly competed their work, while a final one is stuck “SENDING”. This means the coordinator is waiting to hear back from it but it’s not received any update. The query can’t complete unless it gets that final piece of work.
If you see this again, you should find these “SENDING” threads in the profile and look at the logs from the executor they are located on. Some query might be generating an exception that kills a thread on that node.
That being said, you are using Dremio 3.1.x and we’ve made a number of improvements in the way Dremio handles these “bad” threads and “stuck” jobs in our 3.2.x release. I highly recommend upgrading if possible.