Query by external tools is running for more than 24 hours

Hi!
I’m encounter an issue with a query executed by external tool that is running for more than 24 hours.
In my dremio settings Query Queue Control is activated with a “Queue timeout” of 5 minutes.
Why this properties is not used?

Best Regards

Rosario

Hi @0iras0r,

  1. Can you attach a profile for the job?
    https://www.dremio.com/tutorials/share-query-profile-dremio/
  2. Is the job actually listed as running, or is it still enqueued?

Hi @ben and sorry for the delay,
but at that time we didn’t have the information you requested.

This friday we have the same problem, so this time I saved the info in time.

Thanks in advance.
Rosario

f535cfae-7f4d-43be-afb8-b07c16ff4b38.zip (51,9 KB)
27036fff-2ad7-46ec-bd68-dc223bc71993.zip (27,1 KB)
c5ee6743-53f1-48f5-b1b3-ec1adbdb09fc.zip (21,0 KB)

Hi @ben Any update regarding this issue?

Regards,
Rosario

Hello @0iras0r, all three of these jobs failed while executing. So, to answer your original question, these jobs did not timeout on the queue because they were not enqueued (to run) in the first place, they were running (executing).

Putting the reason for the failures aside, the question is why they did not get cleared from the list of running jobs. The job/query ids are:

  • 236b4bbb-4a6b-d338-19ee-9518dae70900
  • 236b4bdf-dead-fac9-3efb-303224488a00
  • 2376fe49-5e58-227b-d290-5f3976c9f300

You can check to see if they are actually still executing on nodes in your cluster by running the following query:

SELECT * FROM SYS.FRAGMENTS WHERE queryId IN ( '236b4bbb-4a6b-d338-19ee-9518dae70900', '236b4bdf-dead-fac9-3efb-303224488a00', '2376fe49-5e58-227b-d290-5f3976c9f300' )

If this doesn’t return any records, then that’s a strong indication that these are just stranded jobs that are not really running (and hence not taking up any cluster resources). If they are blocking other jobs from executing – that is, you are seeing jobs timeout in their queue – then you can increase the concurrency limit of that queue as workaround.

In our most recent release, we have made code changes that prevent jobs from getting stuck in a state that is similar to this. If you supply the Dremio {{server.log}} from the time these jobs were “running” we might be able to determine if this is the same problem.

To confirm, you cannot cancel these jobs, correct?

Yes, I’m unable to cancel the jobs.
So are you suggesting to upgrade to 3.1?

Best regards,
Rosario

@0iras0r, we are continually fixing bugs and adding features, so upgrade is a good idea. As I mentioned, we addressed an issue similar to this in our most recent release.

Can you confirm that these jobs are not executing:

SELECT * FROM SYS.FRAGMENTS WHERE queryId IN ( '236b4bbb-4a6b-d338-19ee-9518dae70900', '236b4bdf-dead-fac9-3efb-303224488a00', '2376fe49-5e58-227b-d290-5f3976c9f300' )

Yes, yesterday the problem shows up again.
After further analisys, including your query suggestion, we can confirm that jobs are not executing.
We will try an upgrade finger crossed
I’ll let you know if that will solve the problem.

Thank you for your answers and suggestions. =)