Query by external tools is running for more than 24 hours

0iras0r · March 12, 2019, 3:49pm

Hi!
I’m encounter an issue with a query executed by external tool that is running for more than 24 hours.
In my dremio settings Query Queue Control is activated with a “Queue timeout” of 5 minutes.
Why this properties is not used?

Best Regards

Rosario

ben · March 12, 2019, 4:08pm

Hi @0iras0r,

Can you attach a profile for the job?
https://www.dremio.com/tutorials/share-query-profile-dremio/
Is the job actually listed as running, or is it still enqueued?

0iras0r · March 26, 2019, 12:22pm

Hi @ben and sorry for the delay,
but at that time we didn’t have the information you requested.

This friday we have the same problem, so this time I saved the info in time.

Thanks in advance.
Rosario

f535cfae-7f4d-43be-afb8-b07c16ff4b38.zip (51,9 KB)
27036fff-2ad7-46ec-bd68-dc223bc71993.zip (27,1 KB)
c5ee6743-53f1-48f5-b1b3-ec1adbdb09fc.zip (21,0 KB)

0iras0r · March 29, 2019, 10:51am

Hi @ben Any update regarding this issue?

Regards,
Rosario

ben · March 29, 2019, 5:41pm

Hello @0iras0r, all three of these jobs failed while executing. So, to answer your original question, these jobs did not timeout on the queue because they were not enqueued (to run) in the first place, they were running (executing).

Putting the reason for the failures aside, the question is why they did not get cleared from the list of running jobs. The job/query ids are:

236b4bbb-4a6b-d338-19ee-9518dae70900
236b4bdf-dead-fac9-3efb-303224488a00
2376fe49-5e58-227b-d290-5f3976c9f300

You can check to see if they are actually still executing on nodes in your cluster by running the following query:

SELECT * FROM SYS.FRAGMENTS WHERE queryId IN ( '236b4bbb-4a6b-d338-19ee-9518dae70900', '236b4bdf-dead-fac9-3efb-303224488a00', '2376fe49-5e58-227b-d290-5f3976c9f300' )

If this doesn’t return any records, then that’s a strong indication that these are just stranded jobs that are not really running (and hence not taking up any cluster resources). If they are blocking other jobs from executing – that is, you are seeing jobs timeout in their queue – then you can increase the concurrency limit of that queue as workaround.

In our most recent release, we have made code changes that prevent jobs from getting stuck in a state that is similar to this. If you supply the Dremio {{server.log}} from the time these jobs were “running” we might be able to determine if this is the same problem.

To confirm, you cannot cancel these jobs, correct?

0iras0r · April 1, 2019, 8:48am

Yes, I’m unable to cancel the jobs.
So are you suggesting to upgrade to 3.1?

Best regards,
Rosario

ben · April 1, 2019, 4:35pm

@0iras0r, we are continually fixing bugs and adding features, so upgrade is a good idea. As I mentioned, we addressed an issue similar to this in our most recent release.

Can you confirm that these jobs are not executing:

SELECT * FROM SYS.FRAGMENTS WHERE queryId IN ( '236b4bbb-4a6b-d338-19ee-9518dae70900', '236b4bdf-dead-fac9-3efb-303224488a00', '2376fe49-5e58-227b-d290-5f3976c9f300' )

0iras0r · April 2, 2019, 7:37am

Yes, yesterday the problem shows up again.
After further analisys, including your query suggestion, we can confirm that jobs are not executing.
We will try an upgrade finger crossed
I’ll let you know if that will solve the problem.

Thank you for your answers and suggestions. =)

Topic		Replies	Views
Dremio queries are hanging	10	3115	February 7, 2019
Job status different between Jobs UI and Query & Planning	7	1650	March 16, 2020
Running Query response timeout error from Dremio to Superset	2	419	November 21, 2023
Workers in Provisioning or Disconnected	1	1816	March 11, 2019
Timeout setting for overrun query in dremio	1	655	November 1, 2023

Query by external tools is running for more than 24 hours

Related topics