Q: Removing a job that taskspooler thinks is still running but actual process has been killed

94 views
Skip to first unread message

Richard Conner

unread,
Feb 1, 2017, 5:26:06 PM2/1/17
to taskspooler
We recently came across an issue where taskspooler shows a job as running, but the referenced PID does not exist in the system.
I think the process may have been SIGKILL (kill -9) and taskspooler never got a child exit signal and is left confused.

The issue becomes more difficult if we have several very long running tasks that are queued up waiting to be executed.

Is there any way of telling taskspooler that a long gone PID has finished?

The only answer I can envision so far is to increase the taskspooler queue-size by 1 slot to account for the orphan slot and then restart taskspooler once all the tasks have been completed.

Lluís Batlle i Rossell

unread,
Feb 3, 2017, 6:01:06 PM2/3/17
to tasks...@googlegroups.com
I tried to reproduce it, but I failed.

Do you know of any command sequence that may reproduce the problem?

Regards,
Lluís.

On Wed, Feb 01, 2017 at 02:26:06PM -0800, Richard Conner wrote:
> We recently came across an issue where taskspooler shows a job as running,
> but the referenced PID does not exist in the system.
> I *think* the process may have been SIGKILL (kill -9) and taskspooler never
> got a child exit signal and is left confused.
>
> The issue becomes more difficult if we have several very long running tasks
> that are queued up waiting to be executed.
>
> Is there any way of telling taskspooler that a long gone PID has finished?
>
> The only answer I can envision so far is to increase the taskspooler
> queue-size by 1 slot to account for the orphan slot and then restart
> taskspooler once all the tasks have been completed.
>
> --
> You received this message because you are subscribed to the Google Groups "taskspooler" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to taskspooler...@googlegroups.com.
> To post to this group, send an email to tasks...@googlegroups.com.
> Visit this group at https://groups.google.com/group/taskspooler.
> For more options, visit https://groups.google.com/d/optout.


--
(Escriu-me xifrat si saps PGP / Write ciphered if you know PGP)
PGP key D4831A8A - https://emailselfdefense.fsf.org/

Richard Conner

unread,
Feb 6, 2017, 5:12:18 PM2/6/17
to taskspooler, vi...@viric.name
Unfortunately, no. :(  Thanks for trying.

We couldn't find any system logs or reason for the process to have failed. I could only guess that a "kill -9" might remove the process without taskspooler's knowledge.
I'll see if we can reproduce it somehow, but at the moment it is one of those WTH mysteries. 

Lluís Batlle i Rossell

unread,
Feb 6, 2017, 5:15:09 PM2/6/17
to tasks...@googlegroups.com
Do you have any /tmp/ts.error or so?
> > an email to taskspooler...@googlegroups.com <javascript:>.
> > > To post to this group, send an email to tasks...@googlegroups.com
> > <javascript:>.

Richard Conner

unread,
Feb 6, 2017, 6:04:41 PM2/6/17
to taskspooler, vi...@viric.name
No /tmp/ts.error file either. I just tried several other kill signals (SEGV, SIGILL, SIGALRM, SIGHUP) on both the started PID, as well as on the PID of the ts attached to the PID. No luck. Very odd.
Reply all
Reply to author
Forward
0 new messages