|
I have a simple repro case with a freestyle project and "Execute shell" build steps. I'm on Jenkins 1.625.3, Linux master, Linux slave.
I hit this because we have freestyle projects where several build processes execute in parallel, but use flock to gate access to a shared resource. Sometimes, when a job is aborted, one of more of these child processes persist and prevent future jobs on the slave from acquiring the shared lock. The flock stuff below isn't necessary to repro - just sleep gives similar behavior - but it matches my use case and makes it easy to identify affected processes with fuser.
Create a freestyle project with two build steps:
-
Execute shell
#!/bin/bash -ex
nohup flock /var/lock/mylockfile sleep 1h &
-
Execute shell
Then abort the job (manually or by timeout). flock and its child sleep process persist, and continue to hold the lock.
This is the simplest project configuration I could construct. In all of these cases, the child processes are killed as expected:
-
Omitting the second "Execute shell."
-
Combining them into a single "Execute shell."
-
Failing by means other than abort, e.g. /bin/false in the second "Execute shell."
Sample results below. While the job is running, the lock is in use as expected:
$ fuser /var/lock/mylockfile
22733 22734
$ ps -p 22733,22734 -o pid,ppid,stat,lstart,args
PID PPID STAT STARTED COMMAND
22733 1 S Wed Feb 24 00:57:51 2016 flock /var/lock/mylockfile sleep 1h
22734 22733 S Wed Feb 24 00:57:51 2016 sleep 1h
Then abort the job:
[experimental_jenkins_26048] $ /bin/bash -ex /tmp/hudson8042917752397215577.sh
+ nohup flock /var/lock/mylockfile sleep 1h
[experimental_jenkins_26048] $ /bin/bash -ex /tmp/hudson4924658810125221857.sh
+ sleep 1h
Build timed out (after 3 minutes). Marking the build as aborted.
Build was aborted
Finished: ABORTED
Afterwards, the processes are still alive:
$ ps -p 22733,22734 -o pid,ppid,stat,lstart,args
PID PPID STAT STARTED COMMAND
22733 1 S Wed Feb 24 00:57:51 2016 flock /var/lock/mylockfile sleep 1h
22734 22733 S Wed Feb 24 00:57:51 2016 sleep 1h
BUILD_ID is unchanged, so ProcessTreeKiller should find them:
$ strings /proc/22733/environ | grep BUILD_ID
BUILD_ID=17
$ strings /proc/22734/environ | grep BUILD_ID
BUILD_ID=17
|