[JIRA] (JENKINS-50379) Jenkins kills long running sh script with no output

4,931 views
Skip to first unread message

evan.ward@nrl.navy.mil (JIRA)

unread,
Mar 23, 2018, 3:54:02 PM3/23/18
to jenkinsc...@googlegroups.com
Evan Ward created an issue
 
Jenkins / Bug JENKINS-50379
Jenkins kills long running sh script with no output
Issue Type: Bug Bug
Assignee: Unassigned
Components: durable-task-plugin
Created: 2018-03-23 19:53
Environment: Jenkins ver. 2.107.1 on CentOS 7
Priority: Minor Minor
Reporter: Evan Ward

I have a Jenkins pipeline that runs a shell script that takes about 5 minutes and generates no output. The job fails and I'm seeing the following in the output:

wrapper script does not seem to be touching the log file in /home/jenkins/workspace/job_Pipeline@2@tmp/durable-595950a5
 (--JENKINS-48300--: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
 script returned exit code -1

Based on JENKINS-48300 it seems that Jenkins is intentionally killing my script while it is still running. IMHO it is a bug for Jenkins to assume that a shell script will generate output every n seconds for any finite n. As a workaround I've set -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL to one hour. But what happens when I have a script that takes an hour and one minute!?

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

dbeck@cloudbees.com (JIRA)

unread,
Apr 1, 2018, 1:32:02 PM4/1/18
to jenkinsc...@googlegroups.com
Daniel Beck commented on Bug JENKINS-50379
 
Re: Jenkins kills long running sh script with no output

Does it work when you echo whatever ; yourscript.sh instead of just the latter?

nullify005@gmail.com (JIRA)

unread,
Apr 3, 2018, 7:49:02 PM4/3/18
to jenkinsc...@googlegroups.com
Lee Webb commented on Bug JENKINS-50379

Travis does this sort of thing too, if there's no output for a while it just assumes the process is hung then stops the build.

If you don't want to mess with Jenkins something like the following shell snippet can help.

It forks your long running process & echo's dots to the console as long as it's still running:

# suppress command output unless there is a failure
function quiet() {
if [[ $- =~ x ]]; then set +x; XTRACE=1; fi
if [[ $- =~ e ]]; then set +e; ERREXIT=1; fi
tmp=$(mktemp) || return # this will be the temp file w/ the output
echo -ne "quiet running: ${@} "
ts_elapsed=0
ts_start=$(date +%s)
"${@}" > "${tmp}" 2>&1 &
cmd_pid=$!
while [ 1 ]; do
if [ `uname` == 'Linux' ]; then
ps -q ${cmd_pid} > /dev/null 2>&1
running=${?}
else
ps -ef ${cmd_pid} > /dev/null 2>&1
running=${?}
fi
if [ "${running}" -eq 0 ]; then
echo -ne '.'
sleep 3
continue
fi
break
done
wait ${cmd_pid}
ret=${?}
ts_end=$(date +%s)
let "ts_elapsed = ${ts_end} - ${ts_start}"
if [ "${ret}" -eq 0 ]; then
echo -ne " finished with code ${ret} in ${ts_elapsed} secs, last lines were:\n"
tail -n 4 "${tmp}"
else
cat "${tmp}"
fi
rm -f "${tmp}"
if [ "${ERREXIT}" ]; then unset ERREXIT; set -e; fi
if [ "${XTRACE}" ]; then unset XTRACE; set -x; fi
return "${ret}" # return the exit status of the command
}

 

evan.ward@nrl.navy.mil (JIRA)

unread,
Apr 4, 2018, 9:06:03 AM4/4/18
to jenkinsc...@googlegroups.com

Daniel Beck the script initially generates some output to show that it started and then generates no output for a long time. I think this has the same effect as your suggestion of using echo.

dbeck@cloudbees.com (JIRA)

unread,
Apr 4, 2018, 9:17:05 AM4/4/18
to jenkinsc...@googlegroups.com

Evan Ward I expect so. Thanks for the clarification.

jacob.keller@gmail.com (JIRA)

unread,
May 18, 2018, 12:17:01 PM5/18/18
to jenkinsc...@googlegroups.com

I see this issue on scripts which do generate some output, but it happens that parts of the script take some time to run: in my case I'm compiling a kernel module and even when the make output is sent to the console, sometimes individual steps take longer than the timeout...

shreedhara.isc@gmail.com (JIRA)

unread,
Jun 21, 2018, 3:17:02 AM6/21/18
to jenkinsc...@googlegroups.com

Hi Evan Ward,
We are also facing the same issue, can you please help us to know how to change Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL interval.

evan.ward@nrl.navy.mil (JIRA)

unread,
Jun 21, 2018, 8:00:05 AM6/21/18
to jenkinsc...@googlegroups.com

Set it in the JVM arg line on master.

max.ivanch@gmail.com (JIRA)

unread,
Jun 27, 2018, 4:58:05 AM6/27/18
to jenkinsc...@googlegroups.com

Hi there,

 

Same issue I run task with high load disk tasks. I put durable plugin in "none" option, I tried HEARTBEAT_CHECK_INTERVAL but it doesn't work for me. To have solution I have created additional mount point in jenkins slave. But IMHO I would prefer to have option to disable it at all.

This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)

18332563813@163.com (JIRA)

unread,
Jul 3, 2018, 10:16:02 PM7/3/18
to jenkinsc...@googlegroups.com
lei rou commented on Bug JENKINS-50379

I have the same problem some times, and how to change the HEARTBEAT_CHECK_INTERVAL?

sr.professional88@gmail.com (JIRA)

unread,
Aug 8, 2018, 8:54:02 AM8/8/18
to jenkinsc...@googlegroups.com

Hi There,

 

I am also facing this issue now in our environment. If there is any work around for this.

 

```wrapper script does not seem to be touching the log file in /home/****/workspace/demo@tmp/durable-549a8a8c
(JENKINS-48300: if on a laggy filesystem, consider -Dorg.****ci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)```

tamas.gal@me.com (JIRA)

unread,
Aug 23, 2018, 2:09:03 PM8/23/18
to jenkinsc...@googlegroups.com

Same for us. Out of nowhere jobs are killed on our Jenkins nodes. Manually setting the heartbeat check interval to 300 seems to work for now.

 

tamas.gal@me.com (JIRA)

unread,
Aug 23, 2018, 2:12:02 PM8/23/18
to jenkinsc...@googlegroups.com
Tamas Gal edited a comment on Bug JENKINS-50379
Same for us. Out of nowhere jobs are killed on our Jenkins nodes. Manually setting the heartbeat check interval to 300 seems to work for now.

  Btw. on Debian-like machines, you need to edit `/var/default/jenkins` and add the above mentioned variable setting to the line starting with *{{JAVA_ARGS=}}*

It should then look something like this:

*{{JAVA_ARGS="-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 -Djava.awt.headless=true ..."}}*

sr.professional88@gmail.com (JIRA)

unread,
Aug 29, 2018, 6:20:01 AM8/29/18
to jenkinsc...@googlegroups.com

Hi Guys,

 This issue is due to Durable Task Pluggin, In the latest release of Durable task Pluggin 1.25 this has been resolved.

reference Link : https://issues.jenkins-ci.org/browse/JENKINS-52881

 

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jglick@cloudbees.com (JIRA)

unread,
Aug 29, 2018, 9:21:04 AM8/29/18
to jenkinsc...@googlegroups.com

More likely related to JENKINS-48300. Impossible to diagnose merely from this message.

The problem is not that your script stops producing output for a while. That is perfectly normal and supported. The problem is that a side process which is supposed to be detecting this fact and touching the log file every three seconds is either not running, or not producing the right timestamp as observed by the Jenkins agent JVM.

hboeken@me.com (JIRA)

unread,
Sep 10, 2019, 8:53:03 AM9/10/19
to jenkinsc...@googlegroups.com

After having updated Jenkins and its plugins, we're experiencing this issue too now.

We're now using Jenkins 2.193 and the Durable Task Plugin has version 1.30.

wrapper script does not seem to be touching the log file in /local/user_data/s__t/jenkins/workspace/S___K@7@tmp/durable-ad608bf9
(JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

This 'extremely laggy filesystem' is a local hard disk which isn't laggy whatsoever. 

About 50% of our jobs get aborted due to this. 

Do you have any suggestions how this can get solved without the workaround to redefine the HEARTBEAT_CHECK_INTERVAL?

 

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

jglick@cloudbees.com (JIRA)

unread,
Sep 10, 2019, 10:27:05 AM9/10/19
to jenkinsc...@googlegroups.com

Only by diagnosing and figuring out how to reproduce, so the issue can be fixed.

jacob.keller@gmail.com (JIRA)

unread,
Sep 10, 2019, 3:40:02 PM9/10/19
to jenkinsc...@googlegroups.com

> The problem is not that your script stops producing output for a while. That is perfectly normal and supported. The problem is that a side process which is supposed to be detecting this fact and touching the log file every three seconds is either not running, or not producing the right timestamp as observed by the Jenkins agent JVM.

Right, so it sounds like we need to investigate why the side process that should be touching the log file isn't working properly.

jglick@cloudbees.com (JIRA)

unread,
Sep 10, 2019, 3:49:02 PM9/10/19
to jenkinsc...@googlegroups.com

I should have mentioned that JENKINS-25503 would completely reimplement the code involved here, possibly solving this issue (possibly introducing others).

Reply all
Reply to author
Forward
0 new messages