[JIRA] (JENKINS-42940) Timeout step hangs after restart if timeout occurred, but enclosed block did not exit yet

12 views
Skip to first unread message

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 17, 2020, 4:07:04 PM4/17/20
to jenkinsc...@googlegroups.com
Devin Nusbaum assigned an issue to Devin Nusbaum
 
Jenkins / Bug JENKINS-42940
Timeout step hangs after restart if timeout occurred, but enclosed block did not exit yet
Change By: Devin Nusbaum
Assignee: Devin Nusbaum
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 17, 2020, 4:07:04 PM4/17/20
to jenkinsc...@googlegroups.com
Devin Nusbaum updated an issue
Change By: Devin Nusbaum
Summary: [fix included] Timeout step hangs after restart if timeout occurred, but enclosed block did not exit yet

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 17, 2020, 4:09:02 PM4/17/20
to jenkinsc...@googlegroups.com
Devin Nusbaum updated an issue
In case the timeout occurs, and Jenkins is restarted during the grace period if waits for the inner block to terminate, then the build hangs forever with this exception in the Jenkins log:

Mär 10, 2017 3 {noformat}
2020-03-13 02
: 49 09 : 10 PM org 40 . jenkinsci 575+0000 [id=1502]  WARNING o . plugins j . workflow p . flow w . f. FlowExecutionList$ItemListenerImpl$1 # onFailure
WARNUNG
: Failed to load CpsFlowExecution[Owner[ hang devops-gate / 1 master/blackbox-self-service/25907 : hang devops-gate/master/blackbox-self-service # 1 25907 ]]
java.lang.NullPointerException
        at org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution.cancel(TimeoutStepExecution.java: 94 151 )
        at org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution.setupTimer(TimeoutStepExecution.java: 88 139 )
        at org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution.onResume(TimeoutStepExecution.java: 57 90 )
        at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:185)
        at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:180)
        ...
{noformat}

Reproducability of this issue relies on a block that does not immediately Exit. For example:

{code}
node \ {
timeout (time: 10, unit: 'SECONDS')
\ {
  build job: 'hang2', parameters: [ new StringParameterValue('A','B') ], quietPeriod: 0
}}
{code}

with a second Pipeline Job hang2:

{code}
retry(3) \ {
    sleep 300
}
{code}

Creates this console log:

{noformat}
Gestartet durch Benutzer RK
[Pipeline] node
Running on host in /$JENKINS_HOME/workspace/hang
[Pipeline]
\ {
[Pipeline] timeout
Timeout set to expire in 10 Sekunden
[Pipeline]
\ {
[Pipeline] build (Building hang2)
Scheduling project: hang2
Starting building: hang2 #1
Cancelling nested steps due to timeout
Resuming build at Fri Mar 10 15:49:00 CET 2017 after Jenkins restart
Waiting to resume hang #1|: ???
Waiting to resume hang #1|: host ist offline
Waiting to resume hang #1|: host ist offline

Ready to run at Fri Mar 10 15:49:10 CET 2017

Timeout expired 3,7 Sekunden ago
{noformat}

... and then it hangs forever.

Reason: when onResume() is called, the timer is expired, so cancel() is called, and since it already tried to cancel, forcible is true, and then killer is null, causing an NPE.

Fix: Check killer for null on line 94 in cancel() in TimeoutStepExecution().

Rationale for Major, not minor bug: breaks restart resiliense.

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 17, 2020, 4:14:03 PM4/17/20
to jenkinsc...@googlegroups.com

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 17, 2020, 4:14:03 PM4/17/20
to jenkinsc...@googlegroups.com
Devin Nusbaum commented on Bug JENKINS-42940
 
Re: Timeout step hangs after restart if timeout occurred, but enclosed block did not exit yet

This was also reported as JENKINS-61019. I reproduced it in a test and filed a PR to fix this in a way that still results in the body being cancelled, see jenkinsci/workflow-basic-steps-plugin#112.

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 17, 2020, 4:14:03 PM4/17/20
to jenkinsc...@googlegroups.com
Devin Nusbaum started work on Bug JENKINS-42940
 
Change By: Devin Nusbaum
Status: Open In Progress

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 20, 2020, 9:57:02 AM4/20/20
to jenkinsc...@googlegroups.com

dnusbaum@cloudbees.com (JIRA)

unread,
Apr 20, 2020, 5:06:02 PM4/20/20
to jenkinsc...@googlegroups.com
 

A fix for this issue was just released in version 2.20 of Pipeline: Basic Steps Plugin.

Change By: Devin Nusbaum
Status: Fixed but Unreleased Resolved
Released As: workflow-basic-steps 2.20
Reply all
Reply to author
Forward
0 new messages