[JIRA] (JENKINS-59400) Jenkins slave nodes hangs for up to 12+ minutes after build phase completes

5 views
Skip to first unread message

Rocha@Stratovan.com (JIRA)

unread,
Sep 16, 2019, 8:28:02 PM9/16/19
to jenkinsc...@googlegroups.com
John Rocha created an issue
 
Jenkins / Bug JENKINS-59400
Jenkins slave nodes hangs for up to 12+ minutes after build phase completes
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2019-09-17 00:27
Environment: * Jenkins 2.176.2
* Master(x1) and Build(x6) nodes are running Windows 10 64B Enterprise
* Java Version 8 Update 201 (build 1.8.0_201-b09) Oracle
* Jenkins is running on Master from command line call
* Jenkins is running on Slaves as a service
* Web browser: Chrome Version 76.0.3809.132 (Official Build) (64-bit)
Labels: slave slaves Slave hang hanging delay
Priority: Critical Critical
Reporter: John Rocha

Jenkins slave nodes hangs for up to 12+ minutes after build phase completes.

I have a master node and 7 nodes in the build pool. The master is configured to only run jobs labeled master. Builds are always only done from the build nodes.

There are no post-build steps configured.

Hang/delay is observed if the build step passes and/or fails.

Example Console Output follows
---------------------------------------------------------------------------
....
2019-09-16 16:51:45 ERROR: last command returned failure: 1
2019-09-16 16:51:45 
2019-09-16 16:51:45 build.bat failed with error code '1'
2019-09-16 16:51:45 
2019-09-16 16:51:45 Build step 'Execute Windows batch command' marked build as failure
2019-09-16 17:03:48 Finished: FAILURE
--------------------------------------------------------------------------- 

Observe that at 16:51:45 the Build step for Execute Windows batch command finishes. However the build step continues for another 12 minutes before finally completed with the FAILURE notification.

There are 6 nodes in the build pool being triggered for the builds

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

Rocha@Stratovan.com (JIRA)

unread,
Oct 7, 2019, 2:51:09 PM10/7/19
to jenkinsc...@googlegroups.com
John Rocha commented on Bug JENKINS-59400
 
Re: Jenkins slave nodes hangs for up to 12+ minutes after build phase completes

Update:

The issue doesn't always happen. It seems to depend upon what is being done in the job. For example, if the job is to update perforce there is no noticeable delay. If it's to do a simple compile using Visual Studio there doesn't seem to be a delay for that either. By simple compile I mean few objects that don't seem to trigger parallel compilation.

When it does happen it appears to be with bigger Visual Studio builds that have parallel compilation enabled. Moreover, I've noticed that there may be multiple MSBuild.exe processes still running even after Jenkins reports "build.bat existing with success"

For example, during my most recent reproduction, there were 5 MSBuild.exe processes lingering after Jenkins reported the script exited with success, but the build didn't return the final result until ~8m later.

The MSBuild.exe processes would slowly go away one by one.

Once all of the MSBuild.exe processes terminated, the Jenkins job reported it's final "Finished: SUCCESS" result.

Rocha@Stratovan.com (JIRA)

unread,
Oct 7, 2019, 3:08:01 PM10/7/19
to jenkinsc...@googlegroups.com

Root cause - User calling error

I ran the script manually from the CLI and observed that the MSBuild.exe processes never went away. Ever.

I Googled for this and found this stackoverflow description/solution

If parallel compiles are enabled and used, the default is for the MSBuild.exe process to stay around so it can be re-used by future compiles. This seems to cause a problem with the remote jenkins build pools.

The MSBuild.exe reuse/linger functionality can be disabled by passing /nr:false for the build process.

When I added this flag it resolved my issue.

This problem doesn't happen if I am building without a build pool (i.e. one jenkins node that does it's own compiles). It only occurs when I go to a master/slave-build-pool scenario. Then it occurs when building on the slave nodes.

Reply all
Reply to author
Forward
0 new messages