[JIRA] (JENKINS-47821) vsphere plugin 2.16 not respecting slave disconnect settings

pjdarton@gmail.com (JIRA)

unread,

Mar 23, 2018, 9:30:01 AM3/23/18

to jenkinsc...@googlegroups.com

pjdarton assigned an issue to pjdarton

Jenkins /

JENKINS-47821

vsphere plugin 2.16 not respecting slave disconnect settings

Change By:	pjdarton
Assignee:	pjdarton

Add Comment

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)

pjdarton@gmail.com (JIRA)

unread,

Mar 23, 2018, 9:34:02 AM3/23/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

You're saying it's a regression since 2.15? Hmm, ok... I certainly hadn't intended to cause this behavior but I'll see if I can find the cause and fix it...

If you can provide any further information then that'd greatly simplify the debugging process.

e.g. what do you mean by "opportunistically"? What's the scenario in which the VM gets re-used (when it shouldn't) vs being disposed of correctly?

Add Comment

John.Mellor@esentire.com (JIRA)

unread,

Mar 23, 2018, 9:54:02 AM3/23/18

to jenkinsc...@googlegroups.com

John Mellor commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Yes, exactly. I never do incremental builds as they are a severely-broken dev practice. I typically setup a node to disconnect after one build, and reset back to a vmware snapshot upon restart. That way I can easily debug a build problem because the machine is left in the state where the build failed, and the next job does not have artifacts present like dependency packages, config files or docker images added by the previous build.

However I am now in a situation where sometimes a queued build runs on the node without going through the reset-back-to-snapshot step, breaking it.

I have a crude workaround of powering the node down after every build, forcing it to go through the power-up steps which will then revert back to snapshot. However, this maximizes the downtime for the node between builds, and prevents some debugging actions because you lose the in-memory structures this way.

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Mar 23, 2018, 10:43:01 AM3/23/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

I share your opinions - I never (intentionally) do incremental builds either

If you can figure out a reproducible test case that I can follow here to reproduce the issue (i.e. see it reuse a node using plugin version 2,16 where it didn't on 2.15) then that'll greatly assist (and hence speed up) the diagnostic process and hence dramatically reduce the time-to-fix it. Debugging something that happens "sometimes" is way more difficult than debugging something that happens "every time you do X".

i.e. If you help me to help you, you'll get a solution a lot quicker

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Mar 27, 2018, 12:02:02 PM3/27/18

to jenkinsc...@googlegroups.com

pjdarton updated an issue

Jenkins /

JENKINS-47821

vsphere plugin 2.16 not respecting slave disconnect settings

Change By:	pjdarton
Issue Type:	Improvement Bug

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Mar 27, 2018, 12:03:03 PM3/27/18

to jenkinsc...@googlegroups.com

pjdarton started work on

JENKINS-47821

Change By:	pjdarton
Status:	Open In Progress

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Mar 27, 2018, 12:25:02 PM3/27/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

John Mellor I've spotted what might have been a race condition in the code, giving it an opportunity to go wrong where it didn't before, but without further information regarding your configuration, I have no means to test whether or not it's fixed the issue.

I've made some changes in vsphere-cloud PR#91 and you can download a built plugin from the ci.jenkins.io Jenkins server vsphere-cloud PR-91 CI build job (see "Last Successful Artifacts" - "vsphere-cloud.hpi").
If you download that file you can then install it using "Manage Jenkins" -> "Manage Plugins" -> "Advanced" -> "Upload Plugin".

Give that version of the plugin it a try and see if it makes a difference. If it doesn't help, you'll have to go into way more detail about how you've got things set up so that I can reproduce the issue locally. If it does help then please let me know.

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Mar 27, 2018, 12:26:02 PM3/27/18

to jenkinsc...@googlegroups.com

pjdarton edited a comment on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

[~alt_jmellor] I've spotted what might have been a race condition in the code, giving it an opportunity to go wrong where it didn't before , but , without further information regarding your configuration, I have no means to test whether or not it's fixed the issue.

I've made some changes in [vsphere-cloud PR#91|https://github.com/jenkinsci/vsphere-cloud-plugin/pull/91] and you can download a built plugin from the ci.jenkins.io Jenkins server [vsphere-cloud PR-91|https://ci.jenkins.io/job/Plugins/job/vsphere-cloud-plugin/job/PR-91/] CI build job (see "Last Successful Artifacts" - "[vsphere-cloud.hpi|https://ci.jenkins.io/job/Plugins/job/vsphere-cloud-plugin/job/PR-91/lastSuccessfulBuild/artifact/target/vsphere-cloud.hpi]").

If you download that file you can then install it using "Manage Jenkins" -> "Manage Plugins" -> "Advanced" -> "Upload Plugin".

Give that version of the plugin it a try and see if it makes a difference. If it doesn't help, you'll have to go into _way_ more detail about how you've got things set up so that I can reproduce the issue locally. If it does help then please let me know.

Add Comment

scm_issue_link@java.net (JIRA)

unread,

Apr 4, 2018, 12:15:03 PM4/4/18

to jenkinsc...@googlegroups.com

SCM/JIRA link daemon commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Code changed in jenkins
User: Peter Darton
Path:
src/main/java/org/jenkinsci/plugins/vSphereCloudLauncher.java
src/main/java/org/jenkinsci/plugins/vSphereCloudSlave.java
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
src/main/java/org/jenkinsci/plugins/vsphere/RunOnceCloudRetentionStrategy.java
src/main/java/org/jenkinsci/plugins/vsphere/VSphereOfflineCause.java
src/main/resources/org/jenkinsci/plugins/Messages.properties
src/main/resources/org/jenkinsci/plugins/vsphere/Messages.properties
http://jenkins-ci.org/commit/vsphere-cloud-plugin/620868e4808f0df6772c11331dc86bd3ea8413eb
Log:
Merge pull request #91 from pjdarton/prevent-reuse-of-single-use-slaves

JENKINS-47821 Prevent run-once slave from accepting more jobs.

Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/6f78bb0aa164...620868e4808f

Add Comment

John.Mellor@esentire.com (JIRA)

unread,

Apr 5, 2018, 12:28:02 PM4/5/18

to jenkinsc...@googlegroups.com

John Mellor commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

With the limited testing that I've been able to perform, it looks like this change is provisionally not working.
If I queue up multiple jobs for a single node, and configure the node to do nothing upon end-of-job and reset back to snapshot upon startup, then I do not see an expected reset-to-startup between each job. It looks like it just starts the job on the already-polluted machine and skips the reset-to-snapshot for some reason.

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Apr 6, 2018, 5:37:02 AM4/6/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

In that case then I'm going to need you to describe your setup, as that's not what I see here (but then, I mostly use the plugin's "Cloud" functionality and am unfamiliar with its other functionality, which I'm guessing is what you're using).

If you can provide a description of how to set up a Jenkins server (that has the vSphere plugin installed) to reproduce this issue, I'll see if I can reproduce it. If I can reproduce it, there's a chance I might be able to fix it.
(FYI fixing bugs in this plugin is not my official day job, so the easier you can make it for me to see the issue for myself, the better the chances are that I can come up with a fix before my boss tells me to do something that is my official day job)

Add Comment

John.Mellor@esentire.com (JIRA)

unread,

Apr 6, 2018, 11:34:02 AM4/6/18

to jenkinsc...@googlegroups.com

John Mellor commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

For some reason, I am unable to screenshot a typical config into this ticket.
When I configure a high-use build node, I generally set it up for:

 
                                                                Availability: Take this agent online when in demand, and offline when idle
Disconnect after limited builds: 1
What to do when the slave is disconnected: Revert and Restart

If it is a low-use node, then I instead configure for:

 
                                                                What to do when the slave is disconnected: Shutdown

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Apr 6, 2018, 1:55:03 PM4/6/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Can we start with where you define the node?
FYI there's multiple ways the plugin can define a slave node, so how you get to the point where you make the choices you've described (can) make a difference.
I need instructions that start from "I've installed Jenkins and I've installed the plugin". I'm guessing that the next step would be to define a vSphere cloud and tell Jenkins the URL of vSphere and login details, and I presume that there will have to be some stuff in that vSphere server too, but I need to know what it consists of.

Add Comment

John.Mellor@esentire.com (JIRA)

unread,

Apr 6, 2018, 2:06:02 PM4/6/18

to jenkinsc...@googlegroups.com

John Mellor commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

 
                                                                Name of this Cloud: QA Cluster
vSphere Host: https://vsphere.internal
Disable Certificate Verification: checked
Credentials: <valid non-interactive user/password in credentials>
Templates: <none>
 
                                                            

FYI, There are several types of clouds configured at this site: google, vmware, k8s, etc.

Add Comment

John.Mellor@esentire.com (JIRA)

unread,

Apr 6, 2018, 2:09:02 PM4/6/18

to jenkinsc...@googlegroups.com

John Mellor commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

The target vsphere cloud is running an esxi-5.5 cluster managed by vcenter-5.5, and using unshared local disks in RAID-6 as the VMFS volumes. Not sure what else I can give you.

Add Comment

pjdarton@gmail.com (JIRA)

unread,

Apr 6, 2018, 5:31:02 PM4/6/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

How (by which method?) did you define the slave nodes in Jenkins?
(All my vSphere slaves are created from templates defined in the cloud section; I am aware it's possible to define non-cloud ones by a couple of routes but I've never done that myself)

Add Comment

sqa.valentinmarin@gmail.com (JIRA)

unread,

May 21, 2018, 1:17:03 PM5/21/18

to jenkinsc...@googlegroups.com

Valentin Marin commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Got the same issue here, as in slaves not respecting disconnect after limited builds setting (Jenkins 2.107.2 , vSphere 2.17). Nodes have been defined via Jenkins->Nodes->Slave virtual computer running under vSphere Cloud.

Add Comment

sqa.valentinmarin@gmail.com (JIRA)

unread,

May 22, 2018, 5:12:01 AM5/22/18

to jenkinsc...@googlegroups.com

Valentin Marin edited a comment on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Got the same issue here, as in slaves not respecting disconnect after limited builds setting (Jenkins 2.107.2 , vSphere 2.17). Nodes have been defined via Jenkins->Nodes->Slave virtual computer running under vSphere Cloud.

To add a bit of context, I'm running pipeline projects on those nodes and they do not seem to be treated as 'builds' per say, as no executed instances of those are being displayed in node's "Build History" sections.

Add Comment

pjdarton@gmail.com (JIRA)

unread,

May 22, 2018, 6:08:02 AM5/22/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Valentin Marin So you've got staticly-defined slaves... How are they connecting to Jenkins? SSH? JNLP? If JNLP, which protocol version? And are you passing in a JNLP_SECRET or are they allowed in unauthenticated? Also, what version of slave.jar are you using on the slave VMs?

I've been tracing oddities in my own Jenkins build environment where slaves that start and then connect via JNLP often "stay online" (briefly) after they've gone offline due to a reboot-induced disconnection (long enough to start a new build job, which then fails because the slave had disconnected), but I've yet to get to the bottom of it (race-conditions are always difficult to debug). It may be that the issue I'm trying to track down and this issue are all related...

FYI I don't think that the lack of pipeline history is a vSphere plugin issue. I've got a pipeline job that reboots my static (non-VM) Windows slaves and that doesn't show up on their build history, so if a pipeline segment doesn't show up on a normal Jenkins slave's build history, I don't think we can expect it to show up on a vSphere slave's history either, as that'd be common code (the vSphere slave code "extends" the Jenkins core Slave code).

Add Comment

sqa.valentinmarin@gmail.com (JIRA)

unread,

May 22, 2018, 6:21:03 AM5/22/18

to jenkinsc...@googlegroups.com

Valentin Marin edited a comment on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Slaves are connected via JNLP (windows service with , while passing the JNLP secret), remoting version 3.17.

Add Comment

sqa.valentinmarin@gmail.com (JIRA)

unread,

May 22, 2018, 6:21:03 AM5/22/18

to jenkinsc...@googlegroups.com

Valentin Marin commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Slaves are connected via JNLP (windows service with secret), remoting version 3.17.

Add Comment

eub.kansas19@gmail.com (JIRA)

unread,

Jul 25, 2018, 2:49:03 PM7/25/18

to jenkinsc...@googlegroups.com

Josiah Eubank commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Found a ticket regarding build history and pipelines

Add Comment

This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)

eub.kansas19@gmail.com (JIRA)

unread,

Jul 25, 2018, 3:46:02 PM7/25/18

to jenkinsc...@googlegroups.com

Josiah Eubank edited a comment on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Found a ticket regarding build history and pipelines JENKINS-38877

Add Comment

eub.kansas19@gmail.com (JIRA)

unread,

Jul 25, 2018, 4:43:02 PM7/25/18

to jenkinsc...@googlegroups.com

Josiah Eubank edited a comment on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Found a ticket regarding build history and pipelines JENKINS-38877

Experiencing this still on 2.18, even though the text "Limited Builds is not currently used" no longer appears in the config help

Add Comment

eub.kansas19@gmail.com (JIRA)

unread,

Jul 25, 2018, 5:14:02 PM7/25/18

to jenkinsc...@googlegroups.com

Josiah Eubank edited a comment on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Found a ticket regarding build history and pipelines JENKINS-38877

Experiencing this still on 2.18, even though the text "Limited Builds is not currently used" no longer appears in the config help . Note this is combined with "Take this agent offline when not in demand...."

Add Comment

oren@chapo.co.il (JIRA)

unread,

Jul 29, 2018, 11:35:02 AM7/29/18

to jenkinsc...@googlegroups.com

Oren Chapo commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

I've seen this issue also with version 2.16 and 2.18 of the vSphere Cloud plugin, however - it seems like it's not a problem in the plugin, but a limitation of the "cloud" Jenkins interface that the plugin implements.

If you're trying to ensure a slave is always in a "clean" state when allocated, here's my workaround, after hours of painful google-search, trial and error:
1. Node configuration: fill the "Snapshot Name" field (eg "Clean")
2. Node configuration: Availability: "Take this agent online when in demand, and offline when idle"
3. Node configuration: What to do when the slave is disconnected: "Shutdown"
4. Pipeline job configuration: include the following code:

 
                                                                	import jenkins.slaves.*
	import jenkins.model.*
	import hudson.slaves.*
	import hudson.model.*
	
	def SafelyDisposeNode() {
		print "Safely disposing node..."
		def slave = Jenkins.instance.getNode(env.NODE_NAME) as Slave
		if (slave == null) {
			error "ERROR: Could not get slave object for node!"
		}
		try
		{
			slave.getComputer().setTemporarilyOffline(true, null)
			if(isUnix()) {
				sh "(sleep 2; poweroff)&"
			} else {
				bat "shutdown -t 2 -s"
			}
			slave.getComputer().disconnect(null)
			sleep 10
		} catch (err) {
			print "ERROR: could not safely dispose node!"
		} finally {
			slave.getComputer().setTemporarilyOffline(false, null)
		}
		print "...node safely disposed."
		slave = null
	}
	
	def DisposableNode(String nodeLabel, Closure body) {
		node(nodeLabel) {
			try {
				body()
			} catch (err) {
				throw err
			} finally {
				SafelyDisposeNode()
			}
		}
	}

 
                                                            

5. When you want to ensure the node will NOT be used by another job (or another run of the same job), use a "DisposableNode" block instead of "node" block:

 
                                                                	DisposableNode('MyNodeLabel') {
		// run your pipeline code here.
		// it will make sure the node is shutdown at the end of the block, even if it fails.
		// no other job or build will be able to use the node in its "dirty" state,
		// and vSphere plugin will revert to "clean" snapshot before starting the node again.
	}
 
                                                            

6. If other Jobs are using this node (or node label), they all must use the above workaround, to avoid leaving a "dirty" machine for each other.
7. As of the "why is it so important to have node in a clean state?" question, my use case is integration tests of kernel-mode drivers (both Windows and Linux O/S) that typically "break" the O/S and leave it in an unstable state (BSODs and Kernel Panics are common).
8. If your pipeline job is running under a Groovy sandbox, you will need to permit some classes (The job will fail and offer you to whitelist a class, repeat carefully several times).

Add Comment

james.telfer@horiba.com (JIRA)

unread,

Apr 24, 2019, 3:53:02 AM4/24/19

to jenkinsc...@googlegroups.com

James Telfer commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

Any progress on this? I have just come up against what looks like the same issue. Statically defined Windows slaves connecting via JNLPv4.

They seem to completely ignore the 'Disconnect After Limited Builds' option, which re-reading the Wiki seems to be the expected behaviour?

Oren Chapo your work-around doesn't seem to work for me, at least not when using it within declarative pipeline.

Add Comment

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

werner.mueller8@boschrexroth.de (JIRA)

unread,

Jan 2, 2020, 7:23:02 AM1/2/20

to jenkinsc...@googlegroups.com

Werner Müller commented on

JENKINS-47821

Re: vsphere plugin 2.16 not respecting slave disconnect settings

I modified the workaround to reset the vm in the pipeline itself.

Advantages:

Shutdown activities are not required in the node configuration.
The node is resetted before executing the pipeline to the given snapshot

 
                                                                def ResettedNode(String vm, String serverName, String snapshotName, Closure body) {
    node(vm) {
        // Reset the computer in the context of the node to avoid running other jobs on this node in the meanwhile
        stage('Reset node')
        {
            def slave = Jenkins.instance.getNode(env.NODE_NAME) as Slave
            

if (slave == null) {
                error "ERROR: Could not get slave object for node!"
            }
            try
            {
                slave.getComputer().setTemporarilyOffline(true, null

)
                vSphere buildStep: [$class: 'PowerOff', vm: vm, evenIfSuspended: true, shutdownGracefully: false, ignoreIfNotExists: false], serverName: serverName
                vSphere buildStep: [$class: 'RevertToSnapshot', vm: vm, snapshotName: snapshotName], serverName: serverName
                vSphere buildStep: [$class: 'PowerOn', timeoutInSeconds: 240, vm: vm], serverName: serverName
                slave.getComputer().disconnect(null)
                sleep 10 // wait, while the agent on the slave is starting up
            } catch (err) {
                print "ERROR: could not reset node!"
            } finally {
                slave.getComputer().setTemporarilyOffline(false, null)
            }
            slave = null
        }
    }
    // Wait for node to come online again
    node(vm) {
        body()
    }
}

ResettedNode('vm', 'vCloud', 'clean') 
{

}
 
                                                            

Add Comment

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)

Reply all

Reply to author

Forward