I’m using the vSphere plugin to manage a series of Windows Server 2012 R2 agents. The basic usage scenario is to have the agents normally powered down. When a job starts, revert the VM to a base snapshot, power on the VM, run the job and then shut down the VM.
Versions in use:
Jenkins 2.187
vSphere 2.20
Occasionally, an agent will start up, the job will start running and then a few seconds later the agent will shut down. I’m having some trouble figuring out why this is happening. Below is the console output from the job:
23:20:44 Started by upstream project "Regression - NIA 12.3" build number 16
23:20:44 originally caused by:
23:20:44 Started by timer
23:20:44 Running as SYSTEM
23:20:44 [NIABuild1253] Starting limited count build: 1 of 1
23:20:44 [EnvInject] - Loading node environment variables.
23:20:44 Building remotely on NIABuild1253 (WIN2012-SQL2012-MS2020-REGRESSION) in workspace c:\j\esi-full\dev
23:20:44 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
+
23:20:44 ... p4 info
+
23:20:44 ... p4 info
+
23:20:44 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
+
23:20:44 ... p4 counter change
+
23:20:45 ... p4 changes -m1 -ssubmitted //jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2____
+
23:20:45 ... p4 counter change
+
23:20:45 ... p4 changes -m1 -ssubmitted //jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2____
+
23:20:46 ... p4 repos -C
+
23:20:46 P4: builds: 277937
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
+
23:20:48 ... p4 info
+
23:20:48 ... p4 info
+
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
+
23:20:48
23:20:48 P4 Task: establishing connection.
23:20:48 ... server: esxa-perforce:1666
23:20:48 ... node: NIABuild1253
23:20:48 Building on Node: NIABuild1253
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
+
23:20:48 ... p4 info
+
23:20:48 ... p4 info
+
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
+
23:20:49
23:20:49 P4 Task: establishing connection.
23:20:49 ... server: esxa-perforce:1666
23:20:49 ... node: NIABuild1253
23:20:49
23:20:49 P4 Task: reverting all pending and shelved revisions.
23:20:49 ... p4 revert c:\j\esi-full\dev/...
+
23:20:49 ... rm [abandoned files]
23:20:49 duration: (184ms)
23:20:49
23:20:49 ... p4 sync -q c:\j\esi-full\dev/...#none
+
Looking in vSphere I can see the tasks that revert the snapshot, power on the VM and then shut it down within a few seconds.
Task Name Target Details Initiator Start Time Completion Time
Initiate guest OS shutdown NIABuild1253 xxx 5/28/2020 11:20:52 PM 5/28/2020 11:20:52 PM
Power On virtual machine NIABuild1253 xxx 5/28/2020 11:20:06 PM 5/28/2020 11:20:11 PM
Revert snapshot NIABuild1253 xxx 5/28/2020 11:20:03 PM 5/28/2020 11:20:05 PM
You can see that 11:20:03 PM it reverted the snapshot and then about 50 seconds later, after powering on the agent, it turned around and shut it down – the timestamps in the console output fall within that range.
My agents are configured as follows:
Force VM Launch checked
Wait for VMTools checked
Delay between launch and boot complete 300
Secondary launch method Launch agent by connecting it to the master
Availability Bring this agent online when in demand, and take offline when idle
In demand delay 0
Idle delay 1
Disconnect After Limited Builds 1
What to do when the slave is disconnected Shutdown
Jenkins-slave-err.log shows the following: createEngine at 11:20:42 PM and then again at 11:20:51 PM, less than 10 seconds later. There’s no indication as to why the engine was created a 2nd time.
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: NIABuild1253
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
May 28, 2020 11:20:42 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 3.33
May 28, 2020 11:20:42 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using c:\j\remoting as a remoting work directory
May 28, 2020 11:20:42 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to c:\j\remoting
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://nxjenkinsmaster:80/, http://nxjenkinsmaster/]
May 28, 2020 11:20:42 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: nxjenkinsmaster
Agent port: 49210
Identity: 39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to nxjenkinsmaster:49210
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: 39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0
May 28, 2020 11:20:43 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
May 28, 2020 11:20:44 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: NIABuild1253
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
May 28, 2020 11:20:51 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 3.33
May 28, 2020 11:20:51 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using c:\j\remoting as a remoting work directory
May 28, 2020 11:20:51 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to c:\j\remoting
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://nxjenkinsmaster:80/, http://nxjenkinsmaster/]
May 28, 2020 11:20:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: nxjenkinsmaster
Agent port: 49210
Identity: 39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to nxjenkinsmaster:49210
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: 39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0
May 28, 2020 11:20:52 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
May 28, 2020 11:20:53 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
On Jenkins master there are 3 relevant logs for this agent:
Slave.log.4 from 11:20:42 PM
[NIABuild1253] Starting Virtual Machine...
[NIABuild1253] Reverting to snapshot:Base Snapshot
[NIABuild1253] Powering on VM
[NIABuild1253] Waiting for VMTools
[NIABuild1253] VM Tools are running
[NIABuild1253] Finished wait for VMTools
[NIABuild1253] Waiting for up to 300 seconds for slave to come online.
Slave.log.3 from 11:20:50 PM
Inbound agent connected from 10.110.4.17/10.110.4.17
Remoting version: 3.33
This is a Windows agent
[NIABuild1253] Slave has come online.
[NIABuild1253] Slave online
Agent successfully connected and online
ERROR: Connection terminated
java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[NIABuild1253] Running disconnect procedure...
[NIABuild1253] Disconnect done. Performing idle action SHUTDOWN...
Slave.log.2 from 11:21:23 PM
Inbound agent connected from 10.110.4.17/10.110.4.17
Remoting version: 3.33
This is a Windows agent
[NIABuild1253] Attempting a graceful shutdown
Agent successfully connected and online
ERROR: Connection terminated
java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[NIABuild1253] Guest shutdown succeeded
[NIABuild1253] Idle action SHUTDOWN complete.
[NIABuild1253] Running disconnect procedure...
[NIABuild1253] Disconnect done. Performing idle action SHUTDOWN...
[NIABuild1253] Idle action SHUTDOWN complete.
[NIABuild1253] Running disconnect procedure...
[NIABuild1253] Disconnect done. Performing idle action SHUTDOWN...
[NIABuild1253] Idle action SHUTDOWN complete.
Thanks for any assistance,
Randy Beckwith