Agents with vSphere plugin shutting down unexpectedly

23 views
Skip to first unread message

Randy Beckwith

unread,
May 29, 2020, 10:12:05 AM5/29/20
to jenkins...@googlegroups.com

I’m using the vSphere plugin to manage a series of Windows Server 2012 R2 agents. The basic usage scenario is to have the agents normally powered down. When a job starts, revert the VM to a base snapshot, power on the VM, run the job and then shut down the VM.

 

Versions in use:

Jenkins 2.187

vSphere 2.20

 

Occasionally, an agent will start up, the job will start running and then a few seconds later the agent will shut down. I’m having some trouble figuring out why this is happening. Below is the console output from the job:

 

23:20:44 Started by upstream project "Regression - NIA 12.3" build number 16
23:20:44 originally caused by:
23:20:44  Started by timer
23:20:44 Running as SYSTEM
23:20:44 [NIABuild1253] Starting limited count build: 1 of 1
23:20:44 [EnvInject] - Loading node environment variables.
23:20:44 Building remotely on NIABuild1253 (WIN2012-SQL2012-MS2020-REGRESSION) in workspace c:\j\esi-full\dev
23:20:44 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
 +
23:20:44 ... p4 info
 +
23:20:44 ... p4 info
 +
23:20:44 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
 +
23:20:44 ... p4 counter change
 +
23:20:45 ... p4 changes -m1 -ssubmitted //jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2____
 +
23:20:45 ... p4 counter change
 +
23:20:45 ... p4 changes -m1 -ssubmitted //jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2____
 +
23:20:46 ... p4 repos -C
 +
23:20:46 P4: builds: 277937 
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
 +
23:20:48 ... p4 info
 +
23:20:48 ... p4 info
 +
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
 +
23:20:48 
23:20:48 P4 Task: establishing connection.
23:20:48 ... server: esxa-perforce:1666
23:20:48 ... node: NIABuild1253
23:20:48 Building on Node: NIABuild1253
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
 +
23:20:48 ... p4 info
 +
23:20:48 ... p4 info
 +
23:20:48 ... p4 client -o jenkins-NIABuild1253-NIA_12.3_Full_Build_-_Win_2012R2_-_SQL_2012_-_MS____
 +
23:20:49 
23:20:49 P4 Task: establishing connection.
23:20:49 ... server: esxa-perforce:1666
23:20:49 ... node: NIABuild1253
23:20:49 
23:20:49 P4 Task: reverting all pending and shelved revisions.
23:20:49 ... p4 revert c:\j\esi-full\dev/...
 +
23:20:49 ... rm [abandoned files]
23:20:49 duration: (184ms)
23:20:49 
23:20:49 ... p4 sync -q c:\j\esi-full\dev/...#none
 +

 

 

Looking in vSphere I can see the tasks that revert the snapshot, power on the VM and then shut it down within a few seconds.

 

Task Name                                          Target                   Details  Initiator                Start Time                                           Completion Time

Initiate guest OS shutdown         NIABuild1253                     xxx                         5/28/2020 11:20:52 PM                  5/28/2020 11:20:52 PM

Power On virtual machine            NIABuild1253                     xxx                         5/28/2020 11:20:06 PM                  5/28/2020 11:20:11 PM

Revert snapshot                               NIABuild1253                     xxx                         5/28/2020 11:20:03 PM                  5/28/2020 11:20:05 PM

 

You can see that 11:20:03 PM it reverted the snapshot and then about 50 seconds later, after powering on the agent, it turned around and shut it down – the timestamps in the console output fall within that range.

 

My agents are configured as follows:

                Force VM Launch                                                             checked

                Wait for VMTools                                                             checked

                Delay between launch and boot complete           300

                Secondary launch method                                           Launch agent by connecting it to the master

                Availability                                                                          Bring this agent online when in demand, and take offline when idle

                In demand delay                                                              0

                Idle delay                                                                            1

                Disconnect After Limited Builds                                 1

                What to do when the slave is disconnected         Shutdown

 

 

Jenkins-slave-err.log shows the following: createEngine at 11:20:42 PM and then again at 11:20:51 PM, less than 10 seconds later. There’s no indication as to why the engine was created a 2nd time.

 

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main createEngine

INFO: Setting up agent: NIABuild1253

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener <init>

INFO: Jenkins agent is running in headless mode.

May 28, 2020 11:20:42 PM hudson.remoting.Engine startEngine

INFO: Using Remoting version: 3.33

May 28, 2020 11:20:42 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir

INFO: Using c:\j\remoting as a remoting work directory

May 28, 2020 11:20:42 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging

INFO: Both error and output logs will be printed to c:\j\remoting

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Locating server among [http://nxjenkinsmaster:80/, http://nxjenkinsmaster/]

May 28, 2020 11:20:42 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve

INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Agent discovery successful

  Agent address: nxjenkinsmaster

  Agent port:    49210

  Identity:      39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Handshaking

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Connecting to nxjenkinsmaster:49210

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Trying protocol: JNLP4-connect

May 28, 2020 11:20:42 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Remote identity confirmed: 39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0

May 28, 2020 11:20:43 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Connected

May 28, 2020 11:20:44 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn

WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main createEngine

INFO: Setting up agent: NIABuild1253

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener <init>

INFO: Jenkins agent is running in headless mode.

May 28, 2020 11:20:51 PM hudson.remoting.Engine startEngine

INFO: Using Remoting version: 3.33

May 28, 2020 11:20:51 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir

INFO: Using c:\j\remoting as a remoting work directory

May 28, 2020 11:20:51 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging

INFO: Both error and output logs will be printed to c:\j\remoting

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Locating server among [http://nxjenkinsmaster:80/, http://nxjenkinsmaster/]

May 28, 2020 11:20:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve

INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Agent discovery successful

  Agent address: nxjenkinsmaster

  Agent port:    49210

  Identity:      39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Handshaking

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Connecting to nxjenkinsmaster:49210

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Trying protocol: JNLP4-connect

May 28, 2020 11:20:51 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Remote identity confirmed: 39:6a:41:36:af:f4:22:32:e5:3c:bd:0d:4e:f5:76:f0

May 28, 2020 11:20:52 PM hudson.remoting.jnlp.Main$CuiListener status

INFO: Connected

May 28, 2020 11:20:53 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn

WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/

 

 

On Jenkins master there are 3 relevant logs for this agent:

Slave.log.4 from 11:20:42 PM

 

[NIABuild1253] Starting Virtual Machine...

[NIABuild1253] Reverting to snapshot:Base Snapshot

[NIABuild1253] Powering on VM

[NIABuild1253] Waiting for VMTools

[NIABuild1253] VM Tools are running

[NIABuild1253] Finished wait for VMTools

[NIABuild1253] Waiting for up to 300 seconds for slave to come online.

 

Slave.log.3 from 11:20:50 PM

 

Inbound agent connected from 10.110.4.17/10.110.4.17

Remoting version: 3.33

This is a Windows agent

[NIABuild1253] Slave has come online.

[NIABuild1253] Slave online

Agent successfully connected and online

ERROR: Connection terminated

java.nio.channels.ClosedChannelException

                at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)

                at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)

                at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)

                at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)

                at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                at java.lang.Thread.run(Unknown Source)

[NIABuild1253] Running disconnect procedure...

[NIABuild1253] Disconnect done.  Performing idle action SHUTDOWN...

 

Slave.log.2 from 11:21:23 PM

 

Inbound agent connected from 10.110.4.17/10.110.4.17

Remoting version: 3.33

This is a Windows agent

[NIABuild1253] Attempting a graceful shutdown

Agent successfully connected and online

ERROR: Connection terminated

java.nio.channels.ClosedChannelException

                at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)

                at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)

                at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)

                at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)

                at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                at java.lang.Thread.run(Unknown Source)

[NIABuild1253] Guest shutdown succeeded

[NIABuild1253] Idle action SHUTDOWN complete.

[NIABuild1253] Running disconnect procedure...

[NIABuild1253] Disconnect done.  Performing idle action SHUTDOWN...

[NIABuild1253] Idle action SHUTDOWN complete.

[NIABuild1253] Running disconnect procedure...

[NIABuild1253] Disconnect done.  Performing idle action SHUTDOWN...

[NIABuild1253] Idle action SHUTDOWN complete.

 

Thanks for any assistance,

Randy Beckwith

 

 

Reply all
Reply to author
Forward
0 new messages