Vsphere Cloud plugin stops working after a number of builds

33 views
Skip to first unread message

Jorge Peña Cotarelo

unread,
Mar 15, 2017, 12:41:43 PM3/15/17
to Jenkins Users
Hello everyone,

I am using the Vsphere Cloud plugin to spin Jenkins agents (slaves) on demand in new virtual machines using linked clones and keep-until-idle retention strategy. In the Jenkins instance there are a mix of "static virtual machines" Jenkins agents and Jenkins agents on demand

The plugin works correctly until a certain time or maybe limited number of agents created. Approximately after +-70 builds or 12 hours the virtual machines are created correctly, connected to Jenkins but they are shutdown before the build is allocated in the agent or virtual machine.

The following is a piece of log of the main Jenkins.log file when this situation occurs, this is happening over and over.
  
INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Starting Virtual Machine...
Feb 21, 2017 3:49:19 PM org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1 call
INFO
: Provisioned new slave ess-lin_baulim19krk8qcnwn1k5yloml
Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] VM already powered on
Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for VMTools
Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] VM Tools are running
Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Finished wait for VMTools
Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@8c91844 to launch slave.
Feb 21, 2017 3:49:29 PM hudson.slaves.NodeProvisioner$2 run
INFO
: ess-lin_baulim19krk8qcnwn1k5yloml provisioning successfully completed. We have now 5 computer(s)
[02/21/17 15:49:58] SSH Launch of unknown on myip.mydns failed in 63,139 ms
Feb 21, 2017 3:49:58 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_6c1eramupwzkjyu9n3t4cdl5] Slave online
Feb 21, 2017 3:50:05 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
[ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer
Feb 21, 2017 3:50:15 PM hudson.slaves.CloudRetentionStrategy check
INFO
: Disconnecting ess-lin_baulim19krk8qcnwn1k5yloml
Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer
Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress.
Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Asking SSHLauncher to launch slave.
Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress.
Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted
INFO
: There are 0 VMs in this cloud. The instance cap for the cloud is 300, so we have room for more
Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud provision
INFO
: provision(sto-lin-tech,1): 0 existing slaves (=0 executors), templates available are [Template[prefix=ess-lin, provisioned=[], planned=[], max=2147483647, fullness=0.000%]]

If we take a look to the log of the agent in the Jenkins UI the following can be seen

[ess-lin_8hbmdsbdw82ekvakhf1667bk] Starting Virtual Machine...
[ess-lin_8hbmdsbdw82ekvakhf1667bk] VM already powered on
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for VMTools
[ess-lin_8hbmdsbdw82ekvakhf1667bk] VM Tools are running
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Finished wait for VMTools
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@c8db95d to launch slave.
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Ignoring disconnect attempt because a connect attempt is in progress.
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Asking SSHLauncher to launch slave.
[02/21/17 17:55:45] [SSH] Opening SSH connection to [myip.mydns]:22.
[02/21/17 17:55:46] [SSH] Authentication successful.
[02/21/17 17:55:46] [SSH] The remote users environment is:
BASH
=/bin/bash
BASHOPTS
=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES
=()
BASH_ARGC
=()
BASH_ARGV
=()
BASH_CMDS
=()
BASH_EXECUTION_STRING
=set
BASH_LINENO
=()
BASH_SOURCE
=()
BASH_VERSINFO
=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION
='4.3.11(1)-release'
DIRSTACK
=()
EUID
=1000
GROUPS
=()
HOME
=/home/jenkins
HOSTNAME
=sto-lin-ondemand01
HOSTTYPE
=x86_64
IFS
=$' \t\n'
LANG
=en_US.UTF-8
LD_LIBRARY_PATH
=/usr/local/lib
LOGNAME
=jenkins
MACHTYPE
=x86_64-pc-linux-gnu
MAIL
=/var/mail/jenkins
MAKEFLAGS
='-j 8'
OPTERR
=1
OPTIND
=1
OSTYPE
=linux-gnu
PATH
=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
PIPESTATUS
=([0]="0")
PPID
=1916
PS4
='+ '
PWD
=/home/jenkins
SHELL
=/bin/bash
SHELLOPTS
=braceexpand:hashall:interactive-comments
SHLVL
=1
SSH_CLIENT
='myip.mydns 36503 22'
SSH_CONNECTION
='myip.mydns 36503 myip.mydns 22'
TERM
=dumb
UID
=1000
USER
=jenkins
XDG_RUNTIME_DIR
=/run/user/1000
XDG_SESSION_ID
=1
_
=MAKEFLAGS
[02/21/17 17:55:46] [SSH] Checking java version of java
[02/21/17 17:55:46] [SSH] java -version returned 1.7.0_79.
[02/21/17 17:55:46] [SSH] Starting sftp client.
[02/21/17 17:55:47] [SSH] Copying latest slave.jar...
[02/21/17 17:55:48] [SSH] Copied 506,667 bytes.
Expanded the channel window size to 4MB
[02/21/17 17:55:48] [SSH] Starting slave process: cd "/home/jenkins" && java  -jar slave.jar
<===[JENKINS REMOTING CAPACITY]===>channel started
Slave.jar version: 2.53.3
This is a Unix slave
Evacuated stdout
Slave successfully connected and online
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Slave online










HTTP ERROR
404




Problem accessing /computer/ess-lin_8hbmdsbdw82ekvakhf1667bk/logText/progressiveHtml. Reason:


   
Not Found
Powered by Jetty://


This situation is fixed when I set back to 0 the instance cap limit (for some reason instead of 0 the Jenkins UI shows a number of 2147483647 instead of 0 and the plugin starts immediately working fine again.

When I set it to 0, the following error appears in the main Jenkins.log file:


Mar 15, 2017 5:14:06 PM hudson.model.Run execute
INFO
: ess/ess_monitorSlaves #2662 main build action completed: SUCCESS
Mar 15, 2017 5:14:09 PM hudson.ivy.IvyBuildTrigger$DescriptorImpl configure
INFO
: IvyConfigurations: 0
Mar 15, 2017 5:14:09 PM hudson.plugins.jabber.im.transport.JabberPublisherDescriptor configure
INFO
: No hostname specified.
Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
WARNING
: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
WARNING
: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
WARNING
: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
WARNING
: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
Mar 15, 2017 5:14:09 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO
: STARTING VSPHERE CLOUD


Is there something I can do to fix this behaviour ? I opened a Jira case just in case : https://issues.jenkins-ci.org/browse/JENKINS-42811
Reply all
Reply to author
Forward
0 new messages