[JIRA] (JENKINS-57447) Agents sometimes left as suspended after starting from shutdown state

31 views
Skip to first unread message

jieshe@microsoft.com (JIRA)

unread,
May 14, 2019, 4:18:02 AM5/14/19
to jenkinsc...@googlegroups.com
Jie Shen created an issue
 
Jenkins / Bug JENKINS-57447
Agents sometimes left as suspended after starting from shutdown state
Issue Type: Bug Bug
Assignee: Azure DevOps
Components: azure-vm-agents-plugin
Created: 2019-05-14 08:17
Priority: Minor Minor
Reporter: Jie Shen

When using idle retention strategy and setting shutting down only option, sometimes the agents may not be able to connect successfully. And it will stay as suspended all the time.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jieshe@microsoft.com (JIRA)

unread,
May 14, 2019, 4:19:02 AM5/14/19
to jenkinsc...@googlegroups.com

jieshe@microsoft.com (JIRA)

unread,
May 14, 2019, 4:22:03 AM5/14/19
to jenkinsc...@googlegroups.com
Jie Shen updated an issue
When using idle retention strategy and setting shutting down only option, sometimes the agents may not be able to connect successfully. And it will stay as suspended all the time.


 

Some useful log from @Sander Bel  ,

 
{code:java}
2019-05-09 09:20:05.878+0000 [id=1060]  FINE    hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Maintainer Pool Size
2019-05-09 09:20:05.879+0000 [id=1060]  FINE    hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Maintainer Pool Size. 1 ms
2019-05-09 09:22:05.560+0000 [id=1165]  FINE    hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Agents Clean Task
2019-05-09 09:22:05.560+0000 [id=1165]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: start
2019-05-09 09:22:05.561+0000 [id=1165]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: Running clean with 5 minute timeout
2019-05-09 09:22:05.561+0000 [id=1166]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanVMs: AzureVMAgentCleanUpTask: cleanVMs: node ubuntu-test3c8d10 blocked to cleanup
2019-05-09 09:22:05.561+0000 [id=1166]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Cleaning deployments
2019-05-09 09:22:05.561+0000 [id=1166]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Checking deployment ubuntu-test0509110640988
2019-05-09 09:22:07.340+0000 [id=1166]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment created on 5/9/19 11:08 AM
2019-05-09 09:22:07.340+0000 [id=1166]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment newer than timeout, keeping
2019-05-09 09:22:07.340+0000 [id=1166]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Done cleaning deployments
2019-05-09 09:22:07.439+0000 [id=1165]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: end
2019-05-09 09:22:07.439+0000 [id=1165]  FINE    hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Agents Clean Task. 1,879 ms
2019-05-09 09:22:54.407+0000 [id=104]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label mdb-sandbox-acs
2019-05-09 09:22:54.407+0000 [id=104]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:22:54.407+0000 [id=104]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu
2019-05-09 09:22:54.407+0000 [id=104]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest
2019-05-09 09:23:13.239+0000 [id=103]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label chef
2019-05-09 09:23:13.239+0000 [id=103]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:23:13.239+0000 [id=103]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: ubuntu-test matches!
2019-05-09 09:23:13.239+0000 [id=103]   INFO    c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: start for label chef workLoad 1
2019-05-09 09:23:13.240+0000 [id=103]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label chef
2019-05-09 09:23:13.240+0000 [id=103]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:23:13.240+0000 [id=103]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: ubuntu-test matches!
2019-05-09 09:23:13.240+0000 [id=103]   INFO    c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: checking for node reuse options
2019-05-09 09:23:13.240+0000 [id=103]   INFO    c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: agent computer eligible for reuse ubuntu-test3c8d10
2019-05-09 09:23:13.240+0000 [id=103]   INFO    c.m.a.v.AzureVMManagementServiceDelegate#virtualMachineExists: AzureVMManagementServiceDelegate: virtualMachineExists: check for ubuntu-test3c8d10
2019-05-09 09:23:14.168+0000 [id=103]   INFO    c.m.a.v.AzureVMManagementServiceDelegate#virtualMachineExists: AzureVMManagementServiceDelegate: virtualMachineExists: ubuntu-test3c8d10 exists
2019-05-09 09:23:14.169+0000 [id=103]   INFO    c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: asynchronous provision finished, returning 1 planned node(s)
2019-05-09 09:23:14.169+0000 [id=930]   INFO    c.m.azure.vmagent.AzureVMCloud#lambda$provision$0: Found existing node, starting VM ubuntu-test3c8d10
2019-05-09 09:23:14.170+0000 [id=930]   INFO    c.m.a.v.AzureVMManagementServiceDelegate#startVirtualMachine: AzureVMManagementServiceDelegate: startVirtualMachine: ubuntu-test3c8d10
2019-05-09 09:24:03.370+0000 [id=930]   INFO    c.m.a.v.AzureVMManagementServiceDelegate#setVirtualMachineDetails: The Azure agent doesn't have a public IP. Will use the private IP
2019-05-09 09:24:03.370+0000 [id=930]   INFO    c.m.a.v.AzureVMManagementServiceDelegate#setVirtualMachineDetails: Azure agent details:
nodeNameubuntu-test3c8d10
adminUserName=jenkins_node
shutdownOnIdle=true
retentionTimeInMin=0
labels=chef
2019-05-09 09:24:03.370+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: launch method called for agent ubuntu-test3c8d10
2019-05-09 09:24:05.920+0000 [id=1348]  INFO    c.m.a.v.AzureVMManagementServiceDelegate#isVMAliveOrHealthy: AzureVMManagementServiceDelegate: isVMAliveOrHealthy: status PowerState/running
2019-05-09 09:24:05.921+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#connectToSsh: AzureVMAgentSSHLauncher: connectToSsh: start
2019-05-09 09:24:05.921+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#getRemoteSession: AzureVMAgentSSHLauncher: getRemoteSession: getting remote session for user jenkins to host 10.76.96.77:22
2019-05-09 09:24:37.558+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#getRemoteSession: AzureVMAgentSSHLauncher: getRemoteSession: Got remote session for user jenkins to host 10.76.96.77:22
2019-05-09 09:24:37.559+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#connectToSsh: AzureVMAgentSSHLauncher: connectToSsh: Got remote connection
2019-05-09 09:24:37.559+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: starting test -e ~/.azure-agent-init
2019-05-09 09:24:38.811+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: executed successfully
2019-05-09 09:24:38.811+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: checking for java runtime
2019-05-09 09:24:38.811+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: starting java -fullversion
2019-05-09 09:24:40.152+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: executed successfully
2019-05-09 09:24:40.152+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: java runtime present, copying slave.jar to remote
2019-05-09 09:24:40.154+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#copyFileToRemote: AzureVMAgentSSHLauncher: copyFileToRemote: Initiating file transfer to slave.jar
2019-05-09 09:24:51.720+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#copyFileToRemote: AzureVMAgentSSHLauncher: copyFileToRemote: copied file Successfully to slave.jar
2019-05-09 09:24:51.721+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: launching agent: java  -jar slave.jar
2019-05-09 09:24:51.742+0000 [id=1348]  INFO    c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: Connected successfully
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label chef2
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: test-jenkins-ubuntu matches!
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label azure-acos-atlasco
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: acostest matches!
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label mdb-sandbox-acs
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu
2019-05-09 09:24:56.983+0000 [id=97]    FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest
2019-05-09 09:25:05.878+0000 [id=1474]  FINE    hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Maintainer Pool Size
2019-05-09 09:25:05.879+0000 [id=1474]  FINE    hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Maintainer Pool Size. 1 ms
2019-05-09 09:25:13.239+0000 [id=101]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label mdb-sandbox-acs
2019-05-09 09:25:13.239+0000 [id=101]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test
2019-05-09 09:25:13.239+0000 [id=101]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu
2019-05-09 09:25:13.239+0000 [id=101]   FINE    c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest
2019-05-09 09:27:05.560+0000 [id=1812]  FINE    hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Agents Clean Task
2019-05-09 09:27:05.561+0000 [id=1812]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: start
2019-05-09 09:27:05.561+0000 [id=1812]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: Running clean with 5 minute timeout
2019-05-09 09:27:05.561+0000 [id=1813]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Cleaning deployments
2019-05-09 09:27:05.561+0000 [id=1813]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Checking deployment ubuntu-test0509110640988
2019-05-09 09:27:07.711+0000 [id=1813]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment created on 5/9/19 11:08 AM
2019-05-09 09:27:07.711+0000 [id=1813]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment newer than timeout, keeping
2019-05-09 09:27:07.712+0000 [id=1813]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Done cleaning deployments
2019-05-09 09:27:07.800+0000 [id=1812]  FINE    c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: end
2019-05-09 09:27:07.800+0000 [id=1812]  FINE    hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Agents Clean Task. 2,240 ms
2019-05-09 09:28:46.078+0000 [id=101]   INFO    c.m.a.v.AzureVMCloudRetensionStrategy#check: AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: ubuntu-test3c8d10, action: shutdown
2019-05-09 09:28:46.079+0000 [id=1967]  INFO    c.m.a.v.AzureVMCloudRetensionStrategy$1#call: AzureVMCloudRetensionStrategy: going to idleTimeout
{code}
 

jieshe@microsoft.com (JIRA)

unread,
May 14, 2019, 4:24:01 AM5/14/19
to jenkinsc...@googlegroups.com
Jie Shen commented on Bug JENKINS-57447
 
Re: Agents sometimes left as suspended after starting from shutdown state

This issue seems to be caused by https://github.com/jenkinsci/azure-vm-agents-plugin/blob/d83f4153627fc6f64c166217bac59e27876ee984/src/main/java/com/microsoft/azure/vmagent/remote/AzureVMAgentSSHLauncher.java#L250 . This method seems to be hanged forever and the below logs do not be triggered. Need more investigation on this because this method will call Jenkins core code.

sander.bel@external.atlascopco.com (JIRA)

unread,
May 28, 2019, 10:25:01 AM5/28/19
to jenkinsc...@googlegroups.com

jieshe@microsoft.com (JIRA)

unread,
Jun 17, 2019, 11:34:02 PM6/17/19
to jenkinsc...@googlegroups.com
Jie Shen commented on Bug JENKINS-57447

My assumption here seems to be wrong. This issue seems not be hung on the setChannel method. Any help or suggestion here will be welcome.

michael.burtin@netgem.com (JIRA)

unread,
Sep 16, 2019, 5:20:03 AM9/16/19
to jenkinsc...@googlegroups.com

Jie Shen I have the issue you describe.

We have agents configured to shutdown after some idle time. When agent need to be reused, it come back online but never goes out suspended state. It appears from thread dump that it's blocked in https://github.com/jenkinsci/azure-vm-agents-plugin/blob/d83f4153627fc6f64c166217bac59e27876ee984/src/main/java/com/microsoft/azure/vmagent/remote/AzureVMAgentSSHLauncher.java#L250 calling https://github.com/jenkinsci/branch-api-plugin/blob/branch-api-2.5.4/src/main/java/jenkins/branch/WorkspaceLocatorImpl.java#L534:

 Thread 31073: (state = BLOCKED) 
 - jenkins.branch.WorkspaceLocatorImpl$Collector.onOnline(hudson.model.Computer, hudson.model.TaskListener) @bci=27, line=534 (Interpreted frame) 
 - hudson.slaves.SlaveComputer.setChannel(hudson.remoting.Channel, java.io.OutputStream, hudson.remoting.Channel$Listener) @bci=662, line=698 (Interpreted frame) 
 - hudson.slaves.SlaveComputer.setChannel(java.io.InputStream, java.io.OutputStream, java.io.OutputStream, hudson.remoting.Channel$Listener) @bci=82, line=432 (Interpreted frame) 
 - com.microsoft.azure.vmagent.remote.AzureVMAgentSSHLauncher.launch(hudson.slaves.SlaveComputer, hudson.model.TaskListener) @bci=977, line=250 (Interpreted frame) 
 - hudson.slaves.SlaveComputer$1.call() @bci=88, line=294 (Interpreted frame) 
 - jenkins.util.ContextResettingExecutorService$2.call() @bci=18, line=46 (Compiled frame) 
 - jenkins.security.ImpersonatingExecutorService$2.call() @bci=17, line=71 (Interpreted frame) 
 - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) 
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Interpreted frame) 
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame) 
 - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame) 

 

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages