[JIRA] (JENKINS-61033) Provisioning thread hangs

4 views
Skip to first unread message

pibeitz@gmail.com (JIRA)

unread,
Feb 10, 2020, 8:08:03 AM2/10/20
to jenkinsc...@googlegroups.com
Pierre Beitz created an issue
 
Jenkins / Bug JENKINS-61033
Provisioning thread hangs
Issue Type: Bug Bug
Assignee: Nicolas De Loof
Components: docker-plugin
Created: 2020-02-10 13:07
Environment: * Jenkins 2.190.3
* docker-plugin:1.1.9
Priority: Major Major
Reporter: Pierre Beitz

Symptom

  • At some point in time, the docker plugin stops to provision any new docker agent.

Evidence

  • I found out that at some point, the docker plugin tries to provision new containers and hangs:

 

{{

2020-02-05 23:55:49.244+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null
2020-02-05 23:55:49.467+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#canAddProvisionedSlave: Not Provisioning '**/jenkins-agent:2.190.3.2'. Template instance limit of '8' reached on cloud '}}}}{{**{{' }}

2020-02-05 23:55:49.467+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null 2020-02-05 23:55:49.583+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#canAddProvisionedSlave: Not Provisioning '***/jenkins-agent:2.190.3.2'. Template instance limit of '8' reached on cloud '}}{{*{{' 2020-02-05 }}

{{23:55:59.244+0000 [id=71] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null // NO MORE ENTRIES

}}

  • After this last call, there is no more trace of any attempt from the docker plugin. Looking into the threaddump, I found out that the thread running the DockerCloud code is waiting on an external output:

 

{{ "jenkins.util.Timer 6" id=71 (0x47) state=WAITING cpu=95%

  • waiting on <0x7891a9a1> (a java.util.concurrent.CountDownLatch$Sync)
  • locked <0x7891a9a1> (a java.util.concurrent.CountDownLatch$Sync)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
    at com.github.dockerjava.core.async.ResultCallbackTemplate.awaitCompletion(ResultCallbackTemplate.java:92)
    at com.github.dockerjava.netty.InvocationBuilder$ResponseCallback.awaitResult(InvocationBuilder.java:60)
    at com.github.dockerjava.netty.InvocationBuilder.get(InvocationBuilder.java:189)
    at io.jenkins.docker.client.ListContainersCmdExec.execute(ListContainersCmdExec.java:60)
    at io.jenkins.docker.client.ListContainersCmdExec.execute(ListContainersCmdExec.java:24)
    at com.github.dockerjava.netty.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
    at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
    at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker(DockerCloud.java:614)
    at com.nirima.jenkins.plugins.docker.DockerCloud.canAddProvisionedSlave(DockerCloud.java:632)
    at com.nirima.jenkins.plugins.docker.DockerCloud.provision(DockerCloud.java:352)}}

    Hypothesis

  • It appears the wait happens in this code and I'm under the impression that there is not timeout so in case the call hangs, the thread will hang and the plugin won't provision anymore.
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

pibeitz@gmail.com (JIRA)

unread,
Feb 10, 2020, 8:11:02 AM2/10/20
to jenkinsc...@googlegroups.com
Pierre Beitz updated an issue
Change By: Pierre Beitz
h3. Symptom
* At some point in time, the docker plugin stops to provision any new docker agent.

h3. Evidence
* I found out that at some point, the docker plugin tries to provision new containers and hangs:

 

{ { code:java}

2020-02-05 23:55:49.244+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null
2020-02-05 23:55:49.467+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#canAddProvisionedSlave: Not Provisioning '***/jenkins-agent:2.190.3.2'. Template instance limit of '8' reached on cloud ' }}{{}} * {{ ** }}{\{ ' }}

{{
2020-02-05 23:55:49.467+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null
2020-02-05 23:55:49.583+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#canAddProvisionedSlave: Not Provisioning ' }}{{ *** *}}*{{ /jenkins-agent:2.190.3.2'. Template instance limit of '8' reached on cloud ' }}{{}} * {{ ** }}{\{ '
2020-02-05 }}

{{
23:55:59.244+0000 [id=71] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null
// NO MORE ENTRIES

{code } }
 
* After this last call, there is no more trace of any attempt from the docker plugin. Looking into the threaddump, I found out that the thread running the {{DockerCloud}} code is waiting on an external output:  

 

{ { code:java}
"jenkins.util.Timer [#6]" id=71 (0x47) state=WAITING cpu=95%
- waiting on <0x7891a9a1> (a java.util.concurrent.CountDownLatch$Sync)
- locked <0x7891a9a1> (a java.util.concurrent.CountDownLatch$Sync)

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.github.dockerjava.core.async.ResultCallbackTemplate.awaitCompletion(ResultCallbackTemplate.java:92)
at com.github.dockerjava.netty.InvocationBuilder$ResponseCallback.awaitResult(InvocationBuilder.java:60)
at com.github.dockerjava.netty.InvocationBuilder.get(InvocationBuilder.java:189)
at io.jenkins.docker.client.ListContainersCmdExec.execute(ListContainersCmdExec.java:60)
at io.jenkins.docker.client.ListContainersCmdExec.execute(ListContainersCmdExec.java:24)
at com.github.dockerjava.netty.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker(DockerCloud.java:614)
at com.nirima.jenkins.plugins.docker.DockerCloud.canAddProvisionedSlave(DockerCloud.java:632)
at com.nirima.jenkins.plugins.docker.DockerCloud.provision(DockerCloud.java:352)

{code
} }
 
-
h3. Hypothesis

* It appears the wait happens in [this code|https://github.com/jenkinsci/docker-plugin/blob/02397d4faf0cbf729d4d25867768e7c9052aec18/src/main/java/io/jenkins/docker/client/ListContainersCmdExec.java#L59-L61] and I'm under the impression that there is not timeout so in case the call hangs, the thread will hang and the plugin won't provision anymore.
Reply all
Reply to author
Forward
0 new messages