[JIRA] (JENKINS-59403) agent fails immediately when started in a docker-compose cluster

5 views
Skip to first unread message

thunderaxiom@hotmail.com (JIRA)

unread,
Sep 17, 2019, 3:37:03 AM9/17/19
to jenkinsc...@googlegroups.com
ravn created an issue
 
Jenkins / Bug JENKINS-59403
agent fails immediately when started in a docker-compose cluster
Issue Type: Bug Bug
Assignee: Jeff Thompson
Components: remoting
Created: 2019-09-17 07:36
Priority: Minor Minor
Reporter: ravn

I am trying to work with a master + some agents in a single docker cluster orchestrated by docker-compose with Jenkins ver. 2.176.3 under Windows 10 to experiment locally.  

Unfortunately the agents spin up much faster than the master, so they try to connect before the master is ready, which for a docker cluster results in a ConnectException (as the docker daemon handles the connect to the socket, but the instance is not ready yet).   From my reading of the source the initial connect is not try-catch proctected for this, so the retry mechanism does not come into play.

 

This means that in this case that all the agents fail at startup and has to be started manually afterwards.   For my immediate purposes a "–wait" flag waiting X seconds when starting the agent will be fine (or similar), but perhaps the resilience mechanism needs to incoorporate this usecase too?

 

 

```

Sep 17, 2019 9:32:17 AM hudson.remoting.jnlp.Main$CuiListener statuscsr-jenkins-agent-base3_1 | INFO: Locating server among http://docker-images_csr-jenkins_1.local:8080csr-jenkins-agent-base3_1 | Sep 17, 2019 9:32:17 AM hudson.remoting.jnlp.Main$CuiListener errorcsr-jenkins-agent-base3_1 | SEVERE: Failed to connect to http://docker-images_csr-jenkins_1.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused)
csr-jenkins-agent-base3_1 | java.io.IOException: Failed to connect to http://docker-images_csr-jenkins_1.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused)
csr-jenkins-agent-base3_1 |
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:197)
csr-jenkins-agent-base3_1 |
at hudson.remoting.Engine.innerRun(Engine.java:523)
csr-jenkins-agent-base3_1 |
at hudson.remoting.Engine.run(Engine.java:474)
csr-jenkins-agent-base3_1 | Caused by: java.net.ConnectException: Connection refused (Connection refused)
csr-jenkins-agent-base3_1 |
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
csr-jenkins-agent-base3_1 |
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
csr-jenkins-agent-base3_1 |
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
csr-jenkins-agent-base3_1 |
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
csr-jenkins-agent-base3_1 |
at java.base/java.net.Socket.connect(Socket.java:591)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1242)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1181)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1075)
csr-jenkins-agent-base3_1 |
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1009)
csr-jenkins-agent-base3_1 |
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:194)
csr-jenkins-agent-base3_1 | ... 2 more

```

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

jthompson@cloudbees.com (JIRA)

unread,
Sep 25, 2019, 1:01:02 PM9/25/19
to jenkinsc...@googlegroups.com
Jeff Thompson commented on Bug JENKINS-59403
 
Re: agent fails immediately when started in a docker-compose cluster

We've had a couple of attempts to augment or change the startup process lately, some which were eventually successful and some which failed. The key to success has been a clearly defined use case and flow. If we can figure that out for this scenario, this could be a nice enhancement.

I'm always a little concerned about approaches that just wait for some arbitrary amount of time, which may work but not always. Though, if the alternative is to just wait forever, a timeout may not be a bad idea.

msicker@cloudbees.com (JIRA)

unread,
Nov 5, 2019, 4:03:03 PM11/5/19
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages