Agent "Lost Contact"

90 views
Skip to first unread message

Prabha

unread,
Feb 6, 2021, 7:59:23 PM2/6/21
to go-cd
Hi All,

We have been using gocd 20.4 for a while now and we have around 20 agents directly reporting to go-server(no proxy or loadbalancers in between). All these agents were working well till yesterday but all of a sudden all these agents "Lost Contact" for no reason. 

I have been digging around this issue for hours but its just dead ends everywhere.

Seen few threads similar to this issue but not of great help, any suggestions would be great help.

2021-02-06 23:06:00,105 ERROR [scheduler-1] AgentHTTPClientController:103 - Error occurred when agent tried to ping server:
org.springframework.remoting.RemoteAccessException: Could not access HTTP invoker remote service at [https://myserver.com:8154/go/remoting/remoteBuildRepository]; nested exception is org.apache.http.NoHttpResponseException: myserver.com:8154 failed to respond
        at org.springframework.remoting.httpinvoker.HttpInvokerClientInterceptor.convertHttpInvokerAccessException(HttpInvokerClientInterceptor.java:226)
        at org.springframework.remoting.httpinvoker.HttpInvokerClientInterceptor.invoke(HttpInvokerClientInterceptor.java:153)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy10.ping(Unknown Source)
        at com.thoughtworks.go.agent.AgentHTTPClientController.ping(AgentHTTPClientController.java:98)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:567)
        at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:835)
Caused by: org.apache.http.NoHttpResponseException: myserver.com:8154 failed to respond
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at com.thoughtworks.go.agent.common.ssl.GoAgentServerHttpClient.execute(GoAgentServerHttpClient.java:50)
        at com.thoughtworks.go.agent.GoHttpClientHttpInvokerRequestExecutor.doExecuteRequest(GoHttpClientHttpInvokerRequestExecutor.java:65)
        at org.springframework.remoting.httpinvoker.AbstractHttpInvokerRequestExecutor.executeRequest(AbstractHttpInvokerRequestExecutor.java:137)
        at org.springframework.remoting.httpinvoker.HttpInvokerClientInterceptor.executeRequest(HttpInvokerClientInterceptor.java:202)
        at org.springframework.remoting.httpinvoker.HttpInvokerClientInterceptor.executeRequest(HttpInvokerClientInterceptor.java:184)
        at org.springframework.remoting.httpinvoker.HttpInvokerClientInterceptor.invoke(HttpInvokerClientInterceptor.java:150)
        ... 16 common frames omitted
 
Thanks!
Prabha

Sriram Narayanan

unread,
Feb 6, 2021, 8:46:10 PM2/6/21
to go...@googlegroups.com
If all of the agents lost contact, this looks like a network-level drop. 



Are you able to reach that GoCD service from your laptop, from the server itself (via a curl and via a temporary GoCD agent)? 

Do you know what changed (Network maintenance, OS hardening, patches applied)?

-- Ram

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/8be9be9e-c6e5-41b2-9af2-010c90aa41c0n%40googlegroups.com.

Prabha

unread,
Feb 7, 2021, 3:56:10 AM2/7/21
to go-cd
Hi Ram,

Thanks for your response. I did verified the firewall rules and other Operating system related changes, this server is barely touched and it was last rebooted 108 days ago until I restart it yesterday. 

Also we should be the one who does the patching so I am pretty sure this server untouched for long time now. 

And I have observed one more thing now, i recently deleted the agents from go-server console along the with its guid & token in agent cnnfig directory before restarting the agent. When i restart the agent, agents are reporting back to server with pending status for approval. As soon as i approve the new agents it reports properly for few minutes then it loses contact again.

Initially I thought this could be something to do with SSL then even i tried connecting agents on port  8153 but however am getting the error  same.

Thanks!
Prabha.

Sriram Narayanan

unread,
Feb 7, 2021, 10:14:06 AM2/7/21
to go...@googlegroups.com
Are there any stacktraces or messages in the GoCD server logs when you approve the agents?

What if you place an agent temporarily on the same OS as the GoCD server and try registering it?

Could you consider upgrading to the latest GoCD Server (stop the earlier instance,, start the new instance) ?

Something has changed which has taken effect due to the reboot. I can think of firewall rule files that are loaded or packages (JVM?) that are upgraded.

-- Ram

Reply all
Reply to author
Forward
0 new messages