master to agent connection keeps breaking every 3-4 hours

82 views
Skip to first unread message

Ashish Sharma

unread,
Sep 29, 2020, 4:47:22 AM9/29/20
to Jenkins Users

Hi Team, We are using JNLP to connect Mac agent to Linux master node.

Jenkins agent keeps disconnecting frequently, and we are getting below logs in master.

Can you please suggest how to resolve this? What are the steps to further triage the same.

Some of the questions we are trying to answer is:

  • What is EOFException?
  • Why does agent tries to connect to master when its already connected?
  • Why does eventually the ping / connection fails? 

We keep seeing this pattern in logs too often and too frequently. Any help would be appreciated.

Results are same even if we try any of the below options:

  • Connected using Launch agent from Browser
  • Connected by starting automator in Mac which runs shell/zsh to run agent.jar
  • Connected by running plist in Mac

 Jenkins environment:

  • Jenkins: 2.249.1
  • Master Node: Linux RHEL 8.1
  • Master Java Version: 1.8.0_242
  • Slave System: macOS Catalina, Version 10.15.6
  • Slave Java Version: 1.8.0_261
Connection #xxx failed: java.io.EOFException Sep 29, 2020 2:45:21 AM  INFO hudson.TcpSlaveAgentListener$ConnectionHandler run Accepted JNLP4-connect connection #xxx from x.x.x.x/x.x.x.x:57215 Sep 29, 2020 2:45:21 AM  INFO org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer on Recv[JNLP4-connect connection from x.x.x.x/x.x.x.x:57215] Refusing headers from remote: <agent_name> is already connected to this master. Rejecting this connection.Sep 29, 2020 2:45:31 AM  INFO hudson.TcpSlaveAgentListener$ConnectionHandler runConnection #xxx failed: java.io.EOFException Sep 29, 2020 2:45:31 AM INFO hudson.TcpSlaveAgentListener$ConnectionHandler runAccepted JNLP4-connect connection #xxx from x.x.x.x/x.x.x.x:57218 Sep 29, 2020 2:45:32 AM  INFO org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv[JNLP4-connect connection from x.x.x.x/x.x.x.x] Refusing headers from remote: <agent_name> is already connected to this master. Rejecting this connection.Sep 29, 2020 2:45:32 AM INFO hudson.slaves.ChannelPinger$1 onDeadPing failed. Terminating the channel JNLP4-connect connection from x.x.x.x/x.x.x.x:57015. java.util.concurrent.TimeoutException: Ping started at 1601318492966 hasn't completed by 1601318732966        at hudson.remoting.PingThread.ping(PingThread.java:134)        at hudson.remoting.PingThread.run(PingThread.java:90)

 

TIA

 

jeremy mordkoff

unread,
Sep 29, 2020, 12:03:22 PM9/29/20
to Jenkins Users
We have a similar issue that only seems to occur during long running jobs (over 5 hours). The traceback is different but we also see the EOF exception. My client is Ubuntu linux 

I tried to trace the issue by running tcpdumps at both ends on the ssh session from the master to the slave but I saw nothing amiss. I suspect that there is a connection inside the ssh session but that will be hard to catch using tcpdump. 

I wonder if I need to enable some kind of keep alives.....

Ivan Fernandez Calvo

unread,
Sep 29, 2020, 12:15:23 PM9/29/20
to Jenkins Users
  • Why does agent tries to connect to master when its already connected?
That suggests half-closed connections, it means that the agent loses the connection with the Jenkins instance but the FIN notification never arrived to the Jenkins instance so the connection is open in the Jenkins instance side. It could be related to networks equipment and the policies that have for open connections, the recommendation is to tune the TCP stack to keep those connections open with traffic see https://support.cloudbees.com/hc/en-us/articles/115001416548#7tcpretransmissiontimeoutossperhapsincrease

Ashish Sharma

unread,
Sep 29, 2020, 9:22:00 PM9/29/20
to Jenkins Users
Thanks, are these setting to be applied on master side or slave or both?

Ivan Fernandez Calvo

unread,
Sep 30, 2020, 1:33:57 PM9/30/20
to Jenkins Users
ideally, on both sides, make it on one side usually is enough.

Ashish Sharma

unread,
Nov 3, 2020, 10:03:34 PM11/3/20
to Jenkins Users
We have tried putting these on both side, but still facing same issue :( 

kuisathaverat

unread,
Nov 4, 2020, 5:06:09 AM11/4/20
to jenkins...@googlegroups.com
then the only thing you can do it is to enable the debug on the sshd server to see what happens with the connection (see https://en.wikibooks.org/wiki/OpenSSH/Logging_and_Troubleshooting), then open a regular ssh connection from the Jenkins instance to the Agent from command line and see what happens after 4 hours  

--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/LiRA5m-zEP4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/c8522ac5-64f0-458f-b853-ac874d2ef6b6n%40googlegroups.com.


--
Reply all
Reply to author
Forward
0 new messages