[JIRA] (JENKINS-58573) 100% CPU remoting.jar or slave.jar on EC2 (connection refused)

17 views
Skip to first unread message

gunter@grodotzki.co.za (JIRA)

unread,
Jul 19, 2019, 11:13:02 AM7/19/19
to jenkinsc...@googlegroups.com
Gunter Grodotzki created an issue
 
Jenkins / Bug JENKINS-58573
100% CPU remoting.jar or slave.jar on EC2 (connection refused)
Issue Type: Bug Bug
Assignee: FABRIZIO MANFREDI
Components: ec2-plugin, remoting
Created: 2019-07-19 15:12
Environment: Jenkins 2.176.2
ec2 1.44.1
Priority: Blocker Blocker
Reporter: Gunter Grodotzki

Jenkins EC2 nodes are constantly crashing with 100% cpu usage:

 

java.util.concurrent.TimeoutException: Ping started at 1563548484233 hasn't completed by 1563548724233
{{ at hudson.remoting.PingThread.ping(PingThread.java:134)}}
{{ at hudson.remoting.PingThread.run(PingThread.java:90)}}

 

I tried both using "native ssh" and via jenkins-ssh and both have the same issue. It looks like the remoting.jar is hung up:

 

JvmTop 0.8.0 alpha - 14:58:59, amd64, 4 cpus, Linux 4.9.0-9-a, load avg 7.89
{{ http://code.google.com/p/jvmtop}}PID MAIN-CLASS HPCUR HPMAX NHCUR NHMAX CPU GC VM USERNAME #T DL
{{ 4093 m.jvmtop.JvmTop 21m 1698m 18m n/a 0.50% 0.00% O8U21 root 12 }}
{{ 3973 remoting.jar [ERROR: Connection refused/access denied] }}
{{ 6406 remoting.jar [ERROR: Connection refused/access denied] }}

 

Not sure how to further debug this.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

gunter@grodotzki.co.za (JIRA)

unread,
Jul 19, 2019, 11:14:02 AM7/19/19
to jenkinsc...@googlegroups.com
Gunter Grodotzki updated an issue
Change By: Gunter Grodotzki
Jenkins EC2 nodes are constantly crashing with 100% cpu usage:

 

{ { noformat}
java.util.concurrent.TimeoutException: Ping started at 1563548484233 hasn't completed by 1563548724233 }}
{{   at hudson.remoting.PingThread.ping(PingThread.java:134) }}
{{   at hudson.remoting.PingThread.run(PingThread.java:90) {noformat } }

 

I tried both using "native ssh" and via jenkins-ssh and both have the same issue. It looks like the remoting.jar is hung up:

 

 
{ { noformat}
JvmTop 0.8.0 alpha - 14:58:59, amd64, 4 cpus, Linux 4.9.0-9-a, load avg 7.89 }}
{{ http://code.google.com/p/jvmtop }}{{
PID MAIN-CLASS HPCUR HPMAX NHCUR NHMAX CPU GC VM USERNAME #T DL }}
{{ 4093 m.jvmtop.JvmTop 21m 1698m 18m n/a 0.50% 0.00% O8U21 root 12 }}
{{

3973 remoting.jar [ERROR: Connection refused/access denied] }}
{{

6406 remoting.jar [ERROR: Connection refused/access denied]

{noformat
} }
 

 

Not sure how to further debug this.

raihaan.shouhell@autodesk.com (JIRA)

unread,
Jul 22, 2019, 12:00:02 PM7/22/19
to jenkinsc...@googlegroups.com
Raihaan Shouhell commented on Bug JENKINS-58573
 
Re: 100% CPU remoting.jar or slave.jar on EC2 (connection refused)

What version of java? I'm not sure if anyone from remoting can chime in on what makes the process spike to 100%

gunter@grodotzki.co.za (JIRA)

unread,
Jul 22, 2019, 2:47:04 PM7/22/19
to jenkinsc...@googlegroups.com
$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1~deb9u1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

Even on a relatively strong machine (EC2 c5.2xlarge) it is happening. I am only connecting via internal IPs, so there is no firewall in between.

gunter@grodotzki.co.za (JIRA)

unread,
Jul 22, 2019, 2:47:04 PM7/22/19
to jenkinsc...@googlegroups.com
Gunter Grodotzki updated an issue
Change By: Gunter Grodotzki
Attachment: Screenshot from 2019-07-22 20-43-32.png

gunter@grodotzki.co.za (JIRA)

unread,
Jul 23, 2019, 12:04:02 PM7/23/19
to jenkinsc...@googlegroups.com

gunter@grodotzki.co.za (JIRA)

unread,
Jul 23, 2019, 12:07:02 PM7/23/19
to jenkinsc...@googlegroups.com
 
Re: 100% CPU remoting.jar or slave.jar on EC2 (connection refused)

Removing `ec2-plugin` as possible culprit. I launched a EC2 instance (same config) and manually attached it as a permanent node. Eventually it started crashing again.

 

I am seeing the following in the logs:

 

jenkins-slave.2.log:2019-07-23T13:16:10.220+0000 WARNING hudson.Proc$LocalProc join: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information
jenkins-slave.2.log:2019-07-23T13:28:54.652+0000 WARNING hudson.Proc$LocalProc join: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information

But while not ideal that should not cause remoting.jar to crash completely?

 

I will try a slightly different setup with Debian 10 and OpenJDK11 to eliminate os issues.

 

gunter@grodotzki.co.za (JIRA)

unread,
Jul 26, 2019, 5:45:02 AM7/26/19
to jenkinsc...@googlegroups.com

Switched over to Ubuntu 18.04 with OpenJDK 11. Still need to do longer tests, seems a bit better but every now and then the remoting.jar will spike at 100% cpu and go down again - without any builds running. So the load never goes down to 0 - even though nothing is building.

I am curious if this has something to do with the jenkins-master running behind cloudflare. But the nodes are connecting via the internal IP to the master, so this should not be an issue?

gunter@grodotzki.co.za (JIRA)

unread,
Jul 26, 2019, 11:20:02 AM7/26/19
to jenkinsc...@googlegroups.com
Gunter Grodotzki edited a comment on Bug JENKINS-58573
Switched over to Ubuntu 18.04 with OpenJDK 11. Still need to do longer tests, seems a bit better but every now and then the remoting.jar will spike at 100% cpu and go down again - without any builds running. So the load never goes down to 0 - even though nothing is building.

I am curious if this has something to do with the jenkins-master running behind cloudflare. But the nodes are connecting via the internal IP to the master, so this should not be an issue?


 

Update: after some time it still gradually increases load. It seems also to affect the master analogously.

fabrizio.manfredi@gmail.com (JIRA)

unread,
Aug 10, 2019, 3:19:03 PM8/10/19
to jenkinsc...@googlegroups.com

can you dump make a flight recording ? or a memdump of the master to check in the status of the master? 

Are the Master and slave  with the same jdk ? 

Is it happen only with ec2 ? 

gunter@grodotzki.co.za (JIRA)

unread,
Aug 10, 2019, 3:53:02 PM8/10/19
to jenkinsc...@googlegroups.com

If you can give me documentation on how to do this that would be great

 

I am running master off docker jenkins/jenkins:lts-slim - but I am guessing the CPU issues with master are only a symptom. I haven't tried it with something non EC2. But given that the issue is on Ubuntu + Debian AMIs its probably not EC2.

 

It would be really great if remoting.jar could have better support for newrelic so I can see 100% on what is consuming 100% cpu. Will be super easy to figure out whats going on. Right now there is no helpful data.

Reply all
Reply to author
Forward
0 new messages