[JIRA] (JENKINS-56581) org.jenkinsci.remoting.protocol.IOHub takes 100% CPU

26 views
Skip to first unread message

joerg.rene.sautter@gmail.com (JIRA)

unread,
Mar 15, 2019, 6:57:06 AM3/15/19
to jenkinsc...@googlegroups.com
Jörg Sautter created an issue
 
Jenkins / Bug JENKINS-56581
org.jenkinsci.remoting.protocol.IOHub takes 100% CPU
Issue Type: Bug Bug
Assignee: Unassigned
Attachments: cpu_time_per_thread.png, thread_cpu_samples.png
Components: _unsorted
Created: 2019-03-15 10:56
Environment: v2.150.2, Windows Server 2012 R2, x64
Labels: performance
Priority: Major Major
Reporter: Jörg Sautter

Our 4 Jenkins installations (all on the same HW) are running at 100% CPU after some time, until they get restarted. Sometimes all of them or only some of them are affected by this.

I had a closer look at one installation and found out IOHub is running out of control. I added a logger to the class and it prints dozens of "0 scheduled tasks to process" per secound, nothing else.

See the attached screenshots of VisualVM for details, the jenkis was idle at the point of record. 
I could provide more infos if someone tells what is needed.

 

Kind regards

Jörg

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 17, 2019, 1:16:02 PM3/17/19
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 17, 2019, 1:16:04 PM3/17/19
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
Change By: Oleg Nenashev
Component/s: remoting
Component/s: _unsorted

jthompson@cloudbees.com (JIRA)

unread,
Apr 25, 2019, 2:05:03 PM4/25/19
to jenkinsc...@googlegroups.com
Jeff Thompson commented on Bug JENKINS-56581
 
Re: org.jenkinsci.remoting.protocol.IOHub takes 100% CPU

That's an odd one. Nothing obvious comes to mind.

On my system, I see one instance of "0 scheduled tasks to process" occur per second. That seems like a reasonable rate and shouldn't impact system load significantly. Can you check your log messages to verify the rate and provide more information about what the log messages are showing? How fast is that check occurring?

I'm not entirely sure what your VisualVM images are showing. They seem to be showing that the thread is spending a fair bit of time waiting on the `select()` but not that it's spending a lot of CPU time.

More information would certainly be needed to diagnose any issues here. Unfortunately, I don't have any additional specific requests for info or ideas what would be useful. Maybe you could collect more information about what the system and Jenkins are doing and try to correlate that with CPU usage.

joerg.rene.sautter@gmail.com (JIRA)

unread,
May 21, 2019, 3:40:02 AM5/21/19
to jenkinsc...@googlegroups.com
Jörg Sautter updated an issue
Change By: Jörg Sautter
Attachment: jenkins_logs.png

joerg.rene.sautter@gmail.com (JIRA)

unread,
May 21, 2019, 3:41:01 AM5/21/19
to jenkinsc...@googlegroups.com

today the issue occurred again and i could confirm the 100+ "0 scheduled tasks to process" per secound.

See the attached jenkins_logs.png screenshot and the copyed text in jenkins_logs.txt

I will try to get more informations what is going on on the system. 

joerg.rene.sautter@gmail.com (JIRA)

unread,
May 21, 2019, 7:59:02 AM5/21/19
to jenkinsc...@googlegroups.com

I did not find any unusual I/O going on while the CPU was at 100%, we now updated Jenkins and the JDK.

Jenkins from v2.150.2 to v2.164.3, JDK from Oracle jdk1.8.0_152 to openjdk-11.0.1.

If this issue still occurs i will write an comment in this ticket.

jthompson@cloudbees.com (JIRA)

unread,
Jun 12, 2019, 12:31:02 PM6/12/19
to jenkinsc...@googlegroups.com

That is a surprising amount of messages, all with the same timestamp. I still have no idea what's going on with it, though. If you can gather any further information and diagnosis, please share it.

joerg.rene.sautter@gmail.com (JIRA)

unread,
Jun 13, 2019, 2:56:01 AM6/13/19
to jenkinsc...@googlegroups.com

We had this issue again and after some research in the windows event log, i think this might be the result of:

https://kb.vmware.com/s/article/2006849

The described events are shown in the system log at the time when the CPU goes up to 100%.

We try to get this fixed and i will write an update to this ticket later on.

jthompson@cloudbees.com (JIRA)

unread,
Jun 13, 2019, 12:43:02 PM6/13/19
to jenkinsc...@googlegroups.com

Interesting. I'm glad you were able to track down some connected information. Please share when you have further information.

joerg.rene.sautter@gmail.com (JIRA)

unread,
Jul 24, 2019, 11:37:03 AM7/24/19
to jenkinsc...@googlegroups.com

Hi,

a small update on this issue, we change the type the disk is mounted in the VM. This made everything even worse, so we decided to build the remoting jar of jenkins and modfy the logging.

It seems like the selector.select() method doesn't block, adding a timeout did not change the behaviour. The call returns after ~3000ns with 0 items, no call to selector.wakeup is done and the selector thread is not interrupted at this point. I don't understand why the select call returns here ...

We now set the priority of the selectorThread to Thread.MIN_PRIORITY and call Thread.yield if wantSelectNow is fals and selected is 0.
Lets see if this helps ...

 

joerg.rene.sautter@gmail.com (JIRA)

unread,
Jul 25, 2019, 2:41:03 AM7/25/19
to jenkinsc...@googlegroups.com

I think this https://stackoverflow.com/questions/9939989/java-nio-selector-select-returns-0-although-channels-are-ready is the reason why the IOHub is taking all of our CPU power.

The current implementation relies on the returned number of ready channels, but this is not the total number of ready channel. It is only the number of channels gone ready while the select is performed. 

We now changed the impementation from checking the returned number to keys.isEmpty() and evaluate this on our Jenkins installations.

I will keep you updated and will do a PR in somedays, if the issue is fixed by this change.

joerg.rene.sautter@gmail.com (JIRA)

unread,
Jul 25, 2019, 3:04:03 AM7/25/19
to jenkinsc...@googlegroups.com
Jörg Sautter edited a comment on Bug JENKINS-56581
I think this [https://stackoverflow.com/questions/9939989/java-nio-selector-select-returns-0-although-channels-are-ready] is the reason why the IOHub is taking all of our CPU power.

The current implementation relies on the returned number of ready channels, but this is not the total number of ready channel. It is only the number of channels gone ready while the select is performed. 

We now changed the impementation from checking the returned number to keys.isEmpty() and evaluate this on our Jenkins installations.

I will keep you updated and will do a PR in somedays some days , if the issue is fixed by this change.

jthompson@cloudbees.com (JIRA)

unread,
Jul 25, 2019, 4:35:01 PM7/25/19
to jenkinsc...@googlegroups.com

Interesting. It would be great if this resolves the issues you are seeing.

gunter@grodotzki.co.za (JIRA)

unread,
Jul 26, 2019, 11:26:04 AM7/26/19
to jenkinsc...@googlegroups.com

joerg.rene.sautter@gmail.com (JIRA)

unread,
Jul 30, 2019, 2:39:03 AM7/30/19
to jenkinsc...@googlegroups.com

Hopefully the last update to this issue:

After implementing an workaround to the not blocking selector (basically adding yield and sleep statements), another thread was consuming 100% CPU.
It was a jetty thread with exactly the same issue like the Jenkins thread before.

There is a jetty issue about this and a comment with the hint to update the jdk to 11.0.2, since the update none of Jenkins installations had the issue again.

Thanks for your help!

Kind regards

Jörg

jthompson@cloudbees.com (JIRA)

unread,
Jul 30, 2019, 2:01:03 PM7/30/19
to jenkinsc...@googlegroups.com

Jörg Sautter, so updating to Java 11 resolved this for you? That's great information. Any reason not to close this information, then?

Reply all
Reply to author
Forward
0 new messages