Vertx App High CPU

685 views
Skip to first unread message

Jody Schering

unread,
Aug 9, 2016, 2:41:49 PM8/9/16
to vert.x
Hi everyone, we recently started using vertx and built a small web application. The application uses the vertx web module and the asynchronous jdbc module. We're on vertx 3.3.2 and we have 2 verticles 1 accepting rest web service calls and executing database requests and another that performs remote commands (it executes scripts on a server to perform actions on different servers) the verticles communicate via the message bus. We also set the verticles to have an instance count of 4.  After we released we started to see the cpu usage continually go up on the servers after about a week, Eventually if you don't stop the application and restart it it will lock up using all available CPU. Unfortunately I'm not able to reproduce this issue in development so I'm trying to capture what's causing this using jstack and SJK (Swiss Java Knife) https://github.com/aragozin/jvm-tools to get thread dumps and analyze the data. So far I don't see what's wrong, so I'm posting some of the findings in hopes someone with more Vertx experience will be able to help point me in the correct direction. So far the only issue I'm seeing in the logs that may be related to the high cpu is the following message with no stack trace.

Aug 09, 2016 7:53:27 AM [SEVERE] io.vertx.core.net.impl.ConnectionBase: java.io.IOException: Connection reset by peer 

I'm seeing this message show up frequently and it appears the more this message shows up the higher the CPU usage gets. Though I'm not sure where this is coming from and what's causing it.


I've attached a zip file containing top output, flame svg graph, histogram output, thread info and the thread dump from 2 vertx processes.

Below shows how the vertx process is using 100% of 2 CPUs. If we don't restart it, it will use all CPU on the server and stop responding.

top - 11:02:04 up 62 days, 23:15, 3 users, load average: 2.10, 2.07, 2.01
Tasks: 306 total, 1 running, 305 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 37.9%us, 62.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 37.7%us, 62.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8185644k total, 4204808k used, 3980836k free, 477508k buffers
Swap: 2097148k total, 0k used, 2097148k free, 1479352k cached


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52942 webp 20 0 5761m 977m 15m S 199.7 12.2 3837:51 java                                        


Any help would be greatly appreciated.

Thank you.
vertx_jvm_trace.zip

Jody Schering

unread,
Aug 9, 2016, 6:31:52 PM8/9/16
to vert.x
So after running some more diagnostics I'm able to determine it is this thread in the thread dump causing the high cpu usage. After running ttop it lists the threads:

2016-08-09T14:40:08.820-0800 Process summary 
  process cpu=104.76%
  application cpu=101.47% (user=38.49% sys=62.99%)
  other: cpu=3.28% 
  GC cpu=0.00% (young=0.00%, old=0.00%)
  heap allocation rate 43mb/s
  safe point rate: 0.4 (events/s) avg. safe point pause: 0.59ms
  safe point sync time: 0.00% processing time: 0.02% (wallclock time)
[002050] user=37.36% sys=62.25% alloc=   42mb/s - NioProcessor-3
[002605] user= 1.13% sys= 0.47% alloc= 1228kb/s - RMI TCP Connection(2)-xxx.xx.xxx.xx
[000016] user= 0.00% sys= 0.07% alloc=  1093b/s - scheduled
[002604] user= 0.00% sys= 0.06% alloc=   14kb/s - cached1
[002606] user= 0.00% sys= 0.04% alloc=  4212b/s - JMX server connection timeout 2606
[000042] user= 0.00% sys= 0.01% alloc=   488b/s - hz._hzInstance_1_dev.MulticastThread
[000014] user= 0.00% sys= 0.01% alloc=    99b/s - pool-1-thread-1
[000049] user= 0.00% sys= 0.01% alloc=   318b/s - vert.x-eventloop-thread-0
[000032] user= 0.00% sys= 0.01% alloc=    30b/s - hz._hzInstance_1_dev.migration
[000037] user= 0.00% sys= 0.00% alloc=     0b/s - hz._hzInstance_1_dev.IO.thread-out-0
[000010] user= 0.00% sys= 0.00% alloc=    45b/s - vertx-blocked-thread-checker


Thread [2050] RUNNABLE at 2016-08-09T17:16:20.879 - NioProcessor-3
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:97)
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1080)
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

The same result is shown on the other server but since the CPU is 200% it shows 2 NioProcessor taking up all the CPU.

2016-08-09T14:51:01.871-0800 Process summary 
  process cpu=203.83%
  application cpu=199.96% (user=77.40% sys=122.56%)
  other: cpu=3.87% 
  GC cpu=0.00% (young=0.00%, old=0.00%)
  heap allocation rate 83mb/s
  safe point rate: 0.2 (events/s) avg. safe point pause: 1.83ms
  safe point sync time: 0.01% processing time: 0.02% (wallclock time)
[000296] user=38.89% sys=60.55% alloc=   41mb/s - NioProcessor-2
[000306] user=37.38% sys=61.60% alloc=   41mb/s - NioProcessor-3
[000390] user= 0.94% sys= 0.43% alloc= 1169kb/s - RMI TCP Connection(2)-xxx.xx.xxx.xx
[000016] user= 0.00% sys= 0.05% alloc=  1133b/s - scheduled
[000388] user= 0.00% sys= 0.03% alloc=   516b/s - cached3
[000391] user= 0.00% sys= 0.02% alloc=  3787b/s - JMX server connection timeout 391
[000389] user= 0.00% sys= 0.02% alloc=   255b/s - cached1
[000042] user= 0.19% sys=-0.18% alloc=   488b/s - hz._hzInstance_1_dev.MulticastThread
[000025] user= 0.00% sys= 0.00% alloc=     0b/s - hz._hzInstance_1_dev.SlowOperationDetectorThread
[000014] user= 0.00% sys= 0.00% alloc=    82b/s - pool-1-thread-1
[000031] user= 0.00% sys= 0.00% alloc=    36b/s - hz._hzInstance_1_dev.wait-notify
[000017] user= 0.00% sys= 0.00% alloc=     0b/s - hz._hzInstance_1_dev.InvocationMonitorThread
[000092] user= 0.00% sys= 0.00% alloc=     0b/s - vert.x-eventloop-thread-6
[000034] user= 0.00% sys= 0.00% alloc=    87b/s - hz._hzInstance_1_dev.IO.thread-in-0
[000074] user= 0.00% sys= 0.00% alloc=     0b/s - vert.x-eventloop-thread-3
[000010] user= 0.00% sys= 0.00% alloc=     0b/s - vertx-blocked-thread-checker
[000094] user= 0.00% sys= 0.00% alloc=     0b/s - vert.x-eventloop-thread-7
[000060] user= 0.00% sys= 0.00% alloc=     0b/s - vert.x-eventloop-thread-2


Thread [296] RUNNABLE at 2016-08-09T17:20:07.130 - NioProcessor-2
sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:193)
org.apache.mina.transport.socket.nio.NioProcessor.wakeup(NioProcessor.java:113)
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1172)
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

Thread [306] RUNNABLE at 2016-08-09T17:20:07.130 - NioProcessor-3
sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:193)
org.apache.mina.transport.socket.nio.NioProcessor.wakeup(NioProcessor.java:113)
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1172)
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

This appears to be caused by bug https://issues.apache.org/jira/browse/DIRMINA-1006 found in mina 2.0.9 and not fixed until 2.0.12, my version is 2.0.10

Nat

unread,
Aug 9, 2016, 6:55:09 PM8/9/16
to vert.x
That's a MINA problem. Vert.x does not use MINA underneath.

Jody Schering

unread,
Aug 10, 2016, 12:02:54 PM8/10/16
to vert.x
Yeah, I know, that's why I posted the link to the mina jira issue. I originally didn't know where the issue was and was looking for suggestions. Thanks for the reply.

nickscodes

unread,
Aug 10, 2016, 5:16:49 PM8/10/16
to vert.x
Do you have an app with both Mina and Netty (in vertx core) doing the same thing? This seems unusual :)  

Jody Schering

unread,
Aug 10, 2016, 6:13:02 PM8/10/16
to vert.x
Well,  I don't include Mina directly but it is included in the apache LDAP api (http://directory.apache.org/api/). Which I'm using to connect to LDAP. If you have an alternative I'd consider removing it for something better.I originally tried to use the vertx shiro auth module(http://vertx.io/docs/vertx-auth-shiro/java/), but its documentation is lacking and I wasn't able to get it working, in a reasonable amount of time. 
Reply all
Reply to author
Forward
0 new messages