High CPU Usage + Deadlocks: resolved only by restart

ramsan...@gmail.com

unread,

Mar 23, 2018, 2:16:07 AM3/23/18

to vert.x

Hi,

We are using vertx as receive HTTP requests, process, and send the response to the clients. As part of the processing we hit mongoDB. Also, JCache on top of HazelCast is used to cache some of the entries with LRU eviction and a 5-minute TTL.

On one of the PROD deployments, we have seen the below on three separate occasions. It is quite important to note that there are two instances of the app under a load balancer, and on each occasion, both the instances showed the same symptoms at the same time.

High CPU usage
Very high response times, though the downstream systems are in good health
Thread dump shows a deadlock scenario involving the event loops, worker threads, Hazelcast threads, JCache threads and MongoDB threads.

below is the extract from the thread dump that shows the event loop/netty part:
"vert.x-eventloop-thread-3" #15 prio=5 os_prio=0 tid=0x00007f53d042f800 nid=0x53ad runnable [0x00007f540cad4000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000e06118b0> (a io.netty.channel.nio.SelectedSelectionKeySet)
    - locked <0x00000000e0613930> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000e0611818> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)

Basic analysis suggests that netty causes high CPU usage.

Any pointers on what might cause this issue, and how to avoid or recover from the deadlock without a restart would be helpful. Thanks.

Best Regards,
Ram

Julien Viet

unread,

Mar 23, 2018, 3:19:50 AM3/23/18

to ve...@googlegroups.com

can you provide all traces that form the deadlocks ?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/953d5030-506a-44f3-85c9-0094474a13a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Julien Viet

unread,

Mar 27, 2018, 12:32:38 PM3/27/18

to ve...@googlegroups.com

Hi,

I can't see any deadlock here.

Julien

On 26 Mar 2018, at 08:46, Mohamed Rilwan <rilw...@gmail.com> wrote:

Hi Julian,

Please find the below thread dump exceptions during the deadlock situations.

"cached4" #6002852 prio=5 os_prio=0 tid=0x00007f540810b000 nid=0x15f waiting on condition [0x00007f53829b1000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0a083e0> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)

"vert.x-eventloop-thread-3" #15 prio=5 os_prio=0 tid=0x00007f53d042f800 nid=0x53ad runnable [0x00007f540cad4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000e06118b0> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000e0613930> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000e0611818> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)

"cluster-ClusterId{value='5a6ff00b18a3ce5338c9fa58', description='null'}-eqvsdbprod01.amd.three.ie:27017" #125 daemon prio=5 os_prio=0 tid=0x00007f5390004800 nid=0x5394 waiting on condition [0x00007f5387efd000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0dd4258> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:237)
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:218)
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:167)
- locked <0x00000000e0dceca8> (a com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable)
at java.lang.Thread.run(Thread.java:745)

Thanks,
Mohamed Rilwan

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/f9f70fa9-ecbc-4b85-9588-f06a1b31a5c3%40googlegroups.com.

Manish Kumar

unread,

Mar 28, 2018, 10:54:41 AM3/28/18

to vert.x

Ram, what version of version you are using?

Mohamed Rilwan

unread,

Mar 29, 2018, 6:13:35 AM3/29/18

to vert.x

Hi Manish,

We are using 3.2.1

Thanks,

Rilwan

Manish Kumar

unread,

Mar 29, 2018, 2:59:56 PM3/29/18

to vert.x

Rilwan - As Julien mentioned that there no visible dead-lock from stacktrace you posted. You can use JMC (require license) to determine where your threads are spending most of time.

I never found thread-dump be too useful to determine these kind of problem until I use a good JVM profiler. We are also seeing High CPU usage in Vertx 3.5.1 but it seems to be an issue around netty - I am going to post a detailed question.

Manish Kumar

unread,

Mar 29, 2018, 3:01:19 PM3/29/18

to vert.x

I meant - JMC require license only if you use it in Prod. For development/QA/evaluation purposes it's free.

Daniel Campos

unread,

May 10, 2018, 12:21:31 PM5/10/18

to vert.x

Hi guys,

I'm using vert.x 3.5.1 and I'm also facing a high CPU usage. It seems it's related to a Connection reset by peer IOException.

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)

at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1108)

at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:345)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:126)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)

at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

at java.lang.Thread.run(Thread.java:745)

The problem is, it calls an exception handler hundreds of times. I can catch that exception via rc.request().connection.exceptionHandler(), but I can't do anything with the fact that it's being called that many times. Once that happens, CPU goes wild and I also need to restart.

Should I deal with this problem is another thread or it's okay to discuss it here?

Julien Viet

unread,

May 10, 2018, 2:53:26 PM5/10/18

to ve...@googlegroups.com

yes that shall be another thread for the sake of clarity.

thanks

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/78d18caf-e716-47d6-bc40-df7cc95685a0%40googlegroups.com.

ramsan...@gmail.com

unread,

Jul 27, 2018, 2:37:57 AM7/27/18

to vert.x

Hi All,

We found the root cause for the high CPU utilization. It was triggered by the default GC algorithm used by the JVM. By reconfiguring it to use G1GC the issue was resolved.

Thanks,

Ram

Julien Viet

unread,

Jul 27, 2018, 7:26:56 AM7/27/18

to ve...@googlegroups.com

hi,

thanks for letting us know.

Julien

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/ecf949c1-24ab-434f-82f2-7b807711c943%40googlegroups.com.

Pratik Parikh

unread,

Jul 27, 2018, 7:44:15 AM7/27/18

to ve...@googlegroups.com

Could you share your configuration change at GC level?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/ecf949c1-24ab-434f-82f2-7b807711c943%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.