Connection exceptionHandler called multiple times, CPU hits 100% shortly after that

177 views
Skip to first unread message

Daniel Campos

unread,
May 11, 2018, 10:30:03 AM5/11/18
to vert.x
Hi guys,

I'm using vert.x 3.5.1. I've built a streaming API in which I get events from a third-party lib and write the strings in the response for each request. This is part of my final handler:

rc.response().setChunked(true);
rc.response().write("{json of each event}");

We don't end the response at any time since it's a streaming API with keep-alive. When something happens with the client connection, we often get a Connection reset by peer:

2018-04-10 11:23:15 SEVERE [io.vertx.core.net.impl.ConnectionBase] Unhandled exception
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1108)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:345)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:126)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:745)

That's fine since I managed to catch that via the connection exceptionHandler:

rc.request().connection().exceptionHandler()

I'm probably writing more than my client can consume, the buffers will be queued -- I suppose -- but I can't pause or degrade the stream like this doc suggests: http://vertx.io/docs/vertx-core/java/#streams
The real problem comes next, that exceptionHandler -- or just the log.error("Unhandled exception", t) from ConnectionBase -- is called thousands of times, CPU spikes, threads start to block and eventually I get a OutOfMemory. Is there a way
I can stop that behaviour as soon as I catch the first exception? Maybe closing or cleaning up the buffer queue in the response. I've tried a couple of calls but none seemed to work.

I appreciate any help on this.



Julien Viet

unread,
May 11, 2018, 11:45:49 AM5/11/18
to ve...@googlegroups.com
Hi,

what do you mean by degrade ?

The real problem comes next, that exceptionHandler -- or just the log.error("Unhandled exception", t) from ConnectionBase -- is called thousands of times, CPU spikes, threads start to block and eventually I get a OutOfMemory. 
you should get only a single exception in the handler per exception, are you saying that for a single exception this handler is called multiple times ?

it is not clear what is happening
Is there a way
I can stop that behaviour as soon as I catch the first exception? Maybe closing or cleaning up the buffer queue in the response. I've tried a couple of calls but none seemed to work.

I appreciate any help on this.

we have released 3.5.2.CR1 today (should be on central), can you give a try ?





--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/6b515a71-e7c4-4abe-9e7f-b6038bd969d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Campos

unread,
May 11, 2018, 11:59:41 AM5/11/18
to vert.x
By degrade I mean I can't pause the stream e.g. ReadStream -> MessageConsumer until the writing queue is not full e.g. writeQueueFull(). I need to write in the response even if the client can't consume in time.

I'll try to be clearer. For each request, I start a third-party streaming lib and the result of each event I send back via response().write(). I guess I write more than the client can consume and those writes will be queued. When the client lost connection for any reason, I get a Connection reset by peer that can be caught by connection.exceptionHandler(). However, after some time streaming, if that happens, that exception handler is called thousands of times, maybe one for each queued write that couldn't be delivered. When that happens, CPU spikes with so many calls to that exception handler degrading the overall performance and causing the event loop to block.

I can try that newer version, but have you seen that behavior before?

Julien Viet

unread,
May 11, 2018, 12:24:26 PM5/11/18
to ve...@googlegroups.com
are you setting compression on your HTTP server ?

-- 
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Daniel Campos

unread,
May 11, 2018, 12:26:17 PM5/11/18
to ve...@googlegroups.com
I'm not. Plain simple vertx.createHttpServer().

Julien Viet

unread,
May 11, 2018, 12:28:39 PM5/11/18
to ve...@googlegroups.com
I'm suspecting you are experiencing this issue


it has been reported with server compression supported but it might happen in other conditions.

do you mind trying with 3.5.2.CR1 ?

Daniel Campos

unread,
May 11, 2018, 12:30:00 PM5/11/18
to ve...@googlegroups.com
I'll try 3.5.2.CR1 and I let you know. Thanks!

Julien Viet

unread,
May 11, 2018, 12:32:13 PM5/11/18
to ve...@googlegroups.com
thanks

let's see how it goes.

Julien

Julien Viet

unread,
May 11, 2018, 12:42:01 PM5/11/18
to ve...@googlegroups.com
I just tried with a plain server and something similar to your case and I've seen many exceptions reported like you describe.

with 3.5.2.CR1 I only see two exceptions (the close exceptions).


so I've good hope it will solve your issue.

Julien

Daniel Campos

unread,
May 11, 2018, 12:50:13 PM5/11/18
to ve...@googlegroups.com
Yeah, that seems promising, thanks! I'll deploy in prod later today and I let you know.

Daniel Campos

unread,
Jun 6, 2018, 3:06:22 PM6/6/18
to ve...@googlegroups.com
As a final report, this seems to be working fine with 3.5.2.CR3. The final 3.5.2 version should be released this week, correct? Just waiting for that to put my app in production.

Thanks Julien! 

Julien Viet

unread,
Jun 6, 2018, 3:25:48 PM6/6/18
to ve...@googlegroups.com
Hi Daniel,

it's actually in central but hasn't been yet announced because Homegrew is dragging feet.

Julien

Manish Kumar

unread,
Jun 8, 2018, 10:46:17 AM6/8/18
to vert.x
@daniel - Whenever you are satisfied with your Prod results, can you please post back here? We had seen high CPU with 3.5.1 (https://groups.google.com/forum/#!searchin/vertx/CPU%7Csort:date/vertx/FOFCDs4EZDc/dYnXtILJCQAJ) and had to rollback to 3.5.0.

Our product is deployed in customer environment, so it's difficult for us to upgrade and roolback if any issue is seen - I assume you own your Prod environment. Thanks a lot for your feedback!!!
Reply all
Reply to author
Forward
0 new messages