Issues concerning "connection reset by peers" and "Too many open files"

3,787 views
Skip to first unread message

赵普明

unread,
Nov 29, 2012, 8:26:26 AM11/29/12
to ve...@googlegroups.com
Hi :

We have a web service written in vert.x and now it's in beta test phase. Today we got our first real world data flowing into tje server, but unfortunately it began to show a lot of "Connection Reset by peers" exception
and later was hogged into nearly frozen state, complaining about "Too many open files". and after checking netstat, we found that the server seems to be holding many many connections like this (The ip of the server here is fake):

>   tcp        0      0 ::ffff:118.78.182.38:80     ::ffff:171.120.97.95:35669  ESTABLISHED

and the opened connection has reached to the max-file limit.

I don't understand where those "Connection reset by peer" comes from.

Our server is serving ads on a website, and all ad request comes from browsers over the Internet. Today we see roughly 10000 page view per minute, which should reasonably low. But there still seems to be a LOT of "Connection Reset by peers" exception.
I was having the impression that vert.x/NIO can serve 10000 request per SECOND without any problem. Maybe the problem here is that all the request coming from browsers needs a different connection? Is NIO OK with that?

Our typical response time is 100~200ms

So here are two questions I'd like to know:


1. Do the server automatically close a connection when it hits a "Connection reset by peer"? If not, how do we close it in the code?

2. Can anyone shed some light on why we've got so many peer resets? How can we avoid that? How can we scale up?

3. Does this have anything to do with keep-alive? Is a connection in vert.x Keep-alive by default? if it is, does that mean we need to manually close a connection after sending a response? (All our requests are short connections coming from browsers--they are ads---so keep-alive is meaningless to us),

I really hope this problem would be solved or else our service can not go online and the whole project would fail ...

Thank you very much.

Best Regards

Puming.
Or can we config the server not to keep-alive?

赵普明

unread,
Nov 29, 2012, 9:47:03 AM11/29/12
to ve...@googlegroups.com
Update: I've configured to server.setTCPKeepAlive(false), and the results are similar. Still lots of ESTABLISHED connections that won't go away. Only very few of them (a few hundred) are changed to
LAST_ACK and no closed ones



在 2012年11月29日星期四UTC+8下午9时26分26秒,赵普明写道:

Tim Fox

unread,
Nov 29, 2012, 11:19:05 AM11/29/12
to ve...@googlegroups.com
What version of Vert.x are you running?

Also, can you please post a log of the exceptions you're getting?

Other things to look at:

1) What have you set max file handles on the server too?

2) Take a look at http://vertx.io/manual.html#performance-tuning

3) What OS are you running?
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/MdWN7qD1CLMJ.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.


--
Tim Fox

Vert.x - effortless polyglot asynchronous application development
http://vertx.io
twitter:@timfox

Tim Fox

unread,
Nov 29, 2012, 11:22:31 AM11/29/12
to ve...@googlegroups.com
On 29/11/12 14:47, 锟斤拷锟斤拷锟斤拷 wrote:
> Update: I've configured to server.setTCPKeepAlive(false),

TCP keep alive is not related to HTTP keep alive

> and the results are similar. Still lots of ESTABLISHED connections
> that won't go away. Only very few of them (a few hundred) are changed to
> LAST_ACK and no closed ones
>
>
>
> 锟斤拷 2012锟斤拷11锟斤拷29锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷9时26锟斤拷26锟诫,锟斤拷锟斤拷锟斤拷写锟斤拷锟斤拷
>
> Hi :
>
> We have a web service written in vert.x and now it's in beta test
> phase. Today we got our first real world data flowing into tje
> server, but unfortunately it began to show a lot of "Connection
> Reset by peers" exception
> and later was hogged into nearly frozen state, complaining about
> "Too many open files". and after checking netstat, we found that
> the server seems to be holding many many connections like this
> (The ip of the server here is fake):
>
> > tcp 0 0 ::ffff:118.78.182.38:80 <http://118.78.182.38:80>
> ::ffff:171.120.97.95:35669 <http://171.120.97.95:35669> ESTABLISHED
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/TSyqUtBoQMoJ.

赵普明

unread,
Nov 29, 2012, 8:34:40 PM11/29/12
to ve...@googlegroups.com
I've changed server settings to :

server.setTCPKeepAlive(true);
server.setResuseAddress(true);

And now it seems good. After a night, the ESTABLISHED connections (file descriptors) are consistent arount 55000 now.

But i don't know what this number would become when the traffic goes up (we are currently running at about 8000 request/minute, with 2 ad placements).

As I suppressed the exception "Connection reset by peers" by modifiying vert.x-core code, now we don't know whether those exceptions are still occuring. I'll report on that later.

But I still don't know what is actually going on. In my guess,  there should only be several hundred connections actually active, but the number is more than 50000.

Do you have any suggestion?

Best Regards

Puming

在 2012年11月30日星期五UTC+8上午12时19分05秒,Tim Fox写道:
What version of Vert.x are you running?
 
1.3.0-final


Also, can you please post a log of the exceptions you're getting?
 
All exceptions are like this above

java.io.IOException: Connection reset by peer
        at sun
.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun
.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun
.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
        at sun
.nio.ch.IOUtil.read(IOUtil.java:186)
        at sun
.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
        at org
.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
        at org
.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
        at org
.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
        at org
.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
        at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java
.lang.Thread.run(Thread.java:722)



 

1) What have you set max file handles on the server too?

800000

Thanks

3) What OS are you running?

CentOS 5.3
 

stream

unread,
Nov 29, 2012, 9:55:41 PM11/29/12
to ve...@googlegroups.com

check the setting of ulimit if your project over the linux.



--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/TSyqUtBoQMoJ.

赵普明

unread,
Nov 29, 2012, 10:11:30 PM11/29/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8上午10时55分41秒,Stream.liu写道:

check the setting of ulimit if your project over the linux.


core file size          (blocks, -c) unlimited
data seg size          
(kbytes, -d) unlimited
scheduling priority            
(-e) 0
file size              
(blocks, -f) unlimited
pending signals                
(-i) 1589248
max locked memory      
(kbytes, -l) 32
max memory size        
(kbytes, -m) unlimited
open files                      
(-n) 100000
pipe size            
(512 bytes, -p) 8
POSIX message queues    
(bytes, -q) 819200
real
-time priority              (-r) 0
stack size              
(kbytes, -s) 10240
cpu time              
(seconds, -t) unlimited
max user processes              
(-u) 1589248
virtual memory          (kbytes, -v) unlimited
file locks                      
(-x) unlimited


open-file is already 800000,the problem is that our server keeps too many connections ESTABLISHED, and that number is increasing steadily this morning after the traffic goes up (we are in China and it's now 11 am).

I guess we somehow didn't manage to close a connection when the response is sent back to browser.

I don't know how can I do that. There is a request.response.close() method, but I can't call it because i don't know when the response is sent. According to Javadoc, response.closeHandler() is called when the connection is closed BEFORE the response is send, so that would not help. I really hope that there is a response.endHandler() so that i can call response.close() at the right time. Or it would be even better that vert.x handles this automatically.

I think the problem is related to the "connection reset by peers" we've been receiving. which is weird.


赵普明

unread,
Nov 29, 2012, 10:21:57 PM11/29/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8上午12时22分31秒,Tim Fox写道:
On 29/11/12 14:47, 锟斤拷锟斤拷锟斤拷 wrote:
> Update: I've configured to server.setTCPKeepAlive(false),

TCP keep alive is not related to HTTP keep alive

Thanks for the tip. I was confused by these two. will do a little dig up to learn about them.

And for our senario(typical massive small and short-lived connections), what we want is this:

1. Browsers requests come in and establish a connection
2. about 100ms later the response is sent back
3. we want this connection to immediately be released or reused (possible?) by other browser requests

If this is done, and we have 10000 request per minute, we expect several hundred or thousand connections alive (ESTABLISHED) at a certain instant.

Can we configure vert.x to do that? Or can we close a connection manually after a response is complete? response.close() is not enough here because I can't get WHEN the response is complete. There is no endHandler() in HttpServerResponse




stream

unread,
Nov 29, 2012, 10:26:49 PM11/29/12
to ve...@googlegroups.com

The method of close in HttpServerResponse is closed the connection of TCP.
So, i suggest you set the KeepAlive for false.

i have look into the code of DefaultHttpServerResponse in which the line of 193 have make a close for channel.


public void end() {

    checkWritten();
    writeHead();
    if (chunked) {
      if (trailers == null) {
        HttpChunk nettyChunk = new DefaultHttpChunk(ChannelBuffers.EMPTY_BUFFER);
        channelFuture = conn.write(nettyChunk);
      } else {
        DefaultHttpChunkTrailer trlrs = new DefaultHttpChunkTrailer();
        for (Map.Entry<String, Object> trailer: trailers.entrySet()) {
          trlrs.addHeader(trailer.getKey(), trailer.getValue());
        }
        channelFuture = conn.write(trlrs);
      }
    }

    if (!keepAlive) {
      closeConnAfterWrite();
    }

    written = true;
    conn.responseComplete();
  }



you can make a demo test it.



On 2012-11-30, at 上午11:11, 赵普明 <zhaop...@gmail.com> wrote:

00

stream

unread,
Nov 29, 2012, 10:47:26 PM11/29/12
to ve...@googlegroups.com

Hi guy, your senario is like my.
i have not yet testing our project. 
i don't set KeepLive in TCP,  but set the header for Http [ Connection: Close].  if you wanna close the response from browers




--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/nCMxCOLc4UUJ.

赵普明

unread,
Nov 29, 2012, 11:25:56 PM11/29/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8上午11时26分49秒,Stream.liu写道:
Thanks for the tip bro. This is what we expected for end().
This means that somehow closeConnAfterWrite() was not (successfully) called, as we saw worse outcome when server.setTCPKeepAlive(false). I'll check that later.


Tim Fox

unread,
Nov 30, 2012, 3:50:57 AM11/30/12
to ve...@googlegroups.com
Are most of the connections in TIME_WAIT state?

If so, it would be normal to see a lot of them.

The way TCP works is that even after you've closed TCP connections the OS will keep them open a while longer (default is 2 minutes) to catch any stray packets that might arrive.

Setting TCP reuse address allows the server to reuse one of these addresses - which is why it probably helps you.

Also you can reduce the timeout at the OS level.

So... maths time. If you are getting 8000 connections / minute, then you should expect to see an average of 16000 connections at steady state.

The first thing I would do is add some logging to make sure that Vert.x is _actually_ closing the connection - just log out the call to channel.close in Vert.x and keep a count of connections in an AtomicLong (or whatever).

Once you've verified that Vert.x is actually closing connections properly, then it's probably just a matter of configuring your OS appropriately.

Also... as I mentioned before, you should also increase your accept backlog syn queue as specified in the performance chapter, if you haven't done so already or you might get refused connections at peak.

Regarding the "connection reset by peer" exceptions on the server - this is normal - you get this when the other side of the connection (in this case the browser) closes it.


On 30/11/2012 01:34, 锟斤拷锟斤拷锟斤拷 wrote:
I've changed server settings to :

server.setTCPKeepAlive(true);
server.setResuseAddress(true);

And now it seems good. After a night, the ESTABLISHED connections (file descriptors) are consistent arount 55000 now.

But i don't know what this number would become when the traffic goes up (we are currently running at about 8000 request/minute, with 2 ad placements).

As I suppressed the exception "Connection reset by peers" by modifiying vert.x-core code, now we don't know whether those exceptions are still occuring. I'll report on that later.

But I still don't know what is actually going on. In my guess,  there should only be several hundred connections actually active, but the number is more than 50000.

Do you have any suggestion?

Best Regards

Puming

锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷12时19锟斤拷05锟诫,Tim Fox写锟斤拷锟斤拷
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ.

赵普明

unread,
Nov 30, 2012, 4:47:39 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午4时50分57秒,Tim Fox写道:
Are most of the connections in TIME_WAIT state?

No, most of them are ESTABLISHED. And that leads me to conclude somehow there are connections not properly closed.

 

If so, it would be normal to see a lot of them.

The way TCP works is that even after you've closed TCP connections the OS will keep them open a while longer (default is 2 minutes) to catch any stray packets that might arrive.

Setting TCP reuse address allows the server to reuse one of these addresses - which is why it probably helps you.

Also you can reduce the timeout at the OS level.

So... maths time. If you are getting 8000 connections / minute, then you should expect to see an average of 16000 connections at steady state.

The first thing I would do is add some logging to make sure that Vert.x is _actually_ closing the connection - just log out the call to channel.close in Vert.x and keep a count of connections in an AtomicLong (or whatever).

Thanks for the tip. I'll check this number. Because calling of response.end() is scattered around the code, I can't make sure EVERY request is finally paired with a req.response.end(); Thinks i need an additional timer to
check this...
 

Once you've verified that Vert.x is actually closing connections properly, then it's probably just a matter of configuring your OS appropriately.

Also... as I mentioned before, you should also increase your accept backlog syn queue as specified in the performance chapter, if you haven't done so already or you might get refused connections at peak.

Yes we have this number set to 100000
 

Regarding the "connection reset by peer" exceptions on the server - this is normal - you get this when the other side of the connection (in this case the browser) closes it.

In our simulated test, there are no such exceptions. Now it seem to me maybe the occur because not all our requests are responded and then the browsers without receiving a response would reset the connection.
 

Tim Fox

unread,
Nov 30, 2012, 4:50:35 AM11/30/12
to ve...@googlegroups.com
On 30/11/12 09:47, ������ wrote:
>
>
> �� 2012��11��30��������UTC+8����4ʱ50��57�룬Tim Foxд����
>
> Are most of the connections in TIME_WAIT state?
>
>
> No, most of them are ESTABLISHED. And that leads me to conclude
> somehow there are connections not properly closed.

Ok that would be easy to verify - just count how many connections are
being closed in the code.
>> �� 2012��11��30��������UTC+8���
>> ��12ʱ19��05� �룬Tim Foxд����
>> On 29/11/12 13:26, ������ wrote:
>> > Hi :
>> >
>> > We have a web service written in vert.x and now it's in
>> beta test
>> > phase. Today we got our first real world data flowing into
>> tje server,
>> > but unfortunately it began to show a lot of "Connection
>> Reset by
>> > peers" exception
>> > and later was hogged into nearly frozen state, complaining
>> about "Too
>> > many open files". and after checking netstat, we found that
>> the server
>> > seems to be holding many many connections like this (The ip
>> of the
>> > server here is fake):
>> >
>> > > tcp 0 0 ::ffff:118.78.182.38:80 <http://118.78.182.38:80>
>> ::ffff:171.120.97.95:35669 <http://171.120.97.95:35669>
>> <https://groups.google.com/d/msg/vertx/-/MdWN7qD1CLMJ>.
>> > To post to this group, send an email to
>> ve...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > vertx+un...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>>
>>
>> --
>> Tim Fox
>>
>> Vert.x - effortless polyglot asynchronous application
>> development
>> http://vertx.io
>> twitter:@timfox
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "vert.x" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ
>> <https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/RkdWoy9EbDcJ.

Tim Fox

unread,
Nov 30, 2012, 4:51:20 AM11/30/12
to ve...@googlegroups.com
On 30/11/12 09:47, ������ wrote:
>
>
> �� 2012��11��30��������UTC+8����4ʱ50��57�룬Tim Foxд����
>
> Are most of the connections in TIME_WAIT state?
>
>
> No, most of them are ESTABLISHED. And that leads me to conclude
> somehow there are connections not properly closed.

Also I'd like to see your server side code - hard to tell if there is a
code issue or not without looking at it.
>> �� 2012��11��30��������UTC+8���
>> ��12ʱ19��05� �룬Tim Foxд����
>>
>> On 29/11/12 13:26, ������ wrote:
>> > Hi :
>> >
>> > We have a web service written in vert.x and now it's in
>> beta test
>> > phase. Today we got our first real world data flowing into
>> tje server,
>> > but unfortunately it began to show a lot of "Connection
>> Reset by
>> > peers" exception
>> > and later was hogged into nearly frozen state, complaining
>> about "Too
>> > many open files". and after checking netstat, we found that
>> the server
>> > seems to be holding many many connections like this (The ip
>> of the
>> > server here is fake):
>> >
>> > > tcp 0 0 ::ffff:118.78.182.38:80 <http://118.78.182.38:80>
>> ::ffff:171.120.97.95:35669 <http://171.120.97.95:35669>
>> <https://groups.google.com/d/msg/vertx/-/MdWN7qD1CLMJ>.
>> > To post to this group, send an email to
>> ve...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > vertx+un...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>>
>>
>> --
>> Tim Fox
>>
>> Vert.x - effortless polyglot asynchronous application
>> development
>> http://vertx.io
>> twitter:@timfox
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "vert.x" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ
>> <https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/RkdWoy9EbDcJ.

Tim Fox

unread,
Nov 30, 2012, 4:51:54 AM11/30/12
to ve...@googlegroups.com
On 30/11/12 09:47, ������ wrote:
>
>
> �� 2012��11��30��������UTC+8����4ʱ50��57�룬Tim Foxд����
>
> Are most of the connections in TIME_WAIT state?
>
>
> No, most of them are ESTABLISHED.

That's odd, if they are ESTABLISHED then setting reuse address shouldn't
help you at all.
>> �� 2012��11��30��������UTC+8���
>> ��12ʱ19��05� �룬Tim Foxд����
>>
>> On 29/11/12 13:26, ������ wrote:
>> > Hi :
>> >
>> > We have a web service written in vert.x and now it's in
>> beta test
>> > phase. Today we got our first real world data flowing into
>> tje server,
>> > but unfortunately it began to show a lot of "Connection
>> Reset by
>> > peers" exception
>> > and later was hogged into nearly frozen state, complaining
>> about "Too
>> > many open files". and after checking netstat, we found that
>> the server
>> > seems to be holding many many connections like this (The ip
>> of the
>> > server here is fake):
>> >
>> > > tcp 0 0 ::ffff:118.78.182.38:80 <http://118.78.182.38:80>
>> ::ffff:171.120.97.95:35669 <http://171.120.97.95:35669>
>> <https://groups.google.com/d/msg/vertx/-/MdWN7qD1CLMJ>.
>> > To post to this group, send an email to
>> ve...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > vertx+un...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>>
>>
>> --
>> Tim Fox
>>
>> Vert.x - effortless polyglot asynchronous application
>> development
>> http://vertx.io
>> twitter:@timfox
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "vert.x" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ
>> <https://groups.google.com/d/msg/vertx/-/x9idNGPwtqMJ>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/RkdWoy9EbDcJ.

赵普明

unread,
Nov 30, 2012, 5:18:49 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午4时50分57秒,Tim Fox写道:
Are most of the connections in TIME_WAIT state?

The first thing I would do is add some logging to make sure that Vert.x is _actually_ closing the connection - just log out the call to channel.close in Vert.x and keep a count of connections in an AtomicLong (or whatever).


 
As this counting would involve modifying vert.x code, I realized that : why don't we have an approach to Make Sure that a response is actually written?

My code seems to be passing the request object everywhere, deep in the callback chains, and after a little re-reading i found that I can't make sure that req.response.end() is called in ALL possible code branches.

Now I really want a method to check that. For example, if I can set a timer upon the request, and check the written status of the response, it would be great. And that would become a request.setTimeout() method.

This will be easier to track than logging and counting IMHO. Do you think it is OK to add this functionality into vert.x?


Tim Fox

unread,
Nov 30, 2012, 5:25:18 AM11/30/12
to ve...@googlegroups.com
We could consider adding a timeout to the Vert.x API, but this kind of
thing would be easy to add in your own code.

I suggest putting the checks in your own code for now, then open a
github issue to have a feature added once you've fixed your issue.

For now, I recommend keeping a counter in your code that gets
incremented when a request arrives in the request handler, and gets
decremented every time response.end gets called (search in your code for
this).

Then have a vert.x periodic timer that logs out the number of
connections every few seconds.

You can wrap this up easily in a small utility class.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/uW2wYercM3kJ.

赵普明

unread,
Nov 30, 2012, 5:25:21 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午5时51分54秒,Tim Fox写道:
On 30/11/12 09:47, 锟斤拷锟斤拷锟斤拷 wrote:
>
>
> 锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷4时50锟斤拷57锟诫,Tim Fox写锟斤拷锟斤拷
>
>     Are most of the connections in TIME_WAIT state?
>
>
> No, most of them are ESTABLISHED.

That's odd, if they are ESTABLISHED then setting reuse address shouldn't
help you at all.

You're right. reuse address didn't help. The situation now is not changed actually. When traffic goes up, the connections are still piling up.

P.S. unfortunately I can't show you the code because that might violate our company's policy :-( 

Now I'm guessing it's highly probable that I missed a req.response.end() on some logic branch.

I'll check that and report back later :-)


>>     锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟�
>>     锟斤拷12时19锟斤拷05锟�锟诫,Tim Fox写锟斤拷锟斤拷

赵普明

unread,
Nov 30, 2012, 6:17:18 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午6时25分18秒,Tim Fox写道:


We could consider adding a timeout to the Vert.x API, but this kind of
thing would be easy to add in your own code.

req.response.written an req.response.closed are not public.


I suggest putting the checks in your own code for now, then open a
github issue to have a feature added once you've fixed your issue.

For now, I recommend keeping a counter in your code that gets
incremented when a request arrives in the request handler, and gets
decremented every time response.end gets called (search in your code for
this).

Then have a vert.x periodic timer that logs out the number of
connections every few seconds.

Just done that. But it seems my code covers nearly all requests. The count difference are  nearly none. (-10 and constant)

And I counted the number of "Reset by peer" exceptions, and that number seems to concur with the number of excess opened ESTABLISHED connections.

I read vert.x code and it seems to have conn.close() when that exception is raised.

roughly 25 exceptions in 1000 request. and that number is close to how we pile up several hundred connections per minute.


But I don't know. Do you have any idea about this?

Can you make sure that response.end() will change a connection from ESTABLISHED to other states?(i.e. close it)






You can wrap this up easily in a small utility class.

--

赵普明

unread,
Nov 30, 2012, 6:34:06 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午7时17分18秒,赵普明写道:


在 2012年11月30日星期五UTC+8下午6时25分18秒,Tim Fox写道:


We could consider adding a timeout to the Vert.x API, but this kind of
thing would be easy to add in your own code.

req.response.written an req.response.closed are not public.


I suggest putting the checks in your own code for now, then open a
github issue to have a feature added once you've fixed your issue.

For now, I recommend keeping a counter in your code that gets
incremented when a request arrives in the request handler, and gets
decremented every time response.end gets called (search in your code for
this).

Then have a vert.x periodic timer that logs out the number of
connections every few seconds.

Just done that. But it seems my code covers nearly all requests. The count difference are  nearly none. (-10 and constant)

And I counted the number of "Reset by peer" exceptions, and that number seems to concur with the number of excess opened ESTABLISHED connections.

I read vert.x code and it seems to have conn.close() when that exception is raised.

roughly 25 exceptions in 1000 request. and that number is close to how we pile up several hundred connections per minute.

Update: after more tests:

"Reset by peer" rate is around 3%  of request numbers

Excess ESTABLISHED connections are roughly 1%~1.5%

Request - Response count difference is nearly none: about  10 in 100000


Tim Fox

unread,
Nov 30, 2012, 7:24:56 AM11/30/12
to ve...@googlegroups.com
Can you also check with master?

I just want to be sure you're not running into something that's already been fixed.

On 30/11/2012 11:34, 锟斤拷锟斤拷锟斤拷 wrote:


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷7时17锟斤拷18锟诫,锟斤拷锟斤拷锟斤拷写锟斤拷锟斤拷


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷6时25锟斤拷18锟诫,Tim Fox写锟斤拷锟斤拷
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/BEUxhVN-ljUJ.

Tim Fox

unread,
Nov 30, 2012, 7:27:30 AM11/30/12
to ve...@googlegroups.com
Also can I request again an example of the logs containing the "connection reset by peer" exceptions?


On 30/11/2012 11:34, 锟斤拷锟斤拷锟斤拷 wrote:


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷7时17锟斤拷18锟诫,锟斤拷锟斤拷锟斤拷写锟斤拷锟斤拷


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷6时25锟斤拷18锟诫,Tim Fox写锟斤拷锟斤拷


--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/BEUxhVN-ljUJ.

赵普明

unread,
Nov 30, 2012, 7:32:58 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午8时24分56秒,Tim Fox写道:
Can you also check with master?

I just want to be sure you're not running into something that's already been fixed.

I'm at home now and do not have the environment of building master code now. I'll do it later :-)

Update: I add the following code to the start of my handle() method:

vertx.setTimer(2000L, new Handler<Long>() {
           
@Override
           
public void handle(Long event) {
               
try {
                    req
.response.end();
               
} catch (Exception e) {
                 
               
}
                 
           
}
       
});


So that would make sure all requests are responded.

But the situation is still the same..


 

赵普明

unread,
Nov 30, 2012, 7:49:57 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午8时27分30秒,Tim Fox写道:
Also can I request again an example of the logs containing the "connection reset by peer" exceptions?


java.io.IOException: Connection reset by peer
    at sun
.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun
.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun
.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
    at sun
.nio.ch.IOUtil.read(IOUtil.java:186)
    at sun
.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
    at org
.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
    at org
.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
    at org
.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
    at org
.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
    at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java
.lang.Thread.run(Thread.java:722)


 
On 30/11/2012 11:34, 锟斤拷锟斤拷锟斤拷 wrote:

Tim Fox

unread,
Nov 30, 2012, 8:01:35 AM11/30/12
to ve...@googlegroups.com
I've finally managed to reproduce the "connection reset by peer" exception.

To do this the browser needs to be Internet Explorer - it seems if IE
creates a connection to the server (keep alive), then the user shuts
down IE without closing tabs first then IE does not properly terminate
the connections - resulting in the exceptions on the server.

I will look into this more later on today.

On 30/11/12 12:49, ������ wrote:
>
>
> �� 2012��11��30��������UTC+8����8ʱ27��30�룬Tim Foxд����
>
> Also can I request again an example of the logs containing the
> "connection reset by peer" exceptions?
>
>
> |
> java.io.IOException:Connectionreset bypeer
> at sun.nio.ch.FileDispatcherImpl.read0(NativeMethod)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
> at sun.nio.ch.IOUtil.read(IOUtil.java:186)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> |
>
> On 30/11/2012 11:34, ������ wrote:
>>
>>
>> �� 2012��11��30��������UTC+8���
>> ��7ʱ17��18�룬� ������д����
>>
>>
>>
>> �� 2012��11��30��������UTC+8��
>> ����6ʱ25��18� �룬Tim Foxд����
>> <https://groups.google.com/d/msg/vertx/-/BEUxhVN-ljUJ>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/aHb9tZ1JGDIJ.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.


Tim Fox

unread,
Nov 30, 2012, 8:10:56 AM11/30/12
to ve...@googlegroups.com
What's very odd is you appear to have connections still established
*after* the client has closed them abruptly. This doesn't make a lot of
sense to me - if the connection is closed then you won't see it on the
list of established connections.

I've verified this locally - if I cause connection reset by peer on a
simple Vert.x Http Server example by browsing to it using IE then
closing IE with a tab open, I do get "connection reset by peer", but the
connection is closed (as expected).

Can you verify on netstat that the established connections are really to
Vert.x, not to some other server?

赵普明

unread,
Nov 30, 2012, 8:16:27 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午9时01分35秒,Tim Fox写道:
I've finally managed to reproduce the "connection reset by peer" exception.

To do this the browser needs to be Internet Explorer - it seems if IE
creates a connection to the server (keep alive), then the user shuts
down IE without closing tabs first then IE does not properly terminate
the connections - resulting in the exceptions on the server.

I will look into this more later on today.


Thanks very much :-)

I've changed my code to some very basic form:

    private static AtomicLong reqCount = new AtomicLong();
   
private static AtomicLong s24Count = new AtomicLong();
   
private static AtomicLong s25Count = new AtomicLong();
   
private static AtomicLong badCount = new AtomicLong();


   
@Override
   
public void handle(final HttpServerRequest req) {

       
if (reqCount.incrementAndGet() % 100 == 0) {
            logger
.info("ReqCount:" + reqCount.get());
       
}

       
String s24 = "...s24code....";
       
String s25 = "...s25code....";
       
       
String pid = req.params().get("pid");
        req
.response.headers().put("Connection", "close");
       
if ("24".equals(pid)) {
           
if (s24Count.incrementAndGet() % 100 == 0) {
                logger
.info("S24Count:" + s24Count.get());
           
}
            req
.response.end(s24);
       
} else if ("25".equals(pid)) {
           
if (s25Count.incrementAndGet() % 100 == 0) {
                logger
.info("S25Count:" + s25Count.get());
           
}
            req
.response.end(s25);
       
} else {
           
if (badCount.incrementAndGet() % 100 == 0) {
                logger
.info("badCount:" + badCount.get());
           
}
            req
.response.statusCode = 400;
            req
.response.end();
       
}

   
}在此输入代码...

we serve two strings for param pid=24 or pid=25 and return StatusCode 400 otherwise.

and now the ESTABLISHED connections are increasing very fast, much faster than the old code (which get the pid and do a lot of stuff and then return those strings)

I have no idea ...


On 30/11/12 12:49, 锟斤拷锟斤拷锟斤拷 wrote:
>
>
> 锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷8时27锟斤拷30锟诫,Tim Fox写锟斤拷锟斤拷
>
>     Also can I request again an example of the logs containing the
>     "connection reset by peer" exceptions?
>
>
> |
> java.io.IOException:Connectionreset bypeer
> at sun.nio.ch.FileDispatcherImpl.read0(NativeMethod)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
> at sun.nio.ch.IOUtil.read(IOUtil.java:186)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> |
>
>     On 30/11/2012 11:34, 锟斤拷锟斤拷锟斤拷 wrote:
>>
>>
>>     锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟�
>>     锟斤拷7时17锟斤拷18锟诫,锟�锟斤拷锟斤拷锟斤拷写锟斤拷锟斤拷
>>
>>
>>
>>         锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷
>>         锟斤拷锟斤拷6时25锟斤拷18锟�锟诫,Tim Fox写锟斤拷锟斤拷

赵普明

unread,
Nov 30, 2012, 8:19:19 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午9时10分56秒,Tim Fox写道:
What's very odd is you appear to have connections still established
*after* the client has closed them abruptly. This doesn't make a lot of
sense to me - if the connection is closed then you won't see it on the
list of established connections.

I've verified this locally - if I cause connection reset by peer on a
simple Vert.x Http Server example by browsing to it using IE then
closing IE with a tab open, I do get "connection reset by peer", but the
connection is closed (as expected).

Can you verify on netstat that the established connections are really to
Vert.x, not to some other server?

Yes, they are all on port 80, which is served by vert.x, and they only increase when I start the vert.x server

So you suggest that maybe due to our server's network settings? I'll run some other test on that server to check that.
 
On 30/11/12 13:01, Tim Fox wrote:
> I've finally managed to reproduce the "connection reset by peer" exception.
>
> To do this the browser needs to be Internet Explorer - it seems if IE
> creates a connection to the server (keep alive), then the user shuts
> down IE without closing tabs first then IE does not properly terminate
> the connections - resulting in the exceptions on the server.
>
> I will look into this more later on today.
>
> On 30/11/12 12:49, 锟斤拷锟斤拷锟斤拷 wrote:
>>
>> 锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷8时27锟斤拷30锟诫,Tim Fox写锟斤拷锟斤拷
>>
>>     Also can I request again an example of the logs containing the
>>     "connection reset by peer" exceptions?
>>
>>
>> |
>> java.io.IOException:Connectionreset bypeer
>> at sun.nio.ch.FileDispatcherImpl.read0(NativeMethod)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:186)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
>> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
>> at
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
>> at
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
>> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> |
>>
>>     On 30/11/2012 11:34, 锟斤拷锟斤拷锟斤拷 wrote:
>>>
>>>     锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟�
>>>     锟斤拷7时17锟斤拷18锟诫,锟�锟斤拷锟斤拷锟斤拷写锟斤拷锟斤拷
>>>
>>>
>>>
>>>         锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷
>>>         锟斤拷锟斤拷6时25锟斤拷18锟�锟诫,Tim Fox写锟斤拷锟斤拷
>>>
>>>
>>>

Tim Fox

unread,
Nov 30, 2012, 8:30:48 AM11/30/12
to ve...@googlegroups.com
Can you try a simple test

1. Run your server
2. Start IE, and point it at your advert page
3. Use netstat to verify you have an established connection
4. Close IE (or terminate it using task manager)
5. Verify you get a "connection reset by peer" on the server.
6. Use netstat again to see if you still have an established connection
> 锟斤拷8时27锟斤拷30 锟诫,Tim Fox写锟斤拷锟斤拷
> <https://groups.google.com/d/msg/vertx/-/aHb9tZ1JGDIJ>.
> >> To post to this group, send an email to ve...@googlegroups.com
> <javascript:>.
> >> To unsubscribe from this group, send email to
> >> vertx+un...@googlegroups.com <javascript:>.
> >> For more options, visit this group at
> >> http://groups.google.com/group/vertx?hl=en-GB
> <http://groups.google.com/group/vertx?hl=en-GB>.
> >
>
>
> --
> Tim Fox
>
> Vert.x - effortless polyglot asynchronous application development
> http://vertx.io
> twitter:@timfox
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/wobs4DvH0GQJ.

赵普明

unread,
Nov 30, 2012, 8:39:05 AM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午9时30分48秒,Tim Fox写道:
Can you try a simple test

1. Run your server
2. Start IE, and point it at your advert page
3. Use netstat to verify you have an established connection
4. Close IE (or terminate it using task manager)
5. Verify you get a "connection reset by peer" on the server.
6. Use netstat again to see if you still have an established connection

I don't have IE with me now as i'm in linux... I can do that test tomorrow.
 

Tim Fox

unread,
Nov 30, 2012, 10:47:45 AM11/30/12
to ve...@googlegroups.com


On Thursday, November 29, 2012 1:26:26 PM UTC, 赵普明 wrote:
Hi :

We have a web service written in vert.x and now it's in beta test phase. Today we got our first real world data flowing into tje server, but unfortunately it began to show a lot of "Connection Reset by peers" exception
and later was hogged into nearly frozen state, complaining about "Too many open files". and after checking netstat, we found that the server seems to be holding many many connections like this (The ip of the server here is fake):

>   tcp        0      0 ::ffff:118.78.182.38:80     ::ffff:171.120.97.95:35669  ESTABLISHED


and the opened connection has reached to the max-file limit.

I don't understand where those "Connection reset by peer" comes from.

Our server is serving ads on a website, and all ad request comes from browsers over the Internet. Today we see roughly 10000 page view per minute, which should reasonably low. But there still seems to be a LOT of "Connection Reset by peers" exception.
I was having the impression that vert.x/NIO can serve 10000 request per SECOND without any problem. Maybe the problem here is that all the request coming from browsers needs a different connection? Is NIO OK with that?

Our typical response time is 100~200ms

So here are two questions I'd like to know:


1. Do the server automatically close a connection when it hits a "Connection reset by peer"? If not, how do we close it in the code?

2. Can anyone shed some light on why we've got so many peer resets? How can we avoid that? How can we scale up?

3. Does this have anything to do with keep-alive? Is a connection in vert.x Keep-alive by default?

Whether a connection is keep alive or not is determined by what the browser sends.

If the browser sends an HTTP 1.0 request and there is a header Connection:Keep-Alive then it's keep alive, if it's a HTTP 1.1 request it's keep alive by default.
 
if it is, does that mean we need to manually close a connection after sending a response?

A keep alive connection doesn't automatically get closed after sending a response - otherwise it would negate the point of keep alive (which is browses reusing the connection to send further requests) !

The connection will remain open until the client closes it (e.g. they close their tab or browser, or maybe the browser decides to close it for some other reason), or you close it.

If you want the connection to remain open you should close it immediately after sending your response, or perhaps set a timer to close it after a timeout.

Tim Fox

unread,
Nov 30, 2012, 10:49:08 AM11/30/12
to ve...@googlegroups.com
On 30/11/12 15:47, Tim Fox wrote:
>
>
> On Thursday, November 29, 2012 1:26:26 PM UTC, 赵普明 wrote:
>
> Hi :
>
> We have a web service written in vert.x and now it's in beta test
> phase. Today we got our first real world data flowing into tje
> server, but unfortunately it began to show a lot of "Connection
> Reset by peers" exception
> and later was hogged into nearly frozen state, complaining about
> "Too many open files". and after checking netstat, we found that
> the server seems to be holding many many connections like this
> (The ip of the server here is fake):
>
> > tcp 0 0 ::ffff:118.78.182.38:80
> <http://171.120.97.95:35669> ESTABLISHED
Meant to say " If you DON'T want the connection to remain open you
should close it immediately after sending your response, or perhaps set
a timer to close it after a timeout."


>
> (All our requests are short connections coming from browsers--they
> are ads---so keep-alive is meaningless to us),
>
> I really hope this problem would be solved or else our service can
> not go online and the whole project would fail ...
>
> Thank you very much.
>
> Best Regards
>
> Puming.
> Or can we config the server not to keep-alive?
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/xKQLDJ31nOQJ.

赵普明

unread,
Nov 30, 2012, 11:39:13 PM11/30/12
to ve...@googlegroups.com


在 2012年11月30日星期五UTC+8下午11时47分45秒,Tim Fox写道:


On Thursday, November 29, 2012 1:26:26 PM UTC, 赵普明 wrote:
Hi :

We have a web service written in vert.x and now it's in beta test phase. Today we got our first real world data flowing into tje server, but unfortunately it began to show a lot of "Connection Reset by peers" exception
and later was hogged into nearly frozen state, complaining about "Too many open files". and after checking netstat, we found that the server seems to be holding many many connections like this (The ip of the server here is fake):

>   tcp        0      0 ::ffff:118.78.182.38:80     ::ffff:171.120.97.95:35669  ESTABLISHED

and the opened connection has reached to the max-file limit.

I don't understand where those "Connection reset by peer" comes from.

Our server is serving ads on a website, and all ad request comes from browsers over the Internet. Today we see roughly 10000 page view per minute, which should reasonably low. But there still seems to be a LOT of "Connection Reset by peers" exception.
I was having the impression that vert.x/NIO can serve 10000 request per SECOND without any problem. Maybe the problem here is that all the request coming from browsers needs a different connection? Is NIO OK with that?

Our typical response time is 100~200ms

So here are two questions I'd like to know:


1. Do the server automatically close a connection when it hits a "Connection reset by peer"? If not, how do we close it in the code?

2. Can anyone shed some light on why we've got so many peer resets? How can we avoid that? How can we scale up?

3. Does this have anything to do with keep-alive? Is a connection in vert.x Keep-alive by default?

Whether a connection is keep alive or not is determined by what the browser sends.

If the browser sends an HTTP 1.0 request and there is a header Connection:Keep-Alive then it's keep alive, if it's a HTTP 1.1 request it's keep alive by default.
 
if it is, does that mean we need to manually close a connection after sending a response?

A keep alive connection doesn't automatically get closed after sending a response - otherwise it would negate the point of keep alive (which is browses reusing the connection to send further requests) !

The connection will remain open until the client closes it (e.g. they close their tab or browser, or maybe the browser decides to close it for some other reason), or you close it.

Now I understand what you mean. Here is a recall of our situation:

1. First senario
  We don't send Connection:close, and we don't manualy close a connection

this is what we do at first, but it will leave many established connections


Second senario
  We send Connection:close, and we don't manually close a connection

I thought letting the browser know it's time to close is OK.  and then I found that they will send a Fin token, but the server didn't respond well:

from tcpdump i found that the browser seems to be sending [F] before the response is fully sent by the server. So the connection is not actually closed.



Third senario

  We send Connection:close, and we manually close a connection with a timer:

vertx.setTimer(2000L, new Handler<Long>() {
           
@Override
           
public void handle(Long event) {

                 req
.response.close();
           
}
       
});

Because the second senario didn't work, we add a timer to manully close the connection.

But it seems that because the browser already sent a Fin token (maybe that changed the status of response object), the close() in the timer didn't do any thing. It didn't send a Fin 2000ms later.

So the situation is still the same. calling req.response.close() in this senario does nothing.


Fourth Senario:

We don't send a Connection:close, but we do a timeout and manually close the connection.

This time the close() in timer works and sends a F, and in my local test it works fin

But when I uploaded it onto the server, two things happens:

1. a lot of FIN_WAIT connections, which seems legit
2. still some ESTABLISHED connections piling up......

I can't explain the 2 thing....


Which scenario do you think is good for our situation?



Tim Fox

unread,
Dec 1, 2012, 4:26:42 AM12/1/12
to ve...@googlegroups.com
I think if you don't want the connection to remain open, you should close it (response.close) immediately after sending the response.

On 01/12/2012 04:39, 锟斤拷锟斤拷锟斤拷 wrote:


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷11时47锟斤拷45锟诫,Tim Fox写锟斤拷锟斤拷
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/wOhiXMHTuC0J.

Tim Fox

unread,
Dec 1, 2012, 4:48:33 AM12/1/12
to ve...@googlegroups.com
On 01/12/2012 04:39, 锟斤拷锟斤拷锟斤拷 wrote:


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷11时47锟斤拷45锟诫,Tim Fox写锟斤拷锟斤拷


On Thursday, November 29, 2012 1:26:26 PM UTC, 锟斤拷锟斤拷锟斤拷 wrote:
Hi :

We have a web service written in vert.x and now it's in beta test phase. Today we got our first real world data flowing into tje server, but unfortunately it began to show a lot of "Connection Reset by peers" exception
and later was hogged into nearly frozen state, complaining about "Too many open files". and after checking netstat, we found that the server seems to be holding many many connections like this (The ip of the server here is fake):

>   tcp        0      0 ::ffff:118.78.182.38:80     ::ffff:171.120.97.95:35669  ESTABLISHED

and the opened connection has reached to the max-file limit.

I don't understand where those "Connection reset by peer" comes from.

Our server is serving ads on a website, and all ad request comes from browsers over the Internet. Today we see roughly 10000 page view per minute, which should reasonably low. But there still seems to be a LOT of "Connection Reset by peers" exception.
I was having the impression that vert.x/NIO can serve 10000 request per SECOND without any problem. Maybe the problem here is that all the request coming from browsers needs a different connection? Is NIO OK with that?

Our typical response time is 100~200ms

So here are two questions I'd like to know:


1. Do the server automatically close a connection when it hits a "Connection reset by peer"? If not, how do we close it in the code?

2. Can anyone shed some light on why we've got so many peer resets? How can we avoid that? How can we scale up?

3. Does this have anything to do with keep-alive? Is a connection in vert.x Keep-alive by default?

Whether a connection is keep alive or not is determined by what the browser sends.

If the browser sends an HTTP 1.0 request and there is a header Connection:Keep-Alive then it's keep alive, if it's a HTTP 1.1 request it's keep alive by default.
 
if it is, does that mean we need to manually close a connection after sending a response?

A keep alive connection doesn't automatically get closed after sending a response - otherwise it would negate the point of keep alive (which is browses reusing the connection to send further requests) !

The connection will remain open until the client closes it (e.g. they close their tab or browser, or maybe the browser decides to close it for some other reason), or you close it.

Now I understand what you mean. Here is a recall of our situation:

1. First senario
  We don't send Connection:close,
What do you mean by 'send Connection:close'? You mean set the response header?

I wouldn't bother with that. Just make sure you call response.close() after everywhere in the code where you call response.end()

--

Tim Fox

unread,
Dec 1, 2012, 4:58:32 AM12/1/12
to ve...@googlegroups.com
On 01/12/2012 04:39, 锟斤拷锟斤拷锟斤拷 wrote:


锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷11时47锟斤拷45锟诫,Tim Fox写锟斤拷锟斤拷

This makes sense since you're not closing the connection immediately, you're closing it on a timeout.


I can't explain the 2 thing....


Which scenario do you think is good for our situation?




If you want the connection to remain open you should close it immediately after sending your response, or perhaps set a timer to close it after a timeout.

 
(All our requests are short connections coming from browsers--they are ads---so keep-alive is meaningless to us),

I really hope this problem would be solved or else our service can not go online and the whole project would fail ...

Thank you very much.

Best Regards

Puming.
Or can we config the server not to keep-alive?
--

赵普明

unread,
Dec 1, 2012, 5:46:04 AM12/1/12
to ve...@googlegroups.com


在 2012年12月1日星期六UTC+8下午5时48分33秒,Tim Fox写道:

Yes, Header "Connection:close", that would tell the browser to send a "Fin" signal to the server

 
I wouldn't bother with that. Just make sure you call response.close() after everywhere in the code where you call response.end()

Yes, I did that. Actually what i'm doing right now is to call response.end() immediately AND call it in a timer to make sure.

But then after some hours of running, the number of hanging ESTABLISHED connections are still increasing, although in a slower pace ( 5000 Established connections for 2,000,000 requests) ,
but steadily increasing.

I've looked at vert.x and netty code a bit and found that AbstractChannel.close() is implemented by sending an event down the pipeline:

    public static ChannelFuture close(Channel channel) {
       
ChannelFuture future = channel.getCloseFuture();
        channel
.getPipeline().sendDownstream(new DownstreamChannelStateEvent(
                channel
, future, ChannelState.OPEN, Boolean.FALSE));
       
return future;
   
}

and letting the last handler deal with it.

Do you know of a direct approach of closing a connection behind a response? I just want to make sure that those connections are closed.
How can we make sure that every call to response.close() is actually closing the connection?

In lighttpd one can configure the max number of live connections, seems lighttpd would track the time of each connection and periodically close those older ones.

Does vert.x or netty have this facility?



 

Tim Fox

unread,
Dec 1, 2012, 7:40:15 AM12/1/12
to ve...@googlegroups.com
On 01/12/12 10:46, ������ wrote:
>
>
> �� 2012��12��1��������UTC+8����5ʱ48��33�룬Tim Foxд����
>
> On 01/12/2012 04:39, ������ wrote:
>>
>>
>> �� 2012��11��30��������UTC+8���
>> ��11ʱ47��45� �룬Tim Foxд����
>>
>>
>>
>> On Thursday, November 29, 2012 1:26:26 PM UTC, �����
>> �� wrote:
>>
>> Hi :
>>
>> We have a web service written in vert.x and now it's in
>> beta test phase. Today we got our first real world data
>> flowing into tje server, but unfortunately it began to
>> show a lot of "Connection Reset by peers" exception
>> and later was hogged into nearly frozen state,
>> complaining about "Too many open files". and after
>> checking netstat, we found that the server seems to be
>> holding many many connections like this (The ip of the
>> server here is fake):
>>
>> > tcp 0 0 ::ffff:118.78.182.38:80
>> <http://171.120.97.95:35669> ESTABLISHED
Can you try one thing? Please hack the source of
DefaultHttpServerResponse.java in master so that the close() method
looks like this:

public void close() {
if (!closed) {
conn.close();
closed = true;
}
}

And tell me if it makes a difference.
>> <https://groups.google.com/d/msg/vertx/-/wOhiXMHTuC0J>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/hgA-ltCE5S8J.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.


赵普明

unread,
Dec 1, 2012, 10:51:44 AM12/1/12
to ve...@googlegroups.com


在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
>
>
> 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷5时48锟斤拷33锟诫,Tim Fox写锟斤拷锟斤拷
>
>     On 01/12/2012 04:39, 锟斤拷锟斤拷锟斤拷 wrote:
>>
>>
>>     锟斤拷 2012锟斤拷11锟斤拷30锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟�
>>     锟斤拷11时47锟斤拷45锟�锟诫,Tim Fox写锟斤拷锟斤拷
>>
>>
>>
>>         On Thursday, November 29, 2012 1:26:26 PM UTC, 锟斤拷锟斤拷锟�
The connections are still increasing, but it seems to be in a slower pace. (we have ~3000 connections for 8,000,000 request now)

Tim Fox

unread,
Dec 1, 2012, 3:35:35 PM12/1/12
to ve...@googlegroups.com
On 01/12/2012 15:51, 赵普明 wrote:


在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
>
>
> 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷5时48锟斤拷33锟 诫,Tim Fox写锟斤拷锟斤拷

Well, I have no idea then.

conn.close() just closes the Netty channel. I would be _extremely_ surprised if that didn't cause the connection to close, if it didn't it would be a major bug in Netty, and I have never heard of anything like this before.

I suspect there is something else going on in your code or environment, but without seeing your code it's very hard to tell.

Have you considered the possibility that 3000 connections might be normal? Even if you are quickly closing each connection you're always going to have a certain number open at any one time.

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J.

Tim Fox

unread,
Dec 2, 2012, 5:02:49 AM12/2/12
to ve...@googlegroups.com
On 01/12/12 15:51, 赵普明 wrote:
>
>
> 在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
>
> On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
> >
> >
> > 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟
> 斤拷5时48锟斤拷33锟 诫,Tim Fox写锟斤拷锟斤拷
There's no need to call it in a timer if you are closing it (with the
hack I suggested).

If you call it in a timer too, then your timer handler will prevent
connection / response related resources from being GC'd until the timer
fires - this will increase the memory required for your application.
> <https://groups.google.com/d/msg/vertx/-/hgA-ltCE5S8J>.
> > To post to this group, send an email to ve...@googlegroups.com
> <javascript:>.
> > To unsubscribe from this group, send email to
> > vertx+un...@googlegroups.com <javascript:>.
> > For more options, visit this group at
> > http://groups.google.com/group/vertx?hl=en-GB
> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> Tim Fox
>
> Vert.x - effortless polyglot asynchronous application development
> http://vertx.io
> twitter:@timfox
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J.

赵普明

unread,
Dec 2, 2012, 10:12:15 PM12/2/12
to ve...@googlegroups.com


在 2012年12月2日星期日UTC+8上午4时35分35秒,Tim Fox写道:
Thanks for the clarification. I'll make a test with a simple Netty application to confirm this.


I suspect there is something else going on in your code or environment, but without seeing your code it's very hard to tell.

Have you considered the possibility that 3000 connections might be normal? Even if you are quickly closing each connection you're always going to have a certain number open at any one time.


Yes, we expect a stable number of opening connections to be a few thousand.

But after two days of running, the number of ESTABLISHED connections has reached 60000, and still increasing. So there must be some connections not properly closed...

So the final refugee would be a pool that manage ALL connections, and periodical clear older connections in order to stay fit. 

This would also be useful for keep-alive connections, as in many situations, keep-alive connections would turn up to be too high when you have multi-millions or even billions of requests per day coming in.

That's why lighttpd has max-keep-alive-connections, and max-keep-alive-idle settings and manages all connections in a pool.

Do netty has this facility? If not can do we need to have that in vert.x? Maybe I'll have to add that to our server in order to survive.

stream

unread,
Dec 2, 2012, 11:08:14 PM12/2/12
to ve...@googlegroups.com
i don't suggest that take vertx webHttpServer as a WebServer.
we get a nginx as WebServer and LoadBalance.
nginx connect to the several WebServer of Java such as Jetty Tomcat, here you can use VertxHttpServer.

Nginx will make a Httpconnection to your Backend Server over protocol of Http 1.0 or Http 1.1

i hope this option could help you.


To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J.

赵普明

unread,
Dec 2, 2012, 11:35:16 PM12/2/12
to ve...@googlegroups.com


在 2012年12月3日星期一UTC+8下午12时08分14秒,Stream.liu写道:
i don't suggest that take vertx webHttpServer as a WebServer.
we get a nginx as WebServer and LoadBalance.
nginx connect to the several WebServer of Java such as Jetty Tomcat, here you can use VertxHttpServer.


Yes, we have this as a backup plan. 

But we thought vert.x could handle with these situations well even without an nginx in front. Theoretically connections should not be a hard problem to solve.

Otherwise we might not have chosen to use vert.x/netty in the first place. Our company have been using lighttpd as server, and earlier this year it was decided that we should invest more in JVM technologies, and using vert.x is one of the moves. Unfortunately I was not quite experienced with highly concurrent servers (I was a Website dev), thinking that vert.x/netty should handle everything as well as ligttpd, did not foresee this problem. As you may have noticed, I didn't know much details of the HTTP/TCP connection process.

stream

unread,
Dec 3, 2012, 1:02:56 AM12/3/12
to ve...@googlegroups.com

On 2012-12-3, at 下午12:35, 赵普明 <zhaop...@gmail.com> wrote:

Yes, we have this as a backup plan. 

But we thought vert.x could handle with these situations well even without an nginx in front. Theoretically connections should not be a hard problem to solve.

Otherwise we might not have chosen to use vert.x/netty in the first place. Our company have been using lighttpd as server, and earlier this year it was decided that we should invest more in JVM technologies, and using vert.x is one of the moves. Unfortunately I was not quite experienced with highly concurrent servers (I was a Website dev), thinking that vert.x/netty should handle everything as well as ligttpd, did not foresee this problem. As you may have noticed, I didn't know much details of the HTTP/TCP connection process.

 We should know the problem is caused by netty or your environment,
so , i think you could make a HttpDemo which make up by Netty, but not vertx.
then , test the your client example in other JVM HTTPServer such as jetty. 

i really wanna know the result ^ ^



Tim Fox

unread,
Dec 3, 2012, 3:15:49 AM12/3/12
to ve...@googlegroups.com
On 03/12/12 03:12, 赵普明 wrote:
>
>
> 在 2012年12月2日星期日UTC+8上午4时35分35秒,Tim Fox写道:
>
> On 01/12/2012 15:51, 赵普明 wrote:
>>
>>
>> 在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
>>
>> On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
>> >
>> >
>> > 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤
>> 拷锟斤拷5时48锟斤拷 33锟 诫,Tim Fox写锟斤拷锟斤拷
Looks like it.

>
> So the final refugee would be a pool that manage ALL connections, and
> periodical clear older connections in order to stay fit.

I don't see how this would help you. You've already said you're calling
response.close() after every response.end(). response.close() simply
closes the Netty channel.

So, if you really are 100% sure you are calling response.close for every
response.end, the logical implication is that Netty isn't always closing
connections.

If you implemented a pool with a periodical closer how else are you
going to close connections than closing the Netty channel? If so, you
will have the same issue.

If the issue disappears after implementing a pool it implies there was
nothing wrong with Netty but there must have been somewhere in your code
where you weren't calling response.end right after response.close.

Let's look at the logical possibilties here:

1) Problem with Netty. channel.close() isn't always closing the channel.
Personally I would be surprised if this is the problem. But it's a
possibility
2) Problem in Vert.x code. We'll - you've already told me that you are
100% sure you are calling response.close() immediately after calling
response.end() for every response. If you look at the code path between
response.close() and the netty channel.close() there is nothing in
between (after the hack I suggested). Effectively response.end()
directly calls channel.close(). So no Vert.x code to go wrong. So we
must discount this possibility.
3) There is some code path in your application where you haven't added a
call to response.close() immediately after response.end()
4) response.end() is throwing an exception sometimes for some reason,
and you're not calling response.close() in a finally block so it doesn't
get called.

I would concentrate your efforts on 3) and 4)


>
> This would also be useful for keep-alive connections, as in many
> situations, keep-alive connections would turn up to be too high when
> you have multi-millions or even billions of requests per day coming in.
>
> That's why lighttpd has max-keep-alive-connections, and
> max-keep-alive-idle settings and manages all connections in a pool.
>
> Do netty has this facility? If not can do we need to have that in
> vert.x? Maybe I'll have to add that to our server in order to survive.

Again, how is that going to help you? If you're leaking connections then
they're just going to build up to a point where your server won't accept
any more and then its
>> <https://groups.google.com/d/msg/vertx/-/hgA-ltCE5S8J>.
>> > To post to this group, send an email to
>> ve...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > vertx+un...@googlegroups.com.
>> > For more options, visit this group at
>> Tim Fox
>>
>> Vert.x - effortless polyglot asynchronous application
>> development
>> http://vertx.io
>> twitter:@timfox
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "vert.x" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J
>> <https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J.

Tim Fox

unread,
Dec 3, 2012, 3:20:38 AM12/3/12
to ve...@googlegroups.com
On 03/12/12 04:35, 赵普明 wrote:
>
>
> 在 2012年12月3日星期一UTC+8下午12时08分14秒,Stream.liu写道:
>
> i don't suggest that take vertx webHttpServer as a WebServer.
> we get a nginx as WebServer and LoadBalance.
> nginx connect to the several WebServer of Java such as Jetty
> Tomcat, here you can use VertxHttpServer.
>
>
> Yes, we have this as a backup plan.
>
> But we thought vert.x could handle with these situations well even
> without an nginx in front. Theoretically connections should not be a
> hard problem to solve.
>
> Otherwise we might not have chosen to use vert.x/netty in the first
> place. Our company have been using lighttpd as server, and earlier
> this year it was decided that we should invest more in JVM
> technologies, and using vert.x is one of the moves. Unfortunately I
> was not quite experienced with highly concurrent servers (I was a
> Website dev), thinking that vert.x/netty should handle everything as
> well as ligttpd, did not foresee this problem. As you may have
> noticed, I didn't know much details of the HTTP/TCP connection process.

With any project: always test in a staging environment before going live!
>
> Nginx will make a Httpconnection to your Backend Server over
> protocol of Http 1.0 or Http 1.1
>
> i hope this option could help you.
>
>
> On 2012-12-3, at 上午11:12, 赵普明 <zhaop...@gmail.com
> <javascript:>> wrote:
>
>>
>>
>> 在 2012年12月2日星期日UTC+8上午4时35分35秒,Tim Fox写道:
>>
>> On 01/12/2012 15:51, 赵普明 wrote:
>>>
>>>
>>> 在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
>>>
>>> On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
>>> >
>>> >
>>> > 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷
>>> UTC+8锟斤拷锟斤拷 5时48锟斤拷33锟 诫,Tim Fox写锟斤拷锟斤拷
>>> <http://118.78.182.38/>
>>> >> <http://118.78.182.38:80
>>> <http://118.78.182.38/>> ::ffff:171.120.97.95:35669
>>> <http://171.120.97.95:35669/>
>>> >> <http://171.120.97.95:35669
>>> <http://171.120.97.95:35669/>> ESTABLISHED
>>> <https://groups.google.com/d/msg/vertx/-/hgA-ltCE5S8J>.
>>> > To post to this group, send an email to
>>> ve...@googlegroups.com.
>>> > To unsubscribe from this group, send email to
>>> > vertx+un...@googlegroups.com.
>>> > For more options, visit this group at
>>> Tim Fox
>>>
>>> Vert.x - effortless polyglot asynchronous application
>>> development
>>> http://vertx.io <http://vertx.io/>
>>> twitter:@timfox
>>>
>>> --
>>> You received this message because you are subscribed to the
>>> Google Groups "vert.x" group.
>>> To view this discussion on the web, visit
>>> https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J
>>> <https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J>.
>>> To post to this group, send an email to ve...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> vertx+un...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/vertx?hl=en-GB
>>> <http://groups.google.com/group/vertx?hl=en-GB>.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "vert.x" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J
>> <https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/XoWhBKujrfoJ.

Tim Fox

unread,
Dec 3, 2012, 3:21:26 AM12/3/12
to ve...@googlegroups.com
On 03/12/12 04:08, stream wrote:
> i don't suggest that take vertx webHttpServer as a WebServer.
> we get a nginx as WebServer and LoadBalance.
> nginx connect to the several WebServer of Java such as Jetty Tomcat,
> here you can use VertxHttpServer.
>
> Nginx will make a Httpconnection to your Backend Server over protocol
> of Http 1.0 or Http 1.1

How is this going to help? Nginx will make just as many connections to
your backend server as if nginx wasn't there.
>
> i hope this option could help you.
>
>
> On 2012-12-3, at 上午11:12, 赵普明 <zhaop...@gmail.com
> <mailto:zhaop...@gmail.com>> wrote:
>
>>
>>
>> 在 2012年12月2日星期日UTC+8上午4时35分35秒,Tim Fox写道:
>>
>> On 01/12/2012 15:51, 赵普明 wrote:
>>>
>>>
>>> 在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
>>>
>>> On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
>>> >
>>> >
>>> > 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟
>>> 斤拷锟斤拷5时48 锟斤拷33锟 诫,Tim Fox写锟斤拷锟斤拷
>>> <http://171.120.97.95:35669/>> ESTABLISHED
>>> <https://groups.google.com/d/msg/vertx/-/hgA-ltCE5S8J>.
>>> > To post to this group, send an email to
>>> ve...@googlegroups.com.
>>> > To unsubscribe from this group, send email to
>>> > vertx+un...@googlegroups.com.
>>> > For more options, visit this group at
>>> Tim Fox
>>>
>>> Vert.x - effortless polyglot asynchronous application
>>> development
>>> http://vertx.io <http://vertx.io/>
>>> twitter:@timfox
>>>
>>> --
>>> You received this message because you are subscribed to the
>>> Google Groups "vert.x" group.
>>> To view this discussion on the web, visit
>>> https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J
>>> <https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J>.
>>> To post to this group, send an email to ve...@googlegroups.com
>>> <javascript:>.
>>> To unsubscribe from this group, send email to
>>> vertx+un...@googlegroups.com <javascript:>.
>>> For more options, visit this group at
>>> http://groups.google.com/group/vertx?hl=en-GB
>>> <http://groups.google.com/group/vertx?hl=en-GB>.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "vert.x" group.
>> To view this discussion on the web, visit
>> <mailto:ve...@googlegroups.com>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com
>> <mailto:vertx+un...@googlegroups.com>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB.
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.

Tim Fox

unread,
Dec 3, 2012, 3:25:58 AM12/3/12
to ve...@googlegroups.com
I'm going to provide you with a patch to Vert.x that I'd like you to run
on your server and report back with the results. It will contain extra
debug information to try to determine whether connections are being
closed or not.

On 03/12/12 03:12, 赵普明 wrote:
>
>
> 在 2012年12月2日星期日UTC+8上午4时35分35秒,Tim Fox写道:
>
> On 01/12/2012 15:51, 赵普明 wrote:
>>
>>
>> 在 2012年12月1日星期六UTC+8下午8时40分15秒,Tim Fox写道:
>>
>> On 01/12/12 10:46, 锟斤拷锟斤拷锟斤拷 wrote:
>> >
>> >
>> > 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤
>> 拷锟斤拷5时48锟斤拷 33锟 诫,Tim Fox写锟斤拷锟斤拷
>> <https://groups.google.com/d/msg/vertx/-/hgA-ltCE5S8J>.
>> > To post to this group, send an email to
>> ve...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > vertx+un...@googlegroups.com.
>> > For more options, visit this group at
>> Tim Fox
>>
>> Vert.x - effortless polyglot asynchronous application
>> development
>> http://vertx.io
>> twitter:@timfox
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "vert.x" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J
>> <https://groups.google.com/d/msg/vertx/-/KUQYTQ8kOM8J>.
>> To post to this group, send an email to ve...@googlegroups.com
>> <javascript:>.
>> To unsubscribe from this group, send email to
>> vertx+un...@googlegroups.com <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/vertx?hl=en-GB
>> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J.

Tim Fox

unread,
Dec 3, 2012, 4:08:58 AM12/3/12
to ve...@googlegroups.com
Ok, please apply the attached patch to master with:

git apply connection_leak_debug.patch

Run it in production and send me the output from your console (you can send this to me privately if you prefer)
connection_leak_debug.patch

Tim Fox

unread,
Dec 3, 2012, 4:15:45 AM12/3/12
to ve...@googlegroups.com
Also, when you run it, please make sure you run with only one instance
of the server.

I.e. don't do vertx run <foo> -instances 16 (or whatever) as you would
normally do on a server machine
> > https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J
> <https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J>.
> > To post to this group, send an email to ve...@googlegroups.com
> <mailto:ve...@googlegroups.com>.
> > To unsubscribe from this group, send email to
> > vertx+un...@googlegroups.com
> <mailto:vertx%2Bunsu...@googlegroups.com>.
> > For more options, visit this group at
> > http://groups.google.com/group/vertx?hl=en-GB
> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> Tim Fox
>
> Vert.x - effortless polyglot asynchronous application development
> http://vertx.io
> twitter:@timfox
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/VQLDuxRM0k8J.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at

Tim Fox

unread,
Dec 3, 2012, 6:05:06 AM12/3/12
to ve...@googlegroups.com
One other thing that springs to mind:

Are you using a RouteMatcher in your code?

If so, can you check that you are setting the noMatch handler and ending + closing the response in there too?

If not, then any keep alive connections to urls that you don't handle will leave their connection open.

This might be causes by users manually keying the wrong url, also browers commonly send requests to /favicon.ico

>     > For more options, visit this group at
>     > http://groups.google.com/group/vertx?hl=en-GB
>     <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
>     --
>     Tim Fox
>
>     Vert.x - effortless polyglot asynchronous application development
>     http://vertx.io
>     twitter:@timfox
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/VQLDuxRM0k8J.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to

Tim Fox

unread,
Dec 4, 2012, 2:16:53 AM12/4/12
to ve...@googlegroups.com
Any feedback?
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/VQLDuxRM0k8J.

To post to this group, send an email to ve...@googlegroups.com.
To unsubscribe from this group, send email to vertx+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/vertx?hl=en-GB.


赵普明

unread,
Dec 4, 2012, 3:14:59 AM12/4/12
to ve...@googlegroups.com
Hi Tim:

Sorry I was not able to connect to the proxy server I was using in the last two days and was not able to log into google groups, which was blocked by Chinese government :-(.

We've found a work-around to get rid of this problem by using netty's IdleStateHandler, closing any connections that is not reading/writing for 30 seconds. Combined with
response.close(), now the connections are not leaking anymore.

I've tested with a simple netty program and it seems to be leaking connections as well. I'd like with your patch, but
unfortunately, with the deadline coming quick, we have to deal with other problems at this time.
I'll comeback and test your patch when we solved other problems and system online, :-)

Thank you very much for your efforts and expertise helping us. Without your guidance I would not be able to solve this problem :-)

在 2012年12月3日星期一UTC+8下午4时25分58秒,Tim Fox写道:

赵普明

unread,
Dec 4, 2012, 3:20:58 AM12/4/12
to ve...@googlegroups.com


在 2012年12月3日星期一UTC+8下午7时05分06秒,Tim Fox写道:
One other thing that springs to mind:

Are you using a RouteMatcher in your code?

If so, can you check that you are setting the noMatch handler and ending + closing the response in there too?

I'm using a RouteMatcher, and noMatch is not set. We have a sendFile handler that matches "/.*" at the end of routes. the code looks like

    @Override
   
public void handle(HttpServerRequest req) {
       
String path = req.path;
       
if ('/' == path.charAt(0)) {
            path
= path.substring(1);
       
}
       
String allpath = Paths.concat(this.rootDir, path);
       
HttpServerResponse response = req.response;
        response
.putHeader("Expires", "Thu Jan 01 2099 00:00:00 GMT");
        response
.sendFile(allpath);
        response
.close();
   
}

where Paths.concat is a utility function that concat paths.

I don't know how response.sendFile() treats paths that have no matching file.


That might really be the cause of our problem. I'll test that later.

 

If not, then any keep alive connections to urls that you don't handle will leave their connection open.

So if there are not matched routers for a connection, wouldn't it be better to just close it? keep-alive does not seem useful here.
 
>     > For more options, visit this group at
>     > http://groups.google.com/group/vertx?hl=en-GB
>     <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
>     --
>     Tim Fox
>
>     Vert.x - effortless polyglot asynchronous application development
>     http://vertx.io
>     twitter:@timfox
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/VQLDuxRM0k8J.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at

Tim Fox

unread,
Dec 4, 2012, 3:48:27 AM12/4/12
to ve...@googlegroups.com
On 04/12/2012 08:20, 赵普明 wrote:


在 2012年12月3日星期一UTC+8下午7时05分06秒,Tim Fox写道:
One other thing that springs to mind:

Are you using a RouteMatcher in your code?

If so, can you check that you are setting the noMatch handler and ending + closing the response in there too?

I'm using a RouteMatcher, and noMatch is not set. We have a sendFile handler that matches "/.*" at the end of routes.

I'm not sure what you mean by this. Can you elaborate?

>     >> > 锟斤拷 2012锟斤拷12锟斤拷1锟斤拷锟斤拷锟斤拷锟斤拷UTC+8 锟 斤
To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/OvI44vfJk98J.

Tim Fox

unread,
Dec 4, 2012, 3:50:32 AM12/4/12
to ve...@googlegroups.com


On Tuesday, December 4, 2012 8:14:59 AM UTC, 赵普明 wrote:
Hi Tim:

Sorry I was not able to connect to the proxy server I was using in the last two days and was not able to log into google groups, which was blocked by Chinese government :-(.

We've found a work-around to get rid of this problem by using netty's IdleStateHandler, closing any connections that is not reading/writing for 30 seconds. Combined with
response.close(), now the connections are not leaking anymore.

I've tested with a simple netty program and it seems to be leaking connections as well. I'd like with your patch, but
unfortunately, with the deadline coming quick, we have to deal with other problems at this time.

I strongly recommend applying the patch anyway, along with your workarounds. The patch won't have any performance penalty, it just logs to stdout and it will provide invaluable information for diagnosing the real problem.

Tim Fox

unread,
Dec 4, 2012, 6:06:54 AM12/4/12
to ve...@googlegroups.com


On Tuesday, December 4, 2012 8:20:58 AM UTC, 赵普明 wrote:


在 2012年12月3日星期一UTC+8下午7时05分06秒,Tim Fox写道:
One other thing that springs to mind:

Are you using a RouteMatcher in your code?

If so, can you check that you are setting the noMatch handler and ending + closing the response in there too?

I'm using a RouteMatcher, and noMatch is not set. We have a sendFile handler that matches "/.*" at the end of routes.

Reading between the lines again. A couple more thoughts.

Is your "/.*" handle matching all request types (i.e. GET, POST, DELETE etc) ?, I.e. are you using the method allWithRegEx, e.g. ?

rm.allWithRegEx("/.*", new Handler<HttpServerRequest>() {
      public void handle(HttpServerRequest req) {
      }
});

If you don't use allWithRegEx but just use getWithRegEx then any POST/PUTs etc will cause a connection leak.

Also... the regex '/.*' will only match uris which start with '/'. If a browser/proxy/malicious client sends you an absolute url or some other crafted url you also have a connection leak (and potential DoS vector)

The only safe way to avoid this is to provide a noMatch handler. I recommend the following:

rm.noMatch(new Handler<HttpServerRequest>() {
      @Override
      public void handle(HttpServerRequest req) {
        System.out.println("No match called with: " + req.uri);
        req.response.statusCode = 404; // Or could use a 403
        req.response.end();
        req.response.close();
      }
    });

Without a noMatch handler there is certainly potential for leaks and DoS attacks.

Tim Fox

unread,
Dec 4, 2012, 6:08:30 AM12/4/12
to ve...@googlegroups.com
On 04/12/12 08:14, 赵普明 wrote:
> Hi Tim:
>
> Sorry I was not able to connect to the proxy server I was using in the
> last two days and was not able to log into google groups, which was
> blocked by Chinese government :-(.
>
> We've found a work-around to get rid of this problem by using netty's
> IdleStateHandler, closing any connections that is not reading/writing
> for 30 seconds. Combined with
> response.close(), now the connections are not leaking anymore.
>
> I've tested with a simple netty program and it seems to be leaking
> connections as well. I'd like with your patch, but
> unfortunately, with the deadline coming quick, we have to deal with
> other problems at this time.
> I'll comeback and test your patch when we solved other problems and
> system online, :-)
>
> Thank you very much for your efforts and expertise helping us. Without
> your guidance I would not be able to solve this problem :-)

No problem! Happy to serve.
> > https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J
> <https://groups.google.com/d/msg/vertx/-/2_mL7kUEOD4J>.
> > To post to this group, send an email to ve...@googlegroups.com
> <javascript:>.
> > To unsubscribe from this group, send email to
> > vertx+un...@googlegroups.com <javascript:>.
> > For more options, visit this group at
> > http://groups.google.com/group/vertx?hl=en-GB
> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
> Tim Fox
>
> Vert.x - effortless polyglot asynchronous application development
> http://vertx.io
> twitter:@timfox
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/Kzgt9N3Q_loJ.

Tim Fox

unread,
Dec 4, 2012, 8:14:12 AM12/4/12
to ve...@googlegroups.com


On Tuesday, December 4, 2012 8:20:58 AM UTC, 赵普明 wrote:


在 2012年12月3日星期一UTC+8下午7时05分06秒,Tim Fox写道:
One other thing that springs to mind:

Are you using a RouteMatcher in your code?

If so, can you check that you are setting the noMatch handler and ending + closing the response in there too?

I'm using a RouteMatcher, and noMatch is not set. We have a sendFile handler that matches "/.*" at the end of routes. the code looks like

    @Override
   
public void handle(HttpServerRequest req) {
       
String path = req.path;
       
if ('/' == path.charAt(0)) {
            path
= path.substring(1);
       
}
       
String allpath = Paths.concat(this.rootDir, path);
       
HttpServerResponse response = req.response;
        response
.putHeader("Expires", "Thu Jan 01 2099 00:00:00 GMT");
        response
.sendFile(allpath);
        response
.close();
   
}

where Paths.concat is a utility function that concat paths.

I don't know how response.sendFile() treats paths that have no matching file.


That might really be the cause of our problem. I'll test that later.

 

If not, then any keep alive connections to urls that you don't handle will leave their connection open.

So if there are not matched routers for a connection, wouldn't it be better to just close it? keep-alive does not seem useful here.

Not necessarily. For example, browsers often make a request to /favicon.ico when making the first request for a domain. If the connection is keep alive then it makes sense not to close the connection in the favicon request so the browser can reuse when making the connection for the main request. If the connection is closed this makes the browsing experience slower for the user.

Gavin Alves

unread,
Jul 31, 2013, 4:43:35 AM7/31/13
to ve...@googlegroups.com
Hi Puming,

Did you make any progress on this? I'm going through the same heartache now.  By any chance are you using HAProxy/Loadbanalancer.org etc in front of your HTTP server?

My current theory is that because HAProxy does not support Keep Alive on the back end, it tells the client to close the connection (which it does), however it does not send a FIN back to the server and the socket remains in an ESTABLISHED state.  I will try to do a packet capture and observe exactly what is happening.

Cheers,

Gavin

Todd Rader

unread,
Aug 20, 2014, 2:23:50 PM8/20/14
to ve...@googlegroups.com
I ran into an issue like this, and found that it was by blindly copying the example in the Vert.x docs that I caused this.

My app made HTTP calls from one process to another, making about 100 or more per minute.  For each call, I originally coded it like this:

vertx.createHttpClient().setHost(...).setPort(...).getNow(...)

I did that because that's how the example in the docs looks....but of course that creates hundreds of HTTP clients, all making connections!  I really only needed one client.  As soon as I created the client in my start() method of my Verticle and started re-using it, the problem went away.
Reply all
Reply to author
Forward
0 new messages