Breaking the 64511 concurrent tcp sessions barrier

1,359 views
Skip to first unread message

Paul van Brouwershaven

unread,
Aug 16, 2013, 6:45:40 AM8/16/13
to golan...@googlegroups.com
I'm currently trying to break the 64511 concurrent tcp sessions barrier in my GO application.

I have optimized my kernel for high tcp concurrency as suggested by Richard Jones on his blog back in 2008 (see metabrew.com blog below). Unfortunately it doesn't have any effect on the number of concurrent sessions.

As you have only 64511 unprivileged ports available per IP address I have added several additional IP addresses to my system and made go loop though these addresses. When I use a LocalAddr with port 0 go directly starts complaining with "address already in use" as soon I pass the +/- 64511 connections. So I tied to loop over my ip range in combination with a specified port for the LocalAddr. Unfortunately this doesn't improve the number of concurrent connections.

1 Million TCP connections (Linux):

2 Million TCP connections (FreeBSD):
http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/

I'm running a Ubuntu Server with a 3.2 kernel.

My connection handling code:

RETRY:

// get the firs source ip:port address in queue 
ipNext = <-iprange

// manage our on port numbers per ip to overcome the 64511 limit
host, port, _ := net.SplitHostPort(ipNext)
nextPort, _ := strconv.Atoi(port)
nextPort++
if nextPort > 65535 {
nextPort = 1024
}
// add this address with the next port number back to the end of the queue
iprange <- host +":"+ strconv.Itoa(nextPort)
   
d := net.Dialer{Timeout: 1*time.Second, Deadline: time.Now().Add(2*time.Second)}
d.LocalAddr, err = net.ResolveTCPAddr("tcp4", ipNext)
if err != nil {
        log.Fatal(err)
}
dc, err := d.Dial("tcp", dst)

if err != nil && strings.Contains(err.Error(), "address already in use") {
goto RETRY
}

Any suggestions on how to pass this barrier?

Thanks,

Paul

Kyle Lemons

unread,
Aug 16, 2013, 2:02:50 PM8/16/13
to Paul van Brouwershaven, golang-nuts
On Fri, Aug 16, 2013 at 3:45 AM, Paul van Brouwershaven <pa...@vanbrouwershaven.com> wrote:
I'm currently trying to break the 64511 concurrent tcp sessions barrier in my GO application.

I have optimized my kernel for high tcp concurrency as suggested by Richard Jones on his blog back in 2008 (see metabrew.com blog below). Unfortunately it doesn't have any effect on the number of concurrent sessions.

As you have only 64511 unprivileged ports available per IP address I have added several additional IP addresses to my system and made go loop though these addresses. When I use a LocalAddr with port 0 go directly starts complaining with "address already in use" as soon I pass the +/- 64511 connections.

This is probably something that can be fixed; I'd file an issue.
 
So I tied to loop over my ip range in combination with a specified port for the LocalAddr. Unfortunately this doesn't improve the number of concurrent connections.

What is the failure mode when you do this?  Do you still get "address already in use" after 64511?  Do you know that your router can handle NAT with one (physical) port having more than 65536 source ports?  Does the tcpdump have any hints?
 

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul van Brouwershaven

unread,
Aug 17, 2013, 3:43:44 AM8/17/13
to golan...@googlegroups.com, Paul van Brouwershaven, Kyle Lemons
On Friday, 16 August 2013 20:02:50 UTC+2, Kyle Lemons wrote:
On Fri, Aug 16, 2013 at 3:45 AM, Paul van Brouwershaven <pa...@vanbrouwershaven.com> wrote:
I'm currently trying to break the 64511 concurrent tcp sessions barrier in my GO application.

I have optimized my kernel for high tcp concurrency as suggested by Richard Jones on his blog back in 2008 (see metabrew.com blog below). Unfortunately it doesn't have any effect on the number of concurrent sessions.

As you have only 64511 unprivileged ports available per IP address I have added several additional IP addresses to my system and made go loop though these addresses. When I use a LocalAddr with port 0 go directly starts complaining with "address already in use" as soon I pass the +/- 64511 connections.

This is probably something that can be fixed; I'd file an issue.
 
 
 
So I tied to loop over my ip range in combination with a specified port for the LocalAddr. Unfortunately this doesn't improve the number of concurrent connections.

What is the failure mode when you do this?  Do you still get "address already in use" after 64511?  Do you know that your router can handle NAT with one (physical) port having more than 65536 source ports?  Does the tcpdump have any hints?

Basically you can't have more than 65535 ports, so that's the reason I'm trying to use 64511 (65535-1024) ports per IP address. This would mean that you have 1.0.0.1:65000 and 1.0.0.2:65000 having a connection at the same time (as advised by Richard Jones). Unfortunately this gives me the same "address already in use" error message as when I specify port 0 (auto).

I'm counting the number of TCP sessions with:
netstat -n | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

tcpdump is not giving me any hints, I will send you a small dump off list.

Oleku Konko

unread,
Aug 17, 2013, 4:54:01 AM8/17/13
to golan...@googlegroups.com
This is very Interesting and i also think its an issue that needs to be fixed. Better still add IP pool as a default feature net & http package.

+1 nice links  


Kyle Lemons

unread,
Aug 17, 2013, 2:33:36 PM8/17/13
to Paul van Brouwershaven, golang-nuts
On Sat, Aug 17, 2013 at 12:43 AM, Paul van Brouwershaven <pa...@vanbrouwershaven.com> wrote:
On Friday, 16 August 2013 20:02:50 UTC+2, Kyle Lemons wrote:

On Fri, Aug 16, 2013 at 3:45 AM, Paul van Brouwershaven <pa...@vanbrouwershaven.com> wrote:
I'm currently trying to break the 64511 concurrent tcp sessions barrier in my GO application.

I have optimized my kernel for high tcp concurrency as suggested by Richard Jones on his blog back in 2008 (see metabrew.com blog below). Unfortunately it doesn't have any effect on the number of concurrent sessions.

As you have only 64511 unprivileged ports available per IP address I have added several additional IP addresses to my system and made go loop though these addresses. When I use a LocalAddr with port 0 go directly starts complaining with "address already in use" as soon I pass the +/- 64511 connections.

This is probably something that can be fixed; I'd file an issue.
 
 
 
So I tied to loop over my ip range in combination with a specified port for the LocalAddr. Unfortunately this doesn't improve the number of concurrent connections.

What is the failure mode when you do this?  Do you still get "address already in use" after 64511?  Do you know that your router can handle NAT with one (physical) port having more than 65536 source ports?  Does the tcpdump have any hints?

Basically you can't have more than 65535 ports, so that's the reason I'm trying to use 64511 (65535-1024) ports per IP address. This would mean that you have 1.0.0.1:65000 and 1.0.0.2:65000 having a connection at the same time (as advised by Richard Jones). Unfortunately this gives me the same "address already in use" error message as when I specify port 0 (auto).

Right, I understand the approach.  However, it seems likely that a (commodity) router might make the inappropriate assumption that one single (physical) port will have only one IP address, and only build in a NAT table that has enough rows to accommodate the 65535 source ports per (physical) port.  If this hypothesis were true, adding more IPs wouldn't improve the total number of TCP flows you can maintain, but adding a second IP on a second interface connected to a second port on the router would double the effective number of flows you could maintain.

Benjamin Measures

unread,
Aug 17, 2013, 6:16:31 PM8/17/13
to golan...@googlegroups.com
On Friday, 16 August 2013 11:45:40 UTC+1, Paul van Brouwershaven wrote:
I'm currently trying to break the 64511 concurrent tcp sessions barrier in my GO application.
[...]

Note those two articles are not about the same thing. Make sure you're not confusing the difference between outgoing tcp connections and incoming tcp connections.

If you just want a 1M tcp connections badge, do it with incoming connections.

[...] I have added several additional IP addresses to my system and made go loop though these addresses. When I use a LocalAddr with port 0 go directly starts complaining with "address already in use" as soon I pass the +/- 64511 connections. So I tied to loop over my ip range in combination with a specified port for the LocalAddr. Unfortunately this doesn't improve the number of concurrent connections.

Does "doesn't improve" mean you get the same error, or something else?
 
My connection handling code:

RETRY:

// get the firs source ip:port address in queue 
ipNext = <-iprange

// manage our on port numbers per ip to overcome the 64511 limit
host, port, _ := net.SplitHostPort(ipNext)
nextPort, _ := strconv.Atoi(port)
nextPort++
if nextPort > 65535 {
nextPort = 1024
}
// add this address with the next port number back to the end of the queue
iprange <- host +":"+ strconv.Itoa(nextPort)
   
d := net.Dialer{Timeout: 1*time.Second, Deadline: time.Now().Add(2*time.Second)}
d.LocalAddr, err = net.ResolveTCPAddr("tcp4", ipNext)
if err != nil {
        log.Fatal(err)
}
dc, err := d.Dial("tcp", dst)

if err != nil && strings.Contains(err.Error(), "address already in use") {
goto RETRY
}

I don't see where the host (for ipNext, for LocalAddr) gets incremented. Based on the above, I can only presume it doesn't and this would explain the behaviour you're seeing.

Personally, I'd go for a simpler approach. For each local IP address launch a goroutine that dials out repeatedly (with exponential backoff, if you will). I wouldn't bother with trying to manage ephemeral ports either - just keep dialing independently on each IP and whatever limit you hit will probably be the ephemeral limit anyways.

Paul van Brouwershaven

unread,
Aug 18, 2013, 3:42:18 AM8/18/13
to golan...@googlegroups.com, Paul van Brouwershaven
On Saturday, 17 August 2013 20:33:36 UTC+2, Kyle Lemons wrote:
Right, I understand the approach.  However, it seems likely that a (commodity) router might make the inappropriate assumption that one single (physical) port will have only one IP address, and only build in a NAT table that has enough rows to accommodate the 65535 source ports per (physical) port.  If this hypothesis were true, adding more IPs wouldn't improve the total number of TCP flows you can maintain, but adding a second IP on a second interface connected to a second port on the router would double the effective number of flows you could maintain.

The server is located in a data center behind a first class router, in the past we were located in a different data center that indeed couldn't handle the traffic. To verify I contacted the network operations team and confirmed that the router is not even seeing an increase in memory or CPU usage.

Paul van Brouwershaven

unread,
Aug 18, 2013, 3:51:12 AM8/18/13
to golan...@googlegroups.com
On Sunday, 18 August 2013 00:16:31 UTC+2, Benjamin Measures wrote:
On Friday, 16 August 2013 11:45:40 UTC+1, Paul van Brouwershaven wrote:
I'm currently trying to break the 64511 concurrent tcp sessions barrier in my GO application.
[...]
1 Million TCP connections (Linux):
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3
2 Million TCP connections (FreeBSD):
http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/

Note those two articles are not about the same thing. Make sure you're not confusing the difference between outgoing tcp connections and incoming tcp connections.

That's a really good point!
 
If you just want a 1M tcp connections badge, do it with incoming connections.

Unfortunately I need to make outgoing connections. 

[...] I have added several additional IP addresses to my system and made go loop though these addresses. When I use a LocalAddr with port 0 go directly starts complaining with "address already in use" as soon I pass the +/- 64511 connections. So I tried to loop over my ip range in combination with a specified port for the LocalAddr. Unfortunately this doesn't improve the number of concurrent connections.

Does "doesn't improve" mean you get the same error, or something else?

I get the same "address already in use" error, I will try to make a small test program that I can share as soon I have a few minutes.
 
I don't see where the host (for ipNext, for LocalAddr) gets incremented. Based on the above, I can only presume it doesn't and this would explain the behaviour you're seeing.

Personally, I'd go for a simpler approach. For each local IP address launch a goroutine that dials out repeatedly (with exponential backoff, if you will). I wouldn't bother with trying to manage ephemeral ports either - just keep dialing independently on each IP and whatever limit you hit will probably be the ephemeral limit anyways.
 
ipNext is a go channel, "ipNext = <-iprange" read one IP address of the channel, "iprange <- host +":"+ strconv.Itoa(nextPort)" adds that same IP address back to the channel but with the next port in range. 

Dave Cheney

unread,
Aug 18, 2013, 4:05:25 AM8/18/13
to Paul van Brouwershaven, golang-nuts
> Unfortunately I need to make outgoing connections.

Please try using net.DialTCP to control the source address you connect
from. You can get the list of available addresses from the net package
using InterfaceAddrs(). You should construct a *net.TCPAddr for the
source address with one of the IP addresses assigned to your outgoing
interface and the port set to 0 to allow the operating system to
choose an ephemeral outgoing port.

Paul van Brouwershaven

unread,
Aug 18, 2013, 4:35:55 AM8/18/13
to Dave Cheney, golang-nuts
On Sun, Aug 18, 2013 at 10:05 AM, Dave Cheney <da...@cheney.net> wrote:
Please try using net.DialTCP to control the source address you connect
from. You can get the list of available addresses from the net package
using InterfaceAddrs(). You should construct a *net.TCPAddr for the
source address with one of the IP addresses assigned to your outgoing
interface and the port set to 0 to allow the operating system to
choose an ephemeral outgoing port.

I'm using the new net.Dialer which is much better than net.DialTCP as you can also specify a timeout and a local address. In the past I used a customized net.DialTCP to get to the same functionality.

Dave Cheney

unread,
Aug 18, 2013, 4:46:53 AM8/18/13
to Paul van Brouwershaven, golang-nuts
Can you please post executable sample code.

Paul van Brouwershaven

unread,
Aug 19, 2013, 7:00:28 AM8/19/13
to golan...@googlegroups.com, Paul van Brouwershaven
On Sunday, 18 August 2013 10:46:53 UTC+2, Dave Cheney wrote:
Can you please post executable sample code.

Here is a demo code that is running 60k concurrent connections with a timeout of 2 seconds.


Please note that you need to increase your file descriptors as below and you a multicore system to properly run this code!

sysctl -w fs.file-max=999999
ulimit -n `cat /proc/sys/fs/file-max`

The code will print a status output like below every 5 seconds:

2013/08/19 10:51:33 GO: 60007 || TCP: ESTABLISHED 4 SYN_SENT 28232   || ERROR: dial tcp 10.2.149.179:80: address already in use

/tmp/count.sh contains a count on current open tcp sessions:

#!/bin/sh

Dave Cheney

unread,
Aug 19, 2013, 7:35:11 AM8/19/13
to Paul van Brouwershaven, golan...@googlegroups.com, Paul van Brouwershaven
Please try running your program under the race detector. I can see at least one data race, on lasterror which may be confusing your results. 


Paul van Brouwershaven

unread,
Aug 19, 2013, 9:06:04 AM8/19/13
to Dave Cheney, golan...@googlegroups.com
On Mon, Aug 19, 2013 at 1:35 PM, Dave Cheney <da...@cheney.net> wrote:
Please try running your program under the race detector. I can see at least one data race, on lasterror which may be confusing your results. 

The race detector won't run with this number of routines, and with a low number of routines it doesn't report any errors.

"lasterror" was the only variable accessed across routines, so I simply removed it as this is only a program that demonstrates the error.
 
http://play.golang.org/p/wXxX_nIOk-


Dave Cheney

unread,
Aug 19, 2013, 9:38:44 AM8/19/13
to Paul van Brouwershaven, golan...@googlegroups.com
I don't understand why you need to use so many goroutines, why not just have one per source IP address? You will also need one goroutine per accepting IP address. 


Paul van Brouwershaven

unread,
Aug 19, 2013, 9:51:28 AM8/19/13
to Dave Cheney, golan...@googlegroups.com
That would slowdown the program as I would only have a few concurrent requests (as many as source IP addresses I have).

Benjamin Measures

unread,
Aug 19, 2013, 8:24:36 PM8/19/13
to golan...@googlegroups.com, Paul van Brouwershaven
On Monday, 19 August 2013 12:00:28 UTC+1, Paul van Brouwershaven wrote:
Here is a demo code that is running 60k concurrent connections with a timeout of 2 seconds.

The code will print a status output like below every 5 seconds:
2013/08/19 10:51:33 GO: 60007 || TCP: ESTABLISHED 4 SYN_SENT 28232   || ERROR: dial tcp 10.2.149.179:80: address already in use
 
Your goroutines are closing the connection almost immediately after dialing:
dconn, err := d.Dial("tcp", ipaddr+":80")
if err != nil {
  [...]
}
defer dconn.Close()

The closing of the connections would explain why you can never get >60k connections ESTABLISHED (or SYN_SENT).

Furthermore, closing the connections will cause the tcp socket to go into TIME_WAIT state (for up to 60s by default on Linux), during which time you cannot rebind the socket. Eventually, you'll just run out of ports and this will happen far sooner than you'd expect from looking at the ESTABLISHED connections alone.


BTW, it seems you're dialing out to a different ip:port each time - technically, connections are identified by the tuple proto,laddr,lport,daddr,dport. So, theoretically, you could create multiple connections (think >1M) bound to the same laddr,lport. SO_REUSEPORT was not long added to Linux though, so it may not be in Go yet: http://grokbase.com/t/gg/golang-dev/13373v3nfm/net-so-reuseport-in-linux-3-9

Dave Cheney

unread,
Aug 19, 2013, 8:42:39 PM8/19/13
to Paul van Brouwershaven, golan...@googlegroups.com
> That would slowdown the program as I would only have a few concurrent
> requests (as many as source IP addresses I have).

Well, it's just a test, right ? I think it's more important to get
your test right, rather than complete fast and show the wrong result.
Also you will need multiple listeners bound to separate IP address to
get more than 65k possible sockets.

Paul van Brouwershaven

unread,
Aug 20, 2013, 2:29:26 AM8/20/13
to Benjamin Measures, golang-nuts
On Tue, Aug 20, 2013 at 2:24 AM, Benjamin Measures <saint....@gmail.com> wrote:
Your goroutines are closing the connection almost immediately after dialing:
dconn, err := d.Dial("tcp", ipaddr+":80")
if err != nil {
  [...]
}
defer dconn.Close()

It are very short connections, but they don't close immediately. The "defer dconn.Close()" closes the connection when the function returns, I'm not downloading anything in the example program but the timing is the same.
 
The closing of the connections would explain why you can never get >60k connections ESTABLISHED (or SYN_SENT).
Furthermore, closing the connections will cause the tcp socket to go into TIME_WAIT state (for up to 60s by default on Linux), during which time you cannot rebind the socket. Eventually, you'll just run out of ports and this will happen far sooner than you'd expect from looking at the ESTABLISHED connections alone.

Most of the connection are in SYN_SENT status (waiting on connection), I don't have issues with TIME_WAIT. I'm running in high concurrency and it's no problem to start more than > 60k connections per second, except that I'm getting "address already in use" errors as soon I go over the +/- 64511 limit.

TIME_WAIT 2917
CLOSE_WAIT 5
FIN_WAIT1 2049
SYN_SENT 54926
ESTABLISHED 836
FIN_WAIT2 1063
CLOSING 13
LAST_ACK 61 

BTW, it seems you're dialing out to a different ip:port each time - technically, connections are identified by the tuple proto,laddr,lport,daddr,dport. So, theoretically, you could create multiple connections (think >1M) bound to the same laddr,lport. SO_REUSEPORT was not long added to Linux though, so it may not be in Go yet: http://grokbase.com/t/gg/golang-dev/13373v3nfm/net-so-reuseport-in-linux-3-9

Sounds interesting but is this not for incoming connections only?

"For TCP, so_reuseport allows multiple listener sockets to be bound to the same port."

Paul van Brouwershaven

unread,
Aug 20, 2013, 2:36:10 AM8/20/13
to Dave Cheney, golan...@googlegroups.com
I'm running with a high concurrency to get to the 60k+ connections, running with less concurrency would cause the program to make less connections or keep the connection open longer than necessary. 

The race detector is not reporting any issues at a low concurrency, but I removed the lasterror that could likely cause an issue anyway.

Is there anything else that you would like me to do?

Dave Cheney

unread,
Aug 20, 2013, 3:04:13 AM8/20/13
to Paul van Brouwershaven, golan...@googlegroups.com
> I'm running with a high concurrency to get to the 60k+ connections, running
> with less concurrency would cause the program to make less connections or
> keep the connection open longer than necessary.

Why do you need to do that ? If you want to test how many connections
you can have open in total, then open the connection, stick it in a
map (so it doesn't get garbage collected), and do it again. When you
start getting errors, move on to the next source IP.

Kyle Lemons

unread,
Aug 20, 2013, 3:18:30 AM8/20/13
to Paul van Brouwershaven, Benjamin Measures, golang-nuts
On Mon, Aug 19, 2013 at 11:29 PM, Paul van Brouwershaven <pa...@vanbrouwershaven.com> wrote:
On Tue, Aug 20, 2013 at 2:24 AM, Benjamin Measures <saint....@gmail.com> wrote:
Your goroutines are closing the connection almost immediately after dialing:
dconn, err := d.Dial("tcp", ipaddr+":80")
if err != nil {
  [...]
}
defer dconn.Close()

It are very short connections, but they don't close immediately. The "defer dconn.Close()" closes the connection when the function returns, I'm not downloading anything in the example program but the timing is the same.
 
The closing of the connections would explain why you can never get >60k connections ESTABLISHED (or SYN_SENT).
Furthermore, closing the connections will cause the tcp socket to go into TIME_WAIT state (for up to 60s by default on Linux), during which time you cannot rebind the socket. Eventually, you'll just run out of ports and this will happen far sooner than you'd expect from looking at the ESTABLISHED connections alone.

Most of the connection are in SYN_SENT status (waiting on connection), I don't have issues with TIME_WAIT. I'm running in high concurrency and it's no problem to start more than > 60k connections per second, except that I'm getting "address already in use" errors as soon I go over the +/- 64511 limit.

TIME_WAIT 2917
CLOSE_WAIT 5
FIN_WAIT1 2049
SYN_SENT 54926
ESTABLISHED 836
FIN_WAIT2 1063
CLOSING 13
LAST_ACK 61 

I don't think you can really say you have 1M TCP connections unless you have 1M that say ESTABLISHED there.  I'd work on getting to 64511 ESTABLISHED first, and then move on to multiple IPs.  It still looks to me like some router somewhere isn't able to actually route enough connections for you, thus tons of SYN_SENT -- no SYN/ACK, because the ACK can't be routed back to you.
 
BTW, it seems you're dialing out to a different ip:port each time - technically, connections are identified by the tuple proto,laddr,lport,daddr,dport. So, theoretically, you could create multiple connections (think >1M) bound to the same laddr,lport. SO_REUSEPORT was not long added to Linux though, so it may not be in Go yet: http://grokbase.com/t/gg/golang-dev/13373v3nfm/net-so-reuseport-in-linux-3-9

Sounds interesting but is this not for incoming connections only?

"For TCP, so_reuseport allows multiple listener sockets to be bound to the same port."

Dave Cheney

unread,
Aug 20, 2013, 3:26:03 AM8/20/13
to Kyle Lemons, Paul van Brouwershaven, Benjamin Measures, golang-nuts
> I don't think you can really say you have 1M TCP connections unless you have
> 1M that say ESTABLISHED there. I'd work on getting to 64511 ESTABLISHED
> first, and then move on to multiple IPs. It still looks to me like some
> router somewhere isn't able to actually route enough connections for you,
> thus tons of SYN_SENT -- no SYN/ACK, because the ACK can't be routed back to
> you.

Also, check dmesg on the server side. If you see things like SYN
cookies enabled, it means your OS thinks it is under attack and will
be slowing down accepting connections from that source IP.

Paul van Brouwershaven

unread,
Aug 20, 2013, 4:08:39 AM8/20/13
to Kyle Lemons, Benjamin Measures, golang-nuts
On Tue, Aug 20, 2013 at 9:18 AM, Kyle Lemons <kev...@google.com> wrote:
I don't think you can really say you have 1M TCP connections unless you have 1M that say ESTABLISHED there.  I'd work on getting to 64511 ESTABLISHED first, and then move on to multiple IPs.  It still looks to me like some router somewhere isn't able to actually route enough connections for you, thus tons of SYN_SENT -- no SYN/ACK, because the ACK can't be routed back to you.

That would be an option, although I'm connection to 'random' addresses in the private IP space. I know that most of these connection will never get established.

Paul van Brouwershaven

unread,
Aug 20, 2013, 5:19:44 AM8/20/13
to Dave Cheney, golan...@googlegroups.com
I updated the program to run with a map, instead of using a goroutine and I also changed that I only move to a new ip address as soon I'm receiving an error message.

The problem is that I'm not getting more concurrent tcp sessions than the number of routines I use. I tested for race conditions again and did not found any issues.

You can find the updated test program here:

When running with 64000 routines, I get slightly over the 64511 (64542) but this keeps under the 65535 limit.

2013/08/20 08:55:54 sockets: used 683 TCP: inuse 590 orphan 31 tw 2 alloc 592 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 08:56:17 sockets: used 64677 TCP: inuse 64503 orphan 31 tw 2 alloc 64586 mem 772 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 08:56:37 sockets: used 64680 TCP: inuse 64484 orphan 31 tw 0 alloc 64589 mem 772 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 1 memory 1480
2013/08/20 08:57:17 sockets: used 64715 TCP: inuse 64521 orphan 31 tw 0 alloc 64624 mem 772 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 08:57:46 sockets: used 64723 TCP: inuse 64542 orphan 31 tw 0 alloc 64632 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 1 memory 1480
2013/08/20 08:58:30 sockets: used 64407 TCP: inuse 62746 orphan 31 tw 0 alloc 64316 mem 772 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0

When I run the test program with 64100 routines I can even get to 64632 but soon run into the error "address already in use" error.

2013/08/20 09:07:33 sockets: used 708 TCP: inuse 614 orphan 30 tw 0 alloc 616 mem 770 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 09:08:04 sockets: used 64807 TCP: inuse 64632 orphan 30 tw 4 alloc 64715 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 09:08:23 *.*.*.1 -> 10.5.175.151 || dial tcp 10.5.175.151:80: address already in use
2013/08/20 09:08:32 *.*.*.2 -> 10.5.175.134 || dial tcp 10.5.175.134:80: address already in use
2013/08/20 09:08:32 *.*.*.3 -> 10.5.175.157 || dial tcp 10.5.175.157:80: address already in use
2013/08/20 09:08:32 *.*.*.4 -> 10.5.175.161 || dial tcp 10.5.175.161:80: address already in use
2013/08/20 09:08:32 *.*.*.5 -> 10.5.175.149 || dial tcp 10.5.175.149:80: address already in use
2013/08/20 09:08:32 *.*.*.6 -> 10.5.175.166 || dial tcp 10.5.175.166:80: address already in use
2013/08/20 09:08:32 *.*.*.7 -> 10.5.175.146 || dial tcp 10.5.175.146:80: address already in use




Kyle Lemons

unread,
Aug 20, 2013, 12:49:24 PM8/20/13
to Paul van Brouwershaven, Dave Cheney, golan...@googlegroups.com
Are you getting these address already in use by coming up with the ports yourself or by letting the OS choose them? 

Paul van Brouwershaven

unread,
Aug 20, 2013, 1:22:30 PM8/20/13
to Kyle Lemons, golang-nuts, Dave Cheney


> Are you getting these address already in use by coming up with the ports yourself or by letting the OS choose them? 

I'm letting the OS select the ports now. But I get the same error when I make the selection.

Benjamin Measures

unread,
Aug 20, 2013, 7:13:38 PM8/20/13
to golan...@googlegroups.com, Dave Cheney
On Tuesday, 20 August 2013 10:19:44 UTC+1, Paul van Brouwershaven wrote:
2013/08/20 08:57:46 sockets: used 64723 TCP: inuse 64542 orphan 31 tw 0 alloc 64632 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 1 memory 1480
2013/08/20 08:58:30 sockets: used 64407 TCP: inuse 62746 orphan 31 tw 0 alloc 64316 mem 772 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
 
The TCP sockets inuse has decreased. This would suggest sockets are being closed.

When I run the test program with 64100 routines I can even get to 64632 but soon run into the error "address already in use" error.
2013/08/20 09:07:33 sockets: used 708 TCP: inuse 614 orphan 30 tw 0 alloc 616 mem 770 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 09:08:04 sockets: used 64807 TCP: inuse 64632 orphan 30 tw 4 alloc 64715 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 09:08:23 *.*.*.1 -> 10.5.175.151 || dial tcp 10.5.175.151:80: address already in use

Using more goroutines will cause connections to be opened (and closed) at a greater rate. That the error doesn't appear until ~20s after peak/high inuse, would support the CLOSE_WAIT issue as outlined earlier in this thread.

Dave Cheney

unread,
Aug 21, 2013, 1:40:01 AM8/21/13
to Benjamin Measures, golang-nuts
Please try this sample code

http://play.golang.org/p/y2319158oP

I get to 114688 connections without any serious tcp tuning.

Paul van Brouwershaven

unread,
Aug 21, 2013, 2:09:39 AM8/21/13
to Benjamin Measures, golang-nuts, Dave Cheney
On Wed, Aug 21, 2013 at 1:13 AM, Benjamin Measures <saint....@gmail.com> wrote:
On Tuesday, 20 August 2013 10:19:44 UTC+1, Paul van Brouwershaven wrote:
2013/08/20 08:57:46 sockets: used 64723 TCP: inuse 64542 orphan 31 tw 0 alloc 64632 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 1 memory 1480
2013/08/20 08:58:30 sockets: used 64407 TCP: inuse 62746 orphan 31 tw 0 alloc 64316 mem 772 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
 
The TCP sockets inuse has decreased. This would suggest sockets are being closed.

The sockets should be closed, in 1 or 2 seconds as requested in Dailer Timeout and Deadline.
 
When I run the test program with 64100 routines I can even get to 64632 but soon run into the error "address already in use" error.
2013/08/20 09:07:33 sockets: used 708 TCP: inuse 614 orphan 30 tw 0 alloc 616 mem 770 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 09:08:04 sockets: used 64807 TCP: inuse 64632 orphan 30 tw 4 alloc 64715 mem 771 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0
2013/08/20 09:08:23 *.*.*.1 -> 10.5.175.151 || dial tcp 10.5.175.151:80: address already in use

Using more goroutines will cause connections to be opened (and closed) at a greater rate. That the error doesn't appear until ~20s after peak/high inuse, would support the CLOSE_WAIT issue as outlined earlier in this thread.

The error doesn't appear immediately with +/- 64000 routines because we are still running under the TCP limit that I'm trying to pass. Each IP address is limited to 65535 ephemeral ports (http://en.wikipedia.org/wiki/Ephemeral_port). Where my ephemeral port range is set from 1024 to 65535. To use more ephemeral ports then possible on a single server you need to add additional IP addresses because the ports should be bounded to a specific IP address.

While you can tune a kernel to shorten the CLOSE_WAIT time and reusing the CLOSE_WAIT ports it should not matter in my case (for now).

By using two source IP addresses I should be able to make 129022 (2x64511) connections, the state of the TCP should have no effect on the total number of sessions, it would have effect if you are looking for established sessions which I'm not.

Paul van Brouwershaven

unread,
Aug 21, 2013, 3:34:39 AM8/21/13
to Dave Cheney, Benjamin Measures, golang-nuts
I tested your sample and could get a 128997 established connections, but this is only 63964 outgoing connections. This is about the same result as with my test but you now have doubled this by counting incoming and outgoing connections.

2013/08/21 06:54:20 starting listeners
2013/08/21 06:54:20 listeners started, starting clients
2013/08/21 06:54:20 clients started, waiting for completion
2013/08/21 06:54:23 client 127.0.0.106:0: connection 7993: dial tcp 127.0.0.206:8000: address already in use
2013/08/21 06:54:23 client 127.0.0.107:0: connection 7993: dial tcp 127.0.0.207:8000: address already in use
2013/08/21 06:54:24 client 127.0.0.102:0: connection 7993: dial tcp 127.0.0.202:8000: address already in use
2013/08/21 06:54:24 client 127.0.0.103:0: connection 7996: dial tcp 127.0.0.203:8000: address already in use
2013/08/21 06:54:24 client 127.0.0.100:0: connection 7995: dial tcp 127.0.0.200:8000: address already in use
2013/08/21 06:54:24 client 127.0.0.104:0: connection 7997: dial tcp 127.0.0.204:8000: address already in use
2013/08/21 06:54:24 client 127.0.0.101:0: connection 7999: dial tcp 127.0.0.201:8000: address already in use
2013/08/21 06:54:24 client 127.0.0.105:0: connection 7998: dial tcp 127.0.0.205:8000: address already in use
                                                                             --------- +
                                                                             63964

2013/08/21 06:54:24 All clients connected, pausing now to let you investigate the process

2013/08/21 06:54:24 sockets: used 129118 TCP: inuse 129023 orphan 11 tw 2 alloc 129025 mem 1774 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0

ESTABLISHED 128997

As soon I extend the number of source and target addresses I get the following:

2013/08/21 07:30:19 starting listeners
2013/08/21 07:30:19 listeners started, starting clients
2013/08/21 07:30:19 clients started, waiting for completion
2013/08/21 07:30:23 client 127.0.0.106:0: connection 6395: dial tcp 127.0.0.206:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.101:0: connection 6395: dial tcp 127.0.0.201:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.103:0: connection 6397: dial tcp 127.0.0.203:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.102:0: connection 6395: dial tcp 127.0.0.202:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.105:0: connection 6396: dial tcp 127.0.0.205:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.104:0: connection 6394: dial tcp 127.0.0.204:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.109:0: connection 6391: dial tcp 127.0.0.109:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.100:0: connection 6394: dial tcp 127.0.0.200:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.108:0: connection 6392: dial tcp 127.0.0.108:8000: address already in use
2013/08/21 07:30:23 client 127.0.0.107:0: connection 6391: dial tcp 127.0.0.207:8000: address already in use
                                                                             --------- +
                                                                             63940

2013/08/21 07:30:23 All clients connected, pausing now to let you investigate the process

2013/08/21 07:30:23 sockets: used 129137 TCP: inuse 129036 orphan 11 tw 0 alloc 129038 mem 1778 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0




--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Mi7QkAqP7II/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Dave Cheney

unread,
Aug 21, 2013, 3:52:37 AM8/21/13
to Paul van Brouwershaven, Benjamin Measures, golang-nuts
Ok, I don't really understand what is required now. I've demonstrated you can hold up 120k sockets from a single process. I think this can easily be extended to a client and a server process, then the rest is tcp tuning. What am I missing ?

Kyle Lemons

unread,
Aug 21, 2013, 4:06:11 AM8/21/13
to Dave Cheney, Paul van Brouwershaven, Benjamin Measures, golang-nuts
Interestingly:

2013/08/21 08:00:46 starting listeners
2013/08/21 08:00:46 listeners started, starting clients
2013/08/21 08:00:46 clients started, waiting for completion
2013/08/21 08:01:52 client 127.0.0.101:0: connection 8189: dial tcp 127.0.0.201:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.100:0: connection 8189: dial tcp 127.0.0.200:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.105:0: connection 8191: dial tcp 127.0.0.205:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.102:0: connection 8189: dial tcp 127.0.0.202:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.107:0: connection 8189: dial tcp 127.0.0.207:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.106:0: connection 8189: dial tcp 127.0.0.206:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.103:0: connection 8188: dial tcp 127.0.0.203:8000: connection timed out
2013/08/21 08:01:52 client 127.0.0.104:0: connection 8190: dial tcp 127.0.0.204:8000: connection timed out
2013/08/21 08:01:52 All clients connected, pausing now to let you investigate the process

for a total of 65514.  No tuning other than what you posted in https://code.google.com/p/go/issues/detail?id=6176#c4


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Paul van Brouwershaven

unread,
Aug 21, 2013, 4:14:36 AM8/21/13
to Dave Cheney, Benjamin Measures, golang-nuts
You demonstrated that you can make 120k TCP sessions, 63964 outgoing, and 63964 incoming.

This is not 120k outgoing, handling incoming connections is not a big deal as they could use the same port.

Paul van Brouwershaven

unread,
Aug 21, 2013, 4:18:50 AM8/21/13
to Kyle Lemons, Dave Cheney, Benjamin Measures, golang-nuts
On Wed, Aug 21, 2013 at 10:06 AM, Kyle Lemons <kev...@google.com> wrote:
for a total of 65514.  No tuning other than what you posted in https://code.google.com/p/go/issues/detail?id=6176#c4

This is still lower than the 65535, the slight difference why you can't reach the 655535 exactly is more depending on other connections running on your test machine.

But it's the intention to pass this 65535, which in this case should be go up to 7 (ip numbers) * 65535 (ports) = 458745 outgoing connections.

Kyle Lemons

unread,
Aug 21, 2013, 5:30:46 AM8/21/13
to Paul van Brouwershaven, Dave Cheney, Benjamin Measures, golang-nuts
Do you have a C program that can do this?  I have been playing around and still can't exceed 65518 outgoing connections even across 64 processes, regardless of how many destination IPs I use.

I'm more and more convinced that this is a linux issue, not a Go issue.


Kyle Lemons

unread,
Aug 21, 2013, 5:41:03 AM8/21/13
to Paul van Brouwershaven, Dave Cheney, Benjamin Measures, golang-nuts
Aha, here's a fun clue:

Aug 21 09:32:01 localhost kernel: [18603209.749683] nf_conntrack: table full, dropping packet.

Quick google search leads me to /proc/sys/net/ipv4/netfilter/ip_conntrack_max

which I doubled (to 131072), and now I can get 131056 consistently.

Paul van Brouwershaven

unread,
Aug 21, 2013, 5:45:54 AM8/21/13
to Kyle Lemons, Dave Cheney, Benjamin Measures, golang-nuts
It looks like I found the problem, I went back to handling my ports manually but now I excluded all ports that are in use as listener on my local system, as for example 27017 for mongodb.

Then I moved the ipNext selection into the retry, as I want to try a new port and not the same port again.

Now I have 151289 connections :-)

2013/08/21 09:40:24 sockets: used 151289 TCP: inuse 151188 orphan 11 tw 4 alloc 151190 mem 1775 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0

Paul van Brouwershaven

unread,
Aug 21, 2013, 5:47:37 AM8/21/13
to Kyle Lemons, Dave Cheney, Benjamin Measures, golang-nuts
If you remove the nf_conntrack module from your kernel (and have some fun with your manual firewall) you can go even much higher. The connection tracking module is abusing resources of your system.
Reply all
Reply to author
Forward
0 new messages