Poor performance of net.Dial() & friends

410 views
Skip to first unread message

s...@uber.com

unread,
Jan 10, 2017, 4:04:33 PM1/10/17
to golang-nuts
Hi Gophers,

My problem domain is such that I need to make a large number of TCP connections from a small set of hosts to many other hosts (targets), on a local network. The connections are short lived, usually <200ms and transfer <100 bytes in each direction, I need to do about 100k connections / second per source host. 

The numbers below are all from a 24-core Intel machine, running Linux with Go 1.7.3, cross compiled from OS X.  The machine has a multi-queue nic with RSS enabled.  The targets are multiple machines running Go servers listening on 200 ports each (to avoid 5-tuple exhaustion).

My Go code [1] spawns N go routines, each of which calls net.Dial(), performs the transaction and then sleeps for 1s.

With this approach, setting GOMAXPROCS=1 can't sustain 10k conns/ section without triggering connection timeouts at a 400ms deadline. Similarly GOMAXPROCS=24 can't sustain 100k conns / second.  Removing the context timeout passed to Dial() improves performance to the point where GOMAXPROCS=1 can do 10k conns/second at a 1% timeout rate with a 200ms deadline. 

I've written a C++ solution that uses N-threads, each calling epoll(). Targets are assigned to threads and then the sockets stay local to the thread for the duration of the transaction. On the same host a single thread can do 20k conns/second with a 0.12% timeout rate at a 200ms deadline. 6 threads with 10k conn/s each produce <2% of timeouts @ 200ms and with 16 threads, 10k each, <2% exceed 200ms and <0.5% of requests exceed 300ms.

I believe the Go solution suffers from at least two issues:

i) net.Dial() is fairly expensive, both in terms of allocations & syscalls. [2]
ii) syscalls cause the Go routine to be rescheduled, bouncing the work for a single socket across CPU cores, hurting locality. Correct me if I'm wrong here but from my reading that's what occuring.


I've tried a number of workarounds:

- Use net.DialTCP() at GOMAXPROCS=4, 40k conns/second all requests complete in <200ms. That's an improvement but it doesn't allow me to provide a timeout.
- exposing net.tcpDial() directly gives 5% timeouts @200ms with GOMAXPROCS=4, 40k conn/s second. Setting GOMAXPROCS=24 produces a 0% timeout rate, and can scale up to 80k  conn/s before timeouts start appearing (1% @ 100k conns/s). This is the best option I've found so far but requires use of an internal API.
- using syscall.Socket() directly. The problem here is receiving notification when the socket is writable (connected). There doesn't appear to be a way to hook into the netpoller. I wrote a solution using syscall.EPoll() directly but that had even worse performance than the native Go solution.

Does anyone have suggestions on speeding this up? I'd prefer to keep this component in written in Go but I'm running out of options to meet the performance & efficiency targets.

Thanks,


Simon N


BenchmarkDial/dialer.DialContext-8 1000          1344 B/op     28 allocs/op
BenchmarkDial/net.Dial-8           3000           863 B/op     20 allocs/op
BenchmarkDial/net.DialTCP-8        2000           638 B/op     15 allocs/op
BenchmarkDial/net.DialTimeout-8    2000          1344 B/op     28 allocs/op
BenchmarkDial/net.dialTCP-8        1000          1120 B/op     23 allocs/op

Dave Cheney

unread,
Jan 10, 2017, 4:20:00 PM1/10/17
to golang-nuts
Dumb question, what about your design prevents you from pooling and reusing connected sockets?

Dave Cheney

unread,
Jan 10, 2017, 4:24:51 PM1/10/17
to golang-nuts
Second dumb question, if your messages are 100 bytes long, why not use UDP?

s...@uber.com

unread,
Jan 10, 2017, 5:28:34 PM1/10/17
to golang-nuts


On Tuesday, January 10, 2017 at 1:24:51 PM UTC-8, Dave Cheney wrote:

> Dumb question, what about your design prevents you from pooling and reusing connected sockets?

The requirements call for a new TCP connection each time. I proposed reusing connections but the customer isn't happy with that solution long term.

> Second dumb question, if your messages are 100 bytes long, why not use UDP?

The protocol is unfortunately not under my control. What's in the gist is not the actual protocol but an approximation. 

Simon

Reply all
Reply to author
Forward
0 new messages