Golang TCP latency overhead

Lei Ni

unread,

Jun 15, 2018, 4:23:36 AM6/15/18

to golang-nuts

When doing some performance profiling for my golang program, I noticed that some extra TCP latency overhead is probably introduced by the golang runtime or its std lib.

To isolate the problem, I created simple TCP client & server programs in both golang and C. Basically, I use a TCP client to send a 8bytes message to server every 100 millisecond (which is huge compared to the latency between the two hosts used for testing). On receiving such test message from the client, the server sends the same message back so the client can measure the RTT time. I repeat this 100 times to measure the average RTT between client & server. The C & Golang implementations are used for comparison.

I've uploaded the above described test programs to https://github.com/lni/tcplatency

The following results were obtained from two of my hosts connected by LAN running golang 1.10.3 on AMD64 Linux. As you can see, the RTT is much higher when both the client & server are implemented in golang.

I am wondering what actually caused such latency overhead? Any help would be much appreciated!

Cheers,

Lei

Host 1	Host 2	avg rtt (microseconds)
go-server	go-client	193
go-server	c-client	165
c-server	go-client	166
c-server	c-client	140

Remus Clearwater

unread,

Jun 15, 2018, 5:09:39 AM6/15/18

to Lei Ni, golang-nuts

I think the latency is mainly due to the go scheduler. You have to pay the cost

If you choose to use go. Practice shows that the latency would become worse

if you increase the amount of goroutines and channels been using in the go program

when comparing with C. I wish I'm wrong.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Manlio Perillo

unread,

Jun 15, 2018, 5:22:46 AM6/15/18

to golang-nuts

Il giorno venerdì 15 giugno 2018 10:23:36 UTC+2, Lei Ni ha scritto:

When doing some performance profiling for my golang program, I noticed that some extra TCP latency overhead is probably introduced by the golang runtime or its std lib.

To isolate the problem, I created simple TCP client & server programs in both golang and C. Basically, I use a TCP client to send a 8bytes message to server every 100 millisecond (which is huge compared to the latency between the two hosts used for testing). On receiving such test message from the client, the server sends the same message back so the client can measure the RTT time. I repeat this 100 times to measure the average RTT between client & server. The C & Golang implementations are used for comparison.

I've uploaded the above described test programs to https://github.com/lni/tcplatency

The following results were obtained from two of my hosts connected by LAN running golang 1.10.3 on AMD64 Linux. As you can see, the RTT is much higher when both the client & server are implemented in golang.

I am wondering what actually caused such latency overhead? Any help would be much appreciated!

Go networking uses a poller.

You should add a C server and client using epoll and non-blocking I/O.

Manlio

Manlio Perillo

unread,

Jun 15, 2018, 5:47:03 AM6/15/18

to golang-nuts

Also, there are several layers in the network implementation in Go.

1) internal/poller

2) net/netFD

3) net/conn

internal/poller uses a specialized mutex to serialize access to Read, Write and Close methods.

Manlio

Jesper Louis Andersen

unread,

Jun 15, 2018, 6:00:28 AM6/15/18

to Lei Ni, golang-nuts

On Fri, Jun 15, 2018 at 10:23 AM Lei Ni <nil...@gmail.com> wrote:

You need to be cautious when you are using averages. It would be far better to run each test, 10-100k times and then use a Students-T on the distributions (given that the variance looks the same and it looks normal, further tests might be needed for this).

The problem is that an average can hide that the program is multi-modal: it often answers slighty faster than what is measured (150us say) but once in a while it falls apart fully and uses 400us or more. It is usually better to work on the full data set rather than the average in these cases. If you don't have drops like these, I'd be cautious as well since a real-world-system is likely to get into situations where your programs don't get to run. Bootstrap methods can generally be used for outlier detection in the data set.

A good start is to store the data in files and use Poul Henning Kamp's 'ministat' tool on the data set (Standard in FreeBSD, and there are ports). You can also use R in which case you can also get nice plots.

If I should guess on a reason, the C program is doing less work than the Go program. OTOH, that also means the C program has limits if you have multiple of these routines doing work like this. So you are paying a higher initial constant cost for some flexibility in the long run.

Ian Lance Taylor

unread,

Jun 15, 2018, 10:02:59 AM6/15/18

to Jesper Louis Andersen, Lei Ni, golang-nuts

On Fri, Jun 15, 2018 at 2:59 AM, Jesper Louis Andersen
<jesper.lou...@gmail.com> wrote:
>
> If I should guess on a reason, the C program is doing less work than the Go
> program. OTOH, that also means the C program has limits if you have multiple
> of these routines doing work like this. So you are paying a higher initial
> constant cost for some flexibility in the long run.

Yes, as others have said, the Go runtime uses a poller for all network
I/O. That lets the Go runtime scale to very large numbers of network
connections without requiring an operating system thread for each one.
But it does mean that there is some overhead for a single network
connection on a single thread.

Ian

Slawomir Pryczek

unread,

Jun 19, 2018, 10:43:52 AM6/19/18

to golang-nuts

Hey Lei, this overhead is very small IMO compared to what you're getting... you need to take into account that when the payload will grow the overhead will decrease, and from my experience - biggest performance killer is receive after each send as you can't use buffering. Try to design for batch processing if that's possible... eg. you send 5 requests and on 6-th you set a flag which will return you status for the last 6 requests sent on that connection. Or you can encapsulate multiple requests into some simple container...

I also noticed that when testing some socket code, that pure C is like 25% faster for small packets without buffering because i was surprised how slow it is (also for C)... in real life that doesn't make big difference, requesting confirmation for each send is what actually will kill your performance... if you can do batch processing and make packets larger per send/receive call - it'd be 5-20 times faster minimum, than if you're doing simple request-response...

Amnon Baron Cohen

unread,

Jun 21, 2018, 2:30:53 AM6/21/18

to golang-nuts

Github gives me a 404 error when I try to see the code.
But both in C and Go you want to check that the TCP nagle algorithm is turned off.
And you will want to turn off interrupt moderation on your network card.

Reply all

Reply to author

Forward