tunging net package efficiency and speed

585 views
Skip to first unread message

Eric Z

unread,
Jan 28, 2014, 6:33:42 PM1/28/14
to golan...@googlegroups.com
Hello Golang-nuts,

Where I can find materials regarding the performance of network package. I have to machines connected with 10GbE card, and I am able to write a straightforward socket c code to get TCP throughput of 1GByte/s with less than 50% cpu utilization.

However, when I write the following golang code to send large data over the 10GbE, I can only get 0.9GByte/s, and the worse part is that the CPU utilization is 100%. I need my cpu utilization to be low so that other parallel tasks could be conducted.

Can anyone give me some advice on how to tune the golang net performance to the similar efficiency of C code?

Thanks a lot.

Server/receiver code
http://play.golang.org/p/X_7tM3dpdL

Client/sender code
http://play.golang.org/p/h4LrW7UfGI



Dave Cheney

unread,
Jan 28, 2014, 9:17:07 PM1/28/14
to Eric Z, golan...@googlegroups.com
Hello,

There are a lot of variables here, not just changing go for c. 

Could you please provide some more details. 

What OS and kernel
Which version of go
What nic
Is the nic operating in polled or irq mode, are you using jumbo frames?
How many cores do the target machines have
What is the CPU breakdown when the tests are running, ie mpstat or vmstat

Cheers

Dave
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dmitry Vyukov

unread,
Jan 29, 2014, 1:41:11 AM1/29/14
to Eric Z, golang-nuts
How does CPU profile look like?

Try to use smaller buffer size for reads and writes, e.g. 64K.

Eric Z

unread,
Jan 29, 2014, 12:36:19 PM1/29/14
to golan...@googlegroups.com, Eric Z
Thank you, Dave.

It's Cent OS 6.4 with kernel 2.6.32. Go is version 1.2, and nic is Intel 82599EB 10-Gigabit. Each machine has twelve cores. I believe I have enough CPU resources. When I perform c vs go experiment all these os, kernel and nic settings are the same. That's why I guess it is the problem of configuring net package in golang.

andrey mirtchovski

unread,
Jan 29, 2014, 12:50:46 PM1/29/14
to Eric Z, golang-nuts
show us the C code too.

Eric Z

unread,
Jan 29, 2014, 1:06:24 PM1/29/14
to golan...@googlegroups.com, Eric Z
Thank you all.

Here is how the CPU profile looks like. Go code seems to be 40% more busy on CPU with the same underline OS, kernel, setup.

When I use C code, the CPU seems to be less busy. (Note, I have 12 physical cores with hyperthreading turned on, so you can see 12 cores there). Here is the mpstat sample for c code running.

10:00:46 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:00:50 AM  all    0.01    0.00    2.24    0.09    0.00    0.32    0.00    0.00   97.33
10:00:50 AM    0    0.00    0.00   61.27    2.60    0.00    8.96    0.00    0.00   27.17
10:00:50 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    8    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM    9    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   10    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   11    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   12    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   13    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   15    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   19    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
10:00:50 AM   20    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   21    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   22    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:00:50 AM   23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00


Here is the CPU profile of running go code

09:59:12 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
09:59:16 AM  all    0.34    0.00    3.23    0.24    0.01    0.61    0.00    0.00   95.57
09:59:16 AM    0    2.22    0.00   56.51    0.00    0.00   16.07    0.00    0.00   25.21
09:59:16 AM    1    1.79    0.00    9.74    0.00    0.00    0.00    0.00    0.00   88.46
09:59:16 AM    2    1.02    0.00   11.48    0.00    0.00    0.00    0.00    0.00   87.50
09:59:16 AM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM    6    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
09:59:16 AM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM    8    2.07    0.00    2.33    0.00    0.00    0.00    0.00    0.00   95.61
09:59:16 AM    9    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   10    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   11    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   12    0.25    0.00    0.25    5.75    0.00    0.00    0.00    0.00   93.75
09:59:16 AM   13    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
09:59:16 AM   14    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
09:59:16 AM   15    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   19    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
09:59:16 AM   20    1.01    0.00    1.52    0.00    0.00    0.00    0.00    0.00   97.47
09:59:16 AM   21    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   22    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
09:59:16 AM   23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

Skip Tavakkolian

unread,
Jan 29, 2014, 3:52:27 PM1/29/14
to Eric Z, golang-nuts
can you test with iperf and compare its measurement with the one from this tool (in Go):


on identical setup the results between it and iperf are nearly identical.

Eric Z

unread,
Jan 29, 2014, 8:10:22 PM1/29/14
to golan...@googlegroups.com, Eric Z
Thank you Skip.

It seems like a nice program. Better than what I have for now, The CPU usage is still pretty high and I can almost get 10Gb/s but it doesn't seem to be very stable. The round trip doesn't work though.

Thanks a lot for sharing the code, and let me take a look at how you implemented.

Cheers,
Eric.

Skip Tavakkolian

unread,
Jan 30, 2014, 2:17:11 AM1/30/14
to Eric Z, golang-nuts
yes, UDP and RTT have UI stubs but no code behind them yet.

Dmitry Vyukov

unread,
Jan 30, 2014, 4:31:49 AM1/30/14
to Eric Z, golang-nuts
Do you set GOMAXPROCS? Don't set GOMAXPROCS for this program.


And I meant program profiling profile, i.e. pprof of perf.

Eric Z

unread,
Jan 30, 2014, 1:58:31 PM1/30/14
to Dmitry Vyukov, golang-nuts
Hi guys,

After looking through Skip's code. I figured that what actually make difference is to use net.DialTCP instead of net.Dial, and net.ListenTCP instead of net.Listen for a high performance TCP connection. I guess when the net connection package wrap up the net TCPconnection, there are some obvious waste. Anyway, thanks again for all of you for the help.

Cheers,
Eric.
Reply all
Reply to author
Forward
0 new messages