Re: Why socket write is 10 times slow than c ?

3124 views
Skip to first unread message

wheelcomplex

unread,
May 29, 2013, 6:01:03 AM5/29/13
to golan...@googlegroups.com
would you please attach the source file of C client (and C server)?

On Wednesday, May 29, 2013 10:47:50 AM UTC+8, Peter wrote:
hi
   i write a test case for socket-write/read,  it's only use client.go send 2bytes in 10 million times , and the server.go received them
   but its too slow ,   a bout 10 times slow than c. 
   i write wrong?
   thans for taught me.

    test case in  attachment

   tks all.
Message has been deleted

Peter

unread,
May 29, 2013, 7:09:44 AM5/29/13
to golan...@googlegroups.com
Do you check for an error every time you send/receive in C?

Please post your C code.

On Wednesday, 29 May 2013 11:38:16 UTC+1, Peter wrote:
oh, 

I found the performance killer

it's :

if(err != nil) {
   doSomthing
}

although never running into  "if".  
Just "err != nil " let a lot slower

why?


在 2013年5月29日星期三UTC+8上午10时47分50秒,Peter写道:

Michael Jones

unread,
May 29, 2013, 7:07:37 AM5/29/13
to Peter, golang-nuts
Well, if "err != nil" then there is an error and you need to deal with it. ;-)


On Wed, May 29, 2013 at 3:38 AM, Peter <syu...@gmail.com> wrote:
oh, 

I found the performance killer

it's :

if(err != nil) {
   doSomthing
}

although never running into  "if".  
Just "err != nil " let a lot slower

why?


在 2013年5月29日星期三UTC+8上午10时47分50秒,Peter写道:
hi
   i write a test case for socket-write/read,  it's only use client.go send 2bytes in 10 million times , and the server.go received them
   but its too slow ,   a bout 10 times slow than c. 
   i write wrong?
   thans for taught me.

    test case in  attachment

   tks all.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Julien Schmidt

unread,
May 29, 2013, 11:22:35 AM5/29/13
to golan...@googlegroups.com
On Wednesday, May 29, 2013 12:38:16 PM UTC+2, Peter wrote:
I found the performance killer

it's :

if(err != nil) {
   doSomthing
}

although never running into  "if".  
Just "err != nil " let a lot slower

If it is really just the this 'if' it could be a branch prediction fail. But modern CPUs have dynamic branch prediction and you said that err != nil was never true, so at least after a few loop iterations the predictor should adapt...

Please post your code :) And don't forget to tell us on which OS and hardware you are testing.

Peter

unread,
May 29, 2013, 12:17:39 PM5/29/13
to golan...@googlegroups.com
hi, Julien:

the code has been uploaded to the topic  attachments
my computer are :os: centos 64,   2 Xeon CPU@ 2.13GHz
I said it does not really be executed because there is no error printed.


在 2013年5月29日星期三UTC+8下午11时22分35秒,Julien Schmidt写道:

Julien Schmidt

unread,
May 29, 2013, 12:59:47 PM5/29/13
to golan...@googlegroups.com
Which Go version are you running?
Results with Go 1.1 (on Intel i5-2500K):
Windows 7 x64
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 18
Milliseconds      : 573
Ticks             : 185735875
TotalDays         : 0,000214972077546296
TotalHours        : 0,00515932986111111
TotalMinutes      : 0,309559791666667
TotalSeconds      : 18,5735875
TotalMilliseconds : 18573,5875

Ubuntu 13.04 x64
real 0m0.640s
user 0m0.332s
sys 0m0.300s

I think this is a result of the integrated network poller for linux in Go 1.1

Alexey Borzenkov

unread,
May 29, 2013, 2:00:23 PM5/29/13
to Peter, golang-nuts
Hi,

Could it be because Go has NoDelay enabled by default, unlike sockets you would use in C? Take a look at SetNoDelay(false): http://golang.org/pkg/net/#TCPConn.SetNoDelay

Monnand

unread,
May 29, 2013, 2:28:22 PM5/29/13
to golan...@googlegroups.com
On 05/29/2013 12:17 PM, Peter wrote:
> hi, Julien:
>
> the code has been uploaded to the topic attachments
> my computer are :os: centos 64, 2 Xeon CPU@ 2.13GHz
> I said it does not really be executed because there is no error printed.
>

I can only see your Go code, but not the C code. I mean, the source code
of your server and client which is written in C programming language
instead of Go.

-Monnand

Peter

unread,
Jun 5, 2013, 3:36:47 AM6/5/13
to golan...@googlegroups.com
i update my  test case,   then,  result like this:

  Server        Client         needTime(5 times, avg)
   golang        golang         12s
   cpp            cpp               6s
   golang        cpp             11s
   cpp            golang        13s 

i used 2bytes package and send 10Million times,  

the cpp  server command:  server 192.168.0.1 8999  2
     cpp   client  command:  client 192.168.0.2  8999 10000000 2

the golang server command:  server  -port 8999 -size 2
      golang client command:   client -hostport 192.168.0.1:8999 -size 2 -times 10000000

the source code in  the attachment



在 2013年5月30日星期四UTC+8上午2时28分22秒,Monnand写道:
client.cpp
client.go
server.cpp
server.go

steve wang

unread,
Jun 5, 2013, 9:25:31 AM6/5/13
to golan...@googlegroups.com
Interesting.
The invocation of receive on server side for go client is almost 10 times for cpp client.

For cpp client:
recv system call times:940935
totle recv bytes:20000000

For Go client:
recv system call times:8750028
totle recv bytes:20000000

Dmitry Vyukov

unread,
Jun 5, 2013, 9:30:45 AM6/5/13
to steve wang, golang-nuts
On Wed, Jun 5, 2013 at 5:25 PM, steve wang <steve....@gmail.com> wrote:
> Interesting.
> The invocation of receive on server side for go client is almost 10 times
> for cpp client.
>
> For cpp client:
> recv system call times:940935
> totle recv bytes:20000000
>
> For Go client:
> recv system call times:8750028
> totle recv bytes:20000000


How many bytes C recv's receive and how many bytes Go recv's receive?
With what errors and how frequently Go recv's fail?

steve wang

unread,
Jun 5, 2013, 9:44:17 AM6/5/13
to golan...@googlegroups.com, steve wang
I only ran cpp server on my 64-bit linux for both cpp client and go client.
In both cases, no client reported error while sending and the server received 20000000 bytes. 

note that I have add code to check error in the source code of the cpp client and as I said above no errors occurred:
while(times--)
{
int n = send(localSocket, pBuffer, bytes, 0);
if (n == -1) {
cout << "failed!" << endl;
break;
}
}



Dmitry Vyukov

unread,
Jun 5, 2013, 9:46:43 AM6/5/13
to steve wang, golang-nuts
But you run it under strace, right?
If there are much more recv's in Go program. Why there are more? Are
they returning less data or fail?

chris dollin

unread,
Jun 5, 2013, 10:01:41 AM6/5/13
to Dmitry Vyukov, steve wang, golang-nuts

On 5 June 2013 14:46, Dmitry Vyukov <dvy...@google.com> wrote:

| But you run it under strace, right?
| If there are much more recv's in Go program. Why there are more? Are
| they returning less data or fail?

He's saying that it takes ten times longer, not that it takes
ten times as many recv calls.

Chris

--
Chris "allusive" Dollin

Dmitry Vyukov

unread,
Jun 5, 2013, 10:10:20 AM6/5/13
to chris dollin, steve wang, golang-nuts
On Wed, Jun 5, 2013 at 6:01 PM, chris dollin <ehog....@googlemail.com> wrote:
>
> On 5 June 2013 14:46, Dmitry Vyukov <dvy...@google.com> wrote:
>
> | But you run it under strace, right?
> | If there are much more recv's in Go program. Why there are more? Are
> | they returning less data or fail?
>
> He's saying that it takes ten times longer, not that it takes
> ten times as many recv calls.

OK, then what is the number of calls?

steve wang

unread,
Jun 5, 2013, 10:21:59 AM6/5/13
to golan...@googlegroups.com, Dmitry Vyukov, steve wang, ehog....@googlemail.com
Sorry for my poor english. I meant the number of invocations.
That is:
940935 vs 8750028

Dmitry Vyukov

unread,
Jun 5, 2013, 10:27:41 AM6/5/13
to steve wang, golang-nuts, chris dollin
On Wed, Jun 5, 2013 at 6:21 PM, steve wang <steve....@gmail.com> wrote:
> Sorry for my poor english. I meant the number of invocations.
> That is:
> 940935 vs 8750028

please look at the strace output to determine why there are 10x recv
calls -- are they return less data? are they fail with errors?

steve wang

unread,
Jun 5, 2013, 10:34:55 AM6/5/13
to golan...@googlegroups.com, chris dollin, steve wang
I have shortened the source code for sake of convenience.
Now they are easy to build and run.

=====================
build:
$ g++ -o server_cpp server.cpp 
$ g++ -o client_cpp client.cpp
$ go build -o client_go client.go

=====================
run server and cpp client:
$ ./server_cpp
recv system call times: 998808
recv bytes: 20000000

$ time ./client_cpp
real 0m3.678s
user 0m0.224s
sys 0m3.440s

=====================
run server and go client:
$ ./server_cpp
recv system call times: 8877231
recv bytes: 20000000

$ time ./client_go
begin send...
client done!

real 0m23.028s
user 0m5.036s
sys 0m17.929s
client.cpp
client.go
server.cpp

Dmitry Vyukov

unread,
Jun 5, 2013, 10:46:06 AM6/5/13
to steve wang, golang-nuts, chris dollin
Looks like it's because of NoDelay (as suggested earlier)

steve wang

unread,
Jun 5, 2013, 10:47:56 AM6/5/13
to golan...@googlegroups.com, steve wang, chris dollin
No errors were reported on both server side and client side every time I ran them.
It apparently that the server receives less data on recv when connected to by go client and I have proved it by printing the length of received data.
For cpp client:
The server receives 30-40 bytes every time.
For go client:
The server receives about only 2 bytes every time.

The server is kept the same during testing.
 

 

steve wang

unread,
Jun 5, 2013, 11:20:21 AM6/5/13
to golan...@googlegroups.com

Craig Mason-Jones

unread,
Jun 5, 2013, 7:42:05 PM6/5/13
to golan...@googlegroups.com
Hi,

The issue is with TCP_NODELAY. It seems that C, by default, does not use TCP_NODELAY, which Go, by default, does.

I've attached 2 clients (go and C) that switch TCP_NODELAY - so the Go client turns it OFF, and the C client turns it ON. On my system, this inverts the performance - my Go client takes about 23s and my C client about 3 minutes. (in the original, C was about 19s, and Go about 3 min)

All the best,
Craig
client.go
client.cpp

steve wang

unread,
Jun 5, 2013, 9:39:39 PM6/5/13
to golan...@googlegroups.com
If you switch TCP_NODELAY OFF for both clients, what's the result?
I got a result that the Go client is yet 3x slower.
I have added comments on the issue page.

dlin

unread,
Jun 5, 2013, 11:29:12 PM6/5/13
to golan...@googlegroups.com
I packed the source and a Makefile together. 

The test result is similar, but the Go version is much slower on my testing( for cpp, I put -O2 option).

$ ./server_cpp  # use client2_go (NoDelay is true)
recv system call times: 6901961
recv bytes: 20000000

$ ./server_cpp # use client_go (NoDelay is false)
recv system call times: 2015287
recv bytes: 20000000

$ ./server_cpp   # use client_cpp
recv system call times: 364847
recv bytes: 20000000
slowSend.tgz

Jesse McNelis

unread,
Jun 5, 2013, 11:53:14 PM6/5/13
to dlin, golang-nuts
On Thu, Jun 6, 2013 at 1:29 PM, dlin <dli...@gmail.com> wrote:
> I packed the source and a Makefile together.
>
> The test result is similar, but the Go version is much slower on my testing(
> for cpp, I put -O2 option).

The other main difference between the Go version and the C version
here is that Go uses poll() under the hood and the C version doesn't.
Calling non-blocking IO is slow, but on real world loads this cost is
tiny because you don't actually do IO very often.


--
=====================
http://jessta.id.au

steve wang

unread,
Jun 6, 2013, 2:44:51 AM6/6/13
to golan...@googlegroups.com, dlin, jes...@jessta.id.au
Agreed. A program in real world rarely keeps sending data without receiving or doing other I/Os.
Hence this issue is not a serious problem.
 

--
=====================
http://jessta.id.au

Peter

unread,
Jun 6, 2013, 6:24:17 AM6/6/13
to golan...@googlegroups.com, dlin, jes...@jessta.id.au
bad luck....

I just need  to collect the data from a large number of clients, then transfer them to other nodes by pub/sub information, 

so , i will need High-performance network IO....



在 2013年6月6日星期四UTC+8下午2时44分51秒,steve wang写道:

Dmitry Vyukov

unread,
Jun 6, 2013, 6:35:29 AM6/6/13
to Peter, golang-nuts, dlin, Jesse McNelis
On Thu, Jun 6, 2013 at 2:24 PM, Peter <syu...@gmail.com> wrote:
> bad luck....
>
> I just need to collect the data from a large number of clients, then
> transfer them to other nodes by pub/sub information,
>
> so , i will need High-performance network IO....


If number of clients is really large, then you can not use the C++
code, because it requires thread-per-connection. The Go code does not
have this limitation.

Jesse McNelis

unread,
Jun 6, 2013, 7:02:35 AM6/6/13
to steve wang, golang-nuts, dlin
On Thu, Jun 6, 2013 at 4:44 PM, steve wang <steve....@gmail.com> wrote:
> Agreed. A program in real world rarely keeps sending data without receiving
> or doing other I/Os.
> Hence this issue is not a serious problem.

A real world program doesn't send 2 bytes per syscall 10 million times.
It's much more likely to buffer that and just do one send.

Syscalls are always expensive, they are so much more expensive
than doing pretty much anything else that it's kind of irrelevant how
expensive they are.


--
=====================
http://jessta.id.au

Anssi Porttikivi

unread,
Jun 6, 2013, 11:44:53 AM6/6/13
to golan...@googlegroups.com
I was expecting John Nagle here to comment on TCP_NODELAY usage ;-)

Craig Mason-Jones

unread,
Jun 6, 2013, 7:42:39 PM6/6/13
to golan...@googlegroups.com
Hi,

I'm on a MacBook Pro, running OS X 10.8.4, with 16G memory (not that that should be relevant in this case, just bragging!).

I'm getting the C++ version (compiled with -O2) taking 14s, and the Go version taking 22s.

Go's definitely slower, but not 3x slower.

All the best,
Craig

dlin

unread,
Jun 6, 2013, 9:43:22 PM6/6/13
to golan...@googlegroups.com
I changed the code from  2 bytes write into 100 bytes write.  Using 100 bytes to simulate my really use case.

100 bytes * 200000 times
$ ./server_cpp  # CPU(97% real:0.19s user:0.01s sys:0.18s) Mem(max:4kB avg:0kB) pf:0 ./client_cpp
recv system call times: 6557
recv bytes: 20000000

$ ./server_cpp # CPU(99% real:0.36s user:0.15s sys:0.21s) Mem(max:4kB avg:0kB) pf:0 ./client_go
recv system call times: 10571
recv bytes: 20000000

100 bytes * 10000000 times
$ ./server_cpp # CPU(99% real:6.62s user:0.47s sys:6.12s) Mem(max:4kB avg:0kB) pf:0 ./client_cpp
recv system call times: 308790
recv bytes: 1000000000

$ ./server_cpp # CPU(99% real:15.69s user:5.70s sys:9.96s) Mem(max:4kB avg:0kB) pf:0 ./client_go
recv system call times: 530584
recv bytes: 1000000000

The difference is reducing when system call count reduced.  I guess the reason of the Go version slower maybe:
1. It is based on epoll, require more system call
2. For every system call, Go's overhead is higher then C's. (It require convert parameter type)
3. Go still have space to optimize its runtime speed.

John Nagle

unread,
Jun 7, 2013, 6:35:21 PM6/7/13
to golan...@googlegroups.com
On 5/29/2013 8:22 AM, Julien Schmidt wrote:
> On Wednesday, May 29, 2013 12:38:16 PM UTC+2, Peter wrote:
>
>> I found the performance killer
>>
>> it's :
>>
>> if(err != nil) {
>> doSomthing
>> }
>>
>> although never running into "if".
>> Just "err != nil " let a lot slower
>>
>
> If it is really just the this 'if' it could be a branch prediction fail.
> But modern CPUs have dynamic branch prediction and you said that err != nil
> was never true, so at least after a few loop iterations the predictor
> should adapt...

No way can that be the bottleneck, even if the branch prediction is
bad. A socket write will take hundreds to thousands of times longer
than a single if statement.

John Nagle

John Nagle

unread,
Jun 7, 2013, 8:42:49 PM6/7/13
to golan...@googlegroups.com
On 5/29/2013 11:00 AM, Alexey Borzenkov wrote:
> Hi,
>
> Could it be because Go has NoDelay enabled by default, unlike sockets you
> would use in C? Take a look at SetNoDelay(false):
> http://golang.org/pkg/net/#TCPConn.SetNoDelay

That's probably it. The original poster says he is repeatedly writing
2 bytes to a socket. With Go's default, each of those two bytes goes
out in a separate packet. Normally, they'd buffer up until an ACK
came back or a full packet was reached. That's the classic tinygram
case, the worst case for running with NoDelay enabled. Network
traffic goes up by an order of magnitude or more.

John Nagle

Julien Schmidt

unread,
Jun 8, 2013, 3:07:28 AM6/8/13
to golan...@googlegroups.com, na...@animats.com
Yes, forget that.
I think working with slices in Go usually adds a lot of mispredicted branches because of the bounds checks. It may cost a few %, but it doesn't slow down Go by such a magnitude.
This single 'if' can't be the reason - at least if you don't ignore errors there.
Reply all
Reply to author
Forward
0 new messages