can Go http server be faster?

ChrisLu

unread,

Oct 13, 2012, 7:09:19 AM10/13/12

to golan...@googlegroups.com

Seems the simplest web server printing just "Hello World" is not comparable to nginx, apache, etc. I can get about 3000 request/second for Go with max procs set to number of CPUs, but easily get nginx serving a larger static file about 5000 request/second.

I profiled the program. Here is the result. I added the source code at the end also. Can someone please explain why the syscall.Syscall and runtime.futex are taking so much cycles? Can it be improved?

(pprof) top

Total: 6053 samples

2243 37.1% 37.1% 2249 37.2% syscall.Syscall

2153 35.6% 72.6% 2153 35.6% runtime.futex

443 7.3% 79.9% 446 7.4% syscall.Syscall6

92 1.5% 81.5% 92 1.5% bytes.IndexByte

86 1.4% 82.9% 159 2.6% scanblock

81 1.3% 84.2% 81 1.3% syscall.RawSyscall6

68 1.1% 85.3% 68 1.1% runtime.usleep

37 0.6% 86.0% 67 1.1% sweep

35 0.6% 86.5% 35 0.6% runtime.memmove

33 0.5% 87.1% 74 1.2% runtime.MCache_Alloc

(pprof) top --cum

Total: 6053 samples

0 0.0% 0.0% 5350 88.4% schedunlock

3 0.0% 0.0% 2957 48.9% net/http.(*conn).serve

2243 37.1% 37.1% 2249 37.2% syscall.Syscall

2153 35.6% 72.7% 2153 35.6% runtime.futex

0 0.0% 72.7% 1715 28.3% runtime.futexwakeup

0 0.0% 72.7% 1706 28.2% runtime.notewakeup

1 0.0% 72.7% 1638 27.1% net/http.(*response).finishRequest

1 0.0% 72.7% 1628 26.9% bufio.(*Writer).Flush

1 0.0% 72.7% 1628 26.9% net.(*TCPConn).Write

2 0.0% 72.8% 1628 26.9% net.(*netFD).Write

The source code server.go:

package main

import (

"flag"

"log"

"net/http"

"os"

"runtime"

"runtime/pprof"

"time"

)

func HelloServer(w http.ResponseWriter, req *http.Request) {

w.Header().Set("Content-Type", "text/plain")

w.Write([]byte("hello, world!\n"))

}

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {

runtime.GOMAXPROCS(runtime.NumCPU())

flag.Parse()

if *cpuprofile != "" {

f, err := os.Create(*cpuprofile)

if err != nil {

log.Fatal(err)

}

pprof.StartCPUProfile(f)

go func() {

time.Sleep(100 * time.Second)

pprof.StopCPUProfile()

}()

}

http.HandleFunc("/", HelloServer)

srv := &http.Server{

Addr: ":8080",

Handler: http.DefaultServeMux,

ReadTimeout: time.Duration(5) * time.Second,

}

srv.ListenAndServe()

}

Rémy Oudompheng

unread,

Oct 13, 2012, 7:17:48 AM10/13/12

to ChrisLu, golan...@googlegroups.com

On 2012/10/13 ChrisLu <chri...@gmail.com> wrote:
> Seems the simplest web server printing just "Hello World" is not comparable
> to nginx, apache, etc. I can get about 3000 request/second for Go with max
> procs set to number of CPUs, but easily get nginx serving a larger static
> file about 5000 request/second.
>
> I profiled the program. Here is the result. I added the source code at the
> end also. Can someone please explain why the syscall.Syscall and
> runtime.futex are taking so much cycles? Can it be improved?

I think it was recently improved. But remember that optimizing a
micro-benchmark might not be the wisest thing to do.

Rémy.

Jesse McNelis

unread,

Oct 13, 2012, 7:41:16 AM10/13/12

to ChrisLu, golan...@googlegroups.com

On Sat, Oct 13, 2012 at 10:09 PM, ChrisLu <chri...@gmail.com> wrote:
> I profiled the program. Here is the result. I added the source code at the
> end also. Can someone please explain why the syscall.Syscall and
> runtime.futex are taking so much cycles? Can it be improved?

It's probably the scheduler. When you're doing no calculations having
more threads will usually degrade performance.
The goroutines spend all their time moving between threads.
You'll likely get better performance with GOMAXPROCS=1

By default the http pkg uses chunked mode if you don't specify a
content length, chunked mode has some overhead.

> --
>
>

--
=====================
http://jessta.id.au

ChrisLu

unread,

Oct 13, 2012, 3:17:07 PM10/13/12

to golan...@googlegroups.com, ChrisLu

It is much faster than before, but still not I think is not good enough when all it does is just "Hello World".

Although it is a micro-benchmarking, it is a very common use case. And it seems related to the goroutine scheduling contention as in this bug:

http://code.google.com/p/go/issues/detail?id=2933

Chris

http://weed-fs.googlecode.com

Rémy Oudompheng

unread,

Oct 13, 2012, 3:25:44 PM10/13/12

to ChrisLu, golan...@googlegroups.com

On 2012/10/13 ChrisLu <chri...@gmail.com> wrote:
> It is much faster than before, but still not I think is not good enough when
> all it does is just "Hello World".
>
> Although it is a micro-benchmarking, it is a very common use case. And it
> seems related to the goroutine scheduling contention as in this bug:
>
> http://code.google.com/p/go/issues/detail?id=2933

You didn't explain the new benchmark results nor how you came to that
conclusion.

Rémy.

ChrisLu

unread,

Oct 13, 2012, 3:33:39 PM10/13/12

to golan...@googlegroups.com, ChrisLu

I was referring to the benchmark difference using some Go pre 1.0 version one year ago, vs current Go version. See this link: https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/zeLMYnjO_JA

Chris

ChrisLu

unread,

Oct 13, 2012, 3:57:26 PM10/13/12

to golan...@googlegroups.com, ChrisLu, jes...@jessta.id.au

Seems setting the go max procs to 1 does not affect the performance much at all. And the profiling results remain almost the same.

I tried setting the content length also. But seems no effect on performance and profiling either. I think since the chunk size is so small, the performance effect does not show up.

Chris

(pprof) top

Total: 7143 samples

2670 37.4% 37.4% 2836 39.7% syscall.Syscall

2561 35.9% 73.2% 2561 35.9% runtime.futex

548 7.7% 80.9% 586 8.2% syscall.Syscall6

109 1.5% 82.4% 109 1.5% bytes.IndexByte

106 1.5% 83.9% 106 1.5% syscall.RawSyscall6

96 1.3% 85.3% 163 2.3% scanblock

60 0.8% 86.1% 60 0.8% runtime.usleep

48 0.7% 86.8% 80 1.1% sweep

34 0.5% 87.2% 346 4.8% runtime.mallocgc

32 0.4% 87.7% 70 1.0% runtime.MCache_Alloc

(pprof) top --cum

Total: 7143 samples

0 0.0% 0.0% 6350 88.9% schedunlock

3 0.0% 0.0% 3572 50.0% net/http.(*conn).serve

2670 37.4% 37.4% 2836 39.7% syscall.Syscall

2561 35.9% 73.3% 2561 35.9% runtime.futex

5 0.1% 73.3% 2035 28.5% runtime.futexwakeup

0 0.0% 73.3% 2019 28.3% runtime.notewakeup

2 0.0% 73.4% 1953 27.3% net/http.(*response).finishRequest

4 0.1% 73.4% 1948 27.3% net.(*TCPConn).Write

1 0.0% 73.4% 1947 27.3% bufio.(*Writer).Flush

2 0.0% 73.5% 1944 27.2% net.(*netFD).Write

Dave Cheney

unread,

Oct 15, 2012, 6:52:39 PM10/15/12

to jli.ju...@gmail.com, golan...@googlegroups.com

> I'm getting the same result here.. I'm trying to build a high-performance
> web server for a particular application, but I can hardly justify it when
> it's slower than the previous C++ thread-per-connection one I'm using. Any
> more

That is very concerning. Please post your test code so others can
attempt to reproduce your results.

Dave

ChrisLu

unread,

Oct 15, 2012, 10:20:20 PM10/15/12

to golan...@googlegroups.com, jli.ju...@gmail.com

The code should be just basic "Hello World", as seen in the original post.

I also profiled the execution graph here:

http://postimage.org/image/aurx4vvmn/full/

Chris

Dave Cheney

unread,

Oct 15, 2012, 10:30:37 PM10/15/12

to ChrisLu, golan...@googlegroups.com, jli.ju...@gmail.com

Do you have a graph with GOMAXPROCS unset ?

Related: resolving https://code.google.com/p/go/issues/detail?id=3412
may reduce the amount of time spent in Write by avoiding a scheduler
call.

> --
>
>

Dave Cheney

unread,

Oct 15, 2012, 11:04:18 PM10/15/12

to jli.ju...@gmail.com, golan...@googlegroups.com

Thank you for posting your code. Please describe your benchmark harness.

On Tue, Oct 16, 2012 at 2:02 PM, <jli.ju...@gmail.com> wrote:
> I accidentally posted my previous message without finishing, I meant to say
> "Any more information about this would be much appreciated."
>
> Anyways, here's the code I'm using. I have a custom handler because this is
> stripped out of a bigger application.
>
> ////////////////////////////////////
> package main
>
> import (
> "runtime"
> "runtime/pprof"
> "os"
> "os/signal"
> "net/http"
> "strconv"
> "io"
> "fmt"
> )
>
> type httpHandler struct{}
> func (handler *httpHandler) ServeHTTP(w http.ResponseWriter, r
> *http.Request) {
> response := "Test content."
> w.Header().Add("Content-Type", "text/plain")
> w.Header().Add("Content-Length", strconv.Itoa(len(response)))
> io.WriteString(w, response)
> }
>
> func main() {
> runtime.GOMAXPROCS(1)//runtime.NumCPU())
>
> profile := true
>
> if profile {
> f, _ := os.Create("profile.cpu")
> pprof.StartCPUProfile(f)
> }
>
> go func() {
> c := make(chan os.Signal, 1)
> signal.Notify(c, os.Interrupt)
> <-c
>
> if profile {
> pprof.StopCPUProfile()
> }
>
> fmt.Println("Caught interrupt.. shutting down.")
> os.Exit(0)
> }()
>
> handler := &httpHandler{}
> server := &http.Server{
> Addr: ":8080",
> Handler: handler,
> }
> server.ListenAndServe()
> }
> ////////////////////////////////////
>
> And here's the pprof output:
> https://dl.dropbox.com/u/11537896/pprof11533.0.svg
>
> Any help optimizing this would be awesome!

> --
>
>

Chris Lu

unread,

Oct 15, 2012, 11:11:59 PM10/15/12

to Dave Cheney, golan...@googlegroups.com, jli.ju...@gmail.com

Actually I tried several approches, but setting/unsetting GOMAXPROCS, content-length. But all got similar results.

This graph is without GOMAXPROCS setting.

http://postimage.org/image/aurx4vvmn/full/

With the code, you should be able to get similar results easily.

Chris

Dave Cheney

unread,

Oct 16, 2012, 12:44:18 AM10/16/12

to Chris Lu, golan...@googlegroups.com, jli.ju...@gmail.com

I think the best solution is to resolve
https://code.google.com/p/go/issues/detail?id=3412, this will reduce
the amount of scheduler thrashing.

Dave Cheney

unread,

Oct 18, 2012, 6:53:16 AM10/18/12

to Chris Lu, golan...@googlegroups.com, jli.ju...@gmail.com

Chris, Justin,

If you are able, could you try this CL which partially addresses issue 3412.

http://codereview.appspot.com/6739043

Benchmark results and profile svgs would be great as I don't have a
test harness that can generate enough load.

Cheers

Dave

ChrisLu

unread,

Oct 18, 2012, 3:22:59 PM10/18/12

to golan...@googlegroups.com, Chris Lu, jli.ju...@gmail.com

Here is the profiling for GOMAXPROCS=1 and GOMAXPROCS=#ofCPUs, respectively, with the profiling graph.

The previous heavy usage of "syscall.Syscall" seems much less now and I consider it a very good change. However, the overall performance still stay almost the same though. The bottleneck seems on the "runtime.futex" now.

1) GOMAXPROCS=1

http://postimage.org/image/rvl8yckgh/

(pprof) top

Total: 3216 samples

1406 43.7% 43.7% 1406 43.7% runtime.futex

1216 37.8% 81.5% 1216 37.8% syscall.RawSyscall

324 10.1% 91.6% 327 10.2% syscall.Syscall

152 4.7% 96.3% 152 4.7% bytes.IndexByte

30 0.9% 97.3% 30 0.9% scanblock

6 0.2% 97.5% 7 0.2% sweepspan

5 0.2% 97.6% 6 0.2% syscall.Syscall6

3 0.1% 97.7% 4 0.1% MCentral_Alloc

3 0.1% 97.8% 7 0.2% net/textproto.(*Reader).ReadMIMEHeader

3 0.1% 97.9% 9 0.3% runtime.MCache_Alloc

(pprof) top --cum

Total: 3216 samples

2 0.1% 0.1% 3197 99.4% schedunlock

1 0.0% 0.1% 2331 72.5% net/http.(*conn).serve

1406 43.7% 43.8% 1406 43.7% runtime.futex

1 0.0% 43.8% 1401 43.6% runtime.entersyscall

0 0.0% 43.8% 1400 43.5% type..eq.[32]string

0 0.0% 43.8% 1398 43.5% runtime.futexwakeup

0 0.0% 43.8% 1398 43.5% runtime.notewakeup

0 0.0% 43.8% 1217 37.8% bufio.(*Writer).Flush

0 0.0% 43.8% 1217 37.8% net.(*conn).Write

0 0.0% 43.8% 1217 37.8% net.(*netFD).Write

2) GOMAXPROCS = number of CPUs

http://postimage.org/image/erx51udrb/full/

(pprof) top

Total: 4550 samples

1351 29.7% 29.7% 1351 29.7% runtime.futex

1181 26.0% 55.6% 1181 26.0% syscall.RawSyscall

588 12.9% 68.6% 588 12.9% runtime.usleep

439 9.6% 78.2% 443 9.7% syscall.Syscall

320 7.0% 85.3% 320 7.0% syscall.Syscall6

39 0.9% 86.1% 84 1.8% sweepspan

37 0.8% 86.9% 37 0.8% syscall.RawSyscall6

30 0.7% 87.6% 30 0.7% bytes.IndexByte

27 0.6% 88.2% 179 3.9% scanblock

25 0.5% 88.7% 25 0.5% runtime.memmove

(pprof) top --cum

Total: 4550 samples

0 0.0% 0.0% 3518 77.3% schedunlock

5 0.1% 0.1% 1910 42.0% net/http.(*conn).serve

1351 29.7% 29.8% 1351 29.7% runtime.futex

0 0.0% 29.8% 1188 26.1% net/http.(*response).finishRequest

0 0.0% 29.8% 1187 26.1% bufio.(*Writer).Flush

3 0.1% 29.9% 1187 26.1% net.(*conn).Write

1 0.0% 29.9% 1184 26.0% net.(*netFD).Write

1181 26.0% 55.8% 1181 26.0% syscall.RawSyscall

2 0.0% 55.9% 1180 25.9% syscall.WriteNB

0 0.0% 55.9% 1178 25.9% syscall.writeNB

Dave Cheney

unread,

Oct 18, 2012, 4:46:18 PM10/18/12

to jli.ju...@gmail.com, golan...@googlegroups.com, Chris Lu, jli.ju...@gmail.com

Well, this idea still needs validation. In theory using the NB variant should reduce scheduler overhead by not informing it the goroutine is about to block. However, if write(2) does more than copy the buffer into kernel space and return the number of bytes that fit then this approach probably isn't going to improve throughput. The best way to do this is to load test and profile.

On 19/10/2012, at 3:27, jli.ju...@gmail.com wrote:

Awesome, this looks really good! Once I get home I'll try it out and let you know how it goes.
Thanks for taking this on!

- Justin

--

ChrisLu

unread,

Oct 18, 2012, 5:23:41 PM10/18/12

to golan...@googlegroups.com, jli.ju...@gmail.com, Chris Lu

Please see my reply(2 hours ago) for the load test and profiling with your patch.

Chris

Dave Cheney

unread,

Oct 18, 2012, 5:30:31 PM10/18/12

to ChrisLu, golan...@googlegroups.com, jli.ju...@gmail.com, Chris Lu

Thanks Chris, sorry I didn't see your other reply till now. One of the causes of the high % of CPU spent in futex, I believe, is mutex contention. I'll keep investigating.

--

Dave Cheney

unread,

Oct 18, 2012, 9:12:55 PM10/18/12

to ChrisLu, golan...@googlegroups.com, jli.ju...@gmail.com

@Chris, what program are you using to simulate the client ? Are you
using siege like Justin ?

ChrisLu

unread,

Oct 18, 2012, 11:20:18 PM10/18/12

to golan...@googlegroups.com, ChrisLu, jli.ju...@gmail.com

I simply use "ab -n 10000 -c 3 http://localhost:8080/".

This is run comparing a Go helloword with Nginx. I got results like 3600req/sec vis 5500req/sec on my computer.

Chris

Dave Cheney

unread,

Oct 18, 2012, 11:23:32 PM10/18/12

to ChrisLu, golan...@googlegroups.com, jli.ju...@gmail.com

That explains why you are spending so much time in syscall.Close, ab
uses http/1.0 mode without persistent connections.

> --
>
>

Dave Cheney

unread,

Oct 18, 2012, 11:27:29 PM10/18/12

to jli.ju...@gmail.com, golan...@googlegroups.com, Chris Lu

Can you please try again, i've removed a few allocations in the Accept() path.

hg revert @6739043
hg clpach 6739043

then ./make.bash

will be sufficient.

The blocking on syscall.Accept and syscall.Close are probably
unavoidable, they are blocking syscalls, we we have to inform the
scheduler so it can park the goroutine. syscall.Accept may be fixable,
I'm pretty sure Close is not.

On Fri, Oct 19, 2012 at 1:25 PM, <jli.ju...@gmail.com> wrote:
> Just benchmarked the new code (also switched to the current version in the
> repository rather than the release version). Here's the pprof output:
> http://dl.dropbox.com/u/11537896/pprof7401.0.svg It looks like the time in
> syscall is still quite high.
> The transaction rate is up somewhat though, I'm hitting around 4400
> trans/sec consistently.

> --
>
>

ChrisLu

unread,

Oct 19, 2012, 2:25:48 AM10/19/12

to golan...@googlegroups.com, ChrisLu, jli.ju...@gmail.com

ok. To keep the persistent connections, I can use "ab -n 10000 -c 3 -k http://localhost:8080/".

The performance does improve. However, Nginx improves much more.

For Go helloworld vs Nginx, I got 5200req/sec vs 11600req/sec.

Here is the new profile with the persistent connections:

http://postimage.org/image/c5qq14p6b/full/

(pprof) top

Total: 6838 samples

1895 27.7% 27.7% 1895 27.7% runtime.futex

1847 27.0% 54.7% 1847 27.0% syscall.RawSyscall

648 9.5% 64.2% 653 9.5% syscall.Syscall6

176 2.6% 66.8% 194 2.8% syscall.Syscall

118 1.7% 68.5% 118 1.7% syscall.RawSyscall6

114 1.7% 70.2% 114 1.7% runtime.usleep

98 1.4% 71.6% 98 1.4% runtime.osyield

67 1.0% 72.6% 67 1.0% runtime.memmove

61 0.9% 73.5% 306 4.5% runtime.mallocgc

58 0.8% 74.3% 58 0.8% runtime.memhash

(pprof) top --cum

Total: 6838 samples

2 0.0% 0.0% 5867 85.8% schedunlock

15 0.2% 0.2% 3663 53.6% net/http.(*conn).serve

4 0.1% 0.3% 2195 32.1% net.(*pollServer).Run

8 0.1% 0.4% 2017 29.5% net/http.(*response).finishRequest

4 0.1% 0.5% 1921 28.1% bufio.(*Writer).Flush

9 0.1% 0.6% 1918 28.0% net.(*conn).Write

7 0.1% 0.7% 1914 28.0% net.(*netFD).Write

1895 27.7% 28.4% 1895 27.7% runtime.futex

2 0.0% 28.5% 1849 27.0% syscall.WriteNB

2 0.0% 28.5% 1849 27.0% syscall.writeNB

Chris

Devon H. O'Dell

unread,

Oct 19, 2012, 2:49:43 AM10/19/12

to ChrisLu, golan...@googlegroups.com, jli.ju...@gmail.com

2012/10/19 ChrisLu <chri...@gmail.com>:

> ok. To keep the persistent connections, I can use "ab -n 10000 -c 3 -k
> http://localhost:8080/".
>
> The performance does improve. However, Nginx improves much more.
> For Go helloworld vs Nginx, I got 5200req/sec vs 11600req/sec.
>
> Here is the new profile with the persistent connections:

Looks like a bunch of lock contention to me. I really wonder if we
could get better performance with user space locks in some cases.

--dho

> --
>
>

bryanturley

unread,

Oct 24, 2012, 2:53:05 PM10/24/12

to golan...@googlegroups.com

(pkg/runtime/sys_linux_amd64.s)
// int64 futex(int32 *uaddr, int32 op, int32 val,
// struct timespec *timeout, int32 *uaddr2, int32 val2);
TEXT runtime·futex(SB),7,$0
MOVQ 8(SP), DI
MOVL 16(SP), SI
MOVL 20(SP), DX
MOVQ 24(SP), R10
MOVQ 32(SP), R8
MOVL 40(SP), R9
MOVL $202, AX
SYSCALL
RET

A syscall for a mutex does seem overboard, but I am betting it is more portable.

Though it seems to have at some point in time increased the speed of apps
http://kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf

You guys talked about trying your own lock code, did that work out for you? I am curious.

Ian Lance Taylor

unread,

Oct 24, 2012, 4:07:35 PM10/24/12

to bryanturley, golan...@googlegroups.com

I'm not sure I understand your question, but that is not the lock
code. That is the code for the futex system call. The lock code for
a GNU/Linux system is in runtime/lock_futex.c. The lock code uses the
futex system call to wait for a lock to become available, but, e.g.,
acquiring an unlocked lock does not make a system call.

Ian

bryanturley

unread,

Oct 24, 2012, 4:32:08 PM10/24/12

to golan...@googlegroups.com

> I am curious.

I'm not sure I understand your question, but that is not the lock
code. That is the code for the futex system call.

Yeah I should have read more of the code I did start in lock_futex.c, read through it to fast. Mostly noted the massive style difference in go's c code to go's normal coding style...
I code in c almost identically to the default go coding style.
I was responding to this thread they have a pprof generated image that says runtime.futex() is taking 27% a few emails up.

The lock code for
a GNU/Linux system is in runtime/lock_futex.c. The lock code uses the
futex system call to wait for a lock to become available, but, e.g.,
acquiring an unlocked lock does not make a system call.

Ian

Oh that is good to know, never used the futex call.
Every time (before go) I have needed a mutex I have written my own, but they never worked with the kernel scheduler like futex does. Then again futex doesn't work on all (any?) non-linux platforms. Should have read the lock code more carefully. I wonder if spinning a little longer before calling the futex stuff would speed this up.
All in all I was curious whether they got the go http server faster in this case.

bryanturley

unread,

Oct 24, 2012, 4:47:19 PM10/24/12

to golan...@googlegroups.com

Ian just to be clear "You guys talked about trying your own lock code, did that work out for you? I am curious."
Was referring to the people on this thread not the go developers as a whole. I trust you guys ;)

ChrisLu

unread,

Oct 24, 2012, 5:04:32 PM10/24/12

to golan...@googlegroups.com

As the original poster, I would say this performance problem still exists.

Just by looking at the graph, the runtime.futex costs 27.7% time. They seem to be all cost for scheduling the goroutines.

If so, this seems a very high cost just to have the convenience of goroutines.

http://postimage.org/image/c5qq14p6b/full/

The Go scheduler grows goroutines on an as needed bases one by one, and never releases back the idle goroutines.

Here seems a good opportunity to pool the goroutines more efficiently. Am I right?

Chris
http://weed-fs.googlecode.com

bryanturley

unread,

Oct 24, 2012, 5:40:34 PM10/24/12

to golan...@googlegroups.com

On Wednesday, October 24, 2012 4:04:32 PM UTC-5, ChrisLu wrote:

As the original poster, I would say this performance problem still exists.

Just by looking at the graph, the runtime.futex costs 27.7% time. They seem to be all cost for scheduling the goroutines.
If so, this seems a very high cost just to have the convenience of goroutines.

http://postimage.org/image/c5qq14p6b/full/

The Go scheduler grows goroutines on an as needed bases one by one, and never releases back the idle goroutines.

You only get new goroutines by using the go keyword. So you or a library you are using (http) grow them on an as needed basis the scheduler just schedules them. If your goroutine exits it doesn't come back to life at any point.
Though it might be interesting to know how many goroutines are alive/created/die during this test.

I don't think it is the schedulers job to release idle goroutines either. If one is idle and it is no longer needed it should be killed by it's creator not by a 2nd/3rd party observer.

Here seems a good opportunity to pool the goroutines more efficiently. Am I right?

I have read (I think) that there is a new scheduler coming soonish that might fix this. Though pool is not the word I would use... manage perhaps.
If you want to get crazy you could try changing the spin counts yourself that might make it better, could also make it way way worse.
I wouldn't recommend it unless you have some experience at that level already though.

ChrisLu

unread,

Oct 24, 2012, 5:41:04 PM10/24/12

to golan...@googlegroups.com

Could be a similar issue to http://code.google.com/p/go/issues/detail?id=2933

However it is marked as "Go 1.1 maybe". Hope it will not take forever to fix.

Chris

bryanturley

unread,

Oct 24, 2012, 5:44:28 PM10/24/12

to golan...@googlegroups.com

On Wednesday, October 24, 2012 4:40:34 PM UTC-5, bryanturley wrote:

On Wednesday, October 24, 2012 4:04:32 PM UTC-5, ChrisLu wrote:
As the original poster, I would say this performance problem still exists.

It is a mostly synthetic benchmark as well though. If your code did a little more work before returning data I bet it would even the numbers out.
I also wouldn't call it a problem, it is just that nginx is faster for the moment.

ChrisLu

unread,

Oct 25, 2012, 3:07:22 AM10/25/12

to golan...@googlegroups.com

Not really just synthetic. Golang is good for system programming, and many use cases involve providing web services. For my case, the Weed File System project, it is meant to serve static content via http. Even the algorithm is fast and efficient, it is embarrassing that static file serving are so much slow.

Nginx is just one web server. There are many other servers much faster too.

If this is a known slow performance problem, probably we should get warned during the Golang "close to the metal" marketing.

The Go scheduler clearly can be much more efficient. Don't call the benchmark synthetic, and let's look at the real issue.

Chris

http://weed-fs.googlecode.com

Dustin

unread,

Oct 25, 2012, 3:18:48 AM10/25/12

to golan...@googlegroups.com

On Thursday, October 25, 2012 12:07:22 AM UTC-7, ChrisLu wrote:

Not really just synthetic. Golang is good for system programming, and many use cases involve providing web services. For my case, the Weed File System project, it is meant to serve static content via http. Even the algorithm is fast and efficient, it is embarrassing that static file serving are so much slow.

Nginx is just one web server. There are many other servers much faster too.

If this is a known slow performance problem, probably we should get warned during the Golang "close to the metal" marketing.

The Go scheduler clearly can be much more efficient. Don't call the benchmark synthetic, and let's look at the real issue.

To be fair, you're saying the web server performance is ~20% slower than your custom C++ solution and maybe around half the speed of the fastest hand optimized web server you can find. While it's certainly possible that it can get faster, I think it's a bit unreasonable to call this embarrassing.

Joubin Houshyar

unread,

Oct 25, 2012, 10:49:07 AM10/25/12

to golan...@googlegroups.com

That is fair, Dustin. But the subtext here is whether there even exists a path for a hand-optimized/non-idiomatic WFS. I suggest that it would be generally helpful for the community (and also a plus for Go the language) to address this type of concern and provide guidance for improving performance, and not dismiss them out of hand.

/R

Ethan Burns

unread,

Oct 25, 2012, 11:15:24 AM10/25/12

to golan...@googlegroups.com

On Thursday, October 25, 2012 10:49:07 AM UTC-4, Joubin Houshyar wrote:

That is fair, Dustin. But the subtext here is whether there even exists a path for a hand-optimized/non-idiomatic WFS. I suggest that it would be generally helpful for the community (and also a plus for Go the language) to address this type of concern and provide guidance for improving performance, and not dismiss them out of hand.

I don't think that anything has been dismissed. From what I understand, this has been classified as issue 2933 (http://code.google.com/p/go/issues/detail?id=2933) which is marked as Go1.1 maybe (on the TODO list, but not at the top of it). There are many other things that need to be fixed before Go 1.1 is ready (http://swtch.com/~rsc/go11.html), so I am sure that everyone is quite busy.

Best,

Ethan

Joubin Houshyar

unread,

Oct 25, 2012, 12:08:19 PM10/25/12

to golan...@googlegroups.com

Just in case it is not clear: Comment was a general suggestion and certainly not directed at Dustin, and, imo it is /perfectly reasonable/ that version 1.0 of Go runtime (or any platform for that matter) is not matching the performance of stacks that have been under development for years. Remember this? http://openmap.bbn.com/~kanderso/performance/java/index.html

Best,
Ethan

/R

Dave Cheney

unread,

Oct 28, 2012, 1:09:48 PM10/28/12

to Joubin Houshyar, golan...@googlegroups.com

http://codereview.appspot.com/6813046/

Could those with suitable test harnesses please comment with benchmark
numbers if this change produces an improvement.

> --
>
>

Rob Pike

unread,

Oct 28, 2012, 1:13:13 PM10/28/12

to Dave Cheney, Joubin Houshyar, golan...@googlegroups.com

I don't like the change in semantics here. A blocking operation has
silently become non-blocking.

-rob

Job van der Zwan

unread,

Oct 28, 2012, 1:55:16 PM10/28/12

to golan...@googlegroups.com, Dave Cheney, Joubin Houshyar

Uhm... the function is called "WriteNB", isn't that pretty explicit?

Joubin Houshyar

unread,

Oct 28, 2012, 9:06:21 PM10/28/12

to Job van der Zwan, golan...@googlegroups.com, Dave Cheney

On Sun, Oct 28, 2012 at 1:55 PM, Job van der Zwan <j.l.van...@gmail.com> wrote:

Uhm... the function is called "WriteNB", isn't that pretty explicit?

He means http://golang.org/pkg/io/#Writer

David Anderson

unread,

Oct 28, 2012, 9:11:11 PM10/28/12

to Joubin Houshyar, Job van der Zwan, golan...@googlegroups.com, Dave Cheney

Check the comments on the code review. That concern is resolved, so assuming other things also get resolved, we should get a decent network performance boost. Yay.

- Dave

--

Dave Cheney

unread,

Nov 21, 2012, 12:33:07 AM11/21/12

to David Anderson, Joubin Houshyar, Job van der Zwan, golan...@googlegroups.com

Hello,

Some new results are available

https://codereview.appspot.com/6813046/#msg24

I am interested too see if others can verify or contradict my results.

Cheers

Dave

Reply all

Reply to author

Forward