Websocket benchmark

1,884 views
Skip to first unread message

Ryan Slade

unread,
Jun 13, 2012, 12:59:05 PM6/13/12
to golan...@googlegroups.com
Hi

Just saw this post on Hackernews and thought it would be interesting to see your take on it:

Any idea why the latency is so much higher than Java and Erlang?

Cheers
Ryan

Rémy Oudompheng

unread,
Jun 13, 2012, 1:32:40 PM6/13/12
to Ryan Slade, golan...@googlegroups.com
Go's scheduler works in a kind of round-robin fashion. If a Go server
has many outstanding requests, it can happen (depending on the design)
that if a request consists of steps A, B, C:
- all goroutines do A
- all goroutines do B
- all goroutines do C.
so that total throughput is still very high but each given request
take a long time to complete.
It is possible that a lower latency is obtained if the number of
goroutines in a given state is limited, so that they finish before new
requests begin being processed.

Rémy.

andrey mirtchovski

unread,
Jun 13, 2012, 1:39:08 PM6/13/12
to Ryan Slade, golan...@googlegroups.com
> Any idea why the latency is so much higher than Java and Erlang?

I bet the call to GOMAXPROCS is unnecessary here as the handlers never do enough work to justify the time lost communicating between processes.

I reran the tests on an old linux box, here's a comparison between gomaxprocs set to 2 and unset:

gomaxprocs = 2:

         {latency,
             [{min,56},
              {max,4401},
              {arithmetic_mean,169.37548638132296},
              {geometric_mean,156.3471431259773},
              {harmonic_mean,148.6543226548309},

gomaxprocs = 1:

         {latency,
             [{min,56},
              {max,418},
              {arithmetic_mean,160.74610894941634},
              {geometric_mean,154.35521105320353},
              {harmonic_mean,147.75170003484243},
              {median,146},

jorelli

unread,
Jun 13, 2012, 4:53:49 PM6/13/12
to golan...@googlegroups.com, Ryan Slade
the author doesn't seem to point out that Java delivered half the messages that Go delivered over the same time frame, and nobody else really seems to notice.  People are all "wow Java is so much faster", but... it only handled half the amount of connections.

Erlang and Go have min latencies of 339 and 396, with maxes of 133716 and 49339496.  So Go is within 20% of Erlang in the best case, and 368 times slower in the worst case?  What?  That seems way too big of a spread to be irreconcilable.  Something seems fishy about that.  I wonder if has to do with buffer allocation in io.Copy, which is what he's doing in the websocket handler.  From what I can tell, that's a 32KB memory overhead for every connection, so... if there's 10k of those... he's allocating 320MB of memory for the buffers?  I'm hoping I'm wrong about that and I'm misunderstanding something, because that seems like a fair amount of buffering overhead.  AWS m1.medium instances are also either 32 or 64 bit; he doesn't say whether it's a 32 or 64 bit instance; iirc, Go's garbage collection doesn't perform the same on 32 bit machines as it does on 64 bit machines.  But that's just what I spot with a quick glance; I haven't spent much time learning about performance analysis in Go.

pico

unread,
Jun 14, 2012, 4:42:07 AM6/14/12
to golan...@googlegroups.com
In this case, Java (or Scala with Akka) has excellence networking performance, sadly to see Go isn't possible for my project.
In some situation if you need C alike, G-wan web application server has the highest performance although it has been RC
release for months and seem the developer has disappear, still, you will get the idea how it can performance better than Java
using specific Linux API. I wish Go has this feature too in order to excel itself as a viable solution for web server!

pico

unread,
Jun 14, 2012, 4:49:30 AM6/14/12
to golan...@googlegroups.com
Oh, correction: it should be libsxe instead of G-wan which has the highest performance by almost 2x. Perhaps Google may find this useful to
implement this architecture?

André Moraes

unread,
Jun 14, 2012, 8:27:05 AM6/14/12
to jorelli, golan...@googlegroups.com
> irreconcilable.  Something seems fishy about that.  I wonder if has to do
> with buffer allocation in io.Copy, which is what he's doing in the websocket
> handler.  From what I can tell, that's a 32KB memory overhead for every
> connection, so... if there's 10k of those... he's allocating 320MB of memory
> for the buffers?  I'm hoping I'm wrong about that and I'm misunderstanding

Most of the time, io.Copy is used to copy large ammounts of data, and
not a single line with time information.

Also, since he was testing using pulibc IP he was also testing the
speed of it's client code and the speed of the network link between
the server and the client.

Too much variables that are out-of-control.

I was rewriting some of the testing using go websocket client/server
on localhost, and the times was in the scale of micro-seconds. (even
using io.Copy on a 32 bit machine).


--
André Moraes
http://amoraes.info

roger peppe

unread,
Jun 14, 2012, 9:46:19 AM6/14/12
to André Moraes, jorelli, golan...@googlegroups.com
FWIW I also wondered what was going on. I doubt that
io.Copy is anything to do with it. I suspect the scheduler.

I wrote a little bit of code to test it (to play with it,
go get code.google.com/p/rog-go/exp/cmd/websocket...)

It seems there's a big difference between MAXGOPROCS=1
and MAXGOPROCS=2 (runtime.NumCPU reports 4 on my machine).
Its unusual that MAXGOPROCS>1 speeds things up so much.

Some sample runs (I needed the long wait between runs to
prevent the network stack from running out of local addresses;
there's probably a better way):

# sysctl -w 'net.core.netdev_max_backlog=2500'
# ulimit -n 30000
# for i in 1 2 3 4; do
echo ncpu $i
GOMAXPROCS=$i websocket-stress | websocket-analyse
sleep 300
done
ncpu 1
total 31.95649s
latency: min 47us; max 862.475ms; mean 268.299392ms; median 220.062ms
connect: min 246us; max 16.983571s; mean 1.987242486s; median 284.384ms
delay: min 1us; max 120.636ms; mean 2.2523ms; median 1.941ms
ncpu 2
total 16.191334s
latency: min 35us; max 602.721ms; mean 132.065983ms; median 40.62ms
connect: min 200us; max 1.028406s; mean 93.174134ms; median 4.979ms
delay: min 0; max 72.225ms; mean 1.497035ms; median 1.292ms
ncpu 3
total 14.619608s
latency: min 30us; max 357.473ms; mean 49.460422ms; median 5.347ms
connect: min 212us; max 369.663ms; mean 18.330433ms; median 1.314ms
delay: min 1us; max 57.4ms; mean 1.346355ms; median 1.145ms
ncpu 4
total 13.646085s
latency: min 32us; max 73.886ms; mean 6.128948ms; median 1.294ms
connect: min 200us; max 100.802ms; mean 4.921183ms; median 1.028ms
delay: min 0; max 45.42ms; mean 1.264279ms; median 1.123ms
#

andrey mirtchovski

unread,
Jun 18, 2012, 6:40:34 PM6/18/12
to roger peppe, André Moraes, jorelli, golan...@googlegroups.com
the author supposedly posted this on reddit:

"I wrote the thing and I can tell you that yes, these results are
pretty much crap. I incorrectly used Folsom, the project I used to
collect the stats. Folsom is meant for realtime monitoring and not
what I used it for. It only keeps 60 second windows of the data it
produces. That means all times prior to the fine 60 second window was
thrown out."

http://www.reddit.com/r/programming/comments/v5ap8/web_server_benchmark_haskellsnap_erlang/c51wv9i

roger peppe

unread,
Jun 19, 2012, 5:34:49 AM6/19/12
to andrey mirtchovski, André Moraes, jorelli, golan...@googlegroups.com
that should be tempered by the fact that he replied to that
post later with:
"I'm sorry I was mistaken that Folsom only keeps a 60 second window.
It has support for sliding windows but its default is to use the
entire dataset for histograms and percentiles. I take back my
statement that the result were pretty much crap. They're only slightly
crappy because I used a m1.medium box."

unread,
Jun 19, 2012, 8:38:11 AM6/19/12
to golan...@googlegroups.com
As pointed out by Rémy, the increased latency has to do with Go's scheduler.

The goroutine scheduler does not know that each HTTP request is a task that completes in about 1...100 milliseconds and that HTTP requests should (in the majority of cases) terminate in the order of their arrival.

I plan to take a look at this sometime in the future and fix this issue (unless somebody fixes it sooner than me) by enabling Go programs to send hints to Go's scheduler. Sending information to the scheduler seems to be the best option of how to fix this issue.

Ryan Slade

unread,
Jul 23, 2012, 7:04:11 AM7/23/12
to golan...@googlegroups.com
The author has re-run the benchmarks and has been much more rigorous this time:

The Go version is still a little behind in terms of latency, but not much.
Reply all
Reply to author
Forward
0 new messages