httputil.ReverseProxy 40% slower with go 1.5.1 compared to go 1.4.1

570 views
Skip to first unread message

Frank Schröder

unread,
Sep 23, 2015, 4:25:55 PM9/23/15
to golang-nuts
I've written a dynamic HTTP reverse proxy which uses the httputil.ReverseProxy and my go 1.5.1 version is 40% slower than the go 1.4.1 version. OS is linux/amd64.

This is reproducible with http://play.golang.org/p/smlmZl1MgJ

which I then run with 

./proxy -target http://somehost/ 

and then run a load test with weighttp to get the following results:

$ ./weighttp -n 1000000 -c 100 -t 16 "http://host:port/somePath"

direct: 14500 req/sec
go1.4.1: 12000 req/sec
go1.5.1: 7000 req/sec

The response for my test case is 2300 bytes.

I've attached the cpu profile graphs. Based on that I suspected copyResponse is generating a lot of garbage and tried adding a sync.Pool as follows:

net/http/httputil/reverseproxy.go:

var pool = sync.Pool{
New: func() interface{} { return make([]byte, 4096) },
}

func (p *ReverseProxy) copyResponse(dst io.Writer, src io.Reader) {
...
buf := pool.Get().([]byte)
io.CopyBuffer(dst, src, buf)
pool.Put(buf)
}

But that didn't help. Next assumption is that it spends more time during scheduling. Someone else has an idea where to look?

Frank
proxy-1.4.1-cpu.pdf
proxy-1.5.1-cpu.pdf

Matthew Holt

unread,
Sep 23, 2015, 10:57:42 PM9/23/15
to golang-nuts
I've noticed a similar regression with Caddy. Use GOGC=0 and you'll (probably) notice performance is back up to where it was; so this slowdown is probably just due to Go 1.5's new concurrent garbage collector... although complete pauses are nearly none, it does use more CPU during execution, eating up some cycles that could be used to serve requests.

At least, that's my theory -- I haven't had time to investigate and confirm it yet.

Frank Schröder

unread,
Sep 24, 2015, 3:06:06 AM9/24/15
to golang-nuts
That was the hint in the right direction. 

Looks like GOGC=100 is too aggressive for this app. GOGC=200 gets me in the range of go1.4.1 and GOGC=400 or 800 even past that. 

Didn't see any major latency spikes but I also didn't look too hard yet.

Frank

Sokolov Yura

unread,
Sep 24, 2015, 4:13:12 AM9/24/15
to golang-nuts
Can it be maxLatencyWriter.Write? What if

func (m *maxLatencyWriter) Write(p []byte) (int, error) {
m.lk.Lock()
n, err := m.dst.Write(p)
m.lk.Unlock()
return n, err
}

Giulio Iotti

unread,
Sep 24, 2015, 4:22:57 AM9/24/15
to golang-nuts
        defer m.lk.Unlock()
        return m.dst.Write(p)
  }

-- 
Giulio Iotti

Frank Schröder

unread,
Sep 24, 2015, 4:23:52 AM9/24/15
to golang-nuts
I doubt that this would have such an impact but I'll check. 

Apparently, setting GOGC=800 also helps Go 1.4. Benchmark went from 12k req/sec to 17k req/sec.


On Thursday, September 24, 2015 at 10:13:12 AM UTC+2, Sokolov Yura wrote:

Frank Schröder

unread,
Sep 24, 2015, 4:24:49 AM9/24/15
to golang-nuts
That's the way it is currently written. I think the motivation is that "defer" has a cost although I'm not sure whether that is significant.

Giulio Iotti

unread,
Sep 24, 2015, 4:26:41 AM9/24/15
to golang-nuts
On Thursday, September 24, 2015 at 10:24:49 AM UTC+2, Frank Schröder wrote:
That's the way it is currently written. I think the motivation is that "defer" has a cost although I'm not sure whether that is significant.

Sorry, should have looked. Defer has a (small) price, but it hasn't changed between Go versions.

-- 
Giulio Iotti 

Frank Schröder

unread,
Sep 24, 2015, 4:33:57 AM9/24/15
to golang-nuts
Couldn't see a noticeable difference. So far GOGC=800 made the biggest dent for both go 1.4 and go 1.5 in my case.


On Thursday, September 24, 2015 at 10:13:12 AM UTC+2, Sokolov Yura wrote:

Dave Cheney

unread,
Sep 24, 2015, 4:54:27 AM9/24/15
to golang-nuts
That probably means your program is too allocation heavy. Use the --allocation_object mode in pprof to find the highest creator of short lived objects.

Use GODEBUG=gctrace=1 to monitor the frequency of GC events.

If your program is constantly in GC, sync.pool will be ineffective.

Frank Schröder

unread,
Sep 24, 2015, 6:00:49 AM9/24/15
to golang-nuts
Most allocs are net/textproto ReadMIMEHeader. I don't really alloc anything in my code. Need to see whether there is something I can do.

With GOGC=100 my app spends 10% of time in the GC. With GOGC=800 this drops to 1-2%.

$ go tool pprof -alloc_objects -text proxy proxy-1.5.1-mem.pprof
10273938 of 10427199 total (98.53%)
Dropped 61 nodes (cum <= 52135)
      flat  flat%   sum%        cum   cum%
   1340949 12.86% 12.86%    1463923 14.04%  net/textproto.(*Reader).ReadMIMEHeader
    701099  6.72% 19.58%     701099  6.72%  net/textproto.MIMEHeader.Add
    627736  6.02% 25.60%    1160854 11.13%  net/http.(*persistConn).roundTrip
    610932  5.86% 31.46%    4048902 38.83%  net/http/httputil.(*ReverseProxy).ServeHTTP
    496070  4.76% 36.22%     496070  4.76%  net.uitoa
    466555  4.47% 40.69%     466555  4.47%  net/http.Header.clone
    459575  4.41% 45.10%     459575  4.41%  net/http.(*conn).closeNotify
    422711  4.05% 49.16%     778008  7.46%  net/http.(*Request).write
    415899  3.99% 53.14%    1813537 17.39%  net/http.(*persistConn).readLoop
    377053  3.62% 56.76%     633368  6.07%  io.copyBuffer
    325522  3.12% 59.88%     325522  3.12%  net/textproto.MIMEHeader.Set
    314856  3.02% 62.90%     314856  3.02%  time.NewTimer
    299300  2.87% 65.77%    1180637 11.32%  net/http.(*Server).newConn
    219934  2.11% 67.88%     219934  2.11%  net/textproto.(*Reader).ReadLine
    216475  2.08% 69.96%     216475  2.08%  net/http/httputil.NewSingleHostReverseProxy
    212564  2.04% 72.00%    1397595 13.40%  net/http.ReadResponse
    211633  2.03% 74.03%    1354273 12.99%  net/http.(*conn).readRequest
    208596  2.00% 76.03%     208596  2.00%  syscall.anyToSockaddr
    202912  1.95% 77.97%     202912  1.95%  net.sockaddrToTCP
    202654  1.94% 79.92%     202654  1.94%  net/http.readTransfer
    201709  1.93% 81.85%     201709  1.93%  net/http.newTransferWriter
    179387  1.72% 83.57%     182458  1.75%  net/http.newBufioWriterSize
    122974  1.18% 84.75%     122974  1.18%  net/textproto.(*Reader).readContinuedLineSlice
    116337  1.12% 85.87%     116337  1.12%  net.newTCPConn
    113691  1.09% 86.96%     113691  1.09%  net.(*conn).Read
    112090  1.07% 88.03%     508638  4.88%  net.IP.String
    104429  1.00% 89.03%     104429  1.00%  net.JoinHostPort
    103658  0.99% 90.03%     515166  4.94%  net.(*netFD).accept
    103113  0.99% 91.02%    1042956 10.00%  net/http.ReadRequest
    102850  0.99% 92.00%     102850  0.99%  strings.genSplit
    101240  0.97% 92.97%     101240  0.97%  net/url.parse
    100804  0.97% 93.94%    1261787 12.10%  net/http.(*Transport).RoundTrip
     98174  0.94% 94.88%      98174  0.94%  net/http/httputil.singleJoiningSlash
     94392  0.91% 95.79%     343563  3.29%  net/http.(*response).ReadFrom
     89277  0.86% 96.64%     103588  0.99%  sync.(*Pool).Put
     84834  0.81% 97.46%      85974  0.82%  net/http.newBufioReader
     55965  0.54% 97.99%      55965  0.54%  fmt.(*fmt).padString
     28770  0.28% 98.27%      82130  0.79%  net/http.Header.sortedKeyValues
     27026  0.26% 98.53%      82991   0.8%  fmt.(*pp).doPrintf
       193 0.0019% 98.53%     778201  7.46%  net/http.(*persistConn).writeLoop

Reply all
Reply to author
Forward
0 new messages