Go http/2 implementation is x5 less per performant than http/1.1

3,039 views
Skip to first unread message

Kirth Gersen

unread,
Nov 8, 2021, 12:59:31 PM11/8/21
to golang-nuts
http/2 implementation seems ~5x slower in bytes per seconds (when transfer is cpu capped).


I submitted an issue about this 3 months ago in the Go Github ( https://github.com/golang/go/issues/47840 ) but first commenter misunderstood it and it got buried (they're probably just swamped with too many open issues (5k+...)).

Everything using Golang net/http is impacted, the Caddy web server for instance.

I know it probably doesn't matter for most use cases because it's only noticeable with high throughput transfers (>1 Gbps). 
Most http benchmarks focus on "requests per second" and not "bits per seconds" but this performance matters too sometimes.

If anyone with expertise in profiling Go code and good knowledge of the net/http lib internal could take a look. It would be nice to optimize it or at least have an explanation.

thx (sorry if wrong  group to post this).

Andrew Rodland

unread,
Nov 8, 2021, 3:37:30 PM11/8/21
to golang-nuts
I also maintain an app that moves several Gbit/s of data, and have noticed that it bottlenecks while using http2, but handles much more throughput per instance, with lower tail latency, when running under GODEBUG=http2client=0. Possibly relevant is that it makes a large number of requests to a single upstream https host (like the PoC here does). I can't share source or any detailed data, but I'm willing to discuss further, and to test-drive any workarounds or fixes. I definitely believe there's a real performance regression here, at least under particular circumstances.

Thanks,

Andrew

robert engels

unread,
Nov 9, 2021, 12:28:16 PM11/9/21
to Kirth Gersen, golang-nuts
I did a review of the codebase.

Http2 is a multiplexed protocol with independent streams. The Go implementation uses a common reader thread/routine to read all of the connection content, and then demuxes the streams and passes the data via pipes to the stream readers. This multithreaded nature requires the use of locks to coordinate. By managing the window size, the connection reader should never block writing to a steam buffer - but a stream reader may stall waiting for data to arrive - get descheduled - only to be quickly rescheduled when reader places more data in the buffer - which is inefficient.

Out of the box on my machine, http1 is about 37 Gbps, and http2 is about 7 Gbps on my system.

Some things that jump out:

1. The chunk size is too small. Using 1MB pushed http1 from 37 Gbs to 50 Gbps, and http2 to 8 Gbps.

2. The default buffer in io.Copy() is too small. Use io.CopyBuffer() with a larger buffer - I changed to 4MB. This pushed http1 to 55 Gbs, and http2 to 8.2. Not a big difference but needed for later.

3. The http2 receiver frame size of 16k is way too small. There is overhead on every frame - the most costly is updating the window.

I made some local mods to the net library, increasing the frame size to 256k, and the http2 performance went from 8Gbps to 38Gbps.

4. I haven’t tracked it down yet, but I don’t think the window size update code is not working as intended - it seems to be sending window updates (which are expensive due to locks) far too frequently. I think this is the area that could use the most improvement - using some heuristics there is the possibility to detect the sender rate, and adjust the refresh rate (using high/low water marks).

5. The implementation might need improvements using lock-free structures, atomic counters, and busy-waits in order to achieve maximum performance.

So 38Gbps for http2 vs 55 Gbps for http1. Better but still not great. Still, with some minor changes, the net package could allow setting of a large frame size on a per stream basis - which would enable much higher throughput. The gRPC library allows this.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/89926c2f-ec73-43ad-be49-a8bc76a18345n%40googlegroups.com.

Kirth Gersen

unread,
Nov 9, 2021, 1:18:47 PM11/9/21
to golang-nuts
Great !

I made some local mods to the net library, increasing the frame size to 256k, and the http2 performance went from 8Gbps to 38Gbps.
That is already enormous for us. thx for finding this.

4 -> Indeed  a lot of WINDOW_UPDATE messages are visible when using GODEBUG=http2debug=1 

Robert Engels

unread,
Nov 9, 2021, 4:32:10 PM11/9/21
to Kirth Gersen, golang-nuts
To be clear, I have no plans to submit a Cl to improve this at this time. 

It would require some api changes to implement properly. 

On Nov 9, 2021, at 12:19 PM, Kirth Gersen <kirth...@gmail.com> wrote:

Great !

robert engels

unread,
Nov 9, 2021, 6:50:34 PM11/9/21
to Kirth Gersen, golang-nuts
Well, I figured out a way to do it simply. The CL is here https://go-review.googlesource.com/c/net/+/362834

The frame size will be used for all connections using that transport, so it is probably better to create a transport specifically for the high-throughput transfers.

You can also create perform single shot requests like:

if useH2C {
   rt = &http2.Transport{
      AllowHTTP: true,
      DialTLS: func(network, addr string, cfg *tls.Config) (net.Conn, error) {
         return dialer.Dial(network, addr)
      },
      MaxFrameSize: 1024*256,
   }
}

var body io.ReadCloser = http.NoBody

req, err := http.NewRequestWithContext(ctx, "GET", url, body)
if err != nil {
   return err
}

resp, err := rt.RoundTrip(req)

Andrey T.

unread,
Nov 10, 2021, 1:59:58 PM11/10/21
to golang-nuts
Fellas,
I would say the 5x throughput difference is a serious problem.Would you be kind and open an issue on github about it?
Also, the PR that you have might benefit from explanation about what you are trying to solve (and probably link to an issue on github), so it would get more attention.

Thanks!

Andrey

robert engels

unread,
Nov 10, 2021, 3:22:42 PM11/10/21
to Andrey T., golang-nuts
As reported in the OP, the issue was filed long ago https://github.com/golang/go/issues/47840

My CL https://go-review.googlesource.com/c/net/+/362834 is a viable fix (and should of been supported originally).

Andrey T.

unread,
Nov 10, 2021, 3:30:58 PM11/10/21
to golang-nuts
Thank you Robert,
I somehow missed the reference to the ticket in the first message, sorry about that.

As for the CL - I think adding link to the github issue, and add a bit of explanation in a commit message would help.
I added link to your CL to the github issue's discussion, hopefully it will bring more attention to it.

A.

robert engels

unread,
Nov 10, 2021, 8:05:33 PM11/10/21
to Andrey T., golang-nuts
No worries. I updated the issue and the CL. I will comment in the CL with a few more details.

robert engels

unread,
Nov 13, 2021, 8:11:40 PM11/13/21
to Andrey T., golang-nuts
As another data point, I decided to test a few implementations of http2 downloads on OSX.

Using a Go server with default frame size (16k):

Go client:  900 MB/s
Java client: 1300 MB/s
curl: 1500 MB/s

Using a Java server with default frame size (16k):

Go client: 670 MB/s
Java client: 720 MB/s
curl: 800 M/s

Using Go server using 256k client max frame size:

Go client: 2350 MB/s
Java client: 2800 MB/s
h2load: 4300 MB/s

Using Java server using 256k client max frame size:

Go client: 2900 MB/s
Java client: 2800 MB/s
h2load: 3750 MB/s

For h2load, I needed to create a PR to allow the frame size to be set, see https://github.com/nghttp2/nghttp2/pull/1640

Kevin Chowski

unread,
Nov 15, 2021, 10:23:14 AM11/15/21
to golang-nuts
These are interesting results, thanks for investigating and sharing results!

I see that you have mostly been focusing on throughput in your posts, have you done testing for latency differences too?

Robert Engels

unread,
Nov 15, 2021, 11:32:48 AM11/15/21
to Kevin Chowski, golang-nuts
Since http2 multiplexes streams it will delicately affect latency on other streams. This is why I suggested using multiple transports - one for high throughput transfers and another for lower latency “interactive” sessions. 

On Nov 15, 2021, at 9:23 AM, Kevin Chowski <Ke...@chowski.com> wrote:

These are interesting results, thanks for investigating and sharing results!

Kirth Gersen

unread,
Nov 19, 2021, 11:58:25 AM11/19/21
to golang-nuts
Your CL works well with the POC.

Side question not specific to this issue: how to test changes to golang.org/x/net with net/http ?
The 'h2_bundle' trick with go generate & bundle requires to fork the std lib too ? 
I have a hard time figuring how to do this. I tried with gotip but I get an error with "gotip generate" (bundle: internal error: package "strings" without types was imported from "golang.org/x/net/http2")
Any doc/tutorial on how to deal with this 'bundle' trick ?

thx

Robert Engels

unread,
Nov 19, 2021, 12:34:11 PM11/19/21
to Kirth Gersen, golang-nuts

Use the replace directive to point the net package to your local copy. 

Much easier to test X changes than stdlib changes. 

On Nov 19, 2021, at 10:58 AM, Kirth Gersen <kirth...@gmail.com> wrote:

Your CL works well with the POC.

Kirth Gersen

unread,
Nov 19, 2021, 1:26:22 PM11/19/21
to golang-nuts
(sorry for the previous reply, wrong account)
and sorry I wasn't clear enough: I already use the replace directive for x/net. this works fine for the POC but only when explicitly using a http2 transport.
But this doesn't for the net/http because of the h2_bundle thing. see https://www.youtube.com/watch?v=FARQMJndUn0&t=814s
So for instance if I change the line golang.org/x/net/http2/transport.go:2418 like you suggested in github it's not working with the default http server unless I rebuild the std lib.

Jean-François Giorgi

unread,
Nov 19, 2021, 1:34:03 PM11/19/21
to Robert Engels, Kirth Gersen, golang-nuts
I already use the replace directive for but this doesn't for the net/http h2_bundle thing. see https://www.youtube.com/watch?v=FARQMJndUn0&t=814s

robert engels

unread,
Nov 20, 2021, 12:40:35 AM11/20/21
to Jean-François Giorgi, Kirth Gersen, golang-nuts
Correct. If you refer to the code in the OP you will see that different transports are created in order to test http1 vs http2.

If you wish to use the default behavior to use http2 you need to rebuild the stdlib (at least as far as I understand the issue with h2_bundle).

My patches only affect the client not the server, BUT, I would assume similar changes are needed on the server side to support highly efficient large uploads.

As an aside, the whole issue the circular dependency and http1/http2 highlights a the need to use more interfaced based design. If that was done here and the interfaces segregated from the implementations - there would of been no need for h2_bundle. There is a reason Java allows both circular dependencies (to a point) and uses factories - it is often a far similar pattern.

Reply all
Reply to author
Forward
0 new messages