I was recently profiling one of my golang servers that uses bi-directional streaming gRPC both upstream and
downstream, and trying to debug some tail latency issues. Looking at mutex profile, I see that a lot of time is
spent on lock contention in http2Client.handleData, specifically, here are the snippet of the profile and 2 lines
caught my attention: line 950 and line 971 (highlighted below).
From a quick look at the code it looks like line 971 is not avoidable. However 950 can probably be disabled if
I don't use dynamic window size. I'm not sure whether I should be doing that because it might have unforeseen
side effect but I wonder if anyone could help me to see if this looks like a reasonable profile and what I can do
to reduce the overall contention (right now line 950 accounts for 64% of all delays/contentions for the entire
server). Also I'd really like to learn what's the pros and cons for a non-default initial window size.
Type: delay
Time: Jun 15, 2020 at 12:37am (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list handleData
Total: 1.39hrs
0 1.12hrs (flat, cum) 80.61% of Total
. . 939: if sendBDPPing {
. . 940: // Avoid excessive ping detection (e.g. in an L7 proxy)
. . 941: // by sending a window update prior to the BDP ping.
. . 942:
. . 943: if w := t.fc.reset(); w > 0 {
. 5.92mins 944: t.controlBuf.put(&outgoingWindowUpdate{
. . 945: streamID: 0,
. . 946: increment: w,
. . 947: })
. . 948: }
. . 949:
. 35.34mins 950: t.controlBuf.put(bdpPing)
. . 951: }
. . 952: // Select the right stream to dispatch.
. 10.15ms 953: s := t.getStream(f)
. . 954: if s == nil {
. . 955: return
. . 956: }
. . 957: if size > 0 {
. 154.30ms 958: if err := s.fc.onData(size); err != nil {
. . 959: t.closeStream(s, io.EOF, true, http2.ErrCodeFlowControl, status.New(codes.Internal, err.Error()), nil, false)
. . 960: return
. . 961: }
. . 962: if f.Header().Flags.Has(http2.FlagDataPadded) {
. . 963: if w := s.fc.onRead(size - uint32(len(f.Data()))); w > 0 {
. . 964: t.controlBuf.put(&outgoingWindowUpdate{s.id, w}) . . 965: }
. . 966: }
. . 967: // TODO(bradfitz, zhaoq): A copy is required here because there is no
. . 968: // guarantee f.Data() is consumed before the arrival of next frame.
. . 969: // Can this copy be eliminated?
. . 970: if len(f.Data()) > 0 {
. 24.64mins 971: buffer := t.bufferPool.get()
. . 972: buffer.Reset()
. . 973: buffer.Write(f.Data())
. 1.48mins 974: s.write(recvMsg{buffer: buffer})
. . 975: }
. . 976: }