OOM occurring with a small heap

611 views
Skip to first unread message

Yunchi Luo

unread,
Jul 1, 2019, 3:42:42 PM7/1/19
to golan...@googlegroups.com, Alec Thomas
Hello, I'd like to solicit some help with a weird GC issue we are seeing.

I'm trying to debug OOM on a service we are running in k8s. The service is just a CRUD server hitting a database (DynamoDB). Each replica serves about 300 qps of traffic. There are no memory leaks. On occasion (seemingly correlated to small latency spikes on the backend), the service would OOM. This is surprising because it has a circuit breaker that drops requests after 200 concurrent connections that has never trips, and goroutine profiles confirm that there are nowhere 200 active goroutines.

GC logs are pasted below. It's interlaced with dumps of runtime.Memstats (the RSS number is coming from /proc/<pid>/stats). Go version is 1.12.5, running an Alpine 3.10 container in an Amazon kernel 4.14.123-111.109.amzn2.x86_64.

The service happily serves requests using ~50MB of RSS for hours, until the last 2 seconds, where GC mark&sweep time starts to 2-4X per cycle (43+489/158/0.60+0.021 ms cpu => 43+489/158/0.60+0.021 ms cpu), and RSS and Sys blow up. It’s also interesting that in the last log line: `Sys=995MB RSS=861MB HeapSys=199MB`. If I’m reading this correctly, there’s at least `662MB` of memory in RSS that is not assigned to the heap. Though this might be due to the change in 1.125 to use MADV_FREE, so the pages are freeable not yet reclaimed by the kernel.

I don’t understand how heap can be so small across gc cycles (28->42->30MB on the last line means heap doesn't grow past 42MB?), yet RSS keeps growing. I'm assuming the increased RSS is causing the kernel to OOM the service, but that should only happen if the RSS is not freeable as marked by MADV_FREE. There doesn't seem to be any indication of that from the GC logs. I guess this all comes down to me not having a good understanding of how the GC algorithm works and how to read these logs. I'd really appreciate it if anyone can explain what's happening and why.

gc 41833 @19135.227s 0%: 0.019+2.3+0.005 ms clock, 0.079+0.29/2.2/5.6+0.020 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:04.886 [Memory]: Alloc=7MB TotalAlloc=230172MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=51MB HeapInUse=11MB HeapReleased=5MB
gc 41834 @19135.869s 0%: 0.005+2.9+0.003 ms clock, 0.023+0.32/2.5/6.6+0.012 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:05.886 [Memory]: Alloc=9MB TotalAlloc=230179MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=12MB HeapReleased=5MB
gc 41835 @19136.704s 0%: 0.038+2.1+0.004 ms clock, 0.15+0.35/2.1/5.3+0.016 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:06.886 [Memory]: Alloc=9MB TotalAlloc=230184MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=12MB HeapReleased=5MB
gc 41836 @19137.611s 0%: 0.009+2.1+0.003 ms clock, 0.036+0.39/2.0/5.7+0.015 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:07.887 [Memory]: Alloc=10MB TotalAlloc=230190MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=49MB HeapInUse=12MB HeapReleased=5MB
gc 41837 @19138.444s 0%: 0.008+2.1+0.004 ms clock, 0.035+0.51/2.1/5.7+0.017 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:08.887 [Memory]: Alloc=10MB TotalAlloc=230195MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=49MB HeapInUse=12MB HeapReleased=5MB
gc 41838 @19139.474s 0%: 0.005+2.6+0.003 ms clock, 0.023+0.37/2.5/4.3+0.014 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
gc 41839 @19140.173s 0%: 0.011+2.4+0.003 ms clock, 0.046+0.20/2.3/5.8+0.015 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:09.887 [Memory]: Alloc=7MB TotalAlloc=230202MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=11MB HeapReleased=5MB
gc 41840 @19140.831s 0%: 0.082+2.1+0.003 ms clock, 0.32+0.64/2.1/5.3+0.014 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:10.887 [Memory]: Alloc=9MB TotalAlloc=230209MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=12MB HeapReleased=5MB
gc 41841 @19141.655s 0%: 0.014+2.1+0.003 ms clock, 0.056+0.28/2.0/5.7+0.013 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
gc 41842 @19142.316s 0%: 0.006+2.7+0.003 ms clock, 0.027+0.29/2.6/6.2+0.014 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:11.888 [Memory]: Alloc=6MB TotalAlloc=230216MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=51MB HeapInUse=11MB HeapReleased=5MB
gc 41843 @19142.942s 0%: 0.010+2.1+0.005 ms clock, 0.040+0.29/2.0/5.7+0.023 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:12.888 [Memory]: Alloc=9MB TotalAlloc=230223MB Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=11MB HeapReleased=5MB
gc 41844 @19143.724s 0%: 0.008+2.4+0.004 ms clock, 0.035+0.38/2.0/5.7+0.017 ms cpu, 11->11->5 MB, 12 MB goal, 4 P
gc 41845 @19144.380s 0%: 10+9.3+0.044 ms clock, 43+6.1/9.2/4.4+0.17 ms cpu, 11->11->6 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:13.901 [Memory]: Alloc=6MB TotalAlloc=230230MB Sys=136MB RSS=98MB HeapSys=94MB HeapIdle=83MB HeapInUse=11MB HeapReleased=35MB
gc 41846 @19144.447s 0%: 0.008+26+0.005 ms clock, 0.033+0.46/7.8/26+0.020 ms cpu, 11->12->9 MB, 12 MB goal, 4 P
gc 41847 @19144.672s 0%: 0.013+76+0.006 ms clock, 0.053+0.20/6.4/80+0.024 ms cpu, 17->18->8 MB, 18 MB goal, 4 P
gc 41848 @19145.014s 0%: 0.008+172+0.005 ms clock, 0.035+0.13/8.5/177+0.022 ms cpu, 15->17->10 MB, 16 MB goal, 4 P
gc 41849 @19145.298s 0%: 0.007+285+0.006 ms clock, 0.030+10/285/7.6+0.024 ms cpu, 19->23->15 MB, 20 MB goal, 4 P
INFO 2019-06-30T08:46:15.052 [Memory]: Alloc=22MB TotalAlloc=230264MB Sys=598MB RSS=531MB HeapSys=265MB HeapIdle=240MB HeapInUse=25MB HeapReleased=164MB
gc 41850 @19145.665s 0%: 10+419+0.005 ms clock, 43+489/158/0.60+0.021 ms cpu, 26->30->17 MB, 30 MB goal, 4 P
gc 41851 @19146.325s 0%: 21+798+0.036 ms clock, 86+990/401/0+0.14 ms cpu, 28->42->30 MB, 34 MB goal, 4 P
INFO 2019-06-30T08:46:16.613 [Memory]: Alloc=41MB TotalAlloc=230303MB Sys=995MB RSS=861MB HeapSys=199MB HeapIdle=155MB HeapInUse=44MB HeapReleased=54MB


I also captured the OOM log from dmesg here https://gist.github.com/mightyguava/7ecc6fc55f5cd925062d6beede3783b3.

--
Yunchi

Robert Engels

unread,
Jul 1, 2019, 4:33:09 PM7/1/19
to Yunchi Luo, golan...@googlegroups.com, Alec Thomas

A leak of the TCP connections (maybe not properly closed)? Each TCP connection will use kernel memory and process memory (local buffers), that won't be on the heap (the reference to the TCP connection will be in the Go heap, but is probably much smaller than the buffer allocation).

That would be my guess - but just a guess.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CANnT9sj1_sZCKDkGbkzarwcn8DYEX9OS6Ack%2B71613eyLQ7y6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



Yunchi Luo

unread,
Jul 1, 2019, 5:27:33 PM7/1/19
to Robert Engels, golan...@googlegroups.com, Alec Thomas
I actually have a heap profile (pasted at the bottom) from about 1 second before the service died (the goroutine that is logging "[Memory]" triggers heap profiles once RSS > 100MB). I don't see TCP connections there. Maybe it's too few to be sampled. How would I verify your theory? That the service dies within 2 seconds after several hours makes it very hard to debug.

The top thing in the heap profile is from the reflect package. I initially found that suspect, but it turns out this comes from a use of httptrace.ClientTrace I had for counting new connections to DynamoDB.

 tracer := &httptrace.ClientTrace{
   ConnectStart: func(_, _ string) {
     newConns.Inc()
   },
}

`newConns` is just a prometheus counter. The `tracer` object itself is created once and re-used with every client request context. On request, `httptrace.WithClientTrace(ctx, tracer)` uses reflection to compose the trace functions under-the-hood and uses reflection to invoke it, hence the reflect.funcLayout and reflect.Value.call. Objects in `reflect` account for about 50% of heap in terms of size, and does seem to grow as the service is running out of memory, but that's only 12MB so I thought it was a red herring.

Heap profile:
Type: inuse_space
Time: Jun 30, 2019 at 4:46am (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) inuse_objects
(pprof) top
Showing nodes accounting for 414485, 100% of 414485 total
Showing top 10 nodes out of 81
      flat  flat%   sum%        cum   cum%
    344074 83.01% 83.01%     344074 83.01%  reflect.funcLayout.func1
     32768  7.91% 90.92%     376842 90.92%  reflect.callReflect
     16384  3.95% 94.87%      16384  3.95%  github.com/json-iterator/go.processTags
     10923  2.64% 97.51%      10923  2.64%  context.WithValue
      8192  1.98% 99.48%       8192  1.98%  crypto/hmac.New
      1260   0.3% 99.79%       1260   0.3%  github.com/aws/aws-sdk-go/aws/signer/v4.(*signingCtx).buildCanonicalHeaders
       820   0.2%   100%        820   0.2%  github.com/stripe/veneur/tdigest.NewMerging
        64 0.015%   100%         64 0.015%  reflect.addReflectOff
         0     0%   100%        820   0.2%  git.sqcorp.co/cash/digester/lib/pkg/histogram.(*SlidingWindowDigest).Observe
         0     0%   100%        820   0.2%  git.sqcorp.co/cash/digester/lib/pkg/histogram.(*SlidingWindowDigest).openDigests

(pprof) cum
(pprof) top
Showing nodes accounting for 376842, 90.92% of 414485 total
Showing top 10 nodes out of 81
      flat  flat%   sum%        cum   cum%
         0     0%     0%     376842 90.92%  net/http/httptrace.(*ClientTrace).compose.func1
         0     0%     0%     376842 90.92%  reflect.Value.Call
         0     0%     0%     376842 90.92%  reflect.Value.call
     32768  7.91%  7.91%     376842 90.92%  reflect.callReflect
         0     0%  7.91%     376842 90.92%  reflect.makeFuncStub
    344074 83.01% 90.92%     344074 83.01%  reflect.funcLayout.func1
         0     0% 90.92%     344074 83.01%  sync.(*Pool).Get
         0     0% 90.92%      16448  3.97%  github.com/json-iterator/go._createDecoderOfType
         0     0% 90.92%      16448  3.97%  github.com/json-iterator/go.createDecoderOfType
         0     0% 90.92%      16448  3.97%  github.com/json-iterator/go.decoderOfStruct


--
Yunchi

Robert Engels

unread,
Jul 1, 2019, 5:40:21 PM7/1/19
to Yunchi Luo, golan...@googlegroups.com, Alec Thomas
I think don't think you are going to find it in the 'heap', rather it would be in native memory.

I would use the monitor the /proc/[pid] for the process, and pay attention to the 'fd','net' and 'statm' - if my theory is correct you will see growth here long before the process is killed. Since you are running under k8s and cgroups, you will need to do this along side the Go process (unless you have root access to the server).

I 'think' depending on kernel version, that kernel memory used goes against the process for OOM purposes, so this is a likely candidate if pprof is showing nothing.

Do you by chance do any of your own memory management (via malloc/CGO)? If so, this is not going to show in pprof either.

Yunchi Luo

unread,
Jul 1, 2019, 8:37:50 PM7/1/19
to Robert Engels, golan...@googlegroups.com, Alec Thomas
Following that logic, a leak of TCP connections should manifest as a file descriptor leak. We have the process_open_fds metric from Prometheus, that is the number of open file descriptors as found in /proc/<pid>/fd. The number of descriptors overtime correlates well with the amount of traffic, pretty cyclic. There doesn't appear to be a leak.

We don't do our own memory management and the binary is compiled with CGO_ENABLED=0.

I still think the issue I'm seeing should be GC (or heap) related, given the explosion in mark & sweep time, HeapSys, HeapIdle, and HeapReleased just before the process dies. But I'm lacking ideas on how to track down the cause of the increase.
--
Yunchi

Michael Jones

unread,
Jul 1, 2019, 9:11:25 PM7/1/19
to Yunchi Luo, Alec Thomas, Robert Engels, golan...@googlegroups.com
Does adding runtime.GoSched() calls make any difference?


For more options, visit https://groups.google.com/d/optout.
--
Michael T. Jones
michae...@gmail.com

Michael Jones

unread,
Jul 1, 2019, 11:07:15 PM7/1/19
to Mighty Guava, Alec Thomas, Robert Engels, Yunchi Luo, golan...@googlegroups.com
One theory you advanced is of a desperate garbage collector. If you prevent it from running all along, then maybe that could be possible. Pursuing that line of reasoning I though you could ensure it’s chances with gosched() in your code. 

On Mon, Jul 1, 2019 at 7:17 PM Mighty Guava <might...@gmail.com> wrote:
I don't understand. What would adding runtime.Goscheduled() do here? I don't have any explicit loops in this service.

For more options, visit https://groups.google.com/d/optout.

Robert Engels

unread,
Jul 2, 2019, 12:00:57 AM7/2/19
to Mighty Guava, Yunchi Luo, Michael Jones, Alec Thomas, golan...@googlegroups.com
Does your process use mmap? Maybe you are leaking there, as this counts against process memory size. 

On Jul 1, 2019, at 9:11 PM, Mighty Guava <might...@gmail.com> wrote:

I don't understand. What would adding runtime.Goscheduled() do here? I don't have any explicit loops in this service.
On Jul 1, 2019, 9:11 PM -0400, Michael Jones <michae...@gmail.com>, wrote:

robert engels

unread,
Jul 2, 2019, 12:26:00 AM7/2/19
to Mighty Guava, Yunchi Luo, Michael Jones, Alec Thomas, golan...@googlegroups.com
Are you using anything like fast-http, or such? Have you tried running the server with the race-detector enabled? I could see a race condition causing a rare failure that causes the code to go into a ‘spinning loop’ preventing GC to work properly. It’s a guess but I would try that to rule out the possibility.

On Jul 1, 2019, at 11:06 PM, Mighty Guava <might...@gmail.com> wrote:

I don't think GC is being blocked. GC ran multiple times during the time the service was exploding in memory. 

I'm not using mmap in the service. It's just a straightforward CRUD webserver backed by DynamoDB. The only thing non-standard I can think of is that it connects to its backing database over HTTP/1.1 (using AWS Go SDK) instead of a custom database protocol. It's serving a fair amount of requests (500 qps per replica), so I initially assumed it was under-provisioned to handle latency spikes. But I tripled the memory and it's still OOMing on occasion, even though RSS is 9% of allocated memory just seconds before. 

Yunchi

robert engels

unread,
Jul 2, 2019, 1:21:10 AM7/2/19
to Mighty Guava, Yunchi Luo, Michael Jones, Alec Thomas, golan...@googlegroups.com
One other thing to think about - 500 qps is a lot for a single server if things start blocking and you have a low memory cap. Imagine the scenario where the DynamoDB “locks up / heavy delays due to IO contention / something”, depending on how your server is written/configured it may still keep accepting inbound connections, each potentially causing a thread to be created (if they are doing blocking IO). If there is not a low cap on the number of inbound connections you could quickly run out of memory. You might want to see if there is a configuration option on grpc-gateway to limit the maximum number of simultaneous inbound connections, set that to a low number, and see if it still runs out of memory.

This scenario is unlikely if you didn’t see the TCPConn instances in the heap dump.

On Jul 1, 2019, at 11:28 PM, Mighty Guava <might...@gmail.com> wrote:

Oh now that you mention it, I'm using grpc-gateway with the stdlib http server. Though all it really does is convert json <=> protobuf and is pretty mature so it's unlikely to be the culprit.

Running with race detector is a good idea! I'll try that tomorrow.

Thanks,
Yunchi Luo

Michael Jones

unread,
Jul 2, 2019, 2:24:48 AM7/2/19
to robert engels, Mighty Guava, Yunchi Luo, Alec Thomas, golang-nuts
(I don't suspect GC either, it was a rule it out suggestion)

Bakul Shah

unread,
Jul 2, 2019, 2:25:25 AM7/2/19
to Yunchi Luo, golan...@googlegroups.com, Alec Thomas
Before assuming it is the GC or something system related, you may wish to verify it is *not your own logic*. Larger RSS could also be due to your own logic touching more and more memory due to some runaway effect. The probability this has to do with GC is very low given the very widespread use of Go and the probability of a bug in new code is very high given it is used on a much much smaller scale. 

This has the "smell" of a concurrency bug. If I were you I'd test the code for any races, I'd review the code thoroughly with someone who doesn't know the code so that I'm forced to explain it, and I'd add plenty of assertions. I'd probably first look at the circuit breaker code -- things like how does it know how many concurrent connections exist?

In general, any hypothesis you come up with, you should have a test that *catches* the bug given the hypothesis. Elusive bugs tend to become more elusive as you are on the hunt and as you may fix other problems you discover on the way.

I even suspect you're looking at GC logs a bit too early. Instrument your own code and look at what patterns emerge. [Not to mention any time you spend on understanding your code will help improve your service; but better understanding of and debugging the GC won't necessarily help you!]

Mighty Guava

unread,
Jul 2, 2019, 9:00:56 AM7/2/19
to Yunchi Luo, Michael Jones, Alec Thomas, Robert Engels, golan...@googlegroups.com
I don't understand. What would adding runtime.Goscheduled() do here? I don't have any explicit loops in this service.
On Jul 1, 2019, 9:11 PM -0400, Michael Jones <michae...@gmail.com>, wrote:

Mighty Guava

unread,
Jul 2, 2019, 9:00:57 AM7/2/19
to robert engels, Yunchi Luo, Michael Jones, Alec Thomas, golan...@googlegroups.com
Oh now that you mention it, I'm using grpc-gateway with the stdlib http server. Though all it really does is convert json <=> protobuf and is pretty mature so it's unlikely to be the culprit.

Running with race detector is a good idea! I'll try that tomorrow.

Thanks,
Yunchi Luo
On Jul 2, 2019, 12:25 AM -0400, robert engels <ren...@ix.netcom.com>, wrote:

Mighty Guava

unread,
Jul 2, 2019, 9:01:04 AM7/2/19
to Robert Engels, Yunchi Luo, Michael Jones, Alec Thomas, golan...@googlegroups.com
I don't think GC is being blocked. GC ran multiple times during the time the service was exploding in memory. 

I'm not using mmap in the service. It's just a straightforward CRUD webserver backed by DynamoDB. The only thing non-standard I can think of is that it connects to its backing database over HTTP/1.1 (using AWS Go SDK) instead of a custom database protocol. It's serving a fair amount of requests (500 qps per replica), so I initially assumed it was under-provisioned to handle latency spikes. But I tripled the memory and it's still OOMing on occasion, even though RSS is 9% of allocated memory just seconds before. 

Yunchi
On Jul 2, 2019, 12:00 AM -0400, Robert Engels <ren...@ix.netcom.com>, wrote:

Yunchi Luo

unread,
Jul 2, 2019, 12:54:20 PM7/2/19
to Bakul Shah, golan...@googlegroups.com, Alec Thomas
I'm not so much pointing my finger at GC as I am hoping GC logs could help tell the story, and that someone with a strong understanding of GC in Go could weigh in here. In the last 4 seconds before OOM, "TotalAlloc" increased by only 80M, yet "HeapIdle" increased to 240M from 50M, RSS increased by 810M. The numbers don't add up for me. A running sum of 80M of heap objects were allocated in the time RSS increased by 10X that. If GC was completely off, I still wouldn't expect this to happen, which makes me want to rule out GC being blocked as a problem. Maybe there was runaway heap reservation because something in my code caused a ton of fragmentation? Is that sane? The heap profile lacking clues is also strange.

Regarding the possibility of a race, I forgot I do have goroutine profiles captured along with the heap profiles at the time memory exploded. There are only 10 goroutines running on the serving path, which rules out too many concurrent requests being served (please correct me if I'm wrong). Those fan out to 13 goroutines talking to the db, all of which are in http.Transport.roundTrip (which is blocked on runtime.gopark so I assume they are waiting on the TCP connection). All other goroutines that don't originate in the stdlib are also blocked on `select` or sleeping. Our CI does run with go test -race, but I'll try doing some load testing with a race detector enabled binary.

Bakul, that is sound advice. I've actually been debugging this on and off for a couple months now, with the help of several people, a few of which have peer reviewed the code. I agree it's most likely to be some runaway code that I caused in my logic, but we haven't been able to pin-point the cause and I've run out of hypothesis to test at the moment. This is why I've started asking on go-nuts@. The circuit breaker code was one of the first things I checked, has been unit tested and verified working with load tests. Now that you mention it, I actually did uncover a Go stdlib bug in http2 while doing the load tests... but that's unrelated.

--
Yunchi

andrey mirtchovski

unread,
Jul 2, 2019, 2:09:21 PM7/2/19
to Yunchi Luo, Bakul Shah, golang-nuts, Alec Thomas
What I have found useful in the past is pprof's ability to diff profiles. That means that if you capture heap profiles at regular intervals you can see a much smaller subset of changes and compare allocation patterns. 

Robert Engels

unread,
Jul 2, 2019, 2:15:13 PM7/2/19
to Yunchi Luo, Bakul Shah, golan...@googlegroups.com, Alec Thomas
Did you try running on an older release of Go, like 1.10?

Yunchi Luo

unread,
Jul 2, 2019, 2:16:28 PM7/2/19
to andrey mirtchovski, Bakul Shah, golang-nuts, Alec Thomas
I did try to do that! I have 3 heap profiles captured from the ~3 seconds before crash. The only thing particularly suspicious is the httptrace call I mentioned earlier in the thread.

Diffing 1 to 2
(pprof) cum
(pprof) top 50
Showing nodes accounting for 4604.15kB, 81.69% of 5636.17kB total

      flat  flat%   sum%        cum   cum%
         0     0%     0%  5120.16kB 90.84%  net/http/httptrace.(*ClientTrace).compose.func1
         0     0%     0%  5120.16kB 90.84%  reflect.Value.Call
         0     0%     0%  5120.16kB 90.84%  reflect.Value.call
         0     0%     0%  5120.16kB 90.84%  reflect.callReflect
 5120.16kB 90.84% 90.84%  5120.16kB 90.84%  reflect.funcLayout.func1
         0     0% 90.84%  5120.16kB 90.84%  reflect.makeFuncStub
         0     0% 90.84%  4604.15kB 81.69%  sync.(*Pool).Get
         0     0% 90.84%  -516.01kB  9.16%  io.Copy
         0     0% 90.84%  -516.01kB  9.16%  io.copyBuffer
         0     0% 90.84%  -516.01kB  9.16%  io/ioutil.devNull.ReadFrom
 -516.01kB  9.16% 81.69%  -516.01kB  9.16%  io/ioutil.glob..func1


Diff 2 to 3
(pprof) top 50
Showing nodes accounting for 7680.44kB, 100% of 7680.44kB total

      flat  flat%   sum%        cum   cum%
         0     0%     0%  6144.18kB 80.00%  net/http/httptrace.(*ClientTrace).compose.func1
         0     0%     0%  6144.18kB 80.00%  reflect.Value.Call
         0     0%     0%  6144.18kB 80.00%  reflect.Value.call
  512.01kB  6.67%  6.67%  6144.18kB 80.00%  reflect.callReflect
         0     0%  6.67%  6144.18kB 80.00%  reflect.makeFuncStub
 5632.17kB 73.33% 80.00%  5632.17kB 73.33%  reflect.funcLayout.func1
         0     0% 80.00%  5632.17kB 73.33%  sync.(*Pool).Get
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/request.(*HandlerList).Run
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/request.(*Request).Send
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/request.(*Request).Sign
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/signer/v4.(*signingCtx).build
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/signer/v4.SignSDKRequest
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/signer/v4.SignSDKRequestWithCurrentTime
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/aws/signer/v4.Signer.signWithBody
         0     0% 80.00%  1024.23kB 13.34%  github.com/aws/aws-sdk-go/service/dynamodb.(*DynamoDB).GetItemWithContext

--
Yunchi

Yunchi Luo

unread,
Jul 2, 2019, 2:17:21 PM7/2/19
to andrey mirtchovski, Bakul Shah, golang-nuts, Alec Thomas
I removed the httptrace call yesterday and there have been no OOMs yet. Going to let it bake for another day. If OOMs show up again, I'll try reverting to an older Go version tomorrow. Otherwise I'll point my finger at httptrace I guess.
--
Yunchi

Bakul Shah

unread,
Jul 2, 2019, 3:28:47 PM7/2/19
to Yunchi Luo, andrey mirtchovski, golang-nuts, Alec Thomas
Switching Go version seems like a stab in the dark. If the OOM symptom does show up, you have simply wasted time. If it doesn't show up, you still don't know if the bug exists and is simply hiding. Even if you think the bug in Go code generation (or GC) and not in your code, there is nothing the Go developers can do without a concrete test. So then you have to keep changing things until the symptom disappears for a long time and cross fingers the bug went away....

If the symptom disappears when you take out httptrace, now you have narrowed the hypothesis to httptrace but you haven't proven it. You still don't know if removing httptrace changed the timing behavior such that the bug is hiding somewhere and if it will show up in future.

This is why one should change as little as possible outside of what is needed for testing a specific hypothesis and why you should have a test that tickles the bug. By "proving" I mean you must find a specific assertion that is violated.

I'd instrument the code under question and somehow capture the history of last N seconds just before OOM. Capturing heap profile is just one way to look at what your code does. You can find other ways to look at what your code does and create a log. For instance, the extra memory use is also a symptom just like OOM; what you may want to check is whether your data structures are consistent. The actual inconsistency may have occurred long before the OOM crash.

Note that the GC logs allow you to see some aspect of the GC behavior better but if you don't understand it well enough, it may seem mysterious (compared to your mental model) and later, when you run out of other hypotheses, it may even seem suspicious, so I'd be careful about looking at such things :-)

andrey mirtchovski

unread,
Jul 2, 2019, 3:46:59 PM7/2/19
to Bakul Shah, Yunchi Luo, golang-nuts, Alec Thomas
ok, this is interesting:

reflect.MakeFunc: i've never done this before. what are the allocation
patterns for creating functions with reflect? i see a few crashes
related to these functions but no mentioning of severe memory
consumption.

in my opinion, trying to capture MakeFunc patterns from your original
program in a small example will be helpful.

Yunchi Luo

unread,
Jul 2, 2019, 4:16:17 PM7/2/19
to andrey mirtchovski, Bakul Shah, golang-nuts, Alec Thomas
Yeah, I've been looking at the goroutine profiles and there are some strange stacks like the below. 

         1   reflect.flag.mustBeExported /go/src/reflect/value.go:213
             reflect.Value.call /go/src/reflect/value.go:424
             reflect.Value.Call /go/src/reflect/value.go:308
             net/http/httptrace.(*ClientTrace).compose.func1 /go/src/net/http/httptrace/trace.go:203
             reflect.callReflect /go/src/reflect/value.go:536
             reflect.makeFuncStub /go/src/reflect/asm_amd64.s:20
             reflect.Value.call /go/src/reflect/value.go:447
             reflect.Value.Call /go/src/reflect/value.go:308
             net/http/httptrace.(*ClientTrace).compose.func1 /go/src/net/http/httptrace/trace.go:204
             reflect.callReflect /go/src/reflect/value.go:536
             reflect.makeFuncStub /go/src/reflect/asm_amd64.s:20
             reflect.Value.call /go/src/reflect/value.go:447
             reflect.Value.Call /go/src/reflect/value.go:308
             net/http/httptrace.(*ClientTrace).compose.func1 /go/src/net/http/httptrace/trace.go:204
             reflect.callReflect /go/src/reflect/value.go:536
             reflect.makeFuncStub /go/src/reflect/asm_amd64.s:20
             reflect.Value.call /go/src/reflect/value.go:447
             reflect.Value.Call /go/src/reflect/value.go:308
             net/http/httptrace.(*ClientTrace).compose.func1 /go/src/net/http/httptrace/trace.go:204
             reflect.callReflect /go/src/reflect/value.go:536
             reflect.makeFuncStub /go/src/reflect/asm_amd64.s:20
             reflect.Value.call /go/src/reflect/value.go:447
             reflect.Value.Call /go/src/reflect/value.go:308
             net/http/httptrace.(*ClientTrace).compose.func1 /go/src/net/http/httptrace/trace.go:204
             reflect.callReflect /go/src/reflect/value.go:536
             reflect.makeFuncStub /go/src/reflect/asm_amd64.s:20
             reflect.Value.call /go/src/reflect/value.go:447
             reflect.Value.Call /go/src/reflect/value.go:308
             net/http/httptrace.(*ClientTrace).compose.func1 /go/src/net/http/httptrace/trace.go:204
             reflect.callReflect /go/src/reflect/value.go:536
             reflect.makeFuncStub /go/src/reflect/asm_amd64.s:20
             reflect.Value.call /go/src/reflect/value.go:447


The reflect.MakeFunc is invoked by the httptrace package. From what I could gather, when you call httptrace.WithClientTrace(context.Context, *httptrace.ClientTrace), the package will use reflection to compose any existing httptrace.ClientTrace objects on the context with the trace you provide. I only provide 1 trace, so it's strange there are 6 traces being composed.

The code for adding the trace is pretty straightforward:

// This is called exactly once on service startup to insert a hook before every request to DynamoDB (from the Go AWS SDK)
func instrument(dynamo *dynamodb.DynamoDB) {
  dynamo.Client.Handlers.Send.PushFront(metricHandler())
}

func metricHandler() func(r *request.Request) {
  // There should be exactly 1 instance of this trace object, since the metricHandler() function is invoked once

  tracer := &httptrace.ClientTrace{
    ConnectStart: func(_, _ string) {
      // Incrementing a prometheus counter
      newConns.Inc()
    },
  }
  return func(r *request.Request) {
    // This should get applied once per request
    r.SetContext(httptrace.WithClientTrace(r.Context(), tracer))
  }
}


I've done a little digging along this vector, but there isn't much to look at. A value from httptrace is applied to a hook in the AWS Go SDK.
--
Yunchi

Tom Mitchell

unread,
Jul 3, 2019, 11:34:56 AM7/3/19
to Yunchi Luo, golang-nuts, Alec Thomas
On Mon, Jul 1, 2019 at 12:42 PM 'Yunchi Luo' via golang-nuts <golan...@googlegroups.com> wrote:
Hello, I'd like to solicit some help with a weird GC issue we are seeing.

I'm trying to debug OOM on a service we are running in k8s. The service is just a CRUD server hitting a database (DynamoDB). Each replica serves about 300 qps of traffic. There are no memory leaks. On occasion (seemingly correlated to small latency spikes on the backend), the service would OOM. This is surprising because it has a circuit breaker that drops requests after 200 concurrent connections that has never trips, and goroutine profiles confirm that there are nowhere 200 active goroutines.

Just curious about the network connections.
Is there a chance that the network connections are not getting closed and cleaned up for some reason.
It was common for sockets to hang around in the thousands because user was killing a slow tab or the browser
and the full socket close never completed.    The solution was  to allow reliable connections to time out and finish closing
freeing up the memory.   The application has closed the socket but the protocol has yet to get the last packet to complete the 
handshake.  The shell equivalent would be zombe processes that still need to return the exit status but no process waits
on the status.   Debugging can be interesting in the shell case because of implied waits done by ps.

How many connections does the the system kernel think there are and what state are they are in.
Look both locally and on the DB machine.  
The latency spikes can be a cause or a symptom. 
Look at the connections being made to the CRUD server and make sure they are being setup with short enough timers
that they clean themselves up quickly enough.   Is the CRUD server at risk of a denial of service or random curious probe burst
from a nmap script.  Even firewall drops near or far can leave connections hanging in an incomplete state when an invalid connection 
is detected and blocked and long timer reliable network connections are involved.



 


Yunchi Luo

unread,
Jul 3, 2019, 2:26:30 PM7/3/19
to Tom Mitchell, golang-nuts, Alec Thomas
I actually just managed to trace down the root cause of the bug, and it's quite surprising. It's not a heap overflow, it's a stack overflow, due to a bug in the stdlib! Specifically, adding the same http.ClientTrace twice onto a request context causes a stack overflow.


This explains everything we've been seeing in the memory/gc logs.

The story goes as follows:

There was a bug in my code for adding a http trace (copied earlier in this thread). I added the trace to the request the AWS SDK Client's Send handler, which is triggered on all retries, not just the first request. On rare occasions (every few hours apparently), a network issue or dynamodb backend issue causes retry in multiple client goroutines to happen at the same time. Since multiple goroutines are experiencing runaway stacks at the same time, service gets killed for OOM before any stack reaches the Go 1000000000-byte limit, so we never see a stack overflow error.

This explains relatively small increase in heap and much larger increase in RSS. I bet if I actually logged StackInUse and StackSys from runtime.Memstats, they would have shown an increase much closer to that of RSS.

Bug filed: https://github.com/golang/go/issues/32925. Second stdlib bug building my little CRUD http server...

Thanks all for helping with the debugging!
--
Yunchi
Reply all
Reply to author
Forward
0 new messages