memory leak with websocket service based on go-socket.io

562 views
Skip to first unread message

aide...@gmail.com

unread,
Sep 5, 2016, 10:52:35 PM9/5/16
to golang-nuts
We have a simple websocket service based on go-socket.io and have a memory leak problem.  We found the memory usage is always increase, the idle heap is not return back to the OS. The heap pprof show that the runtime.goexit took much memory,  seems  that the memory not release after the goroutine exit. 

1. heap callgraph





2. gc debug log

gc 440 @51045.897s 0%: 0.034+4182+0.96 ms clock, 0.13+0/4182/12246+3.8 ms cpu, 4304->4309->4143 MB, 8266 MB goal, 4 P
scvg340: inuse: 4404, idle: 15, sys: 4419, released: 0, consumed: 4419 (MB)
GC forced
gc 441 @51170.096s 0%: 3.7+4355+1.4 ms clock, 14+2.9/4357/12795+5.8 ms cpu, 4317->4323->4158 MB, 8287 MB goal, 4 P
GC forced
gc 442 @51294.460s 0%: 0.034+3987+1.2 ms clock, 0.13+1.5/3987/11701+4.9 ms cpu, 4336->4341->4169 MB, 8316 MB goal, 4 P
scvg341: inuse: 4318, idle: 133, sys: 4451, released: 0, consumed: 4451 (MB)
GC forced
gc 443 @51418.451s 0%: 0.36+3925+0.99 ms clock, 1.4+4.0/3925/11554+3.9 ms cpu, 4350->4356->4182 MB, 8338 MB goal, 4 P
scvg342: inuse: 4363, idle: 103, sys: 4466, released: 0, consumed: 4466 (MB)
GC forced
gc 444 @51542.394s 0%: 0.042+3986+1.6 ms clock, 0.16+0/3981/11757+6.5 ms cpu, 4361->4367->4194 MB, 8365 MB goal, 4 P
scvg343: inuse: 4404, idle: 74, sys: 4478, released: 0, consumed: 4478 (MB)
GC forced
gc 445 @51666.384s 0%: 3.4+3987+1.4 ms clock, 13+2.5/3986/11747+5.7 ms cpu, 4375->4382->4208 MB, 8388 MB goal, 4 P
scvg344: inuse: 4454, idle: 39, sys: 4493, released: 0, consumed: 4493 (MB)
GC forced
gc 446 @51790.379s 0%: 0.055+4147+1.5 ms clock, 0.22+0/4139/12125+6.2 ms cpu, 4396->4402->4220 MB, 8416 MB goal, 4 P
scvg345: inuse: 4509, idle: 5, sys: 4514, released: 0, consumed: 4514 (MB)
GC forced
gc 447 @51914.542s 0%: 0.052+4205+2.1 ms clock, 0.21+1.5/4199/12348+8.5 ms cpu, 4413->4420->4234 MB, 8441 MB goal, 4 P
GC forced
gc 448 @52038.752s 0%: 2.7+4517+1.8 ms clock, 11+2.3/4517/13245+7.2 ms cpu, 4428->4436->4247 MB, 8469 MB goal, 4 P
scvg346: inuse: 4406, idle: 142, sys: 4548, released: 0, consumed: 4548 (MB)
GC forced
gc 449 @52163.276s 0%: 0.033+4206+1.3 ms clock, 0.13+0/4206/12306+5.3 ms cpu, 4442->4449->4259 MB, 8495 MB goal, 4 P
scvg347: inuse: 4452, idle: 109, sys: 4561, released: 0, consumed: 4561 (MB)
GC forced
gc 450 @52287.491s 0%: 0.044+4262+2.0 ms clock, 0.17+0/4261/12565+8.2 ms cpu, 4452->4459->4272 MB, 8519 MB goal, 4 P
scvg348: inuse: 4498, idle: 74, sys: 4572, released: 0, consumed: 4572 (MB)
GC forced
gc 451 @52411.769s 0%: 0.028+4012+2.0 ms clock, 0.11+0.066/3992/11762+8.0 ms cpu, 4471->4477->4285 MB, 8544 MB goal, 4 P
scvg349: inuse: 4550, idle: 40, sys: 4590, released: 0, consumed: 4590 (MB)

3. example code

       server.On("error", func(so socketio.Socket, err error) {
                logger := SocketLogger(so)
                logger.Error("socket connect error")
        })

        server.On("connection", func(so socketio.Socket) {

                var (
                        uid   string
                        exist bool
                )

                logger := SocketLogger(so)

                claim := (context.Get(so.Request(), "user")).(*jwt.Token).Claims
                // after get the claims, should clear the request context
                context.Clear(so.Request())

                var rawUID interface{}
                if user, ok := claim.(jwt.MapClaims); ok {
                        if rawUID, ok = user[setting.JwtUserClaimField]; !ok {
                                logger.Error("invalid user claim")
                                so.Emit("disconnect", "invalid user claim")
                        }
                } else {
                        logger.Errorf("invalid jwt claim %s", claim)
                        so.Emit("disconnect", "invalid user claim")
                }

                if uid, exist = rawUID.(string); exist {
                        // Multi connection for same user will be join to the same room
                        so.Join(uid)

                        // root for broadcast all user
                        so.Join(Hourse)

                        c.users.Add(uid, 1)

                        logger.Debug("socket connected")

                        if setting.DEBUG {
                                so.Emit("debug", fmt.Sprintf("Your uid is %s, sid is %s", uid, so.Id()))
                        }
                } else {
                        so.Emit("disconnect", "invalid user claim")
                }

                so.On("debug", func(data string) {
                        log.Debugf("debug data from client %s", data)
                })

                so.On("disconnection", func(data string) {
                        logger.Debugf("socket disconnected")
                        c.users.Add(uid, -1)
                                        })
        })



Problem: 

1. Is this really a memory leak problem? Or just It is  normal  because the golang not release the memory back to OS until the OS request more memory?
2. How can I found the problem point?




 

Dave Cheney

unread,
Sep 5, 2016, 11:07:42 PM9/5/16
to golang-nuts
It looks like your application is using 4.5gb of ram. Check the number of sockets and goroutines you have running. If there are no timeouts then the goroutines could remain alive pinning a lot of memory.

aide...@gmail.com

unread,
Sep 6, 2016, 1:56:04 AM9/6/16
to golang-nuts
The number of goroutines is normal., but the memory continue increase. I did not understand why the runtime.goexit took so many memory.


在 2016年9月6日星期二 UTC+8上午11:07:42,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 2:05:38 AM9/6/16
to golang-nuts, aide...@gmail.com
runtime.goexit is the bottom most stack frame of any running goroutine. When the caller of runtime.goexit returns, this function will clean up the goroutine. 

If you are not leaking goroutines, then your application must be keeping too much memory live, possibly in a shared map.

It's probably also a good idea to build your application with the -race detector and fix any issues that it repors.

aide...@gmail.com

unread,
Sep 6, 2016, 2:37:45 AM9/6/16
to golang-nuts, aide...@gmail.com
Build with -race flag report nothing.  Something is interesting is that in the dev env the gc will release the memory, but in product env the memory seems is not release to OS. 



在 2016年9月6日星期二 UTC+8下午2:05:38,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 2:49:14 AM9/6/16
to golang-nuts, aide...@gmail.com
Your operating system may not be reporting memory correctly. To get the accurate report of the memory that your go process uses 

http://talks.godoc.org/github.com/davecheney/presentations/seven.slide#23

Install the pprof handler, and scroll to the bottom of

go tool pprof http://localhost:3999/debug/pprof/heap

The memory statistics will be reported at the bottom

# HeapSys = 4882432

^ this is the number of bytes your operation system has given to the Go process.

aide...@gmail.com

unread,
Sep 6, 2016, 3:20:36 AM9/6/16
to golang-nuts, aide...@gmail.com
I has installed the pprof handler and also opened the gc debug. The HeapSys in  /debug/pprof/heap page is aways increase,  the HeapReleased aways is 0. The gc debug log show that the scvg# is not release any memory.

If the runtime.goexit be called, the goroutine will be return. But why the memory own to the goroutine not release after the goroutine finished? From the callgraph we can see that the runtime.goexit own much heap.

MemStats from /pprof/heap page:

# runtime.MemStats
# Alloc = 211882848
# TotalAlloc = 1640254736
# Sys = 315858600
# Lookups = 51944
# Mallocs = 5738397
# Frees = 4574009
# HeapAlloc = 211882848
# HeapSys = 279576576
# HeapIdle = 56934400
# HeapInuse = 222642176
# HeapReleased = 0
# HeapObjects = 1164388
# Stack = 19267584 / 19267584
# MSpan = 3026880 / 3194880
# MCache = 4800 / 16384
# BuckHashSys = 1483005
# NextGC = 290484807




在 2016年9月6日星期二 UTC+8下午2:49:14,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 3:36:13 AM9/6/16
to golang-nuts
I think your program is leaking memory somewhere else. Is the source of your application avaiable?

Can you please share a graph of the -alloc_objects profile

Dave Cheney

unread,
Sep 6, 2016, 3:42:06 AM9/6/16
to golang-nuts
Are you sure that goroutines are exiting? You can check this in the /debug/pprof handler. It will tell you how many goroutines are currently running.

Dave Cheney

unread,
Sep 6, 2016, 3:54:37 AM9/6/16
to golang-nuts
Also, heap released will not grow until you stop the memory leak and enough of the heap remains idle for more than 10 minutes.

aide...@gmail.com

unread,
Sep 6, 2016, 4:07:42 AM9/6/16
to golang-nuts
I have restart the the websocket service, and collect new callgraph.

1. in-use heap callgraph 


2. heap alloc callgraph







在 2016年9月6日星期二 UTC+8下午3:42:06,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 4:25:40 AM9/6/16
to golang-nuts, aide...@gmail.com
This error handling looks wrong,


Any error from socketio is ignored, and the method always succeeds unconditionally.

Also, which version of Go are you using ?

aide...@gmail.com

unread,
Sep 6, 2016, 4:49:29 AM9/6/16
to golang-nuts, aide...@gmail.com

go 1.6 and 1.7 I have tried, the problem still stands.


在 2016年9月6日星期二 UTC+8下午4:25:40,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 5:05:05 AM9/6/16
to golang-nuts, aide...@gmail.com
Are you sure that goroutines are exiting? You can check this in the /debug/pprof handler. It will tell you how many goroutines are currently running.

aide...@gmail.com

unread,
Sep 6, 2016, 5:14:29 AM9/6/16
to golang-nuts, aide...@gmail.com
I can make sure that the goroutine is exited, because the goroutine count is stable but the heap count is always increase. You can see more detail pprof from https://comet.shiyanlou.com/debug/pprof/.


在 2016年9月6日星期二 UTC+8下午5:05:05,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 5:22:39 AM9/6/16
to golang-nuts, aide...@gmail.com
You have 4601 goroutines blocked on reading from the websocket

goroutine profile: total 4583
1526 @ 0x42d77a 0x42806b 0x4277c9 0x67ffb8 0x680024 0x681861 0x6924e0 0x5b2e5c 0x5b34bc 0x7bc06d 0x7bc1b6 0x7bd237 0x7a2d94 0x45da11
#	0x4277c8	net.runtime_pollWait+0x58										/usr/lib/go-1.7/src/runtime/netpoll.go:160
#	0x67ffb7	net.(*pollDesc).wait+0x37										/usr/lib/go-1.7/src/net/fd_poll_runtime.go:73
#	0x680023	net.(*pollDesc).waitRead+0x33										/usr/lib/go-1.7/src/net/fd_poll_runtime.go:78
#	0x681860	net.(*netFD).Read+0x1a0											/usr/lib/go-1.7/src/net/fd_unix.go:243
#	0x6924df	net.(*conn).Read+0x6f											/usr/lib/go-1.7/src/net/net.go:173
#	0x5b2e5b	bufio.(*Reader).fill+0x10b										/usr/lib/go-1.7/src/bufio/bufio.go:97
#	0x5b34bb	bufio.(*Reader).Read+0x1bb										/usr/lib/go-1.7/src/bufio/bufio.go:209
#	0x7bc06c	bitbucket.org/sjtushi/comet/vendor/github.com/gorilla/websocket.(*Conn).readFull+0x8c	

If the websocket never disconnects, then the goroutine will pin all memory it is using. You probably need to add some timeouts to your application.

aide...@gmail.com

unread,
Sep 6, 2016, 5:48:01 AM9/6/16
to golang-nuts, aide...@gmail.com
When user refresh the page, the old websocket conn  will be closed and a new one created. 

I do not understand "If the websocket never disconnects, then the goroutine will pin all memory it is using."

The websocekt conn block on read, why will pin all memory?


在 2016年9月6日星期二 UTC+8下午5:22:39,Dave Cheney写道:

Dave Cheney

unread,
Sep 6, 2016, 5:54:34 AM9/6/16
to golang-nuts
I'm sorry. I cannot help you, I'm probably wrong about the goroutines holding on to webaockets. Perhaps one of the libraries you use has a memory leak bug.

aide...@gmail.com

unread,
Sep 6, 2016, 6:00:57 AM9/6/16
to golang-nuts

Whatever, thanks so much.

在 2016年9月6日星期二 UTC+8下午5:54:34,Dave Cheney写道:

aide...@gmail.com

unread,
Sep 13, 2016, 4:59:00 AM9/13/16
to golang-nuts, aide...@gmail.com
Finally I fixed the problem.

I use the https://github.com/auth0/go-jwt-middleware to auth with JWT. I also open the KeepContext of gorilla mux router to true for that can get the jwt token of  request from the goriila context. When I get the jwt token, will clear the request context (via context.Clear(r))  immediately. So far, all is work correctly. But when the go-jwt-middleware auth failed there is no chance to clear the request context because KeepContext of mux has been to true. This is the root cause. 



在 2016年9月6日星期二 UTC+8上午10:52:35,aide...@gmail.com写道:
Reply all
Reply to author
Forward
0 new messages