memory leak, throw: out of memory (64bit)

793 views
Skip to first unread message

Peter Kleiweg

unread,
Jun 12, 2012, 6:15:32 AM6/12/12
to golan...@googlegroups.com

I have seen other mentions of out of memory problems,
but these all seem to happen only on 32 bit. I am running on 64
bit.

I have this program that runs in a loop and should use a
limited amount of memory. However, 'top' says its memory usage
keeps growing. The program crashes with a "throw: out of
memory", even though 'top' says there is still plenty of memory
available.

While it's still running, I see the program pause for a few
seconds, ever couple of thousand iterations. I assume this is
when it does garbage collecting.

I have another program that is very similar that doesn't show
these problems. So now I am trying to figure out, is there
something wrong with my program, or with package mgo, or with Go
itself?

go version go1.0.1
GOARCH="amd64"
GOOS="linux"


Here is the code that causes the problem:

location, err := time.LoadLocation("Europe/Amsterdam")

session, err := mgo.Dial(servers)
cTweets := session.DB("twitter").C("tweets")
query := cTweets.Find(bson.M{"CREATED_AT_ZONE": bson.M{"$exists": false }})
count, _ := query.Count()
iter := query.Iter()
tweet := map[string]interface{}{}
for iter.Next(&tweet) { ////// THROW HAPPENS ON THIS LINE
count--
if count%100 == 0 {
fmt.Println(count)
}

// parsed date and time
t, err := time.Parse("Mon Jan 02 15:04:05 -0700 2006", tweet["created_at"].(string))
checkErr(err)
t = t.In(location)
tweet["CREATED_AT_DATE"] = fmt.Sprintf("%04d-%02d-%02d", t.Year(), t.Month(), t.Day())
tweet["CREATED_AT_TIME"] = fmt.Sprintf("%02d:%02d:%02d", t.Hour(), t.Minute(), t.Second())
tweet["CREATED_AT_YEAR"] = t.Year()
tweet["CREATED_AT_MONTH"] = t.Month()
tweet["CREATED_AT_DAY"] = t.Day()
tweet["CREATED_AT_WEEKDAY"] = t.Weekday()
tweet["CREATED_AT_HOUR"] = t.Hour()
tweet["CREATED_AT_MINUTE"] = t.Minute()
tweet["CREATED_AT_SECOND"] = t.Second()
tweet["CREATED_AT_ZONE"], _ = t.Zone()

_, e2 := cTweets.Upsert(bson.M{"id_str": tweet["id_str"]}, tweet)
if e2 != nil {
fmt.Fprintf(os.Stderr, "Update tweet %s errror: %s\n", tweet["id_str"].(string), e2)
}
}

The program that doesn't have problems doesn't get data from
mongo, but from json-files.


Here is the stack trace:

throw: out of memory

goroutine 1 [running]:
reflect.unsafe_New(0xf840060000, 0x4bcd98, 0x11, 0x4bcd98, 0x451bb1, ...)
/net/aps/64/opt/go/src/pkg/runtime/iface.c:682 +0x6d
reflect.New(0xf840060000, 0x4bcd98, 0x11, 0x4bf520, 0xfc3ff7aff0, ...)
/net/aps/64/opt/go/src/pkg/reflect/value.go:1738 +0x6c
launchpad.net/mgo/bson.(*decoder).readDocTo(0xfc3ff70100, 0x4bd248, 0xf8400534d8, 0x156)
/net/aps/64/opt/gosite/src/launchpad.net/mgo/bson/decode.go:189 +0x78f
launchpad.net/mgo/bson.Unmarshal(0xf9a257e400, 0x36900000369, 0x4ba1a0, 0xf8400534d8, 0x0, ...)
/net/aps/64/opt/gosite/src/launchpad.net/mgo/bson/bson.go:439 +0x113
launchpad.net/mgo.(*Iter).Next(0xf840092dd0, 0x4ba1a0, 0xf8400534d8, 0xf8400534d8, 0xfc3ff70ac0, ...)
/net/aps/64/opt/gosite/src/launchpad.net/mgo/session.go:2020 +0x250
main.main()
/net/aistaff/kleiweg/Twitter/fixdate.go:50 +0x704

goroutine 2 [syscall]:
created by runtime.main
/net/aps/64/opt/go/src/pkg/runtime/proc.c:221

goroutine 3 [select]:
launchpad.net/mgo.(*mongoCluster).syncServersLoop(0xf840066160, 0x0)
/net/aps/64/opt/gosite/src/launchpad.net/mgo/cluster.go:307 +0x456
created by launchpad.net/mgo.newCluster
/net/aps/64/opt/gosite/src/launchpad.net/mgo/cluster.go:60 +0x109

goroutine 7 [syscall]:
created by addtimer
/net/aps/64/opt/go/src/pkg/runtime/ztime_amd64.c:69

goroutine 5 [syscall]:
syscall.Syscall6()
/net/aps/64/opt/go/src/pkg/syscall/asm_linux_amd64.s:40 +0x5
syscall.EpollWait(0xf800000006, 0xf8400662d0, 0xa0000000a, 0xffffffff, 0xc, ...)
/net/aps/64/opt/go/src/pkg/syscall/zerrors_linux_amd64.go:1781 +0xa1
net.(*pollster).WaitFD(0xf8400662c0, 0xf840041f00, 0x0, 0x0, 0x0, ...)
/net/aps/64/opt/go/src/pkg/net/fd_linux.go:146 +0x110
net.(*pollServer).Run(0xf840041f00, 0x0)
/net/aps/64/opt/go/src/pkg/net/fd.go:236 +0xe4
created by net.newPollServer
/net/aps/64/opt/go/src/pkg/net/newpollserver.go:35 +0x382

goroutine 6 [runnable]:
syscall.Syscall()
/net/aps/64/opt/go/src/pkg/syscall/asm_linux_amd64.s:30 +0x4b
syscall.Read(0xf800000003, 0xf840043bd0, 0x2400000024, 0x0, 0x0, ...)
/net/aps/64/opt/go/src/pkg/syscall/zerrors_linux_amd64.go:2137 +0x78
net.(*netFD).Read(0xf840086090, 0xf840043bd0, 0x2400000024, 0x2ae400000000, 0x0, ...)
/net/aps/64/opt/go/src/pkg/net/fd.go:424 +0x107
net.(*TCPConn).Read(0xf840053248, 0xf840043bd0, 0x2400000024, 0xf840078000, 0x0, ...)
/net/aps/64/opt/go/src/pkg/net/tcpsock_posix.go:87 +0xce
launchpad.net/mgo.fill(0xf840053248, 0xf840043bd0, 0x2400000024, 0x0, 0xfc3ff52f00, ...)
/net/aps/64/opt/gosite/src/launchpad.net/mgo/socket.go:367 +0x44
launchpad.net/mgo.(*mongoSocket).readLoop(0xf840078000, 0x0)
/net/aps/64/opt/gosite/src/launchpad.net/mgo/socket.go:384 +0xec
created by launchpad.net/mgo.newSocket
/net/aps/64/opt/gosite/src/launchpad.net/mgo/socket.go:111 +0x255

goroutine 8 [finalizer wait]:
created by runtime.gc
/net/aps/64/opt/go/src/pkg/runtime/mgc0.c:882



--
Peter Kleiweg
my Go programming cookbook: http://www.let.rug.nl/~kleiweg/go/

roger peppe

unread,
Jun 12, 2012, 6:26:25 AM6/12/12
to Peter Kleiweg, golan...@googlegroups.com
On 12 June 2012 11:15, Peter Kleiweg <pkle...@xs4all.nl> wrote:
>    tweet := map[string]interface{}{}
>    for iter.Next(&tweet) {            ////// THROW HAPPENS ON THIS LINE

does it make any difference if you change the previous two lines to this?

for {
var tweet map[string]interface{}
if !iter.Next(&tweet) {
break
}

Peter Kleiweg

unread,
Jun 12, 2012, 5:11:55 PM6/12/12
to golang-nuts
On 12 jun, 12:26, roger peppe <rogpe...@gmail.com> wrote:
> On 12 June 2012 11:15, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>
> >    tweet := map[string]interface{}{}
> >    for iter.Next(&tweet) {            ////// THROW HAPPENS ON THIS LINE
>
> does it make any difference if you change the previous two lines to this?
>
> for {
>     var tweet map[string]interface{}
>     if !iter.Next(&tweet) {
>         break
>     }

This doesn't change it.

Archos

unread,
Jun 12, 2012, 5:36:39 PM6/12/12
to golang-nuts

On Jun 12, 11:15 am, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> I have seen other mentions of out of memory problems,
> but these all seem to happen only on 32 bit. I am running on 64
> bit.
>
> I have this program that runs in a loop and should use a
> limited amount of memory. However, 'top' says its memory usage
> keeps growing. The program crashes with a "throw: out of
> memory", even though 'top' says there is still plenty of memory
> available.
>
> While it's still running, I see the program pause for a few
> seconds, ever couple of thousand iterations. I assume this is
> when it does garbage collecting.
>
> I have another program that is very similar that doesn't show
> these problems. So now I am trying to figure out, is there
> something wrong with my program, or with package mgo, or with Go
> itself?
I would test it with another DBMS to know if it's a Go problem.

Rémy Oudompheng

unread,
Jun 12, 2012, 5:51:36 PM6/12/12
to Peter Kleiweg, golang-nuts
How much memory is used before it crashes? Can you do some profiling?

Rémy.

Peter Kleiweg

unread,
Jun 12, 2012, 7:36:57 PM6/12/12
to golang-nuts
On 12 jun, 23:51, Rémy Oudompheng <remyoudomph...@gmail.com> wrote:
> How much memory is used before it crashes? Can you do some profiling?

I added code to keep track of how much memory the program is using. I
see a constant rise. The program is running now for half a day, it
hasn't crashed yet, but it now uses 12g of resident memory, more than
20%.

I added profiling code (runtime/pprof), but then the programs keeps
slowing down with its actual work, until it seems to be doing nothing
but garbage collecting, and still the memory consumption keeps
growing.

I removed the Upsert command from the loop, so all the program doesn't
is fetching objects from the database, processing them, but not
writing them back to the database. In this case, memory usage remains
constant.

I replaced the Upsert command with the Update command (I know I am
always updating objects, never adding new ones), and the memory
consumption keeps growing, but not as fast as when using Upsert.

I don't see how my code can be the cause of the problem. So it must be
either mgo or Go. I discovered there is a mgo-users list. Should I
move this topic to that list?

Russ Cox

unread,
Jun 12, 2012, 11:51:49 PM6/12/12
to Peter Kleiweg, golang-nuts
If you turn on profiling by just importing net/http/pprof (don't
change runtime.MemProfileRate) then you should be able to grab
profiles from the web server every once in a while (or grab them using
runtime.MemProfile and write them to successive files) for analysis
later. It would be interesting to know what the profiles look like:
where is the memory going at those times.

Turning on profiling should not cause unreasonable expense, especially
early in the life of the program, unless runtime.MemProfileRate is set
very low.

Russ

Peter Kleiweg

unread,
Jun 13, 2012, 4:56:28 AM6/13/12
to golang-nuts
On 13 jun, 01:36, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> On 12 jun, 23:51, Rémy Oudompheng <remyoudomph...@gmail.com> wrote:
>
> > How much memory is used before it crashes? Can you do some profiling?
>
> I added code to keep track of how much memory the program is using. I
> see a constant rise. The program is running now for half a day, it
> hasn't crashed yet, but it now uses 12g of resident memory, more than
> 20%.

It crashed finally, just after 'ps' reported these values:

vsz : 18182496
rss : 18119860
size : 18174776
sz : 4545624
%mem : 27.7

There were other programs running using a lot of memory.

Dave Cheney

unread,
Jun 13, 2012, 5:07:11 AM6/13/12
to Peter Kleiweg, golang-nuts
Can you post your program online, or better a short program which demonstrates the behavior?

Rémy Oudompheng

unread,
Jun 13, 2012, 6:47:15 AM6/13/12
to Peter Kleiweg, golang-nuts
it is hardcoded that the Go runtime will crash at 16GB of memory even
if you have more RAM available.

You should dump a heap profile and find where the leak comes from.

2012/6/13, Peter Kleiweg <pkle...@xs4all.nl>:

Jingcheng Zhang

unread,
Jun 13, 2012, 6:52:07 AM6/13/12
to Rémy Oudompheng, Peter Kleiweg, golang-nuts
Why 16GB memory limit hardcoded in runtime? Will it be removed in the future?
--
Best regards,
Jingcheng Zhang
Beijing, P.R.China

Peter Kleiweg

unread,
Jun 13, 2012, 7:17:38 AM6/13/12
to golang-nuts
On 13 jun, 11:07, Dave Cheney <d...@cheney.net> wrote:
> Can you post your program online, or better a short program which demonstrates the behavior?

For a demonstration, you would need a Mongo database with a lot of
data.

I posted some material here: http://pkleiweg.home.xs4all.nl/mgo/

The program, without profiling code: http://pkleiweg.home.xs4all.nl/mgo/fixdate.go

The program works on a database of tweets. Here is a sample tweet:
http://pkleiweg.home.xs4all.nl/mgo/tweet.txt

I ran the program with profiling, working on 3 million tweets, and the
output at several stages is listed here: http://pkleiweg.home.xs4all.nl/mgo/log.txt

Here is a graph that shows memory usage, as reported by 'ps':
http://pkleiweg.home.xs4all.nl/mgo/mgo.png

Peter Kleiweg

unread,
Jun 13, 2012, 9:14:09 AM6/13/12
to golang-nuts
On 13 jun, 13:17, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> On 13 jun, 11:07, Dave Cheney <d...@cheney.net> wrote:
>
> > Can you post your program online, or better a short program which demonstrates the behavior?
>
> For a demonstration, you would need a Mongo database with a lot of
> data.

You can get the database here, with instructions on how to import it:
http://www.let.rug.nl/~kleiweg/mgo/

Warning: it's 4.7 Gbytes

This database is already "fixed", so to use this for testing, you need
to change the following line in the Go program:

query := cTweets.Find(bson.M{"CREATED_AT_ZONE": bson.M{"$exists":
false }})

Change it to this:

query := cTweets.Find(nil)

The database contains little over 3 million tweets. Growth of memory
usage starts somewhere after processing 1 to 2 million tweets.

> I posted some material here: http://pkleiweg.home.xs4all.nl/mgo/
>
> The program, without profiling code:  http://pkleiweg.home.xs4all.nl/mgo/fixdate.go
>
> The program works on a database of tweets. Here is a sample tweet: http://pkleiweg.home.xs4all.nl/mgo/tweet.txt
>
> I ran the program with profiling, working on 3 million tweets, and the
> output at several stages is listed here: http://pkleiweg.home.xs4all.nl/mgo/log.txt

This was done using net/http/pprof, as suggested by Russ Cox.

The profiler reports that between the last two dumps, heap memory size
(?) dropped from 451.6 MB to 108.2 MB. But according to 'top',
resident memory size increased from 1.1g to 1.6g. Is that supposed to
happen? See the log.txt above.

Peter Kleiweg

unread,
Jun 13, 2012, 9:52:49 AM6/13/12
to golang-nuts


On 13 jun, 15:14, Peter Kleiweg <pklei...@xs4all.nl> wrote:

> You can get the database here, with instructions on how to import it: http://www.let.rug.nl/~kleiweg/mgo/
>
> Warning: it's 4.7 Gbytes

I compressed it with gzip. Now it is 742 Mbytes.

Russ Cox

unread,
Jun 13, 2012, 11:27:06 AM6/13/12
to Peter Kleiweg, golang-nuts
On Wed, Jun 13, 2012 at 9:14 AM, Peter Kleiweg <pkle...@xs4all.nl> wrote:
> The profiler reports that between the last two dumps, heap memory size
> (?) dropped from 451.6 MB to 108.2 MB. But according to 'top',
> resident memory size increased from 1.1g to 1.6g. Is that supposed to
> happen? See the log.txt above.

pprof is reporting the memory in use after the last garbage collection.
Due to having reclaimed things and also fragmentation, that can often
be smaller than what top reports. All your profiles look like this one:

Total: 246.1 MB
242.0 98.3% 98.3% 245.6 99.8% launchpad.net/mgo.(*mongoSocket).readLoop
2.1 0.9% 99.2% 2.1 0.9% launchpad.net/mgo.(*queue).expand
1.5 0.6% 99.8% 1.5 0.6% runtime.convT2E
0.5 0.2% 100.0% 0.5 0.2% reflect.mapassign

which suggests that readLoop is the one leaking memory. The next command
to run in pprof would be 'list readLoop', which will print the source code for
readLoop, with allocation counts on each line.

If you haven't seen it already,
http://blog.golang.org/2011/06/profiling-go-programs.html
might be worth a look.

My guess is that the allocations will all be on this line:

b := make([]byte, int(getInt32(s, 0)))

in which case it might help to insert

println("allocated", getInt32(s, 0))

and see what kinds of numbers you get during a problematic run.

Russ

unread,
Jun 13, 2012, 2:34:26 PM6/13/12
to golan...@googlegroups.com
A possible cause of the problem may be that the number of elements stored in http://bazaar.launchpad.net/~niemeyer/mgo/trunk/view/head:/queue.go keeps going up. I would suggest to use chan interface{} with capacity 1024. mgo uses locking when accessing the queue, so replacing []interface{} with a channel requires some redesign of the code.

Peter Kleiweg

unread,
Jun 14, 2012, 2:32:35 AM6/14/12
to golang-nuts
I have located the root of the problem. It has to do with the Prefetch
method.

To summarise:

I do a query. I use Iter.Next to access all objects returned for the
query. I process the object, and use an Update to put the modified
object back into the database. The memory usage keeps growing.

I remove the Update. Memory usage remains constant. But the Update
function is not causing the problems. The problems is in the iterator.

When you use the iterator, and the portion of still to process objects
drops below a threshold, a new batch of matching objects is fetched
from the server, and added to the queue. When the queue isn't empty,
the new objects are added to all preceding objects, and the queue
keeps growing, still containing the objects that are already
processed. (Or references to processed objects stay in existence
preventing those processed objects from being garbage collected.

What happens when I remove the Update function is, I guess, something
like this. In this case the program runs very fast, processing objects
faster than they can be retrieved, so the queue runs empty before
Prefetch has a chance to add more objects.

In may case, I could fix the program by disabling prefetching, with
the following query:

query := cTweets.Find(bson.M{"CREATED_AT_ZONE": bson.M{"$exists":
false }}).Prefetch(0.0)

Gustavo Niemeyer

unread,
Jul 2, 2012, 2:09:25 PM7/2/12
to r...@golang.org, Peter Kleiweg, golang-nuts
On Wed, Jun 13, 2012 at 12:27 PM, Russ Cox <r...@golang.org> wrote:
> Total: 246.1 MB
>    242.0  98.3%  98.3%    245.6  99.8% launchpad.net/mgo.(*mongoSocket).readLoop
>      2.1   0.9%  99.2%      2.1   0.9% launchpad.net/mgo.(*queue).expand
>      1.5   0.6%  99.8%      1.5   0.6% runtime.convT2E
>      0.5   0.2% 100.0%      0.5   0.2% reflect.mapassign
>
> which suggests that readLoop is the one leaking memory. The next command

readLoop is the place where data gets out of the database and into the
process, so it will definitely be marked as the top-level entrance for
memory consumption. The question now is what is holding up that data
in memory. If Prefetch is the issue as Peter hinted, it may turn out
that there's no leak, and it's just data sitting on the iterator
waiting to be consumed.

I'll have a deeper look, though. Thanks for debugging this over.


gustavo @ http://niemeyer.net

Gustavo Niemeyer

unread,
Jul 3, 2012, 5:28:24 PM7/3/12
to Peter Kleiweg, Russ Cox, golan...@googlegroups.com, 0xe2.0x...@gmail.com
On Mon, Jul 2, 2012 at 3:09 PM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
> readLoop is the place where data gets out of the database and into the
> process, so it will definitely be marked as the top-level entrance for
> memory consumption. The question now is what is holding up that data
> in memory. If Prefetch is the issue as Peter hinted, it may turn out
> that there's no leak, and it's just data sitting on the iterator
> waiting to be consumed.
>
> I'll have a deeper look, though. Thanks for debugging this over.

It's a bug in mgo indeed, but not actually a leak, strictly speaking.
The prefetch logic was incorrect, and ended up fetching documents
sooner than it should, which meant the backlog kept growing a bit per
iteration. At the end of the iteration, though, all documents will be
gone.

The fix is a one-liner in session.go, line 2719:

- iter.docsBeforeMore = rdocs - int(iter.prefetch*float64(rdocs))
+ iter.docsBeforeMore = iter.docData.Len() + rdocs -
int(iter.prefetch*float64(rdocs))

Sorry for the trouble.


gustavo @ http://niemeyer.net
Reply all
Reply to author
Forward
0 new messages