Help with strange out of memory error

408 views
Skip to first unread message

Charles Vejnar

unread,
Feb 20, 2014, 12:05:03 PM2/20/14
to golan...@googlegroups.com
Hi,

I started to use Go just a few weeks ago and find it amazing; I am coming from Python, and it was quite easy to start and get very efficient code (compared to Python). Great job!

I have my first problem that I am not able to solve. Sorry if it's actually a quick fix; I am still new to Go.

I have a large tab-separated file (~15GiB) like this:
name1 xx gg ss gg
name2 gg tt 3d4 bbgf sdfh
name1 h ysfh gsfhg


(To create an input file, you can use the attached gen.py; sorry it's in Python)

The first column is a name that can be seen multiple times in different line: I would like to add a column at the end of each line with number of time the name appears in the file:
name1 xx gg ss gg 2
name2 gg tt 3d4 bbgf sdfh 1
name1 h ysfh gsfhg 2

I wrote a simple solution (add_col.go): I first create a map with all names and counts, then re-open the input file and write the output file with the added count.

On small input file it runs as expected. But with a large input file, it dies with an "out of memory" error. The most surprising is that it dies *after* the big map is done while my program is writing the new file. I did some memory profiling and indeed, the memory consumption increases while writing the new file, which I don't understand. Also the total RAM used (I check with top and time util) is 34969728kB, far from the available RAM (I am using Go 1.2 on Archlinux, AMD64 with 128MB of RAM); I confirmed it by starting another program that was able to use much more RAM (80GiB); also no limit are set with ulimit (I checked).

The line that apparently makes the program to die:
line, err := tab.Reader.ReadString('\n')

I also tried to add "runtime.GC()": for smaller input file it solved the problem, but not for bigger ones.

Any idea how to fix this?

Thanks for help,

Charles

---------------------
The stack when it dies:

$ ./xtime go run ./add_col.go out.tab out_wnh.tab
Closing  &{0xc21000a2d0}
Output
GC
GC
GC
GC
fatal error: runtime: out of memory

goroutine 1 [running]:
runtime.throw(0x595257)
        /usr/lib/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7f3c2f74b9c0
runtime.SysMap(0xc9c6960000, 0x100000, 0x5a1bb8)
        /usr/lib/go/src/pkg/runtime/mem_linux.c:131 +0xfe fp=0x7f3c2f74b9f0
runtime.MHeap_SysAlloc(0x5abb00, 0x100000)
        /usr/lib/go/src/pkg/runtime/malloc.goc:473 +0x10a fp=0x7f3c2f74ba30
MHeap_Grow(0x5abb00, 0x10)
        /usr/lib/go/src/pkg/runtime/mheap.c:241 +0x5d fp=0x7f3c2f74ba70
MHeap_AllocLocked(0x5abb00, 0x1, 0x11)
        /usr/lib/go/src/pkg/runtime/mheap.c:126 +0x305 fp=0x7f3c2f74bab0
runtime.MHeap_Alloc(0x5abb00, 0x1, 0x11, 0x7f3c00000001)
        /usr/lib/go/src/pkg/runtime/mheap.c:95 +0x7b fp=0x7f3c2f74bad8
MCentral_Grow(0x5b3760)
        /usr/lib/go/src/pkg/runtime/mcentral.c:180 +0x8c fp=0x7f3c2f74bb38
runtime.MCentral_AllocList(0x5b3760, 0x7f3c2f8e2120)
        /usr/lib/go/src/pkg/runtime/mcentral.c:46 +0x4f fp=0x7f3c2f74bb60
runtime.MCache_Refill(0x7f3c2f8e2000, 0x11)
        /usr/lib/go/src/pkg/runtime/mcache.c:22 +0x7c fp=0x7f3c2f74bb80
runtime.mallocgc(0x100, 0x48cfe1, 0x1)
        /usr/lib/go/src/pkg/runtime/malloc.goc:71 +0xff fp=0x7f3c2f74bbf0
cnew(0x48cfe0, 0xf2, 0x7f3b00000001)
        /usr/lib/go/src/pkg/runtime/malloc.goc:718 +0xc1 fp=0x7f3c2f74bc10
runtime.cnewarray(0x48cfe0, 0xf2)
        /usr/lib/go/src/pkg/runtime/malloc.goc:731 +0x3a fp=0x7f3c2f74bc30
makeslice1(0x487d40, 0xf2, 0xf2, 0x7f3c2f74bc90)
        /usr/lib/go/src/pkg/runtime/slice.c:57 +0x4d fp=0x7f3c2f74bc48
runtime.makeslice(0x487d40, 0xf2, 0xf2, 0xf2, 0xf2, ...)
        /usr/lib/go/src/pkg/runtime/slice.c:38 +0x98 fp=0x7f3c2f74bc78
bufio.(*Reader).ReadBytes(0xc5dd88d600, 0xa, 0x0, 0x0, 0x0, ...)
        /usr/lib/go/src/pkg/bufio/bufio.go:378 +0x15b fp=0x7f3c2f74bd98
bufio.(*Reader).ReadString(0xc5dd88d600, 0xa, 0x0, 0x0, 0x0, ...)
        /usr/lib/go/src/pkg/bufio/bufio.go:395 +0x55 fp=0x7f3c2f74bdd8
main.main()
        /home/charles/work/seq/dvt/add_col.go:87 +0x470 fp=0x7f3c2f74bf48
runtime.main()
        /usr/lib/go/src/pkg/runtime/proc.c:220 +0x11f fp=0x7f3c2f74bfa0
runtime.goexit()
        /usr/lib/go/src/pkg/runtime/proc.c:1394 fp=0x7f3c2f74bfa8
exit status 2
Command exited with non-zero status 1
119.94u 21.58s 141.32r 34969728kB go run./add_col.go out.tab out_wnh.tab

add_col.go
gen.py

Rodrigo Kochenburger

unread,
Feb 20, 2014, 12:12:06 PM2/20/14
to Charles Vejnar, golan...@googlegroups.com
You said the problem happen when writing the file but isn't tab.Reader.Read reading the file?

Loading 15gb of data into a map on a computer with 128mb of memory will likely run out of memory.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
- RK

Charles Vejnar

unread,
Feb 20, 2014, 12:22:05 PM2/20/14
to golan...@googlegroups.com, Charles Vejnar
Sorry: 128GB of RAM of course, not 128MB :)


> You said the problem happen when writing the file but isn't tab.Reader.Read reading the file?
Yes, but what I mean by that is: the program doesn't die while creating the big map, but when I read again the file in order to add the column with the count. Actually it dies w/o without even writing it (I commented out that part in add_col.go).

Charles
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
- RK

Message has been deleted

Gyepi SAM

unread,
Feb 20, 2014, 3:28:56 PM2/20/14
to Charles Vejnar, golan...@googlegroups.com
On Thu, Feb 20, 2014 at 09:05:03AM -0800, Charles Vejnar wrote:
> I wrote a simple solution (add_col.go): I first create a map with all names
> and counts, then re-open the input file and write the output file with the
> added count.
>
> On small input file it runs as expected. But with a large input file, it
> dies with an "out of memory" error. The most surprising is that it dies
> *after* the big map is done while my program is writing the new file. I did
> some memory profiling and indeed, the memory consumption increases while
> writing the new file, which I don't understand. Also the total RAM used (I
> check with top and time util) is 34969728kB, far from the available RAM (I
> am using Go 1.2 on Archlinux, AMD64 with 128MB of RAM); I confirmed it by
> starting another program that was able to use much more RAM (80GiB); also
> no limit are set with ulimit (I checked).
>
> The line that apparently makes the program to die:
> line, err := tab.Reader.ReadString('\n')

ReadString (actually ReadBytes) slurps input until it finds the needle.
In this case, the input is large. My guess is that you are running out of memory because of
fragmentation: there's enough memory available, just not a large enough chunk.

You could free up some memory by changing the second

tab := OpenTab(os.Args[1])

to
tab = OpenTab(os.Args[1])

Otherwise the first one (with its large buffer) is still referenced and its
rather large buffer is not garbage collected.

For a problem like this, I would use mmap and alternately scan for '\t'
and '\n'. The output can be simplified as well to use much less memory
if you create an array of line ending offsets during the first pass
and use that to control the second reads rather than ReadString().

-Gyepi

Charles Vejnar

unread,
Feb 20, 2014, 5:20:49 PM2/20/14
to golan...@googlegroups.com, Charles Vejnar, self-...@gyepi.com


Le jeudi 20 février 2014 15:28:56 UTC-5, Gyepi SAM a écrit :
On Thu, Feb 20, 2014 at 09:05:03AM -0800, Charles Vejnar wrote:
> I wrote a simple solution (add_col.go): I first create a map with all names
> and counts, then re-open the input file and write the output file with the
> added count.
>
> On small input file it runs as expected. But with a large input file, it
> dies with an "out of memory" error. The most surprising is that it dies
> *after* the big map is done while my program is writing the new file. I did
> some memory profiling and indeed, the memory consumption increases while
> writing the new file, which I don't understand. Also the total RAM used (I
> check with top and time util) is 34969728kB, far from the available RAM (I
> am using Go 1.2 on Archlinux, AMD64 with 128MB of RAM); I confirmed it by
> starting another program that was able to use much more RAM (80GiB); also
> no limit are set with ulimit (I checked).
>
> The line that apparently makes the program to die:
> line, err := tab.Reader.ReadString('\n')


Thanks you very much for your reply.
 
ReadString (actually ReadBytes) slurps input until it finds the needle.
In this case, the input is large. My guess is that you are running out of memory because of
fragmentation: there's enough memory available, just not a large enough chunk.

Sorry, but I don't understand that. Each line has a maximum of about 400 letters. How can it be possible to run out of space to store 400 letters?

Is there anyway to reuse always the same memory to store each line?
 
You could free up some memory by changing the second

        tab := OpenTab(os.Args[1])

to
        tab = OpenTab(os.Args[1])

I did that. It might have made the situation better but it still died out of memory.
 
Otherwise the first one (with its large buffer) is still referenced and its
rather large buffer is not garbage collected.

For a problem like this, I would use mmap and alternately scan for '\t'
and '\n'. The output can be simplified as well to use much less memory
if you create an array of line ending offsets during the first pass
and use that to control the second reads rather than ReadString().

Thanks for the suggestion.

Charles

 
-Gyepi

Dave Cheney

unread,
Feb 20, 2014, 6:51:57 PM2/20/14
to golan...@googlegroups.com, Charles Vejnar, self-...@gyepi.com
The bug is earlier in the program.

http://play.golang.org/p/EX0Fvdm-nx

the name var contains the whole line, not the first element, so you are paying for the cost of storing the whole file in memory plus the slice overhead.

The current Go runtime limits the heap on 64bit machines to 16Gb, which is only slightly larger than your input text.

Charles Vejnar

unread,
Feb 20, 2014, 8:58:52 PM2/20/14
to golan...@googlegroups.com, Charles Vejnar, self-...@gyepi.com
Thanks for your reply.

I changed the line to (1 to 2):
name := strings.SplitAfterN(line, "\t", 2)[0]

But it still dies out of memory. To my surprise, actually, the RAM consumption does drop that much (29G to 26G)

Here the maximum seems to be 128GB:
https://code.google.com/p/go/source/browse/src/pkg/runtime/malloc.goc?name=release-branch.go1.2

Charles

Dave Cheney

unread,
Feb 20, 2014, 9:59:56 PM2/20/14
to golan...@googlegroups.com, Charles Vejnar, self-...@gyepi.com
There are probably a few issues,

The first is the key is random, I ran your gen.py out to 1.2gb and found few duplicates. 8 bytes per key plus overhead of the map and string header is going to cost a lot.

The second is that strings.Split returns a subslice of the original string, which itself may be a subslice of a larger string. In essence using the key in that way will pin all the memory that the original string occupied.

Try something like this to work around that issue

        name := strings.SplitAfterN(line, "\t", 2)[0]
        refsMultis[string([]byte(name))]++


Finally, the runtime: out of memory error comes from the garbage collector. It is trying to grow the underlying heap (double it) and the operating system said that was not possible.

You're going to have to do some debugging here because I don't have hardware capable of replicating your results.

Run your program with strace, something like 

 strace -e trace=mmap

and paste the results. Don't forget to use go build to build the program, don't use go run our you'll be stracing the wrapper process.

Your machine may have 128gb but there may be other factors limiting of fragmenting the virtual address space available to the process.

On Friday, 21 February 2014 12:58:52 UTC+11, Charles Vejnar wrote:
Thanks for your reply.

I changed the line to (1 to 2):
name := strings.SplitAfterN(line, "\t", 2)[0]

But it still dies out of memory. To my surprise, actually, the RAM consumption does drop that much (29G to 26G)

Here the maximum seems to be 128GB:
https://code.google.com/p/go/source/browse/src/pkg/runtime/malloc.goc?name=release-branch.go1.2

You are correct, I was mistaken.

Charles Vejnar

unread,
Feb 21, 2014, 2:56:56 PM2/21/14
to golan...@googlegroups.com, Charles Vejnar, self-...@gyepi.com
Hi,

I tried the string([]byte()) work around, and it did drastically reduce the RAM consumption. Thanks again for your help. Is there anyway to do it better than this, I mean getting only the substring?

Now the program doesn't die with a ~15GiB input. I tried with a bigger input (~60GiB). The program dies with out of memory but while creating the big map, which makes more sense. But free RAM is still available at that moment. I guess the problem might be RAM fragmenting? (I don't understand why the free memory can't be used; if you can explain that would be great) Anyway around that? Pre-allocating a big heap?

I did the strace with the string([]byte()) work around. I attached it.

Charles
trace.txt

Dave Cheney

unread,
Feb 21, 2014, 3:10:16 PM2/21/14
to Charles Vejnar, golang-nuts, self-...@gyepi.com
Can you post the output of pmap $THEPIDOFYOURPROGRAM while it is running, the closer to the point it panics the better. Also post the complete panic output.

Cheers

Dave


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/xWJ1_r3cCao/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

peterGo

unread,
Feb 21, 2014, 5:00:02 PM2/21/14
to golan...@googlegroups.com
Charles,

These are my results from the attached program.

$ go version
go version devel +b5eda189b974 Thu Feb 20 16:18:05 2014 -0500 linux/amd64
$ go fmt && go build peter.go && time ./peter ~/gen.txt ~/results.txt
File: gen.txt 14460003485
Counts: 59929991
Memory: 5253939872 7041056112
Memory: 5138367656 7521066416
real    9m4.920s
user    1m47.372s
sys    0m32.620s
$

The machine has 8GB of memory and no swap space.

What results do you get?

Peter
peter.go

Charles Vejnar

unread,
Feb 25, 2014, 2:38:50 PM2/25/14
to golan...@googlegroups.com
Hi Peter,

I tested your solution. It's great! It uses less RAM than mine (from 13G to 7G with the same input of 14G) and it's about 3 times faster. I also tested with my larger input (69G): it used 39G of RAM and worked perfectly.

As you read bytes, and later convert to string, you don't have the problem I had.


Thanks again for your help.

Charles

Charles Vejnar

unread,
Feb 25, 2014, 4:49:13 PM2/25/14
to golan...@googlegroups.com
Actually the output that I get is:

File: in.tab 14460002275
Counts: 59929985
Memory: 5738225984 7527432000
Memory: 6698237024 8487443040

The memory increased during writing for me; for you it did not. This is weird no? Any idea?


Charles

Le vendredi 21 février 2014 17:00:02 UTC-5, peterGo a écrit :
Reply all
Reply to author
Forward
0 new messages