Hi,
I started to use Go just a few weeks ago and find it amazing; I am coming from Python, and it was quite easy to start and get very efficient code (compared to Python). Great job!
I have my first problem that I am not able to solve. Sorry if it's actually a quick fix; I am still new to Go.
I have a large tab-separated file (~15GiB) like this:
name1 xx gg ss gg
name2 gg tt 3d4 bbgf sdfh
name1 h ysfh gsfhg
(To create an input file, you can use the attached gen.py; sorry it's in Python)
The first column is a name that can be seen multiple times in different line: I would like to add a column at the end of each line with number of time the name appears in the file:
name1 xx gg ss gg 2
name2 gg tt 3d4 bbgf sdfh 1
name1 h ysfh gsfhg 2
I wrote a simple solution (add_col.go): I first create a map with all names and counts, then re-open the input file and write the output file with the added count.
On small input file it runs as expected. But with a large input file, it dies with an "out of memory" error. The most surprising is that it dies *after* the big map is done while my program is writing the new file. I did some memory profiling and indeed, the memory consumption increases while writing the new file, which I don't understand. Also the total RAM used (I check with top and time util) is 34969728kB, far from the available RAM (I am using Go 1.2 on Archlinux, AMD64 with 128MB of RAM); I confirmed it by starting another program that was able to use much more RAM (80GiB); also no limit are set with ulimit (I checked).
The line that apparently makes the program to die:
line, err := tab.Reader.ReadString('\n')
I also tried to add "runtime.GC()": for smaller input file it solved the problem, but not for bigger ones.
Any idea how to fix this?
Thanks for help,
Charles
---------------------
The stack when it dies:
$ ./xtime go run ./add_col.go out.tab out_wnh.tab
Closing &{0xc21000a2d0}
Output
GC
GC
GC
GC
fatal error: runtime: out of memory
goroutine 1 [running]:
runtime.throw(0x595257)
/usr/lib/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7f3c2f74b9c0
runtime.SysMap(0xc9c6960000, 0x100000, 0x5a1bb8)
/usr/lib/go/src/pkg/runtime/mem_linux.c:131 +0xfe fp=0x7f3c2f74b9f0
runtime.MHeap_SysAlloc(0x5abb00, 0x100000)
/usr/lib/go/src/pkg/runtime/malloc.goc:473 +0x10a fp=0x7f3c2f74ba30
MHeap_Grow(0x5abb00, 0x10)
/usr/lib/go/src/pkg/runtime/mheap.c:241 +0x5d fp=0x7f3c2f74ba70
MHeap_AllocLocked(0x5abb00, 0x1, 0x11)
/usr/lib/go/src/pkg/runtime/mheap.c:126 +0x305 fp=0x7f3c2f74bab0
runtime.MHeap_Alloc(0x5abb00, 0x1, 0x11, 0x7f3c00000001)
/usr/lib/go/src/pkg/runtime/mheap.c:95 +0x7b fp=0x7f3c2f74bad8
MCentral_Grow(0x5b3760)
/usr/lib/go/src/pkg/runtime/mcentral.c:180 +0x8c fp=0x7f3c2f74bb38
runtime.MCentral_AllocList(0x5b3760, 0x7f3c2f8e2120)
/usr/lib/go/src/pkg/runtime/mcentral.c:46 +0x4f fp=0x7f3c2f74bb60
runtime.MCache_Refill(0x7f3c2f8e2000, 0x11)
/usr/lib/go/src/pkg/runtime/mcache.c:22 +0x7c fp=0x7f3c2f74bb80
runtime.mallocgc(0x100, 0x48cfe1, 0x1)
/usr/lib/go/src/pkg/runtime/malloc.goc:71 +0xff fp=0x7f3c2f74bbf0
cnew(0x48cfe0, 0xf2, 0x7f3b00000001)
/usr/lib/go/src/pkg/runtime/malloc.goc:718 +0xc1 fp=0x7f3c2f74bc10
runtime.cnewarray(0x48cfe0, 0xf2)
/usr/lib/go/src/pkg/runtime/malloc.goc:731 +0x3a fp=0x7f3c2f74bc30
makeslice1(0x487d40, 0xf2, 0xf2, 0x7f3c2f74bc90)
/usr/lib/go/src/pkg/runtime/slice.c:57 +0x4d fp=0x7f3c2f74bc48
runtime.makeslice(0x487d40, 0xf2, 0xf2, 0xf2, 0xf2, ...)
/usr/lib/go/src/pkg/runtime/slice.c:38 +0x98 fp=0x7f3c2f74bc78
bufio.(*Reader).ReadBytes(0xc5dd88d600, 0xa, 0x0, 0x0, 0x0, ...)
/usr/lib/go/src/pkg/bufio/bufio.go:378 +0x15b fp=0x7f3c2f74bd98
bufio.(*Reader).ReadString(0xc5dd88d600, 0xa, 0x0, 0x0, 0x0, ...)
/usr/lib/go/src/pkg/bufio/bufio.go:395 +0x55 fp=0x7f3c2f74bdd8
main.main()
/home/charles/work/seq/dvt/add_col.go:87 +0x470 fp=0x7f3c2f74bf48
runtime.main()
/usr/lib/go/src/pkg/runtime/proc.c:220 +0x11f fp=0x7f3c2f74bfa0
runtime.goexit()
/usr/lib/go/src/pkg/runtime/proc.c:1394 fp=0x7f3c2f74bfa8
exit status 2
Command exited with non-zero status 1
119.94u 21.58s 141.32r 34969728kB go run./add_col.go out.tab out_wnh.tab