Avoid the garbage in the first place. So, for example instead of allocating and returning a new strings in the String() methods of your object, you might want to implement the WriteTo method (or a similar interface). The standard library doesn't produce much garbage, so it's probably your program that allocates all those objects. Use the memory profiler to find those parts.
Hello everyone,
Our business suffered from an annoying problem. We are developing an
iMessage-like service in Go, the server can serves hundreds of
thousands of concurrent TCP connection per process, and it's robust
(be running for about a month), which is awesome. However, the process
consumes 16GB memory quickly, since there are so many connections,
there are also a lot of goroutines and buffered memories used. I
extend the memory limit to 64GB by changing runtime/malloc.h and
runtime/malloc.goc. It works, but brings a big problem too - The
garbage collecting process is then extremely slow, it stops the world
for about 10 seconds every 2 minutes, and brings me some problems
which are very hard to trace, for example, when stoping the world,
messages delivered may be lost. This is a disaster, since our service
is a real-time service which requires delivering messages as fast as
possible and there should be no stops and message lost at all.
I'm planning to split the "big server process" to many "small
processes" to avoid this problem (smaller memory footprint results to
smaller time stop), and waiting for Go's new GC implementation.
Or any suggestions for me to improve our service currently? I don't
know when Go's new latency-free garbage collection will occur.
Thanks.
--
Best regards,
Jingcheng Zhang
Beijing, P.R.China
--
I am curious have you considered reordering fields in structs so that pointers are packed together? I understand that it's impossible in general case (when there are sub-structs), but in otherwise I think it's OK to arbitrary reorder fields. Then you can say in the metainfo -- in this object of size 128 scan only 4 words. This is, of course, complicates things.
And additionally you know size of the pointed-to object -- it's either determined by type, or if it's a slice then size is in the subsequent word. Right?
Thanks for your explanation, so what determines the GC duration? The memory arena size? I changed the arena limit to 64GB, before the change it is fast to complete the GC. It is fine for our business to stop for about 2 seconds but bad to stop for 10 seconds.
> --
>
>
On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane <sou...@google.com> wrote:
> For vtocc (vitess), we measured an overhead of about 40K per connection. So,
> 16G sounds a little high, even for 100k connections. You may want to profile
> your memory to get a better picture of what's going on. We typically run
> anywhere betwen 5-20k connections, and rarely exceed 1G.
> Are you using Go 1? If so, you should try out a newer build with parallel
> GC. It should give you a speed up proportional to the number CPUs you have.
> If most of your memory is due to large buffer sizes, you should tone down
> GOGC lower (try 50?). This will cause the garbage collector to run more
> often, with shorter pauses. This is because the GC does not scan inside byte
> slices.
Currently we serve 600,000 concurrent, keep-alive TCP connections, per
process. The process consumes 16GB res memory, so each connection
28KB.
Go version is 1.0.3, amd64, with GOGC set to 200.
I'll tune GOGC and Scavenger's GC frequency to see if there are any
space to improve beside of code optimization.
Thanks for your help.
On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane <sou...@google.com> wrote:
> For vtocc (vitess), we measured an overhead of about 40K per connection. So,
> 16G sounds a little high, even for 100k connections. You may want to profile
> your memory to get a better picture of what's going on. We typically run
> anywhere betwen 5-20k connections, and rarely exceed 1G.
> Are you using Go 1? If so, you should try out a newer build with parallel
> GC. It should give you a speed up proportional to the number CPUs you have.
> If most of your memory is due to large buffer sizes, you should tone down
> GOGC lower (try 50?). This will cause the garbage collector to run more
> often, with shorter pauses. This is because the GC does not scan inside byte
> slices.
Currently we serve 600,000 concurrent, keep-alive TCP connections, per
process. The process consumes 16GB res memory, so each connection
28KB.
Go version is 1.0.3, amd64, with GOGC set to 200.
I'll tune GOGC and Scavenger's GC frequency to see if there are any
space to improve beside of code optimization.
Thanks for your help.600k is a lot of connections :). However, a pause time of 10 seconds seems suspicious for 16G. It should be in the ballpark of 1-2 seconds for an 8-core box. This makes me think that 1.0.3 doesn't have the parallel GC improvements. I assume you have GOMAXPROCS set correctly.
--
Tip.
--
Does this mean that there is an internal branch of the tip?
Or only update to current tip when there are some changes improving
the stability?
Thanks Ian for your explanation, so after precise GC, there should be
another improvement exist to make it latency-free (ultimately, a
precise, parallel, latency-free GC), right?
On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane <sou...@google.com> wrote:
> For vtocc (vitess), we measured an overhead of about 40K per connection. So,
> 16G sounds a little high, even for 100k connections. You may want to profile
> your memory to get a better picture of what's going on. We typically run
> anywhere betwen 5-20k connections, and rarely exceed 1G.
> Are you using Go 1? If so, you should try out a newer build with parallel
> GC. It should give you a speed up proportional to the number CPUs you have.
> If most of your memory is due to large buffer sizes, you should tone down
> GOGC lower (try 50?). This will cause the garbage collector to run more
> often, with shorter pauses. This is because the GC does not scan inside byte
> slices.
Currently we serve 600,000 concurrent, keep-alive TCP connections, per
process. The process consumes 16GB res memory, so each connection
28KB.
On Mon, Nov 19, 2012 at 4:29 PM, bryanturley <bryan...@gmail.com> wrote:
> Could manually running the gc more often help in this case? Less dead
> objects to scan perhaps.
Dead objects are not scanned. They are only sweeped.
On Mon, Nov 19, 2012 at 4:29 PM, bryanturley <bryan...@gmail.com> wrote:
> Could manually running the gc more often help in this case? Less dead
> objects to scan perhaps.
Dead objects are not scanned. They are only sweeped.
Hello everyone,
Thanks for all your help, I updated our Go version to:
go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100
and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
a big improvement!
On Wednesday, November 21, 2012 6:50:03 PM UTC+8, Jingcheng Zhang wrote:Hello everyone,
Thanks for all your help, I updated our Go version to:
go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100
and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
a big improvement!Is it possible that GC does even better?One second is still a noticeable interruption when serving game players.
Posibly , the OP has not yet provided the debugging information that was requested.
--
Hello everyone,
Thanks for all your help, I updated our Go version to:
go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100
and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
a big improvement!
Thank contributors on the new GC!
GOGCTRACE=1 ./executable
Guys,What is the best way to measure garbage collection times in GO?Thanks
On Saturday, November 17, 2012 1:29:12 AM UTC-6, Jingcheng Zhang wrote:
Hello everyone,
Our business suffered from an annoying problem. We are developing an
iMessage-like service in Go, the server can serves hundreds of
thousands of concurrent TCP connection per process, and it's robust
(be running for about a month), which is awesome. However, the process
consumes 16GB memory quickly, since there are so many connections,
there are also a lot of goroutines and buffered memories used. I
extend the memory limit to 64GB by changing runtime/malloc.h and
runtime/malloc.goc. It works, but brings a big problem too - The
garbage collecting process is then extremely slow, it stops the world
for about 10 seconds every 2 minutes, and brings me some problems
which are very hard to trace, for example, when stoping the world,
messages delivered may be lost. This is a disaster, since our service
is a real-time service which requires delivering messages as fast as
possible and there should be no stops and message lost at all.
I'm planning to split the "big server process" to many "small
processes" to avoid this problem (smaller memory footprint results to
smaller time stop), and waiting for Go's new GC implementation.
Or any suggestions for me to improve our service currently? I don't
know when Go's new latency-free garbage collection will occur.
Thanks.
--
Best regards,
Jingcheng Zhang
Beijing, P.R.China
--
GOGCTRACE=1 ./executable
On Nov 28, 2012 9:59 PM, "bryanturley" <bryan...@gmail.com> wrote:
>
> On Wednesday, November 28, 2012 2:02:18 PM UTC-6, ⚛ wrote:
>>
>> GOGCTRACE=1 ./executable
>
>
> Might help if you tell him what the fields mean exactly (from go 1.0.3, maybe less cryptic in tip)
>
> "gc63(4): 0+0+0 ms 1 -> 0 MB 8257 -> 1073 (92277-91204) objects 127 handoff"
>
> and from pkg/runtime/mgc0.c
>
> runtime·printf("gc%d(%d): %D+%D+%D ms %D -> %D MB %D -> %D (%D-%D) objects %D handoff\n",
> mstats.numgc, work.nproc, (t1-t0)/1000000, (t2-t1)/1000000, (t3-t2)/1000000,
> heap0>>20, heap1>>20, obj0, obj1,
> mstats.nmalloc, mstats.nfree,
> nhandoff);
>
> Without reading much of this code i am assuming obj0/heap0 are the before and obj1/heap1 are the after?
That is correct.
The sum of the 3 numbers before "ms" is the GC pause time.
There is also: godoc runtime MemStats
> nmalloc and nfree seem obvious enough.
nmalloc and nfree are totals since the start of the program.
> not even a guess as to what handoff is though ;)
handoff is communication between GC threads.