Solving the memory leak in "god load"

88 views
Skip to first unread message

Eric Lindvall

unread,
Nov 9, 2009, 4:11:43 PM11/9/09
to god.rb
Short Story:

Fix for infinite log capture on "god load":
http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df1778c8a

Fix for not cleaning up threads (causing references to be kept for all
tasks) on task unregister/reload:
http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a6ff0f

Pretty graphs to show things are better:
http://skitch.com/lindvall/nga1f/god-load-memory-leak

Long Story:

I've been afflicted by the leak in "god load" for a long time now and
recently resorted to having god automatically restart itself when it
reaches a given memory size: http://gist.github.com/216414

While this solution worked, it didn't make me very happy, and produced
some memory usage graphs that looked like this:
http://skitch.com/lindvall/nrsb6/chart-of-god-memory-usage

Through more than a few nights of trying to hunt down the leak, I
finally realized the leak was a result of issuing "god load" commands
from a long-running god instance. This explains why many people don't
suffer from this problem at the same level that others do.

In the end, two leaks were found:
1. The Driver thread was not stopped when a task was unregistered,
which caused the thread to be active and the watcher to still be
referenced. gdb.rb really helped to track this down quickly.
Fix (branch eric/unload-driver): http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a6ff0f

2. When "god load" is executed, a LOG.start_capture is invoked to give
nice logs if a watcher couldn't start. The problem is, the capture is
never stopped unless there's an error. This caused the log capture to
grow infinitely. In the end, it was just reading through the source
code that caused me to find this one.
Fix (branch eric/load-logger-leak):
http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df1778c8a

You can see the nice graphs of how the min/max memory usage of each
god process quit being so varied and things stabilized nicely on Nov 7
when I deployed the fixed build to the systems:
http://skitch.com/lindvall/nga1f/god-load-memory-leak

Much thanks to tmm1 for lending a hand and an ear with all of the
profilers he could think of to try to track this down, and to Scout
for tracking and generating the graphs that got me off my ass to solve
this problem for real.

If you haven't checked out tmm1's perftools.rb <http://github.com/tmm1/
perftools.rb> and gdb.rb <http://github.com/tmm1/gdb.rb> or Scout
<http://scoutapp.com/> you really should.

I don't have an official build to point everyone to, but hopefully
mojombo will have a chance to release one in the next couple days.

Eric

Jesse Newland

unread,
Nov 9, 2009, 4:24:59 PM11/9/09
to god.rb
+1

Excellent work tracking this down, Eric.

On Nov 9, 4:11 pm, Eric Lindvall <e...@5stops.com> wrote:
> Short Story:
>
> Fix for infinite log capture on "god load":
>    http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df17...
>
> Fix for not cleaning up threads (causing references to be kept for all
> tasks) on task unregister/reload:
>    http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a...
>
> Pretty graphs to show things are better:
>    http://skitch.com/lindvall/nga1f/god-load-memory-leak
>
> Long Story:
>
> I've been afflicted by the leak in "god load" for a long time now and
> recently resorted to having god automatically restart itself when it
> reaches a given memory size:http://gist.github.com/216414
>
> While this solution worked, it didn't make me very happy, and produced
> some memory usage graphs that looked like this:http://skitch.com/lindvall/nrsb6/chart-of-god-memory-usage
>
> Through more than a few nights of trying to hunt down the leak, I
> finally realized the leak was a result of issuing "god load" commands
> from a long-running god instance. This explains why many people don't
> suffer from this problem at the same level that others do.
>
> In the end, two leaks were found:
> 1. The Driver thread was not stopped when a task was unregistered,
> which caused the thread to be active and the watcher to still be
> referenced. gdb.rb really helped to track this down quickly.
> Fix (branch eric/unload-driver):http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a...
>
> 2. When "god load" is executed, a LOG.start_capture is invoked to give
> nice logs if a watcher couldn't start. The problem is, the capture is
> never stopped unless there's an error. This caused the log capture to
> grow infinitely. In the end, it was just reading through the source
> code that caused me to find this one.
> Fix (branch eric/load-logger-leak):http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df17...

lardawge

unread,
Nov 12, 2009, 8:44:32 AM11/12/09
to god.rb
Any word on when this is getting applied and released? +1 if it
matters...

On Nov 9, 4:11 pm, Eric Lindvall <e...@5stops.com> wrote:
> Short Story:
>
> Fix for infinite log capture on "god load":
>    http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df17...
>
> Fix for not cleaning up threads (causing references to be kept for all
> tasks) on task unregister/reload:
>    http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a...
>
> Pretty graphs to show things are better:
>    http://skitch.com/lindvall/nga1f/god-load-memory-leak
>
> Long Story:
>
> I've been afflicted by the leak in "god load" for a long time now and
> recently resorted to having god automatically restart itself when it
> reaches a given memory size:http://gist.github.com/216414
>
> While this solution worked, it didn't make me very happy, and produced
> some memory usage graphs that looked like this:http://skitch.com/lindvall/nrsb6/chart-of-god-memory-usage
>
> Through more than a few nights of trying to hunt down the leak, I
> finally realized the leak was a result of issuing "god load" commands
> from a long-running god instance. This explains why many people don't
> suffer from this problem at the same level that others do.
>
> In the end, two leaks were found:
> 1. The Driver thread was not stopped when a task was unregistered,
> which caused the thread to be active and the watcher to still be
> referenced. gdb.rb really helped to track this down quickly.
> Fix (branch eric/unload-driver):http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a...
>
> 2. When "god load" is executed, a LOG.start_capture is invoked to give
> nice logs if a watcher couldn't start. The problem is, the capture is
> never stopped unless there's an error. This caused the log capture to
> grow infinitely. In the end, it was just reading through the source
> code that caused me to find this one.
> Fix (branch eric/load-logger-leak):http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df17...
Reply all
Reply to author
Forward
0 new messages