Short Story:
Fix for infinite log capture on "god load":
http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df1778c8a
Fix for not cleaning up threads (causing references to be kept for all
tasks) on task unregister/reload:
http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a6ff0f
Pretty graphs to show things are better:
http://skitch.com/lindvall/nga1f/god-load-memory-leak
Long Story:
I've been afflicted by the leak in "god load" for a long time now and
recently resorted to having god automatically restart itself when it
reaches a given memory size:
http://gist.github.com/216414
While this solution worked, it didn't make me very happy, and produced
some memory usage graphs that looked like this:
http://skitch.com/lindvall/nrsb6/chart-of-god-memory-usage
Through more than a few nights of trying to hunt down the leak, I
finally realized the leak was a result of issuing "god load" commands
from a long-running god instance. This explains why many people don't
suffer from this problem at the same level that others do.
In the end, two leaks were found:
1. The Driver thread was not stopped when a task was unregistered,
which caused the thread to be active and the watcher to still be
referenced. gdb.rb really helped to track this down quickly.
Fix (branch eric/unload-driver):
http://github.com/eric/god/commit/3e4fd2898cd4f92afeba120776dc3ba7f4a6ff0f
2. When "god load" is executed, a LOG.start_capture is invoked to give
nice logs if a watcher couldn't start. The problem is, the capture is
never stopped unless there's an error. This caused the log capture to
grow infinitely. In the end, it was just reading through the source
code that caused me to find this one.
Fix (branch eric/load-logger-leak):
http://github.com/eric/god/commit/83968f49057fe79dbd365734bfa2ab3df1778c8a
You can see the nice graphs of how the min/max memory usage of each
god process quit being so varied and things stabilized nicely on Nov 7
when I deployed the fixed build to the systems:
http://skitch.com/lindvall/nga1f/god-load-memory-leak
Much thanks to tmm1 for lending a hand and an ear with all of the
profilers he could think of to try to track this down, and to Scout
for tracking and generating the graphs that got me off my ass to solve
this problem for real.
If you haven't checked out tmm1's perftools.rb <
http://github.com/tmm1/
perftools.rb> and gdb.rb <
http://github.com/tmm1/gdb.rb> or Scout
<
http://scoutapp.com/> you really should.
I don't have an official build to point everyone to, but hopefully
mojombo will have a chance to release one in the next couple days.
Eric