performance and garbage collection

38 views
Skip to first unread message

Shane Caraveo

unread,
Jun 23, 2009, 1:09:45 PM6/23/09
to trac...@googlegroups.com
Hello,
I've been looking into performance and profiling, and noticed the
gc.collect that happens at the end of every request (in trac.web.main).
On my osx dev box, this garbage collection is taking roughly 37% of
the request time on a simple template, and removing it shows the roughly
the same percent increase in requests/second (with ab). I haven't
tested with more intensive templates (e.g. reports or timeline) where I
assume it would take less percent, however this is pretty significant.

Removing it unfortunately introduces some issues with the database pool
and postgres (may be more related to how we're using the database). It
also looks like, based on ticket 6614, that it will introduce a number
of other issues if it is removed. I haven't fully digested 6614 yet,
it's pretty lengthy.

I'm just wondering if anyone has any comments or thoughts about this.

Regards,
Shane

Christian Boos

unread,
Jun 23, 2009, 1:22:04 PM6/23/09
to trac...@googlegroups.com

It's been discussed not so long ago on Trac-Dev, see
http://groups.google.com/group/trac-dev/msg/9dcdaffccc74471c
As you can see there, I did some tests recently, but unfortunately not
on Linux, where the benefits of the explicit gc.collect() were the more
important, IIRC (not sure I ever digested #6614 either ;-) ).

It would be nice to test again on Linux. Maybe other approaches are
possible, in order to lower the frequency of the collections. In all
cases, the lesson learnt was to not necessarily trust Python to do the
right thing if it's not told to do gc explicitly, in long running
programs. Maybe this has changed with 2.6, though.

-- Christian

Shane Caraveo

unread,
Jun 23, 2009, 1:46:15 PM6/23/09
to trac...@googlegroups.com

Having dealt with gc issues with pyxpcom for several years now in long
lived multithreaded apps, python needs to be handled carefully if you
want it to auto-gc correctly, but it can be done and is often
preferable. Our problems were even more exagerated by the possibility
that a javascript or c++ component could potentially be holding onto
python objects creating circular references.

I'll dig back through the other thread and the bug. I'm also going to
profile heavier templates to see what impact gc has on those.

Shane

Shane Caraveo

unread,
Jun 23, 2009, 1:59:04 PM6/23/09
to trac...@googlegroups.com
On 6/23/09 10:22 AM, Christian Boos wrote:
> Shane Caraveo wrote:
>> Hello,
>> I've been looking into performance and profiling, and noticed the
>> gc.collect that happens at the end of every request (in trac.web.main).
>> On my osx dev box, this garbage collection is taking roughly 37% of
>> the request time on a simple template, and removing it shows the roughly
>> the same percent increase in requests/second (with ab). I haven't
>> tested with more intensive templates (e.g. reports or timeline) where I
>> assume it would take less percent, however this is pretty significant.
>>
>> Removing it unfortunately introduces some issues with the database pool
>> and postgres (may be more related to how we're using the database). It
>> also looks like, based on ticket 6614, that it will introduce a number
>> of other issues if it is removed. I haven't fully digested 6614 yet,
>> it's pretty lengthy.
>>
>> I'm just wondering if anyone has any comments or thoughts about this.
>>
>
> It's been discussed not so long ago on Trac-Dev, see
> http://groups.google.com/group/trac-dev/msg/9dcdaffccc74471c
> As you can see there, I did some tests recently, but unfortunately not
> on Linux, where the benefits of the explicit gc.collect() were the more
> important, IIRC (not sure I ever digested #6614 either ;-) ).

That thread happened while I was away. The connection pooling is
definitely a problem for me and easy to reproduce, and would have to be
the first thing I fix if I remove the garbage collection.

I would suggest using ab (apache bench) for simple stress testing.

Shane

Shane Caraveo

unread,
Jun 23, 2009, 3:09:53 PM6/23/09
to trac...@googlegroups.com

FYI, by removing gc and the repository sync for each request, I cut
request time roughly in half depending on the template being requested.
I'm going to be doing more testing on this, and looking at alternative
methods to handle these two items.

Shane

Colin Guthrie

unread,
Jun 23, 2009, 7:32:13 PM6/23/09
to trac...@googlegroups.com
'Twas brillig, and Shane Caraveo at 23/06/09 20:09 did gyre and gimble:

> FYI, by removing gc and the repository sync for each request, I cut
> request time roughly in half depending on the template being requested.
> I'm going to be doing more testing on this, and looking at alternative
> methods to handle these two items.

I found that with the git plugin the sync on every request was *really*
impacting performance.

I hacked up a simple option to disable this automatic sync and set it in
the trac.ini file. This really helped at runtime.

Then in a second change I hacked up a mod to the trac-admin interface to
allow for the incremental sync to be called manually. This call was then
hooked up to the git receive hook (or whatever it's called) so that when
the repository is updated trac was updated, but not at any other times.

I've dug out the patches (they were a bit rough and ready but very
simple) and perhaps someone can apply them for the upcoming release (I
think they are safe enough to add in as it doesn't touch the default
behaviour).

Col

Index: trac/versioncontrol/api.py
===================================================================
--- trac/versioncontrol/api.py (revision 8289)
+++ trac/versioncontrol/api.py (working copy)
@@ -81,7 +81,8 @@

def pre_process_request(self, req, handler):
from trac.web.chrome import Chrome, add_warning
- if handler is not Chrome(self.env):
+ nosync = self.env.config.getbool('trac',
'disable_repository_autosync')
+ if not nosync and handler is not Chrome(self.env):
try:
self.get_repository(req.authname).sync()
except TracError, e:
Index: trac/admin/console.py
===================================================================
--- trac/admin/console.py (revision 8289)
+++ trac/admin/console.py (working copy)
@@ -645,7 +645,7 @@
config_path=os.path.join(self.envname, 'conf', 'trac.ini')))

_help_resync = [('resync', 'Re-synchronize trac with the repository'),
- ('resync <rev>', 'Re-synchronize only the given
<rev>')]
+ ('resync <rev>', 'Re-synchronize only the given
<rev> (if rev is "--latest" it will just sync the repository)')]

def _resync_feedback(self, rev):
sys.stdout.write(' [%s]\r' % rev)
@@ -658,6 +658,10 @@
if argv:
rev = argv[0]
if rev:
+ if "--latest" == rev:
+ repos =
env.get_repository().sync(self._resync_feedback)
+ printout(_("Done."))
+ return
env.get_repository().sync_changeset(rev)
printout(_("%(rev)s resynced.", rev=rev))
return

--

Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited [http://www.tribalogic.net/]
Open Source:
Mandriva Linux Contributor [http://www.mandriva.com/]
PulseAudio Hacker [http://www.pulseaudio.org/]
Trac Hacker [http://trac.edgewall.org/]

Remy Blank

unread,
Jun 23, 2009, 8:00:12 PM6/23/09
to trac...@googlegroups.com
Shane Caraveo wrote:
> FYI, by removing gc and the repository sync for each request, I cut
> request time roughly in half depending on the template being requested.
> I'm going to be doing more testing on this, and looking at alternative
> methods to handle these two items.

The multirepos branch already solves this by requiring post-commit hooks
in all repositories to call "trac-admin changeset added", which triggers
the sync. The default repository is still synced at every request (for
backward compatibility), but you can choose not to have a default
repository at all. Maybe we should even allow disabling that per-request
sync with an option in trac.ini.

-- Remy

signature.asc

Noah Kantrowitz

unread,
Jun 25, 2009, 3:50:34 AM6/25/09
to trac...@googlegroups.com
Pretty sure trac-hacks has the automated sync disabled too. I know
Michael moved a lot of those maintenance tasks into a cron job that
runs every few minutes.

--Noah

Reply all
Reply to author
Forward
0 new messages