my various performance tweaks to trac

17 views
Skip to first unread message

Shane Caraveo

unread,
Jul 20, 2009, 9:25:57 PM7/20/09
to trac...@googlegroups.com
I was going to do this as a ticket since it's so long, but edgwall likes
to think of me as a spammer sometimes. This was partly written as a
response/suggestion to people posting on ticket 7490 as well, so
includes suggestions.

Shane

----

I've recently reduced per request time for *simple* template generation
from 150ms to 45ms on my laptop (dual 1.6ghz core duo 2GB, tested with
ab against a simple template). I also reduced page load times (reported
via google page load/firebug) to around a second (from 2-8 seconds in
some cases). These timings are without any network overhead since I'm
testing on the local machine. Times are all taken on my laptop with my
typical environment/applications open (lots of crap).

This is all done with 11.5.

Even with the gains I still get 100% cpu spikes every request, but at
least it's a shorter duration. Still digging further on that. I also
have done a bit on overall browser page load.

Here's my current brain dump. If some of my conclusions are silly,
please let me know why.

== General ==

In general there are only a couple big wins. For me it was removing
gc.collect (see trac.main) and the timing and estimation plugin.
Everything else was small potatoes in comparison (10ms here, 5ms there),
but together they added up to a good 40-50ms per request. Think of it
this way, using 100%cpu and 50ms/request limits you to a max of 20
requests/second/cpu. Every ms counts if we want decent throughput. I'd
like to get under 30ms.

=== How I timed ===

My test setup is apache prefork with wsgi deamon mode, one process,
mod_deflate enabled. My laptop is pretty loaded, very little free
memory but generally enough that trac shouldn't be swapping. Postgres
is the database, running on the laptop as well.

For timing template generation I used ApacheBench and tossed out
"warmup" requests, sometimes testing with keep alive (getting slightly
better req/s)

{{{
ab [-k] -c 1 -n 10 url
}}}

With zero network latency, the timing is essentially how much time trac
is taking to generate the content. I tested against a simple template
generated from a small plugin that just renders a template file (this is
to involve as little other processing such as querying the ticket table
or permissions), and against WikiStart.

For "page load" I used Google Page Load extension for firebug and
firefox 3.5. I also tested with Safari 4 which has similar
functionality now, it's much faster than firefox :(

I tested with plugins enabled since I wanted to also discover whether
any plugins were causing me grief, some were.

I tested against empty projects, so table indexing or table size are not
involved. I also tested against an small sized project in some
instances, which showed up issues in a couple plugins (eg. t&e).

=== Profiling issues ===

It seems to me that cProfile adds times from generators to the function
that uses them, masking where the real time is spent. That makes
profiling Genshi terribly difficult. I also tried dtrace out, it gives
the same rough data as cProfile. Knowing a tiny bit about the python
engine (having written a c level debugger extension), I am figuring that
it has something to do with how generators work in the python core.

A couple areas that cProfile claimed a lot of exclusive time in Genshi I
rewrote in C. This didn't change my performance at all, but did shift
timing to other area's which helped me find a couple of the actual time
sinks in Genshi (see below).

== For general performance ==

Use either a multithreaded apache if you use mod_python, or use wsgi in
multithreaded deamon mode. It wont remove high-cpu slow requests, but
it does lower memory use to some extent. While that didn't improve the
per-request performance, I got higher concurrent throughput with that
setup. Apache prefork with mod_python or wsgi in embeded mode will
chew up memory, and due to high cpu usage, processes will be fighting
each other more for resources.

=== customizations and plugins ===

The single largest gain I got with plugins was by removing timing and
estimation plugin (primarily seen in ticket reports page where it does
another sql query for each ticket, doubling the request time). It's
stream filters slow lots of stuff down. Removing other plugins with
stream filters or reworking them to not use stream filters when possible
is a win. In some instances, TracHours plugin is a better solution if
you don't need all the t&e features.

Get rid of stuff like the google analytics plugin and just add the code
to a site or theme template. You'll get better performance. Its not
that it's a bad plugin, but it does something simple that adds *genshi*
overhead.

Examine any custom templates for extraneous py:match use, get rid of it.
I had one match in the site template that was removing something from
an admin panel, which I had forgotten about, big mistake and it cost on
the performance of *every* page. I directly patched the template
instead and saved myself 10ms per request. This kind of thing doesn't
show up in the profiler, it was some debug dumps I inserted deep into
the Genshi bowels that showed me the light.

Make sure all static resources for all plugins are exported. trac-admin
deploy will only deploy static resources for enabled plugins in that
project.

=== javascript ===

This is mostly about page load times.

minify all the javascript output by trac-admin deploy, and make sure it
gets cache headers via apache conf.

Upgrade to jquery 1.3, it's much faster. trac trunk has done this, and
there is a diff somewhere that shows what type of changes have to be
made. You'd have to examine any plugins for similar jquery updates that
need to be done.

If you use firebug, be sure you have disabled it (or at least disable
the console tab) for trac pages, the console somehow interrupts and adds
a couple seconds delay in the browser.

=== css ===

I haven't got to this yet, but there is a lot of slow css there.
Removing all the css as a test, my page load in the browser was 200ms
faster. Google Page Load is good at reporting what css *might* be slow,
it reports lots of inefficient css in trac.

== trac issues ==

=== gc.collect ===

The next big item was removing gc.collect. If I recall correctly,
gc.collect took at least 50ms of the request time, sometimes upwards of
100ms. It also hides other bugs (ticket:8443 for example, the postgres
idle bug came back without gc.collect, I've got a patch in that bug).
gc.collect could be done in a background thread from tracd, but I
wouldn't suggest background threads for mod_python prefork. With wsgi
daemon mode, just limit the number of requests before restart to
something you think is fine and test memory consumption. I think the
use of gc.collect should be seriously reconsidered.

I have a sneaking suspicion (unproven) that people who use mod_python
and claim turning off keep alive and/or mod_deflate are having problems
due to gc.collect. As I understand apache filters (e.g. mod_deflate)
they wont finish up the request until the mod_python handler has
returned. So that extra time in gc.collect delays the request being
returned, which delays mod_deflate finishing. It also delays the start
of the next request over a persistent socket connection (ie. keep alive).

=== filters ===

Going through all the match_request implementations and removing
permission checking (put it in process_request), make sure regex matches
are precompiled, and generally simplifying things helps. There are a
number of those in trac core, but plugins are greater abusers in this
area. Also examine any IRequestFilter use and simplify.

=== trac.config ===

Other than Genshi, profiling showed trac.config to be the single largest
time on simple template generation. Profiling showed me that the get
method in the Configuation class (trac.config.Configuration) was slow.
I added caching there for a few extra milliseconds boost. I'm also
looking at removing the auto-reload if the ini file changes, maybe using
spread or memcached to create reload notifications, to get rid of file
stats, but timing doesn't show this to be a large issue on my laptop.

=== repository ===

while I still want to remove the sync on every request (get rid of
system calls/file stats), I have been unable to show that performance
changes much when I test with it removed. There are still bigger fish
to fry.

=== database pool ===

Watching the postgres log file, I can tell that a lot of cursors get
created without being actually used to do a query. This shows up
because psycopg2 executes a BEGIN when a cursor is created. I haven't
yet looked into where that is happening, but it's extra work the system
is doing for nothing.

=== wiki ===

The current wiki parser design is slow, doing the same large regex over
each line. I think a general redesign to use re.finditer, rather than
line split then re.match, would help improve performance here. However,
post-caching the content would be better. I'm experimenting with
partial caching of the wiki content and have reduced my request timing
on WikiStart from 120ms to 75ms while still getting correct content.
The patch I have doesn't cache macro's unless the macro arguments have
'cache' in them (good for page outline macro, which btw seems to
re-parse the entire page, macros included). There may be other issues
with the approach I have taken, I haven't tested it much. Once I get
further, I might actually pre-process when the wiki page is saved and
stash the result into the database, avoiding the parse in most requests.

== Genshi ==

Use Genshi trunk, some performance gain there.

With Genshi, one big win is turning off the whitespace filter, I think I
got around 10ms on a small page, the gain would be larger on larger
pages. I did that by patching trac.web.chrome where there is a call to
stream.render, and adding strip_whitespace=False. While this increases
the file size delivered slightly (and adds a lot of whitespace to the
source view), that can be countered by mod_deflate. The namespace
flattener is another slow filter, but it's necessary.

Another win with Genshi is patching genshi.output.encode and getting rid
of the iteration that calls _encode a bazillion times. Instead, just do:

{{{
out.write(_encode(u''.join(list(iterator))))
}}}

This is much faster.

I should note, I also tried changing trac so that the output went
straight through to the wsgi layer (rather than buffering it in a
cStringIO object). While this improved the initial wait time
dramatically, it actually slowed down the overall request time. I
believe this slowdown was also due to the _encode calls inside the
iteration over the stream. I'm still going to experiment with this more
at some point.

Overall, I like Genshi, now that I kind of get how it works it has a
certain elegance, but I am left unconvinced that it's design and
flexibility outweighs the performance issues, and the added difficulty
in figuring them out (refer to profiling at the top). One way I'm
thinking of working around this is to beef up the xmlrpc interfaces and
do everything with ajax.

== Misc ==

=== plugins ===

I don't think most plugin authors know where they will generate
performance penalties. It's hard to know all those details without
spending a lot of time on it.

generally pay attention to any plugins that use stream filters or
request filters. These are called for every request and more often than
not do some thing that is time intensive. Additionally, IMO, a request
handlers match_request should never do anything beyond
req.path_info.startswith(/mypath). Many match_request implementations
do a lot more, which gets executed for every request.

=== wysiwyg plugin ===

this adds a large js file to *every* request whether useful or not. I
removed the large script from the request filter, then I changed the
load script to check and see if there is a textarea element in the dom,
then load the larger script if necessary.


Remy Blank

unread,
Jul 21, 2009, 1:35:47 AM7/21/09
to trac...@googlegroups.com
Shane Caraveo wrote:
> I was going to do this as a ticket since it's so long, but edgwall likes
> to think of me as a spammer sometimes. This was partly written as a
> response/suggestion to people posting on ticket 7490 as well, so
> includes suggestions.

This is invaluable information, thanks! Profiling Trac was also on my
to-do list for this summer, but you've done it better than I could have
dreamed of doing it.

I'd really like to keep this information close at hand. Could I convince
you to keep your notes on the t.e.o wiki instead of (or in addition to)
the mailing list?

Also, any chance that you could document how to set up profiling so that
other developers (also plugin developers) could set it up on their machine?

-- Remy

signature.asc

Christian Boos

unread,
Jul 22, 2009, 9:08:56 AM7/22/09
to trac...@googlegroups.com
Hello Shane,

First, great job at looking into the bowels of Trac ;-)
Then, as a general comment, I see that some of your suggestions actually
go against some of the changes I did in #6614, so we have not
surprisingly a trade-off of memory vs. speed. In some environments where
memory is strictly restricted, we have no choice but to optimize for
memory, at the detriment of speed. But in most environments, the extra
memory needed for achieving decent speed might be affordable. So I see
here the opportunity for a configuration setting, something like [trac]
favor_speed_over_memory, defaulting to true, that people having low
resources could turn off.

Now let me comment some of your observations.


Shane Caraveo wrote:
> I was going to do this as a ticket since it's so long, but edgwall likes
> to think of me as a spammer sometimes.

[OT] Even with your Preferences/General/Full name and Email address set?

> This was partly written as a
> response/suggestion to people posting on ticket 7490 as well, so
> includes suggestions.
>
> Shane
>
> ----
>
> I've recently reduced per request time for *simple* template generation
> from 150ms to 45ms on my laptop (dual 1.6ghz core duo 2GB, tested with
> ab against a simple template). I also reduced page load times (reported
> via google page load/firebug) to around a second (from 2-8 seconds in
> some cases). These timings are without any network overhead since I'm
> testing on the local machine. Times are all taken on my laptop with my
> typical environment/applications open (lots of crap).
>
> This is all done with 11.5.
>
> Even with the gains I still get 100% cpu spikes every request, but at
> least it's a shorter duration. Still digging further on that. I also
> have done a bit on overall browser page load.
>

That spike is probably occurring during Genshi's generate and render calls.

> Here's my current brain dump. If some of my conclusions are silly,
> please let me know why.
>
> == General ==
>
> In general there are only a couple big wins. For me it was removing
> gc.collect (see trac.main) and the timing and estimation plugin.
> Everything else was small potatoes in comparison (10ms here, 5ms there),
> but together they added up to a good 40-50ms per request. Think of it
> this way, using 100%cpu and 50ms/request limits you to a max of 20
> requests/second/cpu. Every ms counts if we want decent throughput. I'd
> like to get under 30ms.
>

The gc on every request is the typical memory vs. speed trade-off. If it
can be shown that despite not doing gc after every request, the memory
usage stays within bound, then I think we can make that optional. As you
said elsewhere, it's quite possible that this explicit gc simply hides a
real memory leak that can be avoided by other means (like fixing the db
pool issue with PostgreSQL).

Yes, py:match directives are real killers. At the time an extra py:match
was introduced for the theme support, I remember having measured a 10%
increase in the average time needed for any request.

> Make sure all static resources for all plugins are exported. trac-admin
> deploy will only deploy static resources for enabled plugins in that
> project.
>
> === javascript ===
>
> This is mostly about page load times.
>
> minify all the javascript output by trac-admin deploy, and make sure it
> gets cache headers via apache conf.
>
> Upgrade to jquery 1.3, it's much faster. trac trunk has done this, and
> there is a diff somewhere that shows what type of changes have to be
> made. You'd have to examine any plugins for similar jquery updates that
> need to be done.
>

Backport from trunk to 0.11-stable welcomed ;-)

With regard to mod_deflate, I'm not sure how an extra 100ms can explain
the reported difference in behavior.
IIUC, you're using mod_deflate without any trouble, and switching it off
doesn't make a difference?
Was that also the case for you with 0.11.4?

> === filters ===
>
> Going through all the match_request implementations and removing
> permission checking (put it in process_request), make sure regex matches
> are precompiled, and generally simplifying things helps. There are a
> number of those in trac core, but plugins are greater abusers in this

> area. Also examine any IRequestFilter use and simplify.\
>

Not sure if the advice of removing permission checking in match_request
is a good one.
If done after the path_info test, doing the permission check shouldn't
have a big impact and might be needed to enable the use of a fallback
handler.

> === trac.config ===
>
> Other than Genshi, profiling showed trac.config to be the single largest
> time on simple template generation. Profiling showed me that the get
> method in the Configuation class (trac.config.Configuration) was slow.
> I added caching there for a few extra milliseconds boost. I'm also
> looking at removing the auto-reload if the ini file changes, maybe using
> spread or memcached to create reload notifications, to get rid of file
> stats, but timing doesn't show this to be a large issue on my laptop.
>

Interesting, can you make a ticket out of that?

> === repository ===
>
> while I still want to remove the sync on every request (get rid of
> system calls/file stats), I have been unable to show that performance
> changes much when I test with it removed. There are still bigger fish
> to fry.
>

This will be addressed in the multirepos branch, I think we discussed
making a setting for this, in order to keep a backward compatible
behavior for the default repository.

> === database pool ===
>
> Watching the postgres log file, I can tell that a lot of cursors get
> created without being actually used to do a query. This shows up
> because psycopg2 executes a BEGIN when a cursor is created. I haven't
> yet looked into where that is happening, but it's extra work the system
> is doing for nothing.
>

That's probably also worth a ticket on its own, unless this could be
handled in #8443.

> === wiki ===
>
> The current wiki parser design is slow, doing the same large regex over
> each line. I think a general redesign to use re.finditer, rather than
> line split then re.match, would help improve performance here. However,
> post-caching the content would be better. I'm experimenting with
> partial caching of the wiki content and have reduced my request timing
> on WikiStart from 120ms to 75ms while still getting correct content.
> The patch I have doesn't cache macro's unless the macro arguments have
> 'cache' in them (good for page outline macro, which btw seems to
> re-parse the entire page, macros included). There may be other issues
> with the approach I have taken, I haven't tested it much. Once I get
> further, I might actually pre-process when the wiki page is saved and
> stash the result into the database, avoiding the parse in most requests.
>

Interesting as well, be sure to read the discussions about caching wiki
content (#7739 and #1216).

> == Genshi ==
>
> Use Genshi trunk, some performance gain there.
>
> With Genshi, one big win is turning off the whitespace filter, I think I
> got around 10ms on a small page, the gain would be larger on larger
> pages. I did that by patching trac.web.chrome where there is a call to
> stream.render, and adding strip_whitespace=False. While this increases
> the file size delivered slightly (and adds a lot of whitespace to the
> source view), that can be countered by mod_deflate. The namespace
> flattener is another slow filter, but it's necessary.
>
> Another win with Genshi is patching genshi.output.encode and getting rid
> of the iteration that calls _encode a bazillion times. Instead, just do:
>
> {{{
> out.write(_encode(u''.join(list(iterator))))
> }}}
>
> This is much faster.
>

The above was used before #6614, but this triggers the creation of a
huge list and huge unicode string. We could also revert to doing that
when `favor_speed_over_memory` is true.

> I should note, I also tried changing trac so that the output went
> straight through to the wsgi layer (rather than buffering it in a
> cStringIO object). While this improved the initial wait time
> dramatically, it actually slowed down the overall request time. I
> believe this slowdown was also due to the _encode calls inside the
> iteration over the stream. I'm still going to experiment with this more
> at some point.
>

Interesting, but this will break the keep-alive support for tracd
(introduced in 0.11.5) as you won't be able to get the Content-Length
this way (one solution to this would be to add support for chunked
transfer encoding to tracd).

> Overall, I like Genshi, now that I kind of get how it works it has a
> certain elegance, but I am left unconvinced that it's design and
> flexibility outweighs the performance issues, and the added difficulty
> in figuring them out (refer to profiling at the top).

Oh yes, same feeling here ;-)

> One way I'm
> thinking of working around this is to beef up the xmlrpc interfaces and
> do everything with ajax.
>

Maybe. Another idea would be to make more things "opaque" to Genshi, by
pre-rendering chunks that we know to be irrelevant to further
transformations. Typical example would be the diffs and the highlighted
lines in the source browser.

> == Misc ==
>
> === plugins ===
>
> I don't think most plugin authors know where they will generate
> performance penalties. It's hard to know all those details without
> spending a lot of time on it.
>
> generally pay attention to any plugins that use stream filters or
> request filters. These are called for every request and more often than
> not do some thing that is time intensive. Additionally, IMO, a request
> handlers match_request should never do anything beyond
> req.path_info.startswith(/mypath). Many match_request implementations
> do a lot more, which gets executed for every request.
>

Good tips, this should definitely go into the wiki somewhere
(TracDev/Performance?)

> === wysiwyg plugin ===
>
> this adds a large js file to *every* request whether useful or not. I
> removed the large script from the request filter, then I changed the
> load script to check and see if there is a textarea element in the dom,
> then load the larger script if necessary.
>

Again, thanks for working on this!

-- Christian

Shane Caraveo

unread,
Jul 22, 2009, 12:21:59 PM7/22/09
to trac...@googlegroups.com
On 7/20/09 10:35 PM, Remy Blank wrote:
> Shane Caraveo wrote:
>> I was going to do this as a ticket since it's so long, but edgwall likes
>> to think of me as a spammer sometimes. This was partly written as a
>> response/suggestion to people posting on ticket 7490 as well, so
>> includes suggestions.
>
> This is invaluable information, thanks! Profiling Trac was also on my
> to-do list for this summer, but you've done it better than I could have
> dreamed of doing it.
>
> I'd really like to keep this information close at hand. Could I convince
> you to keep your notes on the t.e.o wiki instead of (or in addition to)
> the mailing list?

Yeah, do I need a login?

> Also, any chance that you could document how to set up profiling so that
> other developers (also plugin developers) could set it up on their machine?
>

I'm actually cleaning up my wsgi wrapper that has various profiling and
debugging hooks. I can post that later this week. It could be a start
towards a developer debug/profile plugin, or an addition to the existing
extension developer plugin on trac-hacks.

Shane

Shane Caraveo

unread,
Jul 22, 2009, 12:48:11 PM7/22/09
to trac...@googlegroups.com
On 7/22/09 6:08 AM, Christian Boos wrote:
> Hello Shane,
>
> First, great job at looking into the bowels of Trac ;-)
> Then, as a general comment, I see that some of your suggestions actually
> go against some of the changes I did in #6614, so we have not
> surprisingly a trade-off of memory vs. speed. In some environments where
> memory is strictly restricted, we have no choice but to optimize for
> memory, at the detriment of speed. But in most environments, the extra
> memory needed for achieving decent speed might be affordable. So I see
> here the opportunity for a configuration setting, something like [trac]
> favor_speed_over_memory, defaulting to true, that people having low
> resources could turn off.

For the gc.collect item, I think it should be a configurable background
thread, rather than happening in the request loop. I've been meaning to
explore the memory use with the source browsing and timeline so I can
understand what is happening there, but haven't got around to it. For
the encode loop, I was hoping that sending the output straight through
would be a gain, I think there still might be some opportunity around
that idea.

Another thought, again from a background thread, monitor memory usage
and gc.collect at some threshold. Low memory environments will end up
doing this more often.

>>
>> Even with the gains I still get 100% cpu spikes every request, but at
>> least it's a shorter duration. Still digging further on that. I also
>> have done a bit on overall browser page load.
>>
>
> That spike is probably occurring during Genshi's generate and render calls.

Yeah, yesterday I added a cpu usage monitoring class that throws an
exception when the cpu gets high. It always happens deep in genshi
during the iterations through the stream.

>> Here's my current brain dump. If some of my conclusions are silly,
>> please let me know why.
>>
>> == General ==
>>
>> In general there are only a couple big wins. For me it was removing
>> gc.collect (see trac.main) and the timing and estimation plugin.
>> Everything else was small potatoes in comparison (10ms here, 5ms there),
>> but together they added up to a good 40-50ms per request. Think of it
>> this way, using 100%cpu and 50ms/request limits you to a max of 20
>> requests/second/cpu. Every ms counts if we want decent throughput. I'd
>> like to get under 30ms.
>>
>
> The gc on every request is the typical memory vs. speed trade-off. If it
> can be shown that despite not doing gc after every request, the memory
> usage stays within bound, then I think we can make that optional. As you
> said elsewhere, it's quite possible that this explicit gc simply hides a
> real memory leak that can be avoided by other means (like fixing the db
> pool issue with PostgreSQL).

out of site, out of mind ;)

>> Upgrade to jquery 1.3, it's much faster. trac trunk has done this, and
>> there is a diff somewhere that shows what type of changes have to be
>> made. You'd have to examine any plugins for similar jquery updates that
>> need to be done.
>>
>
> Backport from trunk to 0.11-stable welcomed ;-)

My patches are against .11-stable, I'll try and get a set of bugs with
patches up this week. I also patched a number of plugins, those are
just changing the attribute selector for jquery.


>> I have a sneaking suspicion (unproven) that people who use mod_python
>> and claim turning off keep alive and/or mod_deflate are having problems
>> due to gc.collect. As I understand apache filters (e.g. mod_deflate)
>> they wont finish up the request until the mod_python handler has
>> returned. So that extra time in gc.collect delays the request being
>> returned, which delays mod_deflate finishing. It also delays the start
>> of the next request over a persistent socket connection (ie. keep alive).
>>
>
> With regard to mod_deflate, I'm not sure how an extra 100ms can explain
> the reported difference in behavior.
> IIUC, you're using mod_deflate without any trouble, and switching it off
> doesn't make a difference?
> Was that also the case for you with 0.11.4?

It never did for me, when I read that I tried it right away. However,
if it does happen in some situations, gc.collect could be a contributor
to that. It's also possible that it is not mod_deflate alone, but
mod_deflate in combination with some other apache filter.

>> === filters ===
>>
>> Going through all the match_request implementations and removing
>> permission checking (put it in process_request), make sure regex matches
>> are precompiled, and generally simplifying things helps. There are a
>> number of those in trac core, but plugins are greater abusers in this
>> area. Also examine any IRequestFilter use and simplify.\
>>
>
> Not sure if the advice of removing permission checking in match_request
> is a good one.
> If done after the path_info test, doing the permission check shouldn't
> have a big impact and might be needed to enable the use of a fallback
> handler.

If I recall, quite often it's done prior to path matching. The other
item is the use of uncompiled regexs to match a path. I would say that
80% of plugins are simple and don't need to do more than a simple path
check and can reserve the permission check until process_request.

>> === trac.config ===
>>
>> Other than Genshi, profiling showed trac.config to be the single largest
>> time on simple template generation. Profiling showed me that the get
>> method in the Configuation class (trac.config.Configuration) was slow.
>> I added caching there for a few extra milliseconds boost. I'm also
>> looking at removing the auto-reload if the ini file changes, maybe using
>> spread or memcached to create reload notifications, to get rid of file
>> stats, but timing doesn't show this to be a large issue on my laptop.
>>
>
> Interesting, can you make a ticket out of that?

Sure, with a patch :) At least for the caching part which I have working.

>> === repository ===
>>
>> while I still want to remove the sync on every request (get rid of
>> system calls/file stats), I have been unable to show that performance
>> changes much when I test with it removed. There are still bigger fish
>> to fry.
>>
>
> This will be addressed in the multirepos branch, I think we discussed
> making a setting for this, in order to keep a backward compatible
> behavior for the default repository.
>
>> === database pool ===
>>
>> Watching the postgres log file, I can tell that a lot of cursors get
>> created without being actually used to do a query. This shows up
>> because psycopg2 executes a BEGIN when a cursor is created. I haven't
>> yet looked into where that is happening, but it's extra work the system
>> is doing for nothing.
>>
>
> That's probably also worth a ticket on its own, unless this could be
> handled in #8443.

I think it's a different issue from 8443. If someone does
env.get_cnx/cnx.cursor then does some other check and bails before using
the cursor, you get the extra transaction start/rollback.

>> === wiki ===
>>
>> The current wiki parser design is slow, doing the same large regex over
>> each line. I think a general redesign to use re.finditer, rather than
>> line split then re.match, would help improve performance here. However,
>> post-caching the content would be better. I'm experimenting with
>> partial caching of the wiki content and have reduced my request timing
>> on WikiStart from 120ms to 75ms while still getting correct content.
>> The patch I have doesn't cache macro's unless the macro arguments have
>> 'cache' in them (good for page outline macro, which btw seems to
>> re-parse the entire page, macros included). There may be other issues
>> with the approach I have taken, I haven't tested it much. Once I get
>> further, I might actually pre-process when the wiki page is saved and
>> stash the result into the database, avoiding the parse in most requests.
>>
>
> Interesting as well, be sure to read the discussions about caching wiki
> content (#7739 and #1216).

Yeah, I looked at those before doing my own thing. Doing a partial
cache requires being in the formatter, I don't think it can be done
properly as a generic solution via a plugin. Having some generic
memcache for sharing cache data between instances would be great, but I
think I like the idea of pre-processing the wiki text and stashing it in
the db, or fixing the wiki parser to be faster in combination with a
simple cache like I've implemented.

I'll try to get the time this week to get some more bugs/patches posted.

Shane

Shane Caraveo

unread,
Jul 23, 2009, 1:42:46 PM7/23/09
to trac...@googlegroups.com
On 7/20/09 10:35 PM, Remy Blank wrote:

> Also, any chance that you could document how to set up profiling so that
> other developers (also plugin developers) could set it up on their machine?

I just put my wsgi code into a bug at
http://trac.edgewall.org/ticket/8507

This includes a couple methods for profiling as well as a DBGP harness
for debugging. I've also got a garbage collection background thread in
there.

Shane

Shane Caraveo

unread,
Jul 23, 2009, 2:13:53 PM7/23/09
to trac...@googlegroups.com
On 7/22/09 6:08 AM, Christian Boos wrote:
>> > === wiki ===
>> >
>> > The current wiki parser design is slow, doing the same large regex over
>> > each line. I think a general redesign to use re.finditer, rather than
>> > line split then re.match, would help improve performance here. However,
>> > post-caching the content would be better. I'm experimenting with
>> > partial caching of the wiki content and have reduced my request timing
>> > on WikiStart from 120ms to 75ms while still getting correct content.
>> > The patch I have doesn't cache macro's unless the macro arguments have
>> > 'cache' in them (good for page outline macro, which btw seems to
>> > re-parse the entire page, macros included). There may be other issues
>> > with the approach I have taken, I haven't tested it much. Once I get
>> > further, I might actually pre-process when the wiki page is saved and
>> > stash the result into the database, avoiding the parse in most requests.
>> >
>
> Interesting as well, be sure to read the discussions about caching wiki
> content (#7739 and #1216).
>

I've attached (to #1216) the cache idea I am playing with right now for
others to see.

Shane

Remy Blank

unread,
Jul 23, 2009, 4:11:27 PM7/23/09
to trac...@googlegroups.com
Shane Caraveo wrote:
> On 7/20/09 10:35 PM, Remy Blank wrote:
>> I'd really like to keep this information close at hand. Could I convince
>> you to keep your notes on the t.e.o wiki instead of (or in addition to)
>> the mailing list?
>
> Yeah, do I need a login?

I don't think so, just create a new page, probably below TracDev/.

-- Remy

signature.asc

Leonardo Santagada

unread,
Jul 23, 2009, 10:32:00 PM7/23/09
to trac...@googlegroups.com

[snip]


>>> == General ==
>>>
>>> In general there are only a couple big wins. For me it was removing
>>> gc.collect (see trac.main) and the timing and estimation plugin.
>>> Everything else was small potatoes in comparison (10ms here, 5ms
>>> there),
>>> but together they added up to a good 40-50ms per request. Think
>>> of it
>>> this way, using 100%cpu and 50ms/request limits you to a max of 20
>>> requests/second/cpu. Every ms counts if we want decent
>>> throughput. I'd
>>> like to get under 30ms.
>>>
>>
>> The gc on every request is the typical memory vs. speed trade-off.
>> If it
>> can be shown that despite not doing gc after every request, the
>> memory
>> usage stays within bound, then I think we can make that optional.
>> As you
>> said elsewhere, it's quite possible that this explicit gc simply
>> hides a
>> real memory leak that can be avoided by other means (like fixing
>> the db
>> pool issue with PostgreSQL).
>
> out of site, out of mind ;)

Why should trac even use the gc manually? The only answer would be to
free db pool conections, and that is better fixed by itself (but I
dunno if it is possible). Leaking memory is not going to be fixed by
calling gc.collect. The garbage collector automatically does a
gc.collect and its parameters can be set on a trac.wsgi script if some
user really needs it.

I really would like to understand why trac calls it at every request.

--
Leonardo Santagada
santagada at gmail.com

Christian Boos

unread,
Jul 25, 2009, 2:12:33 AM7/25/09
to trac...@googlegroups.com
Hello Leonardo,

Theoretically, yes.

> I really would like to understand why trac calls it at every request.
>

If you're curious about this, read http://trac.edgewall.org/ticket/6614.
Executive summary: at the early days of 0.11dev, Trac displayed some
pathological memory usage. Calling the gc after every request allowed
Trac memory usage to stay within bound while otherwise the memory would
grow to a few gigabytes before the process would run out of memory.
Something that "shouldn't happen", but it did. Finally a few Python bugs
were uncovered, but the explicit gc stayed, as the memory usage was
still much more conservative with explicit gc rather than without.

Also refer to my other reply to Shane in the "[Trac-dev] performance and
garbage collection" thread. You could see there some recent numbers
about the memory usage with and without explicit gc, on Windows. The
difference is not that dramatic these days as it used to be, but the
most pathological results in #6614 were obtained on Linux anyway, so the
experiment should be tried on that platform as well.

-- Christian

Reply all
Reply to author
Forward
0 new messages