Things were very quiet from my side these past months, but I intend to progressively come back to a more intensive participation.
Among the first steps I'd like to take would be to work towards a 1.0beta1 release (as it's going to be 1.0 and not 0.13, remember? see [1] if not ;-) )
We have a pretty good state on trunk, and though it feels a bit unfinished here and there, I think it's at least as good as 0.12-stable stability wise, so why not get more exposure for the new features and APIs? We can always make things a bit better and more polished in a later 1.0.x maintenance release.
One big reorganization that I think would still be important to do even before the beta1 is the move of the svn support to tracopt. Now that we have Git support there, there's no reason to keep svn directly in trac/versioncontrol (see [2]). I think we can leave around a few imports for backward compatibility. I created #10712 for that.
Besides that, I'll go through some of the pending patches that can finished soon, then upgrade the version numbers, prepare the beta release and get some feedback from the field.
And then, what I'd really like to finish before the final 1.0 are the other tickets listed in TracDev/ToDo#Trac1.0.
Christian Boos wrote:
> One big reorganization that I think would still be important to do even > before the beta1 is the move of the svn support to tracopt. Now that we > have Git support there, there's no reason to keep svn directly in > trac/versioncontrol (see [2]). I think we can leave around a few
> imports for backward compatibility. I created #10712 for that.
IMO the current PyGIT isn't really suitable for a 1.0 release. :)
I've started work on a pygit2 based replacement, but am in the middle
of an interrupt storm at the moment, keeping me busy elsewhere. I'd
be happy if someone else wants to also work with this.
It would be very nice to have it included at least as an alternative.
Also, while doing this, I've understood just how badly the Git data
model fits into Trac's Repository data model. :\
The get_youngest_rev(), previous_rev() and child_revs() Repository
operations are very very expensive; they require traversing the
entire commit graph!
> Christian Boos wrote:
>> One big reorganization that I think would still be important to do even
>> before the beta1 is the move of the svn support to tracopt. Now that we
>> have Git support there, there's no reason to keep svn directly in
>> trac/versioncontrol (see [2]). I think we can leave around a few
>> imports for backward compatibility. I created #10712 for that.
> IMO the current PyGIT isn't really suitable for a 1.0 release. :)
Well, we can label the git support as "experimental", like we did for the Mercurial support until it got improved to the point it is now able to handle huge repositories reasonably well. Note that you'll get similar performance issues for big Subversion repositories, as our svn support is also not lightning fast... This, as well as the numerous other known performance weak points (*) won't prevent us to make a 1.0 release, as we're mainly doing it 1) for having a fresh start for a new release schedule (i.e. doing regular releases for both maintenance and development branches) and 2) to acknowledge the fact that after many years of continuous improvement, it's now more than reasonably stable and solid. We don't use 1.0 to mean it's perfect on all accounts (that will be 2.0! ;-) ).
> I've started work on a pygit2 based replacement, but am in the middle
> of an interrupt storm at the moment, keeping me busy elsewhere. I'd
> be happy if someone else wants to also work with this.
I'd be happy to have a look, and experiment a bit. A fork and a few instructions about how to set up the dependencies would be a good start (as well as the TracDev/Performance/Git page I mentioned in #10594).
> It would be very nice to have it included at least as an alternative.
Sure, but that will have to wait for whichever Trac 1.1.x or 1.3.x release will be current when this effort is ready.
> Also, while doing this, I've understood just how badly the Git data
> model fits into Trac's Repository data model. :\
True.
> The get_youngest_rev(), previous_rev() and child_revs() Repository
> operations are very very expensive; they require traversing the
> entire commit graph!
That was pretty much my point in our little discussion in #10594.
I'm not convinced that simply switching the way we use git (command-line vs. library bindings) will be enough to address the performance issues; it's rather rethinking carefully how to access the information, when and what to cache, etc.
Christian Boos wrote:
>> IMO the current PyGIT isn't really suitable for a 1.0 release. :)
> Well, we can label the git support as "experimental"
Please include a prominent note about the moderate performance,
and/or mention that PyGIT works by forking and execing many git
processes per web request.
>> I've started work on a pygit2 based replacement
> I'd be happy to have a look, and experiment a bit.
> (as well as the TracDev/Performance/Git page I mentioned in #10594).
As I wrote I think it's way premature to do any Git-specific
performance discussion as long as PyGIT is still used.
>> It would be very nice to have it included at least as an alternative.
> Sure, but that will have to wait for whichever Trac 1.1.x or 1.3.x
> release will be current when this effort is ready.
Fair enough!
>> The get_youngest_rev(), previous_rev() and child_revs() Repository
>> operations are very very expensive; they require traversing the
>> entire commit graph!
> That was pretty much my point in our little discussion in #10594.
> I'm not convinced that simply switching the way we use git (command-line > vs. library bindings) will be enough to address the performance issues;
You know that fork and exec are expensive, right? It's not cool to
have web software work like this in 2012. It was halfway OK in 1995,
but not anymore. :)
Also, "simply" is downplaying PyGIT. Because it uses git commands it
remains limited to what git commands can do, where the Git data model
isn't very well exposed, nor in a manner which is very useful. It's
all sorts of wrong.
> it's rather rethinking carefully how to access the information, when
> and what to cache, etc.
I very much welcome this effort to review the Trac Repository data
model! But I have zero expectation that it will result in substantial
code changes within say the next few months.
Native repo access is on the other hand a bite sized problem, and
will certainly have noticeable impact on performance. Hopefully you
and others will turn the Repository interface inside out in parallel
with pygit2 work, to get even more out of Trac in the end! :)
I would like to comment from the sidelines. At the Haiku project we
run one of these huge Git repositories, and for us currently the
trac-gitplugin is unusable. I myself have done some work and research
into getting insight into why this is, I have found that the main
issue is that the information that Trac wants to display is not by any
means efficiently retrievable due to the way the Git data model is
designed.
On Tue, Jun 5, 2012 at 3:10 AM, Peter Stuge <pe...@stuge.se> wrote:
>>> The get_youngest_rev(), previous_rev() and child_revs() Repository
>>> operations are very very expensive; they require traversing the
>>> entire commit graph!
>> That was pretty much my point in our little discussion in #10594.
>> I'm not convinced that simply switching the way we use git (command-line
>> vs. library bindings) will be enough to address the performance issues;
> You know that fork and exec are expensive, right? It's not cool to
> have web software work like this in 2012. It was halfway OK in 1995,
> but not anymore. :)
> Also, "simply" is downplaying PyGIT. Because it uses git commands it
> remains limited to what git commands can do, where the Git data model
> isn't very well exposed, nor in a manner which is very useful. It's
> all sorts of wrong.
I can concur. In my own experiments [1] I have tried to use Dulwich
[2] which is a pure-python implementation of the Git data model, and a
part of the git command set.
The biggest problem is with file history. For example, the simple tree
view that the source browser has now already poses problems. This is
because to get the commit revision in which the file was last changed,
you need to traverse all the commits (starting from the top) up to the
commit it was last changed. In the Haiku root we have one file that
was last changed about 30.000 commits ago. This means that in an
uncached universe the server is always processing the commits for each
and every file in the source browser!
>> it's rather rethinking carefully how to access the information, when
>> and what to cache, etc.
> I very much welcome this effort to review the Trac Repository data
> model! But I have zero expectation that it will result in substantial
> code changes within say the next few months.
> Native repo access is on the other hand a bite sized problem, and
> will certainly have noticeable impact on performance. Hopefully you
> and others will turn the Repository interface inside out in parallel
> with pygit2 work, to get even more out of Trac in the end! :)
While I agree that improving the responsiveness of the back-end might
improve the situation for small and mid-sized repositories, it is not
a sustainable solution for the future. After all: small repositories
(hope to) grow big!
One part of the solution is to improve caching. There are two ways to
do this. In trac-dulwich I experimented with doing a full cache, which
means reproducing the structure of the repository in a way that Trac
can easily get information that it needs. While I nowhere got to a
full solution (I did not get to caching the relations between files),
I think this might not be a good idea in the end. For merely caching
all file and directory revisions in the Haiku repository, I now have a
sqlite database of 192 MB. Imagine that all the relations between file
revisions are also stored in there.
I am starting to see more potential in an alternative kind of caching,
at a higher level. I looked at cgit as an example[3][4]. cgit is
extreme in its caching: it basically caches the html output which I
doubt is acceptable for Trac itself. However, I do see potential for
the versioncontrol module to cache data structures that have been
requested. This way, whenever a request is repeated, and the
underlying datamodel did not change, then the cached data can be
fetched instead.
Another change, at the higher level, would be to have incremental
loading for some operations that we know are expensive. For now I can
think of only one, showing a path history. Basically, load a frame
page, after which the history would be incrementally loaded. However,
this cannot be a full replacement for caching.
I'm glad I'm not the only one trying to wrap my head around this
problem. My next step personally would be to stop with the 'cache
everything' strategy (for now) and go on to the more intelligent
higher level caching. I could use some ideas and input for that.
Christian Boos wrote:
> Besides that, I'll go through some of the pending patches that can > finished soon, then upgrade the version numbers, prepare the beta > release and get some feedback from the field.
I have just updated the version numbers.
> And then, what I'd really like to finish before the final 1.0 are the > other tickets listed in TracDev/ToDo#Trac1.0.
I have triaged my remaining tickets for 1.0 (and fixed a few) this
week-end, and I'm down to the following:
Patch refreshed and ready for testing. Unfortunately, I still don't have
a setup to test it. I will try to set this up this week, but I won't be
angry if anyone beats me to it :) It's very low risk, as the code won't
be active if `[trac] use_xsendfile = false` (the default).
So from my side, all I need is some review and testing, and I'm done. I
see there are currently 58 tickets assigned to milestone 1.0. I have
created a temporary milestone "1.0-triage" and moved most of the tickets
with no recent activity there, for later triage.
To prepare for 1.0, I suggest we keep only those tickets on 1.0 that we
know we will fix before the release. This leaves us with 15 tickets
(including the 3 above). I'd like to ask each of the owners to go
through your tickets in 1.0 and move those that you don't think you can
fix within a week or two to a different milestone. You may also want to
browse through 1.0-triage to make sure I didn't move any important ones
out. But please only move tickets to 1.0 if you are confident that you
will fix them shortly.
This should leave us with very few tickets, and we can hope to fix them
all in a short time, therefore opening the way to a 1.0beta1 release in
the very near future.
Oh, and one more thing: batch modifications rock! Thanks Peter for
integrating this very useful feature.
> Patch refreshed and ready for testing. Unfortunately, I still don't have
> a setup to test it. I will try to set this up this week, but I won't be
> angry if anyone beats me to it :) It's very low risk, as the code won't
> be active if `[trac] use_xsendfile = false` (the default).
Nginx supports exactly the same thing, but using a X-Accel-Redirect
header with slightly different features.
Yes, I know, we discussed this on the ticket. We may support it at some
point, but it's less trivial with nginx, and currently we're trying to
close the feature set for the release, rather than adding new features.
So not for this release, sorry.
Please, before you step to 1.0 could you insert a project_id column to most of the database tables? This way it would be more easy to develop multi-project plugins.
At present, my team-mate and me are going on with implementing that TracMultipleProjects/SingleEnvironment approach in http://trac-hacks.org/wiki/SimpleMultiProjectPlugin. Because of the missing column we had to add all those additional mapping tables project<->{resource}.
> I'd like to ask each of the owners to go
> through your tickets in 1.0 and move those that you don't think you can
> fix within a week or two to a different milestone.
Sure. (Cheating a bit with #10594 but I really think it makes sense to close it.)
> You may also want to
> browse through 1.0-triage to make sure I didn't move any important ones
> out. But please only move tickets to 1.0 if you are confident that you
> will fix them shortly.
I think the patches are good enough to be applied as they are. I could probably do that before the release.
But since the patches are kind of large, we could also wait until after the release, in case anyone wants to review, test or improve the patches some more.
> Oh, and one more thing: batch modifications rock! Thanks Peter for
> integrating this very useful feature.
If it helped you prepare for Trac 1.0, it has already been well worth it. :)
Peter Suter wrote:
>> I'd like to ask each of the owners to go
>> through your tickets in 1.0 and move those that you don't think you can
>> fix within a week or two to a different milestone.
> Sure. (Cheating a bit with #10594 but I really think it makes sense to > close it.)
I don't think it's so cool to close a ticket as fixed just because
you don't know enough to fix it. :\
Peter Suter wrote:
> Sure. (Cheating a bit with #10594 but I really think it makes sense to > close it.)
The post-commit script isn't a release blocker anyway, as we can provide
it through other channels if needed. Feel free to file a new ticket
about it already now, so that we don't forget. Of course, if someone
wants to contribute a script now, to be included in 1.0, even better.
> I think the patches are good enough to be applied as they are. I could > probably do that before the release.
> But since the patches are kind of large, we could also wait until after > the release, in case anyone wants to review, test or improve the patches > some more.
I hesitated for a long time with #1942. As Christian mentioned, current
trunk is very stable and has been for a long time. The patches in #1942
are sufficiently large and invasive that I'm not confident that they
won't affect this stability. I would rather apply them shortly after the
release, early in the development cycle, so that they get some exposure
on trunk and any issues get ironed out.
> The post-commit script isn't a release blocker anyway, as we can provide
> it through other channels if needed. Feel free to file a new ticket
> about it already now, so that we don't forget. Of course, if someone
> wants to contribute a script now, to be included in 1.0, even better.
Right, I created #10730.
> I hesitated for a long time with #1942. As Christian mentioned, current
> trunk is very stable and has been for a long time. The patches in #1942
> are sufficiently large and invasive that I'm not confident that they
> won't affect this stability. I would rather apply them shortly after the
> release, early in the development cycle, so that they get some exposure
> on trunk and any issues get ironed out.
Christian Boos wrote:
> And then, what I'd really like to finish before the final 1.0 are the > other tickets listed in TracDev/ToDo#Trac1.0.
Short status update: we're down to 6 remaining tickets, 3 of which have
patches attached. I'm looking forward to seeing these patches land on
trunk, and to finishing the other 3!
Christian, are you sure you want to move SVN support to tracopt? If so,
I can do the work, so that you get a bit more time to work on the other
tickets (yes, they're all yours ;). Did you have anything special in
mind about backward compatibility? Keep stub modules in the old
locations, and import the global namespace from the new modules?
> Christian Boos wrote:
>> And then, what I'd really like to finish before the final 1.0 are the
>> other tickets listed in TracDev/ToDo#Trac1.0.
> Short status update: we're down to 6 remaining tickets, 3 of which have
> patches attached. I'm looking forward to seeing these patches land on
> trunk, and to finishing the other 3!
Yes! Great work from you, and also thanks to Jun and Peter who have been responsive.
> Christian, are you sure you want to move SVN support to tracopt?
Yes, I think the switch to 1.0 is also a good time to show that svn has no longer a "privileged" place within Trac, and is just one (optional) system supported among others.
> If so,
> I can do the work, so that you get a bit more time to work on the other
> tickets (yes, they're all yours ;).
Thanks! Indeed haven't started anything in this area, so your help will be welcome.
> Did you have anything special in
> mind about backward compatibility? Keep stub modules in the old
> locations, and import the global namespace from the new modules?