Version control refactoring suggestions

12 views
Skip to first unread message

Peter Dimov

unread,
Jan 29, 2007, 10:11:31 AM1/29/07
to trac...@googlegroups.com
Hello everybody,

A main assumption in current implementation is that a revision is global
to the whole repository. This is however not the case with bazaar and
other distributed version control systems.

Additionally there is some unnecessary duplication of functionality in
the current api. For example, there are two ways to create a changeset –
either from a Changeset.get_changes() from a Repository.get_changes().
In the first case path restriction is implemented by Trac, whereas in
the second it is implemented by the version control backend. This could
surely be avoided.

Let me do a summary of the functionality we have:

changeset – displays all / some changes committed in a single revision
diff – displays all / some changes committed over multiple revisions,
possibly between different paths
log – lists all revision that modify a given path
source – lists the inventory at a given revision

Suggestions:

1. Since a changeset is a special diff, I’d suggest that they share
implementation in class Changeset.

2. Also, revision is currently just a string and that is insufficient
(e.g. to better implement bzr support) and IMHO we’d better make a class
out of it. A revision would be a snapshot of (part of) the repository at
the time of a commit. Let that be the class Revision.

3. A Revision object could be extracted from a Repository from a
revision number (currently hex allowed) and path (to support bzr like
systems). The reason is that revno 100 at path1 would be different from
revno 100 at path2.

4. A Changeset would then be the difference between two nodes. A node is
a file or dir at a specific revision (as it is now). A changeset
(current implementation) would be the difference between '/' at revno 99
and '/' at revno 100.

To summarize, here are the classes with some of their methods:

Repository
get_revision (path, rev)
get_revisions (start, end)

Revision
properties: revno, message, time, author, etc.
get_node (path)
Changeset
create (node1, node2)
get_changes()

I plan to try this out with the bzr backend (Aaron Bentley's trac+bzr)
and would really appreciate if somebody gives me feedback, or even
better, works in parallel on the svn backend.

Regards,

Peter

Thomas Moschny

unread,
Jan 29, 2007, 4:50:42 PM1/29/07
to trac...@googlegroups.com
On Monday 29 January 2007 16:11, Peter Dimov wrote:
> A main assumption in current implementation is that a revision is global
> to the whole repository. This is however not the case with bazaar and
> other distributed version control systems.

Really? While I am not an bzr user, I thought that each branch in bzr has its
own revision numbering. Thus supporting multiple locations (with independent
revision numbering) really means supporting multiple repositories (see
#2086), which, in the case of bzr, means supporting multiple branches, no?

A more severe problem for vc backends other than svn is imho the fact that the
vc api currently only supports a linear history, which means that (from
Trac's point of view) each revision can only have a single ancestor. This is
not the case e.g. for Monotone or Mercurial, and maybe others.

- Thomas

--
Thomas Moschny <thomas....@gmx.de>

Christian Boos

unread,
Jan 29, 2007, 5:44:03 PM1/29/07
to trac...@googlegroups.com
Hello Peter,

Peter Dimov wrote:
> A main assumption in current implementation is that a revision is global
> to the whole repository. This is however not the case with bazaar and
> other distributed version control systems.
>

Well, on one end of the spectrum, some dvcs use SHA1 hashes as revision
numbers, so that they're globally unique for practical purposes. At the
other end of the spectrum, traditional systems like CVS have a version
number which is only meaningful at the level of a given file. So bazaar
can be seen somewhere in between, IIUC.

Even if CVS is of no immediate interest to us (there's an Request-a-Hack
for it on TracHacks, though!), there's the multi-repository support that
would bring similar requirements. A revision 100 in repository A has
nothing in common with the revision 100 in repository B, of course. One
way to implement multi-repository support would be to have a virtual
top-level folder containing a virtual folder for each repository. In
this scenario, it also makes sense to require the path and the revision
number at the same time in order to identify a specific revision.

(I've just seen Thomas' mail pointing out the same similarity)

Another scenario would be for getting some convergence with Trac's own
versioned resource. Similar to CVS, we also need to combine path
information (the resource realm + id) and version number in order to
identify a specific revision of a resource (e.g. version 15 of
wiki:WikiStart). Getting some convergence between the resource
manipulation API and the versioncontrol API would be interesting to get
some algorithms (e.g. annotate) work either on vc nodes or on versioned
resources.

> Additionally there is some unnecessary duplication of functionality in
> the current api. For example, there are two ways to create a changeset –
> either from a Changeset.get_changes() from a Repository.get_changes().
> In the first case path restriction is implemented by Trac, whereas in
> the second it is implemented by the version control backend. This could
> surely be avoided.
>

Well, here despite of the same name and similar output, the actions are
quite different, so I'm not sure it's a good idea to merge the two - as
in a merged implementation, we would probably have to differentiate both
situations and have two distinct code paths anyway. But I'm open, and we
could try out to see how it would look like.

> Let me do a summary of the functionality we have:
>
> changeset – displays all / some changes committed in a single revision
> diff – displays all / some changes committed over multiple revisions,
> possibly between different paths
> log – lists all revision that modify a given path
> source – lists the inventory at a given revision
>
> Suggestions:
>
> 1. Since a changeset is a special diff, I’d suggest that they share
> implementation in class Changeset.
>

see reservation above for point 1.

> 2. Also, revision is currently just a string and that is insufficient
> (e.g. to better implement bzr support) and IMHO we’d better make a class
> out of it. A revision would be a snapshot of (part of) the repository at
> the time of a commit. Let that be the class Revision.
>

ok ...

> 3. A Revision object could be extracted from a Repository from a
> revision number (currently hex allowed) and path (to support bzr like
> systems). The reason is that revno 100 at path1 would be different from
> revno 100 at path2.
>

Right, I see this as the major improvement proposed here.

> 4. A Changeset would then be the difference between two nodes. A node is
> a file or dir at a specific revision (as it is now). A changeset
> (current implementation) would be the difference between '/' at revno 99
> and '/' at revno 100.
>

See reservation above. Actually, Trac used to implement things that way
a long time ago, but - for Subversion at least, you loose important
information if you treat the changeset 100 as being simply "the
difference between '/' at revno 99 and '/' at revno 100". This is why we
switched to using a dedicated operation for retrieving the changeset
changes (svn_repos_replay) and only later we reintroduce the use of the
general diff operation (svn_repos_dir_delta) for arbitrary diffs. So at
least for Subversion (and Mercurial as well), we would have to have a
dedicated implementation for the two different situations.

Also they offer a fundamental distinction w.r.t. to the cache behavior:
the changeset corresponding to a revision could be cached (as there are
a finite number of them), the others would always have to be generated
dynamically (as there's a huge number of combinations). I'm actually
quite sure that in the end, the generalized diffs could be generated
*from the cache*. And by the way, that's also something I'd like that we
put some thought in, as I see a great potential to do most operations
through the cache. This would benefit _all_ vc backends, as they would
mainly have to provide revision changeset information, and little more
(retrieving the actual content of a node, getting the node properties...
such things).

> To summarize, here are the classes with some of their methods:
>
> Repository
> get_revision (path, rev)
>

Yes, this seems to be the right thing to do...

> get_revisions (start, end)
>
> Revision
> properties: revno, message, time, author, etc.
> get_node (path)
>

Yes, plus there should be a Revision.get_changeset(path) method, in
order to get the full list of changes if path is empty or the
"restricted changeset" if not, which corresponds to this revision.

Then, following-up to what Thomas wrote, there should be some revision
navigation methods for this class as well: perhaps children() /
parents() revisions, as a generalization of the linear next_revision /
previous_revision we have now (see #1492).
Likewise, the Repository would have to extended in order to give the
revision "roots" (instead of oldest_revision) and the revision "heads"
(instead of the youngest_revision).

> Changeset
> create (node1, node2)
> get_changes()
>
> I plan to try this out with the bzr backend (Aaron Bentley's trac+bzr)
> and would really appreciate if somebody gives me feedback, or even
> better, works in parallel on the svn backend.
>

Great! The more Trac hackers, the better Trac in the end ;-)
Thank you very much for your interest in making Trac better!

-- Christian

Peter Dimov

unread,
Jan 29, 2007, 7:56:26 PM1/29/07
to trac...@googlegroups.com
Thomas Moschny wrote:
> On Monday 29 January 2007 16:11, Peter Dimov wrote:
>> A main assumption in current implementation is that a revision is global
>> to the whole repository. This is however not the case with bazaar and
>> other distributed version control systems.
>
> Really? While I am not an bzr user, I thought that each branch in bzr has its
> own revision numbering.

Actually each bzr revision has a GUID. The same revision could be in
multiple branches under a different revno. Each bazaar branch does have
its own numbering.

> Thus supporting multiple locations (with independent
> revision numbering) really means supporting multiple repositories (see
> #2086), which, in the case of bzr, means supporting multiple branches, no?

Standalone branches have their own repository. Bazaar also has the
concept of a shared repository where multiple branches share the same
revision store. With a shared repository when you branch within the
repository you don't copy its revisions. You just refer to the ones in
the shared store. If you branch to a standalone branch you need to copy
all revisions.

trac+bzr does support multiple repositories. I'm currently using a
single one but I'll switch to the setup Aaron Bentley proposed me,
namely giving each developer their own repository to avoid issues with
umasks and access rights.

> A more severe problem for vc backends other than svn is imho the fact that the
> vc api currently only supports a linear history, which means that (from
> Trac's point of view) each revision can only have a single ancestor. This is
> not the case e.g. for Monotone or Mercurial, and maybe others.

bzr keeps a linear revision history. Thus revious_rev would always work.
When you are at revision 50 and you merged another branch in, you
commit revision 51. The previous revision of 51 is 50 but if you look at
the changeset 51 you'll see it has two parent trees. You can browse to
the other one from there.

Regards,

Peter

Peter Dimov

unread,
Jan 29, 2007, 9:07:05 PM1/29/07
to trac...@googlegroups.com
Hi Christian,

Thanks for the detailed reply and interesting comments!

Christian Boos wrote:
> Well, on one end of the spectrum, some dvcs use SHA1 hashes as revision
> numbers, so that they're globally unique for practical purposes. At the
> other end of the spectrum, traditional systems like CVS have a version
> number which is only meaningful at the level of a given file. So bazaar
> can be seen somewhere in between, IIUC.

As I already wrote in my reply to Thomas, bzr does have GUIDs as
revision IDs. The same revision may be found under different revno in
different branches. I find revnos much more user friendly than SHA1
revids and the path gives some additional information too. I find
cahngeset:100/path/to/branch better much much better than
changeset:1ADC452...

If you browse revision 100 of branch, you could see some file.txt having
revision 44 because it was last changed back then. Of course if you
request file.txt from revision 100 you'll get the same as file.txt are
revision 44 ... I guess that's why you say it is somewhere in between
but... isn't that what svn is doing too?

> Another scenario would be for getting some convergence with Trac's own
> versioned resource. Similar to CVS, we also need to combine path
> information (the resource realm + id) and version number in order to
> identify a specific revision of a resource (e.g. version 15 of
> wiki:WikiStart). Getting some convergence between the resource
> manipulation API and the versioncontrol API would be interesting to get
> some algorithms (e.g. annotate) work either on vc nodes or on versioned
> resources.

That's an interesting idea and I see no reason this would not work.

>> Additionally there is some unnecessary duplication of functionality in
>> the current api. For example, there are two ways to create a changeset –
>> either from a Changeset.get_changes() from a Repository.get_changes().
>> In the first case path restriction is implemented by Trac, whereas in
>> the second it is implemented by the version control backend. This could
>> surely be avoided.
>>
>
> Well, here despite of the same name and similar output, the actions are
> quite different, so I'm not sure it's a good idea to merge the two - as
> in a merged implementation, we would probably have to differentiate both
> situations and have two distinct code paths anyway. But I'm open, and we
> could try out to see how it would look like.

For example it should be possible to reimplement the function that
restrict the path of a changeset. That could be done much more
efficiently by using the bzr api. I'm not really sure about that, but I
think it would be possible to generate the diff using the bzr api too.

I think I now see what is different between the two functions.
Repository.get_changes() is supposed to work even on totally unrelated
paths and not only on different revisions of the same path. I think that
if the paths are unrelated then trac should take care of creating the
diff. Trac would get the two nodes and run the diff on them. But the
changeset path restriction should be left to the version control
implementation. A default implementation could of course be provided in
the base Changeset.

> See reservation above. Actually, Trac used to implement things that way
> a long time ago, but - for Subversion at least, you loose important
> information if you treat the changeset 100 as being simply "the
> difference between '/' at revno 99 and '/' at revno 100". This is why we
> switched to using a dedicated operation for retrieving the changeset
> changes (svn_repos_replay) and only later we reintroduce the use of the
> general diff operation (svn_repos_dir_delta) for arbitrary diffs. So at
> least for Subversion (and Mercurial as well), we would have to have a
> dedicated implementation for the two different situations.

Maybe the above suggestion is applicable to svn and mercurial (i.e. trac
to do arbitrary diffs and vcs to retrieve changeset changes)?

> Also they offer a fundamental distinction w.r.t. to the cache behavior:
> the changeset corresponding to a revision could be cached (as there are
> a finite number of them), the others would always have to be generated
> dynamically (as there's a huge number of combinations). I'm actually
> quite sure that in the end, the generalized diffs could be generated
> *from the cache*. And by the way, that's also something I'd like that we
> put some thought in, as I see a great potential to do most operations
> through the cache. This would benefit _all_ vc backends, as they would
> mainly have to provide revision changeset information, and little more
> (retrieving the actual content of a node, getting the node properties...
> such things).

Caching provided by trac would be a really great thing! I've looked at
the svn cache repository and was thinking how I could reuse it for
trac+bzr. In the perfect scenario the vcs backend wouldn't care about
caching and have Trac handle all the details.

Restricting what should be cached shouldn't be a big deal. The Changeset
caching decorator would take the decision what is forwarded to the
backend and what is read from the cache. We would only cache changesets
where new is specified and all the others (new_path, old, old_path) are
None.

>> Revision
>> properties: revno, message, time, author, etc.
>> get_node (path)
>>
>
> Yes, plus there should be a Revision.get_changeset(path) method, in
> order to get the full list of changes if path is empty or the
> "restricted changeset" if not, which corresponds to this revision.

Yes, that's great!

> Then, following-up to what Thomas wrote, there should be some revision
> navigation methods for this class as well: perhaps children() /
> parents() revisions, as a generalization of the linear next_revision /
> previous_revision we have now (see #1492).

Getting the parent trees is done already. I'm not sure it would be
possible to get the children though. A branch wouldn't know about its
children unless some of them are merged back in. I'll have to check that
with Aaron.

> Likewise, the Repository would have to extended in order to give the
> revision "roots" (instead of oldest_revision) and the revision "heads"
> (instead of the youngest_revision).

You are generally right but... I don't see much sense of having this
youngest_rev altogether. I mean, it is used to find out whether the
revision you have is the oldest one

if rev == youngest_rev:

We'd better use

if rev.next_rev:

or something like that. I think we should refactor youngest_rev away.
What do others think?

Looking forward to your comments!

Regards,

Peter

Thomas Moschny

unread,
Jan 30, 2007, 4:14:09 AM1/30/07
to trac...@googlegroups.com
On Tuesday 30 January 2007 01:56, Peter Dimov wrote:
> Standalone branches have their own repository. Bazaar also has the
> concept of a shared repository where multiple branches share the same
> revision store. With a shared repository when you branch within the
> repository you don't copy its revisions. You just refer to the ones in
> the shared store. If you branch to a standalone branch you need to copy
> all revisions.

We might have a problem using different terms for different things here.

A subversion repository in fact only represents one real branch (in bzr
terms). Subversion 'branches' are only a convention, they are not first class
objects. There is nothing in the repository representing the concept of a
branch (or a tag). As trac was and is tightly connected to subversion, it
currently also supports only one branch.

As such, there is per se no problem in supporting bzr and bzr revisions, as
long as you use only one bzr branch.

Now, if trac had support for using multiple repositories, it could also
support multiple bzr-branches. Whether these bzr-branches are standalone or
compressed together in a shared repository, is imho not really relevant to
trac, it's only an implementation detail.

> > A more severe problem for vc backends other than svn is imho the fact
> > that the vc api currently only supports a linear history, which means
> > that (from Trac's point of view) each revision can only have a single
> > ancestor. This is not the case e.g. for Monotone or Mercurial, and maybe
> > others.
>
> bzr keeps a linear revision history. Thus revious_rev would always work.
> When you are at revision 50 and you merged another branch in, you
> commit revision 51. The previous revision of 51 is 50 but if you look at
> the changeset 51 you'll see it has two parent trees. You can browse to
> the other one from there.

What you describe is not a linear history. Revision 51 on that branch has in
fact two parent revisions, 50 on that branch and 42 (or whatever) on the
other branch.

Wouldn't it be nice, if trac could show a combined history of all revisions
that led to the one you are looking at?

Those problems can all be solved by having trac support (a) multiple
repositories and (b) multiple parents per revision.

Regards,
Thomas

Peter Dimov

unread,
Jan 30, 2007, 7:25:43 AM1/30/07
to trac...@googlegroups.com
Hi Thomas,

Thanks for the reply!

Thomas Moschny wrote:
> A subversion repository in fact only represents one real branch (in bzr
> terms). Subversion 'branches' are only a convention, they are not first class
> objects. There is nothing in the repository representing the concept of a
> branch (or a tag). As trac was and is tightly connected to subversion, it
> currently also supports only one branch.
>
> As such, there is per se no problem in supporting bzr and bzr revisions, as
> long as you use only one bzr branch.

A lot is done with branches in bzr and having only a single branch in
trac not at all feasible. Below are some example directory structures, I
hope they help clear terminology differences:

/ bzr shared repository
trunk bzr branch (the official one)
releases normal dir (unversioned)
x.y.z bzr branch
...
developers normal dir (unversioned)
john either normal dir or a (nested!) bzr shared repository
john-dev branch

The above does work with trac+bzr, so trac+bzr does support multiple
repositories. Another scenario that would work with trac+bzr:

/ normal dir (unversioned)
project1 bzr shared repository
trunk bzr branch
...
project2 bzr shared repositroy
trunk bzr branch
...
developers normal dir (unversioned)
john shared repository
john-dev branch

> Now, if trac had support for using multiple repositories, it could also
> support multiple bzr-branches. Whether these bzr-branches are standalone or
> compressed together in a shared repository, is imho not really relevant to
> trac, it's only an implementation detail.

IMHO implementing multiple repositories is the sole responsibility of
the vcs backend. As you see trac+bzr provides this functionality. All we
need to do in Trac is provide both the path and the revno. I'm also
quite sure it would be possible to implement the svn backend to support
multiple branches just like bzr does.

I guess the following would work with svn:

/ normal dir (unversioned)
project1 svn repository
trunk svn versioned dir
...
project2 svn repository
trunk svn versioned dir
...

For that to work the svn backend would need both the revno and the path
to know where to forward the request. Actually I can imagine that for
project1 and project2 you create two separate SvnRepository objects and
have SvnRepositryFacade, which also implements versioncontrol.Repository
and delegates calls depending on the path prefix. What do you think?

Of course the additional support we would need from Trac is the ability
to have different milestones, versions and tickets for each project. I
remember reading a ticket proposing that...

>> bzr keeps a linear revision history. Thus revious_rev would always work.
>> When you are at revision 50 and you merged another branch in, you
>> commit revision 51. The previous revision of 51 is 50 but if you look at
>> the changeset 51 you'll see it has two parent trees. You can browse to
>> the other one from there.
>
> What you describe is not a linear history. Revision 51 on that branch has in
> fact two parent revisions, 50 on that branch and 42 (or whatever) on the
> other branch.

Yes, you are right! It is not really linear but each revision you commit
in the branch goes in the revision history (which is just a list) and it
gets assigned the next revno. If we are at 50 and we merge and commit
we'll be at 51. So from the timeline point of view the previous rev
would be 50. Generally, you are right, 51 has two parent revisions: 50
and 44.2.3. Now, 44.2.3 is also a revision from the *same* branch. Once
you pull the new revisions from the other branch you assign them new
dotted revnos (but don't ask me how exactly they are chosen).

BzrChangesets.get_properties() returns not only date, message, author
but also the list (with hrefs) to the parent revisions. For the above
case you'll see the links to [50/branch] and [44.2.3/branch], so you do
have the option to browse both parent revisions.

> Wouldn't it be nice, if trac could show a combined history of all revisions
> that led to the one you are looking at?

Yes, definitely! A graphical display would be an especially great thing!
Can you provide some references to past discussions?

> Those problems can all be solved by having trac support (a) multiple
> repositories and (b) multiple parents per revision.

I just checked #2086 and I agree that revision should have a member
either base_path, or branch, or repo, or whatever. Each backend could
decide for itself what it needs. Then multiple branches/repositores/etc.
could completely be implemented in the vcs backend.

Looking forward to any comments!

Regards,

Peter

Thomas Moschny

unread,
Jan 30, 2007, 7:41:50 AM1/30/07
to trac...@googlegroups.com
On Tuesday 30 January 2007 10:14, I wrote:
> Wouldn't it be nice, if trac could show a combined history of all revisions
> that led to the one you are looking at?
>
> Those problems can all be solved by having trac support (a) multiple
> repositories and (b) multiple parents per revision.

Btw, once we have support for that, we could write a pseudo-svn backend that
yields a view to the subversion repository that matches the view the users
have, e.g. by mapping the branches-by-convention to real branches.

IIUC, something similar is currently done /within/ the RevTree Pluginm [1],
but in fact it would be nice to shift that functionality into a separate
module and make the RevTree Plugin work on the (yet to be defined)
generalized vc api.

Regards,
Thomas

[1] http://www.trac-hacks.org/wiki/RevtreePlugin

--
Thomas Moschny <thomas....@gmx.de>

Emmanuel Blot

unread,
Jan 30, 2007, 8:30:50 AM1/30/07
to trac...@googlegroups.com
> IIUC, something similar is currently done /within/ the RevTree Pluginm [1],
> but in fact it would be nice to shift that functionality into a separate
> module and make the RevTree Plugin work on the (yet to be defined)
> generalized vc api.

> [1] http://www.trac-hacks.org/wiki/RevtreePlugin

BTW, feel free to comment/correct
http://www.trac-hacks.org/wiki/RevtreePlugin/Limitations#Versioncontrolbackend
for non-SVN backend (I don't know much about VCS except CVS, SVN and
ClearCase)

TIA,
Manu

Thomas Moschny

unread,
Feb 2, 2007, 9:25:27 AM2/2/07
to trac...@googlegroups.com
On Tuesday 30 January 2007 13:25, Peter Dimov wrote:
> IMHO implementing multiple repositories is the sole responsibility of
> the vcs backend

Imho this is the wrong way to go.

By simply introducing a virtual toplevel directory and virtual directories for
each branch/repository, one loses important information (besides the problem
that there can't be nameless branches). A branch/repo simply is not identical
to a toplevel dir.

All source: wiki links for example have to carry that faked toplevel dir, but
for anyone not knowing the setup of that trac, this remains unclear.

Instead the repository should be specified by a mapable prefix similar to the
way the InterTrac links work (e.g. separated by a colon, as
in "source:my-branch:README@123"). This way, the source links stil carry the
branch/repo information, but now in an explicit way. If there's a mapping
between these prefix and the branches/repos somewhere, one can even
move/rename repositories, extract sub-projects, and the like. (We want to
have that going hand-in-hand with multiple projects support, don't we?)

Regards,

Peter Dimov

unread,
Feb 3, 2007, 7:25:26 PM2/3/07
to trac...@googlegroups.com
Dear all,

Attached is a patch based on trac-0.10.3 containing some refactoring to
the versioncontrol module (not the svn backend though).

Here is a summary of the changes:

class Repository:
def get_revision(self, path, rev): return Revision
def get_node(self, path, rev): return Node

class Node:
self.path
self.rev
def get_revision(self): return Revision
def get_history(self, limit, stop_rev):return [Node, ...]

class Revision:
self.message
self.author
self.date
self.base_path # branch/repository base
def get_node(path): return Node

I currently have the browser and log almost completely working with the
new interface and the modified trac+bzr plugin but I’m still far from ready.

I see plenty of calls to normalize_rev and normalize_path in the
trac.log. I would like to suggest removing the functions normalize_*.
The normalization would be done by the implementation of Repository in
functions like get_revision(path, rev) and get_node(path, rev). These
functions return objects of types Revision and Node and this objects
have members keeping the normalized values (you can see this approach in
the attached path). I would also like to clean up the interfaces and
make them simpler.

I really liked Christian’s suggestion to use
revision.get_changeset(path). The second way of getting a changeset
would be by calling repository.get_changeset(new_node, old_node).
Furthermore, I thought it might make sense to have the Changeset class
ONLY represent the changes (files) but NOT the revision (message, data,
author). I feel like a lot of refactoring can be done in
web_ui.changeset and I guess I’ll start from there before further
refactoring the versioncontrol api.

trac-0.10.3-vc.refactoring.diff

Christian Boos

unread,
Feb 5, 2007, 1:44:27 AM2/5/07
to trac...@googlegroups.com
Peter Dimov wrote:
> Dear all,
>
> Attached is a patch based on trac-0.10.3 containing some refactoring to
> the versioncontrol module (not the svn backend though).
>

Ouch, while it's OK to use 0.10-stable in order to show example code, we
can't base further developments on that.
The codebase has already evolved significantly in trunk from
0.10-stable; not that much, but enough to make merging painful, I guess.
0.11dev is still a "moving target", but it won't move a lot in the
versioncontrol area, so it's relatively safe to hack things there...

> Here is a summary of the changes:
>
> class Repository:
> def get_revision(self, path, rev): return Revision
> def get_node(self, path, rev): return Node
>
> class Node:
> self.path
> self.rev
> def get_revision(self): return Revision
> def get_history(self, limit, stop_rev):return [Node, ...]
>
> class Revision:
> self.message
> self.author
> self.date
> self.base_path # branch/repository base
> def get_node(path): return Node
>
> I currently have the browser and log almost completely working with the
> new interface and the modified trac+bzr plugin but I’m still far from ready.
>
> I see plenty of calls to normalize_rev and normalize_path in the
> trac.log. I would like to suggest removing the functions normalize_*.
>

Sure, now the Revision object *is* the normalization of the rev info.
For the path, yes, it could probably be done in the public API methods.

> ...


>
> I really liked Christian’s suggestion to use
> revision.get_changeset(path). The second way of getting a changeset
> would be by calling repository.get_changeset(new_node, old_node).
> Furthermore, I thought it might make sense to have the Changeset class
> ONLY represent the changes (files) but NOT the revision (message, data,
> author).

Ok.

> I feel like a lot of refactoring can be done in
> web_ui.changeset and I guess I’ll start from there before further
> refactoring the versioncontrol api.
>

So, as I said, better base your changes on 0.11, as if we setup a branch
in the future to go on with this approach, it'll be based on trunk, of
course.

> Looking forward to your comments

Last minor point: I've seen that you removed a bunch of the logic
regarding show_path_history in log.py. While this code can certainly be
cleaned-up, removing it is not the way to go ... better add comments for
the suspicious parts (or comment them out).

The show_path_history stuff is rather complex in Subversion, I've not
even started to implement it in Mercurial. The point is that those
things will be quite easy to implement in a cross-vc way once we'll have
the new cache. Also, it will be 10x or 100x faster ;)

... which brings me to another point, the main refactoring I planned for
0.12 was to introduce this new cache, in order to speed-up the browsing
for distributed vcs like Mercurial. The other points (multi-repository
support, new "merge" change kind, parents/children relationship between
changesets and DAG-like revision log) must of course be considered while
designing and implementing this new cache. But likewise, the new API
must also keep in mind the cache, in order to avoid to have to break the
API once again.

-- Christian

Peter Dimov

unread,
Feb 6, 2007, 8:23:48 AM2/6/07
to trac...@googlegroups.com
Hi Christian,

> Ouch, while it's OK to use 0.10-stable in order to show example code, we
> can't base further developments on that.
> The codebase has already evolved significantly in trunk from
> 0.10-stable; not that much, but enough to make merging painful, I guess.
> 0.11dev is still a "moving target", but it won't move a lot in the
> versioncontrol area, so it's relatively safe to hack things there...

OK, not a problem. I'm running a 0.10.3 on my server now but I could use
0.11dev for development.

> Last minor point: I've seen that you removed a bunch of the logic
> regarding show_path_history in log.py. While this code can certainly be
> cleaned-up, removing it is not the way to go ... better add comments for
> the suspicious parts (or comment them out).

I actually thought some of this could be moved from the ui to the
backend (base-class implementation for example).

> ... which brings me to another point, the main refactoring I planned for
> 0.12 was to introduce this new cache, in order to speed-up the browsing
> for distributed vcs like Mercurial. The other points (multi-repository
> support, new "merge" change kind, parents/children relationship between
> changesets and DAG-like revision log) must of course be considered while
> designing and implementing this new cache. But likewise, the new API
> must also keep in mind the cache, in order to avoid to have to break the
> API once again.

I think I know what you mean... I'll think about what can/makes sense to
be cached and what not. I'll post some thought once I could say more on
the topic.

Regards,

Peter

Peter Dimov

unread,
Mar 6, 2007, 3:44:18 PM3/6/07
to trac...@googlegroups.com
Christian Boos wrote:
> Peter Dimov wrote:
>> Dear all,
>>
>> Attached is a patch based on trac-0.10.3 containing some refactoring to
>> the versioncontrol module (not the svn backend though).
>>
>
> Ouch, while it's OK to use 0.10-stable in order to show example code, we
> can't base further developments on that.

I finally got some spare time to look at that one again. I actually
wanted to port my previous modifications to the latest revision in trunk
but that required me to do some modifications in
LogModule.process_request(), which was about 140 lines and called for
refactoring!

Instead of writing big functions I normally like to write smaller, more
cohesive ones, which do only a single coherent part of the whole work.
The "big" function just handles the logic and delegates to the smaller
functions accordingly. IMHO this way the source code becomes much more
readable and easier to maintain. That’s definitely an approach that
could be applied at different places in trac (from what I’ve seen,
LogModule and ChangesetModule at least).

I’ve refactored LogModule.process_request() having the above described
considerations in mind. That’s just the first step and I guess it might
make sense to further split some of the new functions. I’ve added an new
class LogRequest which handles a single request to the log module.
__init__() parses the inputs and the object stores the request state
between the invocations of the individual request processing steps. I
haven’t thoroughly tested the patch, which means I may have forgotten to
add a self. in front of a var somewhere.

I plan to do similar refactoring to ChangesetModule before proceeding
with the version control api refactoring. Actually I would further
refactor LogRequest and ChangesetRequest to use the "perfect" version
control api. I guess we would have some discussion about how perfect it
is before actually going ahead and changing it ;-)

Well, I’d like to know what you think about all this before I do any
further work.

Best regards,

Peter

trac-r4937-LogRequest.patch

Christian Boos

unread,
Mar 7, 2007, 11:46:33 AM3/7/07
to trac...@googlegroups.com
Peter Dimov wrote:
> I finally got some spare time to look at that one again. I actually
> wanted to port my previous modifications to the latest revision in trunk
> but that required me to do some modifications in
> LogModule.process_request(), which was about 140 lines and called for
> refactoring!

Indeed ;-)

> Instead of writing big functions I normally like to write smaller, more
> cohesive ones, which do only a single coherent part of the whole work.
> The "big" function just handles the logic and delegates to the smaller
> functions accordingly. IMHO this way the source code becomes much more
> readable and easier to maintain. That’s definitely an approach that
> could be applied at different places in trac (from what I’ve seen,
> LogModule and ChangesetModule at least).
>

I think it's a good idea to have a request handler object that can be
used to better encapsulate the processing of one request, as we can
store the request and the parameters and access them from processing
methods and helper methods. This is not possible to do at the level of
the module component itself, which is shared by different requests.

So I've started from your patch and made some further modification.
The LogModule.process_request just do:

return LogRequest(self, req).process()

instead of using an API of the LogRequest class. process() is a generic
method from the parent RequestHandler class, which will dispatch to the
appropriate method of the subclass according to the action parameter
('do_<action>' or 'render_<action>' ones, the former for GET requests
and the latter for POST requests). This is not that useful for the
LogRequest, where the only action is 'view' (the default), but would be
for other handlers.

> I’ve refactored LogModule.process_request() having the above described
> considerations in mind. That’s just the first step and I guess it might
> make sense to further split some of the new functions. I’ve added an new
> class LogRequest which handles a single request to the log module.
> __init__() parses the inputs and the object stores the request state
> between the invocations of the individual request processing steps. I
> haven’t thoroughly tested the patch, which means I may have forgotten to
> add a self. in front of a var somewhere.
>

Yes, the path_history mode hasn't been tested, I fixed it.
Also, self.revs was used in two different places for a different
purpose. I've used locals instead.
I also changed the methods a bit to make more apparent what step they
were doing, by using a more functional style:

def render_view(self):
history = self._get_history_provider()
items = self._get_log_items(history)
data = self._get_template_context(items)

if len(items) == self.limit+1:
...

instead of:

request.prepare_history()
request.extract_info()
request.extract_changes()
request.prepare_response()
return request.get_response()

> I plan to do similar refactoring to ChangesetModule before proceeding
> with the version control api refactoring. Actually I would further
> refactor LogRequest and ChangesetRequest to use the "perfect" version
> control api. I guess we would have some discussion about how perfect it
> is before actually going ahead and changing it ;-)

Please take a look at my updated patch. If we both feel we can achieve
something going on this way, maybe we should restart a vc-refactoring
branch ;-) As you proposed, we should first start there by some
clean-ups and refactoring of the current code base.

-- Christian

LogRequest-refactoring-r4941.diff

Peter Dimov

unread,
Mar 7, 2007, 1:30:24 PM3/7/07
to trac...@googlegroups.com
Christian Boos wrote:
> instead of using an API of the LogRequest class. process() is a generic
> method from the parent RequestHandler class, which will dispatch to the
> appropriate method of the subclass according to the action parameter
> ('do_<action>' or 'render_<action>' ones, the former for GET requests
> and the latter for POST requests). This is not that useful for the
> LogRequest, where the only action is 'view' (the default), but would be
> for other handlers.

The RequestHandler idea is great! The definition of RequestHandler
should be moved elsewhere though. You'd know better than me which module
is most suitable.

Do we really need two different handlers ('do_<action>' or
'render_<action>') for the *same* action? I wouldn't actually care
whether I got the data form the query string with of a GET request or in
the message body of a POST request... all the handler should care about
is to get the data and handle the request. Or maybe there's a difference
in the way trac handles GET and POST requests that I'm unaware of?

> I also changed the methods a bit to make more apparent what step they
> were doing, by using a more functional style:
>
> def render_view(self):
> history = self._get_history_provider()
> items = self._get_log_items(history)
> data = self._get_template_context(items)
>
> if len(items) == self.limit+1:
> ...

Nice! :-)

>> I plan to do similar refactoring to ChangesetModule before proceeding
>> with the version control api refactoring. Actually I would further
>> refactor LogRequest and ChangesetRequest to use the "perfect" version
>> control api. I guess we would have some discussion about how perfect it
>> is before actually going ahead and changing it ;-)
>
> Please take a look at my updated patch. If we both feel we can achieve
> something going on this way, maybe we should restart a vc-refactoring
> branch ;-) As you proposed, we should first start there by some
> clean-ups and refactoring of the current code base.

Yes, we will definitely achieve something! I hope I'll be able to
refactor BrowserModule and ChangesetModule and provide a patch for
review by the end of the week :-).

Regards,

Peter

Christian Boos

unread,
Mar 7, 2007, 4:03:01 PM3/7/07
to trac...@googlegroups.com
Peter Dimov wrote:
> Christian Boos wrote:
>
>> instead of using an API of the LogRequest class. process() is a generic
>> method from the parent RequestHandler class, which will dispatch to the
>> appropriate method of the subclass according to the action parameter
>> ('do_<action>' or 'render_<action>' ones, the former for GET requests
>> and the latter for POST requests). This is not that useful for the
>> LogRequest, where the only action is 'view' (the default), but would be
>> for other handlers.
>>
>
> The RequestHandler idea is great! The definition of RequestHandler
> should be moved elsewhere though. You'd know better than me which module
> is most suitable.
>

Well, ideally it should go in trac/web/api.py, next to the IRequestHandler.
Hm, probably a more distinctive name would be useful then, like
RequestProcessor.

> Do we really need two different handlers ('do_<action>' or
> 'render_<action>') for the *same* action? I wouldn't actually care
> whether I got the data form the query string with of a GET request or in
> the message body of a POST request... all the handler should care about
> is to get the data and handle the request. Or maybe there's a difference
> in the way trac handles GET and POST requests that I'm unaware of?
>

We usually use posts to trigger some side-effect in the database, so
after having processed the POST we usually send a redirect so that the
client GETs the latest version of the resource. But you're right, there
are situations where we don't differentiate, and we should probably
default to the render_<action> method when the do_<action> one is not
implemented. The opposite is not desired however, as we have some
protective measures for submissions that should be done exclusively with
a POST (grep for form_token in trac/web).

>>> I plan to do similar refactoring to ChangesetModule before proceeding
>>> with the version control api refactoring. Actually I would further
>>> refactor LogRequest and ChangesetRequest to use the "perfect" version
>>> control api. I guess we would have some discussion about how perfect it
>>> is before actually going ahead and changing it ;-)
>>>
>> Please take a look at my updated patch. If we both feel we can achieve
>> something going on this way, maybe we should restart a vc-refactoring
>> branch ;-) As you proposed, we should first start there by some
>> clean-ups and refactoring of the current code base.
>>
>
> Yes, we will definitely achieve something! I hope I'll be able to
> refactor BrowserModule and ChangesetModule and provide a patch for
> review by the end of the week :-).
>

Ok, so I'd like to restart the sandbox/vc-refactoring branch, and I'd
like that Peter could commit there as well.
The idea would be to get that last patch there, then move the
RequestHandler to trac.web.api.RequestProcessor and do a similar
refactoring for the Browser and Changeset modules. This cleanup would be
beneficial to 0.11 and will enable Peter to become familiar with the
existing code; Peter who has already volunteered for providing us with
the perfect versioncontrol API for 0.12 (as I understood it, right? ;-) ).

Any dissenting opinion?

-- Christian

Emmanuel Blot

unread,
Mar 7, 2007, 6:50:08 PM3/7/07
to trac...@googlegroups.com
> Any dissenting opinion?

Nope, but could either of you maintain a Trac page that describes any
break that could occur in the internal APIs - in order to ease the
migration of related plugins ?

TIA,
Manu

Peter Dimov

unread,
Mar 9, 2007, 8:26:57 PM3/9/07
to trac...@googlegroups.com
Christian Boos wrote:
> Peter Dimov wrote:
>> The RequestHandler idea is great! The definition of RequestHandler
>> should be moved elsewhere though. You'd know better than me which module
>> is most suitable.
>>
>
> Well, ideally it should go in trac/web/api.py, next to the IRequestHandler.
> Hm, probably a more distinctive name would be useful then, like
> RequestProcessor.

I did that.

> We usually use posts to trigger some side-effect in the database, so
> after having processed the POST we usually send a redirect so that the
> client GETs the latest version of the resource.

OK, that makes perfect sense!

> Ok, so I'd like to restart the sandbox/vc-refactoring branch, and I'd
> like that Peter could commit there as well.

Who should I talk to regarding permission rights?

> The idea would be to get that last patch there, then move the
> RequestHandler to trac.web.api.RequestProcessor and do a similar
> refactoring for the Browser and Changeset modules.

Attached is a patch containing Christian's modifications of the
LogRequest refactoring + RequestHandler moved to
trac.web.api.RequestProcessor + refactored BrowserModule. Please have a
look and comment!

I'm not sure why the colorization doesn't work for me but it didn't work
even before the refactoring. Any idea?

Just to make sure I got it right, what is the semantic of
Node.last_modified? Is it the modification time of the entry path@rev?
If that is the case then we don't need to do the lookup
changes[node.rev].date when sorting the entries but we can use
node.last_modified instead... seems to work at least (see
_date_file_order()).


> This cleanup would be
> beneficial to 0.11 and will enable Peter to become familiar with the
> existing code; Peter who has already volunteered for providing us with
> the perfect versioncontrol API for 0.12 (as I understood it, right? ;-) ).

:-) volunteer I do and we'll definitely need to discuss here on the list
what "perfect" means ;-)


Regards,

Peter

trac-r4941-versioncontrol.web_ui.cleanup.patch
Reply all
Reply to author
Forward
0 new messages