A main assumption in current implementation is that a revision is global to the whole repository. This is however not the case with bazaar and other distributed version control systems.
Additionally there is some unnecessary duplication of functionality in the current api. For example, there are two ways to create a changeset – either from a Changeset.get_changes() from a Repository.get_changes(). In the first case path restriction is implemented by Trac, whereas in the second it is implemented by the version control backend. This could surely be avoided.
Let me do a summary of the functionality we have:
changeset – displays all / some changes committed in a single revision diff – displays all / some changes committed over multiple revisions, possibly between different paths log – lists all revision that modify a given path source – lists the inventory at a given revision
Suggestions:
1. Since a changeset is a special diff, I’d suggest that they share implementation in class Changeset.
2. Also, revision is currently just a string and that is insufficient (e.g. to better implement bzr support) and IMHO we’d better make a class out of it. A revision would be a snapshot of (part of) the repository at the time of a commit. Let that be the class Revision.
3. A Revision object could be extracted from a Repository from a revision number (currently hex allowed) and path (to support bzr like systems). The reason is that revno 100 at path1 would be different from revno 100 at path2.
4. A Changeset would then be the difference between two nodes. A node is a file or dir at a specific revision (as it is now). A changeset (current implementation) would be the difference between '/' at revno 99 and '/' at revno 100.
To summarize, here are the classes with some of their methods:
Revision properties: revno, message, time, author, etc. get_node (path) Changeset create (node1, node2) get_changes()
I plan to try this out with the bzr backend (Aaron Bentley's trac+bzr) and would really appreciate if somebody gives me feedback, or even better, works in parallel on the svn backend.
On Monday 29 January 2007 16:11, Peter Dimov wrote:
> A main assumption in current implementation is that a revision is global > to the whole repository. This is however not the case with bazaar and > other distributed version control systems.
Really? While I am not an bzr user, I thought that each branch in bzr has its own revision numbering. Thus supporting multiple locations (with independent revision numbering) really means supporting multiple repositories (see #2086), which, in the case of bzr, means supporting multiple branches, no?
A more severe problem for vc backends other than svn is imho the fact that the vc api currently only supports a linear history, which means that (from Trac's point of view) each revision can only have a single ancestor. This is not the case e.g. for Monotone or Mercurial, and maybe others.
Peter Dimov wrote: > A main assumption in current implementation is that a revision is global > to the whole repository. This is however not the case with bazaar and > other distributed version control systems.
Well, on one end of the spectrum, some dvcs use SHA1 hashes as revision numbers, so that they're globally unique for practical purposes. At the other end of the spectrum, traditional systems like CVS have a version number which is only meaningful at the level of a given file. So bazaar can be seen somewhere in between, IIUC.
Even if CVS is of no immediate interest to us (there's an Request-a-Hack for it on TracHacks, though!), there's the multi-repository support that would bring similar requirements. A revision 100 in repository A has nothing in common with the revision 100 in repository B, of course. One way to implement multi-repository support would be to have a virtual top-level folder containing a virtual folder for each repository. In this scenario, it also makes sense to require the path and the revision number at the same time in order to identify a specific revision.
(I've just seen Thomas' mail pointing out the same similarity)
Another scenario would be for getting some convergence with Trac's own versioned resource. Similar to CVS, we also need to combine path information (the resource realm + id) and version number in order to identify a specific revision of a resource (e.g. version 15 of wiki:WikiStart). Getting some convergence between the resource manipulation API and the versioncontrol API would be interesting to get some algorithms (e.g. annotate) work either on vc nodes or on versioned resources.
> Additionally there is some unnecessary duplication of functionality in > the current api. For example, there are two ways to create a changeset – > either from a Changeset.get_changes() from a Repository.get_changes(). > In the first case path restriction is implemented by Trac, whereas in > the second it is implemented by the version control backend. This could > surely be avoided.
Well, here despite of the same name and similar output, the actions are quite different, so I'm not sure it's a good idea to merge the two - as in a merged implementation, we would probably have to differentiate both situations and have two distinct code paths anyway. But I'm open, and we could try out to see how it would look like.
> Let me do a summary of the functionality we have:
> changeset – displays all / some changes committed in a single revision > diff – displays all / some changes committed over multiple revisions, > possibly between different paths > log – lists all revision that modify a given path > source – lists the inventory at a given revision
> Suggestions:
> 1. Since a changeset is a special diff, I’d suggest that they share > implementation in class Changeset.
see reservation above for point 1.
> 2. Also, revision is currently just a string and that is insufficient > (e.g. to better implement bzr support) and IMHO we’d better make a class > out of it. A revision would be a snapshot of (part of) the repository at > the time of a commit. Let that be the class Revision.
ok ...
> 3. A Revision object could be extracted from a Repository from a > revision number (currently hex allowed) and path (to support bzr like > systems). The reason is that revno 100 at path1 would be different from > revno 100 at path2.
Right, I see this as the major improvement proposed here.
> 4. A Changeset would then be the difference between two nodes. A node is > a file or dir at a specific revision (as it is now). A changeset > (current implementation) would be the difference between '/' at revno 99 > and '/' at revno 100.
See reservation above. Actually, Trac used to implement things that way a long time ago, but - for Subversion at least, you loose important information if you treat the changeset 100 as being simply "the difference between '/' at revno 99 and '/' at revno 100". This is why we switched to using a dedicated operation for retrieving the changeset changes (svn_repos_replay) and only later we reintroduce the use of the general diff operation (svn_repos_dir_delta) for arbitrary diffs. So at least for Subversion (and Mercurial as well), we would have to have a dedicated implementation for the two different situations.
Also they offer a fundamental distinction w.r.t. to the cache behavior: the changeset corresponding to a revision could be cached (as there are a finite number of them), the others would always have to be generated dynamically (as there's a huge number of combinations). I'm actually quite sure that in the end, the generalized diffs could be generated *from the cache*. And by the way, that's also something I'd like that we put some thought in, as I see a great potential to do most operations through the cache. This would benefit _all_ vc backends, as they would mainly have to provide revision changeset information, and little more (retrieving the actual content of a node, getting the node properties... such things).
> To summarize, here are the classes with some of their methods:
> Repository > get_revision (path, rev)
Yes, this seems to be the right thing to do...
> get_revisions (start, end)
> Revision > properties: revno, message, time, author, etc. > get_node (path)
Yes, plus there should be a Revision.get_changeset(path) method, in order to get the full list of changes if path is empty or the "restricted changeset" if not, which corresponds to this revision.
Then, following-up to what Thomas wrote, there should be some revision navigation methods for this class as well: perhaps children() / parents() revisions, as a generalization of the linear next_revision / previous_revision we have now (see #1492). Likewise, the Repository would have to extended in order to give the revision "roots" (instead of oldest_revision) and the revision "heads" (instead of the youngest_revision).
> I plan to try this out with the bzr backend (Aaron Bentley's trac+bzr) > and would really appreciate if somebody gives me feedback, or even > better, works in parallel on the svn backend.
Great! The more Trac hackers, the better Trac in the end ;-) Thank you very much for your interest in making Trac better!
Thomas Moschny wrote: > On Monday 29 January 2007 16:11, Peter Dimov wrote: >> A main assumption in current implementation is that a revision is global >> to the whole repository. This is however not the case with bazaar and >> other distributed version control systems.
> Really? While I am not an bzr user, I thought that each branch in bzr has its > own revision numbering.
Actually each bzr revision has a GUID. The same revision could be in multiple branches under a different revno. Each bazaar branch does have its own numbering.
> Thus supporting multiple locations (with independent > revision numbering) really means supporting multiple repositories (see > #2086), which, in the case of bzr, means supporting multiple branches, no?
Standalone branches have their own repository. Bazaar also has the concept of a shared repository where multiple branches share the same revision store. With a shared repository when you branch within the repository you don't copy its revisions. You just refer to the ones in the shared store. If you branch to a standalone branch you need to copy all revisions.
trac+bzr does support multiple repositories. I'm currently using a single one but I'll switch to the setup Aaron Bentley proposed me, namely giving each developer their own repository to avoid issues with umasks and access rights.
> A more severe problem for vc backends other than svn is imho the fact that the > vc api currently only supports a linear history, which means that (from > Trac's point of view) each revision can only have a single ancestor. This is > not the case e.g. for Monotone or Mercurial, and maybe others.
bzr keeps a linear revision history. Thus revious_rev would always work. When you are at revision 50 and you merged another branch in, you commit revision 51. The previous revision of 51 is 50 but if you look at the changeset 51 you'll see it has two parent trees. You can browse to the other one from there.
Thanks for the detailed reply and interesting comments!
Christian Boos wrote: > Well, on one end of the spectrum, some dvcs use SHA1 hashes as revision > numbers, so that they're globally unique for practical purposes. At the > other end of the spectrum, traditional systems like CVS have a version > number which is only meaningful at the level of a given file. So bazaar > can be seen somewhere in between, IIUC.
As I already wrote in my reply to Thomas, bzr does have GUIDs as revision IDs. The same revision may be found under different revno in different branches. I find revnos much more user friendly than SHA1 revids and the path gives some additional information too. I find cahngeset:100/path/to/branch better much much better than changeset:1ADC452...
If you browse revision 100 of branch, you could see some file.txt having revision 44 because it was last changed back then. Of course if you request file.txt from revision 100 you'll get the same as file.txt are revision 44 ... I guess that's why you say it is somewhere in between but... isn't that what svn is doing too?
> Another scenario would be for getting some convergence with Trac's own > versioned resource. Similar to CVS, we also need to combine path > information (the resource realm + id) and version number in order to > identify a specific revision of a resource (e.g. version 15 of > wiki:WikiStart). Getting some convergence between the resource > manipulation API and the versioncontrol API would be interesting to get > some algorithms (e.g. annotate) work either on vc nodes or on versioned > resources.
That's an interesting idea and I see no reason this would not work.
>> Additionally there is some unnecessary duplication of functionality in >> the current api. For example, there are two ways to create a changeset – >> either from a Changeset.get_changes() from a Repository.get_changes(). >> In the first case path restriction is implemented by Trac, whereas in >> the second it is implemented by the version control backend. This could >> surely be avoided.
> Well, here despite of the same name and similar output, the actions are > quite different, so I'm not sure it's a good idea to merge the two - as > in a merged implementation, we would probably have to differentiate both > situations and have two distinct code paths anyway. But I'm open, and we > could try out to see how it would look like.
For example it should be possible to reimplement the function that restrict the path of a changeset. That could be done much more efficiently by using the bzr api. I'm not really sure about that, but I think it would be possible to generate the diff using the bzr api too.
I think I now see what is different between the two functions. Repository.get_changes() is supposed to work even on totally unrelated paths and not only on different revisions of the same path. I think that if the paths are unrelated then trac should take care of creating the diff. Trac would get the two nodes and run the diff on them. But the changeset path restriction should be left to the version control implementation. A default implementation could of course be provided in the base Changeset.
> See reservation above. Actually, Trac used to implement things that way > a long time ago, but - for Subversion at least, you loose important > information if you treat the changeset 100 as being simply "the > difference between '/' at revno 99 and '/' at revno 100". This is why we > switched to using a dedicated operation for retrieving the changeset > changes (svn_repos_replay) and only later we reintroduce the use of the > general diff operation (svn_repos_dir_delta) for arbitrary diffs. So at > least for Subversion (and Mercurial as well), we would have to have a > dedicated implementation for the two different situations.
Maybe the above suggestion is applicable to svn and mercurial (i.e. trac to do arbitrary diffs and vcs to retrieve changeset changes)?
> Also they offer a fundamental distinction w.r.t. to the cache behavior: > the changeset corresponding to a revision could be cached (as there are > a finite number of them), the others would always have to be generated > dynamically (as there's a huge number of combinations). I'm actually > quite sure that in the end, the generalized diffs could be generated > *from the cache*. And by the way, that's also something I'd like that we > put some thought in, as I see a great potential to do most operations > through the cache. This would benefit _all_ vc backends, as they would > mainly have to provide revision changeset information, and little more > (retrieving the actual content of a node, getting the node properties... > such things).
Caching provided by trac would be a really great thing! I've looked at the svn cache repository and was thinking how I could reuse it for trac+bzr. In the perfect scenario the vcs backend wouldn't care about caching and have Trac handle all the details.
Restricting what should be cached shouldn't be a big deal. The Changeset caching decorator would take the decision what is forwarded to the backend and what is read from the cache. We would only cache changesets where new is specified and all the others (new_path, old, old_path) are None.
>> Revision >> properties: revno, message, time, author, etc. >> get_node (path)
> Yes, plus there should be a Revision.get_changeset(path) method, in > order to get the full list of changes if path is empty or the > "restricted changeset" if not, which corresponds to this revision.
Yes, that's great!
> Then, following-up to what Thomas wrote, there should be some revision > navigation methods for this class as well: perhaps children() / > parents() revisions, as a generalization of the linear next_revision / > previous_revision we have now (see #1492).
Getting the parent trees is done already. I'm not sure it would be possible to get the children though. A branch wouldn't know about its children unless some of them are merged back in. I'll have to check that with Aaron.
> Likewise, the Repository would have to extended in order to give the > revision "roots" (instead of oldest_revision) and the revision "heads" > (instead of the youngest_revision).
You are generally right but... I don't see much sense of having this youngest_rev altogether. I mean, it is used to find out whether the revision you have is the oldest one
if rev == youngest_rev:
We'd better use
if rev.next_rev:
or something like that. I think we should refactor youngest_rev away. What do others think?
On Tuesday 30 January 2007 01:56, Peter Dimov wrote:
> Standalone branches have their own repository. Bazaar also has the > concept of a shared repository where multiple branches share the same > revision store. With a shared repository when you branch within the > repository you don't copy its revisions. You just refer to the ones in > the shared store. If you branch to a standalone branch you need to copy > all revisions.
We might have a problem using different terms for different things here.
A subversion repository in fact only represents one real branch (in bzr terms). Subversion 'branches' are only a convention, they are not first class objects. There is nothing in the repository representing the concept of a branch (or a tag). As trac was and is tightly connected to subversion, it currently also supports only one branch.
As such, there is per se no problem in supporting bzr and bzr revisions, as long as you use only one bzr branch.
Now, if trac had support for using multiple repositories, it could also support multiple bzr-branches. Whether these bzr-branches are standalone or compressed together in a shared repository, is imho not really relevant to trac, it's only an implementation detail.
> > A more severe problem for vc backends other than svn is imho the fact > > that the vc api currently only supports a linear history, which means > > that (from Trac's point of view) each revision can only have a single > > ancestor. This is not the case e.g. for Monotone or Mercurial, and maybe > > others.
> bzr keeps a linear revision history. Thus revious_rev would always work. > When you are at revision 50 and you merged another branch in, you > commit revision 51. The previous revision of 51 is 50 but if you look at > the changeset 51 you'll see it has two parent trees. You can browse to > the other one from there.
What you describe is not a linear history. Revision 51 on that branch has in fact two parent revisions, 50 on that branch and 42 (or whatever) on the other branch.
Wouldn't it be nice, if trac could show a combined history of all revisions that led to the one you are looking at?
Those problems can all be solved by having trac support (a) multiple repositories and (b) multiple parents per revision.
Thomas Moschny wrote: > A subversion repository in fact only represents one real branch (in bzr > terms). Subversion 'branches' are only a convention, they are not first class > objects. There is nothing in the repository representing the concept of a > branch (or a tag). As trac was and is tightly connected to subversion, it > currently also supports only one branch.
> As such, there is per se no problem in supporting bzr and bzr revisions, as > long as you use only one bzr branch.
A lot is done with branches in bzr and having only a single branch in trac not at all feasible. Below are some example directory structures, I hope they help clear terminology differences:
/ bzr shared repository trunk bzr branch (the official one) releases normal dir (unversioned) x.y.z bzr branch ... developers normal dir (unversioned) john either normal dir or a (nested!) bzr shared repository john-dev branch
The above does work with trac+bzr, so trac+bzr does support multiple repositories. Another scenario that would work with trac+bzr:
/ normal dir (unversioned) project1 bzr shared repository trunk bzr branch ... project2 bzr shared repositroy trunk bzr branch ... developers normal dir (unversioned) john shared repository john-dev branch
> Now, if trac had support for using multiple repositories, it could also > support multiple bzr-branches. Whether these bzr-branches are standalone or > compressed together in a shared repository, is imho not really relevant to > trac, it's only an implementation detail.
IMHO implementing multiple repositories is the sole responsibility of the vcs backend. As you see trac+bzr provides this functionality. All we need to do in Trac is provide both the path and the revno. I'm also quite sure it would be possible to implement the svn backend to support multiple branches just like bzr does.
I guess the following would work with svn:
/ normal dir (unversioned) project1 svn repository trunk svn versioned dir ... project2 svn repository trunk svn versioned dir ...
For that to work the svn backend would need both the revno and the path to know where to forward the request. Actually I can imagine that for project1 and project2 you create two separate SvnRepository objects and have SvnRepositryFacade, which also implements versioncontrol.Repository and delegates calls depending on the path prefix. What do you think?
Of course the additional support we would need from Trac is the ability to have different milestones, versions and tickets for each project. I remember reading a ticket proposing that...
>> bzr keeps a linear revision history. Thus revious_rev would always work. >> When you are at revision 50 and you merged another branch in, you >> commit revision 51. The previous revision of 51 is 50 but if you look at >> the changeset 51 you'll see it has two parent trees. You can browse to >> the other one from there.
> What you describe is not a linear history. Revision 51 on that branch has in > fact two parent revisions, 50 on that branch and 42 (or whatever) on the > other branch.
Yes, you are right! It is not really linear but each revision you commit in the branch goes in the revision history (which is just a list) and it gets assigned the next revno. If we are at 50 and we merge and commit we'll be at 51. So from the timeline point of view the previous rev would be 50. Generally, you are right, 51 has two parent revisions: 50 and 44.2.3. Now, 44.2.3 is also a revision from the *same* branch. Once you pull the new revisions from the other branch you assign them new dotted revnos (but don't ask me how exactly they are chosen).
BzrChangesets.get_properties() returns not only date, message, author but also the list (with hrefs) to the parent revisions. For the above case you'll see the links to [50/branch] and [44.2.3/branch], so you do have the option to browse both parent revisions.
> Wouldn't it be nice, if trac could show a combined history of all revisions > that led to the one you are looking at?
Yes, definitely! A graphical display would be an especially great thing! Can you provide some references to past discussions?
> Those problems can all be solved by having trac support (a) multiple > repositories and (b) multiple parents per revision.
I just checked #2086 and I agree that revision should have a member either base_path, or branch, or repo, or whatever. Each backend could decide for itself what it needs. Then multiple branches/repositores/etc. could completely be implemented in the vcs backend.
> Wouldn't it be nice, if trac could show a combined history of all revisions > that led to the one you are looking at?
> Those problems can all be solved by having trac support (a) multiple > repositories and (b) multiple parents per revision.
Btw, once we have support for that, we could write a pseudo-svn backend that yields a view to the subversion repository that matches the view the users have, e.g. by mapping the branches-by-convention to real branches.
IIUC, something similar is currently done /within/ the RevTree Pluginm [1], but in fact it would be nice to shift that functionality into a separate module and make the RevTree Plugin work on the (yet to be defined) generalized vc api.
> IIUC, something similar is currently done /within/ the RevTree Pluginm [1], > but in fact it would be nice to shift that functionality into a separate > module and make the RevTree Plugin work on the (yet to be defined) > generalized vc api. > [1] http://www.trac-hacks.org/wiki/RevtreePlugin
On Tuesday 30 January 2007 13:25, Peter Dimov wrote:
> IMHO implementing multiple repositories is the sole responsibility of > the vcs backend
Imho this is the wrong way to go.
By simply introducing a virtual toplevel directory and virtual directories for each branch/repository, one loses important information (besides the problem that there can't be nameless branches). A branch/repo simply is not identical to a toplevel dir.
All source: wiki links for example have to carry that faked toplevel dir, but for anyone not knowing the setup of that trac, this remains unclear.
Instead the repository should be specified by a mapable prefix similar to the way the InterTrac links work (e.g. separated by a colon, as in "source:my-branch:README@123"). This way, the source links stil carry the branch/repo information, but now in an explicit way. If there's a mapping between these prefix and the branches/repos somewhere, one can even move/rename repositories, extract sub-projects, and the like. (We want to have that going hand-in-hand with multiple projects support, don't we?)