How important is it to have revisions in the diffs that are uploaded?

9 views
Skip to first unread message

Elliot Murphy

unread,
Jun 19, 2007, 12:00:07 AM6/19/07
to revie...@googlegroups.com
Looking at the diffs that are generated by Bazaar, I'm noticing that
they don't include revision numbers, like the subversion diffs do.
Instead, they include timestamps.

How important is it to reviewboard to have revisions for a patch? Is it
useful to just use the latest version of the file referenced by the patch?

Alternatively, bazaar has these things called bundles, which look like a
patch at the top, but then have some metadata bundled in the bottom
where I can extract the parent revision id from, so I could require that
people uploading bazaar patches always submit bundles rather than plain
diffs.

I see both sides of this - it doesn't make much sense to review/approve
a patch that is invalid because development has moved on...except when
it does because development is moving very fast and someone just needs a
review on the general concept of what they are working on, and so then
it would be better to do the review based on the revision the patch was
generated against.

What does everyone else think? What are other SCMtool authors doing?

--
Elliot Murphy | https://launchpad.net/~emurphy/

Christian Hammond

unread,
Jun 19, 2007, 12:03:10 AM6/19/07
to revie...@googlegroups.com
It's absolutely essential. Even if you wanted to force users to keep a diff up-to-date, it doesn't make sense after it's reviewed. The patch wouldn't be able to apply and so old review requests would generate errors and be unreadable. It also would generate a tremendous amount of frustration for contributors. Imagine uploading a new diff only to find that someone updated the file between the time you generated the diff and posted the review request. Developers would have to keep asking users to re-upload a new diff and users would eventually get pissed off and stop using Review Board.

Christian
--
Christian Hammond - chi...@chipx86.com
VMware, Inc.

Elliot Murphy

unread,
Jun 19, 2007, 12:25:48 AM6/19/07
to revie...@googlegroups.com
I agree with what you are wanting to accomplish. I'm still wondering if
having the revisions embedded in the diff file itself is going to work
for various SCMs. I've seen a comment about the Git SCMTool in progress
that using revisions is clunky, and looking at the mercurial diffs I
don't see revisions listed there either (although it may be possible to
get some kind of unique identifier).

For bazaar, I can definitely get the exact information about what a
bundle was made against, but it feels funny to try and teach
diffviewer.parser about diff formats for different SCM tools.

What do you think about extending the SCMTool API so that a file is
uploaded, and then handed off to the SCMTool to parse out the list of
files and revisions?

Certainly the diff format is standardized enough that all these tools
generate files that will be applied by the patch command, but the extra
data (like revisions) embedded in those diffs are not standardized at all.

Another, perhaps orthogonal thing that could be done is to cache the
files, rather than re-requesting them every time a diff was viewed. This
might be a good thing to do regardless.

On 06/19/2007 12:03 AM, Christian Hammond wrote:
> It's absolutely essential. Even if you wanted to force users to keep a diff
> up-to-date, it doesn't make sense after it's reviewed. The patch wouldn't be
> able to apply and so old review requests would generate errors and be
> unreadable. It also would generate a tremendous amount of frustration for
> contributors. Imagine uploading a new diff only to find that someone updated
> the file between the time you generated the diff and posted the review
> request. Developers would have to keep asking users to re-upload a new diff
> and users would eventually get pissed off and stop using Review Board.

--
Elliot Murphy | https://launchpad.net/~emurphy/

Christian Hammond

unread,
Jun 19, 2007, 1:58:05 AM6/19/07
to revie...@googlegroups.com
CodeCollab from Smart Bear Software has to face this problem too. They provide tools for submitting diffs and have special versions of those for SCMs whose tools don't generate usable diffs. We may have to go a similar route and focus more heavily on post-review (which I want to do anyway) and some kind of a special differ tool (or that could just stay in post-review with a --show-diff).

I'm fine with extending te SCMTool API to allow them to better parse parts of diffs, but we need to figure out how best to do that. It still won't cover every SCM tool, but if we can reduce the need for special tools when possible, that's great.

As for caching, we already do this. Three times, actually. We cache the files coming from the repository, we cache the patched version, and we cache the list of changes for display in the diff. The repository should only ever be contacted the first time the diff is viewed (provided it hasn't fallen out of the cache since), and the diff should only be generated the first time (again, as long as it's in the cache).

Christian


On 6/18/07, Elliot Murphy <elliot...@gmail.com> wrote:

Nick Loeve

unread,
Jun 19, 2007, 3:33:44 AM6/19/07
to reviewboard
Hi there,

On Jun 19, 6:25 am, Elliot Murphy <elliot.mur...@gmail.com> wrote:
> I agree with what you are wanting to accomplish. I'm still wondering if
> having the revisions embedded in the diff file itself is going to work
> for various SCMs. I've seen a comment about the Git SCMTool in progress
> that using revisions is clunky,

It was more clunky in that Git does not use the term revision. My
patch for the Git SCMTool instead scans for the object ids, which are
the unique identifiers of content at commit time in Git.

But yeah, i had to change the diffparser to grab these from the diff
file. It would be good to able to hand off diff parsing to the
SCMTool, but not sure of the best way to do that. I originally had the
patch allowing the SMCTool to swap in a different diffparser.parseFile
function, but thats just going to lead to code duplication, as alot of
the code was still the same.

dsve...@gmail.com

unread,
Jun 19, 2007, 7:15:08 AM6/19/07
to reviewboard
On 19 Juni, 07:58, "Christian Hammond" <chip...@chipx86.com> wrote:
> I'm fine with extending te SCMTool API to allow them to better parse parts
> of diffs, but we need to figure out how best to do that. It still won't
> cover every SCM tool, but if we can reduce the need for special tools when
> possible, that's great.
Ops, managed to press the wrong button and answer the author instead
of the list.

Anyway...

I beg to differ. It would support every SCM tool that can be used with
ReviewBoard if each tool parsed the diff produced by that scm tool.
This would probably only apply to headers, so it's just about filling
the "patch" with metadata, such as files and revisions.

It's also pretty common that people use the regular plain old diff
tool to submit patches to projects. Sure, projects could enforce that
only correct patches are accepted, but that might piss some people
off, so it would be nice if patches without revision was supported.
The support for this would be simple once "metadata" is produced out
in the SCMTools as if the tool can't find the revision, it fetches the
head revision and associates that revision with the patch.

--
Daniel Svensson

Robin Ericsson

unread,
Jun 19, 2007, 7:21:09 AM6/19/07
to revie...@googlegroups.com
On 6/19/07, dsve...@gmail.com <dsve...@gmail.com> wrote:
> I beg to differ. It would support every SCM tool that can be used with
> ReviewBoard if each tool parsed the diff produced by that scm tool.
> This would probably only apply to headers, so it's just about filling
> the "patch" with metadata, such as files and revisions.

I second that, this is the problem I'm having with CVS where not all
the information in the patch is provided via the regular parse. I've
added an extra parse method to the scmtool in my patch, but a meta way
of parsing would be a lot better.

--
regards,
Robin

Al

unread,
Jun 19, 2007, 9:39:59 AM6/19/07
to reviewboard
Isn't this revision issue a general symptom of trying to integrate in
a distributed SCM? SVN and P4 are both centralized, but Bazaar and Git
are both distributed. With the centralized SCM you get a common point
of reference and handy concepts like "pending changes", but with most
distributed SCM you only get unique identifiers and some lineage
information. Every local change is already committed.

To make a distributed SCM work you need an external repository that
the reviews are against so that you can meaningfully talk about "what
has changed". For most of these systems the interaction between
versions is more complicated than a simple revision number, you would
also need to make sure that the external repository is always merged
(some DSCM let you have multiple heads e.g. Mercurial).

Christian Hammond

unread,
Jun 19, 2007, 4:39:17 PM6/19/07
to revie...@googlegroups.com
The reason I said it won't work with every SCM Tool is because not every SCM tool's diff output contains revision information. It simply can't work without that information. Because of this, a special tool would need to be provided in order to insert this information when creating the diff.

Fetching the head revision is only a guess, and it may be a bad one. Far too often, you'll get conflicts, requiring a rebuild of the diff (and in busy projects, this will be like playing cat and mouse as you try to get a valid diff generated and uploaded in time for it to work). It's conceivable that you may even reach points where it accepts the diff but due to some changes, the diff appears to be rearranging code in weird ways, adding lines that shouldn't be there or deleting lines that now should. Of course, in most cases you'll just get a conflict.

Now maybe we can indeed support a lot of these revisionless diffs. One way is to use the date associated with the source file (they should at least provide a date if they don't provide a revision) and query the tool for the file last modified around that date, but I don't know if all tools do this and whether you'd get accurate results.

I agree that it would be nice to support all diffs without special tools, but I just don't know that it's possible. Smart Bear Software has been doing this kind of stuff for a long time, and I don't think they've even worked out a good solution that works for all tools.

Christian

Christian Hammond

unread,
Jun 19, 2007, 4:41:25 PM6/19/07
to revie...@googlegroups.com
Sure, I completely agree with this. I'm thinking more in terms of how we handle SCMs whose tools generate diffs without the information we need.

Christian

Christian Hammond

unread,
Jun 19, 2007, 4:43:24 PM6/19/07
to revie...@googlegroups.com
One thing we talked about doing for distributed SCMs is to allow the source files to be uploaded along with the diffs so that you communication with the distributed servers isn't needed. Of course, this is problematic in the web UI, but could be simplified greatly with a standardized tool. This would also help for people using tools like quilt to maintain patchsets, as they could upload patches against their patched versions even on Perforce or SVN.

Christian

chi...@chipx86.com
VMware, Inc.

Elliot Murphy

unread,
Jun 19, 2007, 4:59:04 PM6/19/07
to revie...@googlegroups.com
On 06/19/2007 04:43 PM, Christian Hammond wrote:
> One thing we talked about doing for distributed SCMs is to allow the source
> files to be uploaded along with the diffs so that you communication with the
> distributed servers isn't needed. Of course, this is problematic in the web
> UI, but could be simplified greatly with a standardized tool. This would
> also help for people using tools like quilt to maintain patchsets, as they
> could upload patches against their patched versions even on Perforce or SVN.

For bazaar, communication with a server isn't a problem, most projects
still maintain a central "main" repo that releases are done out of. I
believe this also holds true for mercurial and git (with the linux
kernel being a possible exception).

Still, this idea sounds like it has a *lot* of potential. You can easily
get the original version of the file even with SVN, and not needing to
have access to the SCM repository on the web server sounds like a good
thing.

What would something like this look like? Upload a patch file along with
a tarball, then extract the tarball into a temp area and use the
LocalFileTool to get the original files? If we did this, would the
various SCMTool classes still be needed on the reviewboard server?

Christian Hammond

unread,
Jun 19, 2007, 5:23:50 PM6/19/07
to revie...@googlegroups.com
I should make it clear that the revision method is still going to be the primary method. We're not replacing it or requiring that files be uploaded, as that would just require more storage. Right now, due to the caching and revisions, several people can do a diff against a certain file and it will only be stored once. If we require that everyone uploads files, we'll lose that benefit.

I don't know what the implementation will be like, but I don't think we'll have any web UI exposed. It'll all be done by the tool, depending on the SCM.

Christian


Robin Ericsson

unread,
Jun 19, 2007, 5:30:53 PM6/19/07
to revie...@googlegroups.com
On 6/19/07, Christian Hammond <chi...@chipx86.com> wrote:
> Sure, I completely agree with this. I'm thinking more in terms of how we
> handle SCMs whose tools generate diffs without the information we need.

Should I proceed with my current extra method for the moment? I can't
doing anything this week anyway as my development laptop is on
vacation with my wife :-)

--
regards,
Robin

Christian Hammond

unread,
Jun 19, 2007, 5:45:13 PM6/19/07
to revie...@googlegroups.com
Sure.

So I think this thread has forked into two topics and there's confusion between the two.

1) What do we do about diffs that just don't have revision information.

2) What do we do about diffs that have revision information but requires custom parsing.

My comment you replied to is about #1, and your method is for #2, right?

Christian


On 6/19/07, Robin Ericsson < lob...@gmail.com> wrote:

Elliot Murphy

unread,
Jun 19, 2007, 6:11:46 PM6/19/07
to revie...@googlegroups.com
On 06/19/2007 05:23 PM, Christian Hammond wrote:
> I should make it clear that the revision method is still going to be the
> primary method. We're not replacing it or requiring that files be uploaded,
> as that would just require more storage.

At least not in your version of the project :-)

Seriously though, I still think this is worth experimenting with in
order to support distributed SCMs much better than the revision
approach, and review board already appears to have the concept of
exposing different things in the web ui based on which SCMtool is
configured. I'm not advocating throwing away the support for SVN and P4
that is currently working well, just trying to talk through some of the
design tradeoffs that I see rather than just going off and working on it
in isolation.

I'm curious, do you think the storage impact of requiring file uploads
would really be significant? If it is a big deal I could put together a
single-instance-store cache for the files really easily that should
address most of the storage overhead.

Another benefit of uploading the files along with the patch rather than
having the server pull the files from a repository that is set in the
reviewboard configuration is that it adapts to handling reviews on
multiple branches much more easily - say you have A, B, C branches of a
code base. If the server has to pull the original file from the SCM
repository, then you have to have an administrator configure each of
those repositories, ensure access, etc. If you upload the files along
with the patch, you can skip all that admin overhead.

With subversion, you probably don't get this overhead because svn stores
the branches within the same repository, so you specify a different
"base path" for the diff in order to indicate that your patch is
relative to a branch, to trunk, etc. Does perforce do branches the same
way as svn?

And thanks again for putting review board out there as open source for
us to use, hack on, subvert, play with, etc. It's really great to have
this tool available.

Robin Ericsson

unread,
Jun 19, 2007, 6:13:13 PM6/19/07
to revie...@googlegroups.com
On 6/19/07, Christian Hammond <chi...@chipx86.com> wrote:
> 1) What do we do about diffs that just don't have revision information.
>
> 2) What do we do about diffs that have revision information but requires
> custom parsing.
>
> My comment you replied to is about #1, and your method is for #2, right?

Yes, sorry about busting out of the discussion :)

--
regards,
Robin

Christian Hammond

unread,
Jun 19, 2007, 6:34:54 PM6/19/07
to revie...@googlegroups.com
Yep, that's what I'm saying :) We'll be using this new method for the distributed SCMs. The revision method just won't work for many common use cases with those. We could potentially have some heuristic in the post-review script for using the revision method when it makes sense, though, and then just uploading the files the rest of the time.

The branch issue is non-existent with SVN and Perforce. I'm not really sure which SCM it would have serious problems with. In most, if you have access to HEAD, you have access to branches, and it's just a matter of passing flags or forming the right path in the repository.

Christian


On 6/19/07, Elliot Murphy <elliot...@gmail.com> wrote:



--
Reply all
Reply to author
Forward
0 new messages