git and gerrit

Jason Grout

unread,

Feb 16, 2012, 10:06:49 AM2/16/12

to sage-...@googlegroups.com

Kini is working on an SEP for moving to git for revision control. What
experience/feelings do people have for gerrit [1]?

Gerrit is used for Android and Eclipse, for example.

Thanks,

Jason

[1] http://code.google.com/p/gerrit/

Talk on gerrit: http://vimeo.com/23609339
Short Tutorial on gerrit/jenkins: http://vimeo.com/20084957
An old post about gerrit:
http://unethicalblogger.com/2009/12/07/code-review-with-gerrit-a-mostly-visual-guide.html

Keshav Kini

unread,

Feb 16, 2012, 11:31:30 AM2/16/12

to sage-...@googlegroups.com

Jason Grout <jason...@creativetrax.com> writes::

> Kini is working on an SEP for moving to git for revision control.
> What experience/feelings do people have for gerrit [1]?

After looking into it a bit, I've found some things that I really don't
like, for example the following admonition from `a wiki page about
gerrit workflows`_::

> If the master repository has changed since you started, you should
> rebase your changes to the current state. And if you have made many
> small commits, you should squash them so that they do not show up in
> the public repository. Remember: each commit will become a change in
> Gerrit, and need to be approved separately. If you are making one
> "change" to the project, squash your many "checkpoint" commits into
> one commit for public consumption.

This strikes me as very bad. Rebasing, OK, though personally I think
merging is usually a better strategy, especially since it preserves
history more nicely. Squashing commits is definitely a bad idea, though,
as it makes future rebasing harder when you have large single commits
instead of many small commits, which is in fact one of the problems with
our current patch-based system.

.. _a wiki page about gerrit workflows:
http://wiki.openstack.org/GerritWorkflow#Long-lived_Topic_Branches

-Keshav

----
Join us in #sagemath on irc.freenode.net !

Christopher Swenson

unread,

Feb 16, 2012, 11:57:02 AM2/16/12

to sage-...@googlegroups.com

Reviewed changes should be very small so that they are easy to verify and review. Also, most changes are bugfixes or minor feature adds, so it should not create large, hard to merge commits.

If you have such a large change, you should probably be splitting into multiple reviews anyway.

I know the authors of gerrit, and would be willing to ask them a few questions if people had specific concerns.

--Christopher

--
To post to this group, send an email to sage-...@googlegroups.com
To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Jason Grout

unread,

Feb 16, 2012, 12:19:53 PM2/16/12

to sage-...@googlegroups.com

On 2/16/12 10:57 AM, Christopher Swenson wrote:
> Reviewed changes should be very small so that they are easy to verify
> and review. Also, most changes are bugfixes or minor feature adds, so it
> should not create large, hard to merge commits.
>
> If you have such a large change, you should probably be splitting into
> multiple reviews anyway.
>
> I know the authors of gerrit, and would be willing to ask them a few
> questions if people had specific concerns.

However, if I have a "topic" I'd like to commit, and all the commits
really should be reviewed as a package together, or if I'm collaborating
with others on a branch, then it is incredibly convenient to look at
patchsets at a topic-branch level (ala github) rather than the
individual commit level, like gerrit seems to be insisting. It seems
that the only way to get a topic-level view of things in gerrit is to
insist that you squash all the commits into one commit, which is usually
bad, in my opinion.

It is nice that gerrit preserves the history of each individual commit
that is reviewed. Can this history be pushed or mirrored somewhere else
(e.g., can I clone the gerrit repository and get these fake
ref/for/master branches?)

Jason

Christopher Swenson

unread,

Feb 21, 2012, 10:26:16 AM2/21/12

to sage-...@googlegroups.com

Jason,

I forwarded this thread to Shawn Pearce, the primary author of gerrit, and have included his response below.

--Christopher

---------- Forwarded message ----------
From: Shawn Pearce <s...@google.com>
Date: Tue, Feb 21, 2012 at 15:16
Subject: Re: [sage-devel] Re: git and gerrit
To: Christopher Swenson <ch...@caswenson.com>
Cc: Christopher Swenson <cswe...@google.com>

Inline below, feel free to forward this.

> ---------- Forwarded message ----------
> From: Jason Grout <jason...@creativetrax.com>
> Date: Thu, Feb 16, 2012 at 09:19
> Subject: [sage-devel] Re: git and gerrit
> To: sage-...@googlegroups.com
>
>
> However, if I have a "topic" I'd like to commit, and all the commits really
> should be reviewed as a package together, or if I'm collaborating with
> others on a branch, then it is incredibly convenient to look at patchsets at
> a topic-branch level (ala github) rather than the individual commit level,
> like gerrit seems to be insisting.

Currently the topic branch author can include a topic name when
pushing their changes to Gerrit, e.g. "git push review
work:refs/for/master/my-topic" will upload the changes for the master
branch, with the topic label of my-topic. Reviews can click on the
"my-topic" label, or use a search query like "status:open
topic:my-topic" to find open changes with this label and review them.

Kitware has been working on topic branch support, where a topic branch
is a more fundamental concept in Gerrit that can be reviewed and
submitted as a unit. Unfortunately this work is still in progress and
has not made it to a release version yet.

> It seems that the only way to get a
> topic-level view of things in gerrit is to insist that you squash all the
> commits into one commit, which is usually bad, in my opinion.

If the commits are going to be preserved in history, they should be
reviewed individually to ensure each commit is consistent on its own.
If it isn't, it shouldn't be in the history, and should therefore be
squashed together with something else, or removed altogether.
Reviewing the topic branch as a diff of its tip commit against the
common ancestor doesn't really provide a way to review each commit. So
it hasn't really been supported by Gerrit.

If you really need the overall diff in order to get a sense of what
the change is, you can fetch the tip commit from Gerrit and then run
`git diff` on your workstation to glance over the entire diff. Yes you
lose the ability to comment on individual chunks of code. But we (the
Gerrit community) stand behind the notion that if a commit is
worthwhile to keep forever as part of the project history, then it is
worthwhile to review on its own, and it must stand on its own to
justify being included in the project. Thus, this topic branch overall
diff isn't as useful as it sounds.

> It is nice that gerrit preserves the history of each individual commit that
> is reviewed. Can this history be pushed or mirrored somewhere else (e.g.,
> can I clone the gerrit repository and get these fake ref/for/master
> branches?)

Yes. Each change is given its own refs/changes/... branch name in the
Gerrit repository. These can be mirrored elsewhere using stock Git
tools, or the replication feature that is built into Gerrit. Users can
see what branch name was assigned by looking at a patch set on the
change page in the web UI. For each patch set Gerrit generates a
number of Git commands that can be used to fetch the change using this
unique branch name, and then perform common actions against it (e.g.
checkout, pull, cherry-pick, diff).

Jason Grout

unread,

Feb 21, 2012, 10:38:13 AM2/21/12

to sage-...@googlegroups.com

On 2/21/12 9:26 AM, Christopher Swenson wrote:
> Jason,
>
> I forwarded this thread to Shawn Pearce, the primary author of gerrit,
> and have included his response below.

Cool! Thanks!

I'll have to think more about whether I agree with the "each commit
should stand on its own" philosophy mentioned below. It certainly makes
bisection easier. But sometimes huge patches make me think that each
*branch* should stand on its own, and be merged with no fast-forward.
Then it's much easier to tease apart logically separate, but
interrelated, sets of changes. Maybe all that is going on here is
miscommunication about what "stand on its own" means. Does it mean that
the software will run after each commit? Or does it mean that each
commit represents a single logical change, though it may temporarily put
the software into an unreliable or error-ridden state?

Jason

Christopher Swenson

unread,

Feb 21, 2012, 10:43:52 AM2/21/12

to sage-...@googlegroups.com

Well, I believe that gerrit was designed with the Google philosophy in mind, so I would assume that this would mean that the software must run after each commit that is mailed out. (This is sort of equivalent to the way patches work now, as they cannot be committed unless all tests pass.)

There are some exceptions, naturally: when large, new sections of code are being developed, it is not unheard of to accept commits that don't actually work.

--Christopher

Jason

--
To post to this group, send an email to sage-...@googlegroups.com

To unsubscribe from this group, send an email to sage-devel+unsubscribe@googlegroups.com

Jason Grout

unread,

Feb 21, 2012, 10:48:24 AM2/21/12

to sage-...@googlegroups.com

On 2/21/12 9:43 AM, Christopher Swenson wrote:
> Well, I believe that gerrit was designed with the Google philosophy in
> mind, so I would assume that this would mean that the software must run
> after each commit that is mailed out. (This is sort of equivalent to
> the way patches work now, as they cannot be committed unless all tests
> pass.)
>
> There are some exceptions, naturally: when large, new sections of code
> are being developed, it is not unheard of to accept commits that don't
> actually work.

Thanks. That sheds light on the subject.

Jason

Robert Bradshaw

unread,

Feb 21, 2012, 1:19:01 PM2/21/12

to sage-...@googlegroups.com

On Tue, Feb 21, 2012 at 7:43 AM, Christopher Swenson
<ch...@caswenson.com> wrote:
> Well, I believe that gerrit was designed with the Google philosophy in mind,
> so I would assume that this would mean that the software must run after each
> commit that is mailed out. (This is sort of equivalent to the way patches
> work now, as they cannot be committed unless all tests pass.)
>
> There are some exceptions, naturally: when large, new sections of code are
> being developed, it is not unheard of to accept commits that don't actually
> work.

It gets a bit fuzzier when there are multiple authors on a single
feature, spread out through time. I think this is more common for a
project like Sage (lots of very-part-time contributors). It can also
be handy to spread out history into smaller or logical chunks (e.g.
big automatic renames/file moves, and then manual fixups, where the
former might not give a consistant state but is easy to review because
it's just the result of a sed script). I do, however, like the model
of iteratively working on a feature, including incorporating peer
feedback and fixes, adn then squashing the history down into a single
coherent commit at the end. Perhaps we could require that every state
of the main line be consistant, and if it's worthwhile to preserve
inconsistent history we merge it as a branch rather than rebasing.

- Robert

Jason Grout

unread,

Feb 21, 2012, 1:24:03 PM2/21/12

to sage-...@googlegroups.com

On 2/21/12 12:19 PM, Robert Bradshaw wrote:
> Perhaps we could require that every state
> of the main line be consistant, and if it's worthwhile to preserve
> inconsistent history we merge it as a branch rather than rebasing.

That seems very reasonable to me.

Thanks,

Jason

Fernando Perez

unread,

Feb 21, 2012, 1:52:42 PM2/21/12

to sage-...@googlegroups.com

On Tue, Feb 21, 2012 at 7:38 AM, Jason Grout
<jason...@creativetrax.com> wrote:
> I'll have to think more about whether I agree with the "each commit should
> stand on its own" philosophy mentioned below. It certainly makes bisection
> easier. But sometimes huge patches make me think that each *branch* should
> stand on its own, and be merged with no fast-forward. Then it's much easier
> to tease apart logically separate, but interrelated, sets of changes.

Many thanks for this super useful discussion! From my own experience
with github's branch-based review model, I lean strongly towards the
view you summarize above. I prefer to let authors make very liberal
use of the commit abilities in git and review with the branch as the
'atomic unit' I worry about, giving only secondary consideration to
individual commits.

Obviously if the commit history is a complete mess we may require that
it gets rebased and cleaned up, but in general we tend to be
relatively lax with how the individual commits stack up, as long as
the entire branch diff is sensible and easy to analyze on its own.

From our own experience in ipython with github and how unbelievably
fluid this has made the process both for new contributors and for
reviewers, I think I really prefer this over the view that gerrit
seems to propose. But it's been very enlightening to read their
rationale, and I certainly see it as a coherent perspective one can
choose to adopt.

Cheers,

f

Christopher Swenson

unread,

Feb 21, 2012, 2:03:10 PM2/21/12

to sage-...@googlegroups.com

I agree with a lot of what was just said.

My only possible issue with the github workflow: I'm not sure how it interacts with having multiple people who have control of the "master" (central) repo. When a pull request comes in, can anyone who has push access to the repo take control of that pull request?

Also, in my vastly finite experience with github, I've had problems with the fact that pull requests seem to be immutable: once a request has been created, it doesn't seem to be easy to add more commits to it, or change the commits. (This may just be my naïvety.)

But other than, I think github does have a great model. Is github on the table for a future platform to move to?

I think gerrit is definitely a heavy-handed approach to reviews that seems to work better with larger products (like Android), since you do sort of have to adapt to its style of reviews. And I don't know if it is the right tool for Sage (not sure if anything is), but it is certainly worth looking at.

--Christopher

f

Jason Grout

unread,

Feb 21, 2012, 2:16:04 PM2/21/12

to sage-...@googlegroups.com

On 2/21/12 1:03 PM, Christopher Swenson wrote:
> I agree with a lot of what was just said.
>
> My only possible issue with the github workflow: I'm not sure how it
> interacts with having multiple people who have control of the "master"
> (central) repo. When a pull request comes in, can anyone who has push
> access to the repo take control of that pull request?

Yes. A pull request is just a remote branch. Anyone with commit access
to the repository can merge the remote branch and push to the github
repository.

>
> Also, in my vastly finite experience with github, I've had problems with
> the fact that pull requests seem to be immutable: once a request has
> been created, it doesn't seem to be easy to add more commits to it, or

> change the commits. (This may just be my na�vety.)

Again, a pull request is just a remote branch. If I have submitted a
pull request, that just means that I've submitted a certain branch for
merging. If I push more commits to that branch, they automatically show
up in the pull request. In fact, I can rebase the branch and push -f
the branch and the new commits will overwrite whatever was there.

One thing I like about github is that it is so close to the underlying
git philosophy, so that once you understand git, it's easy to understand
github.

> But other than, I think github does have a great model. Is github on the
> table for a future platform to move to?

Yes. We've already moved the Sage notebook and the Sage Cell server to
github.

>
> I think gerrit is definitely a heavy-handed approach to reviews that
> seems to work better with larger products (like Android), since you do
> sort of have to adapt to its style of reviews. And I don't know if it
> is the right tool for Sage (not sure if anything is), but it is
> certainly worth looking at.
>

Thanks again for your help with understanding gerrit better!

Jason

Fernando Perez

unread,

Feb 21, 2012, 2:17:46 PM2/21/12

to sage-...@googlegroups.com

On Tue, Feb 21, 2012 at 11:03 AM, Christopher Swenson
<ch...@caswenson.com> wrote:
> My only possible issue with the github workflow: I'm not sure how it
> interacts with having multiple people who have control of the "master"
> (central) repo. When a pull request comes in, can anyone who has push access
> to the repo take control of that pull request?

Yes. Typically one or more core committers participate in the
discussion, and it's easy to informally choose who does the final
merge. But if you want to be more formal about it, there's an
assignee field and a project can choose to use it explicitly and
require that person to be the one who does the actual final merge.

> Also, in my vastly finite experience with github, I've had problems with the
> fact that pull requests seem to be immutable: once a request has been
> created, it doesn't seem to be easy to add more commits to it, or change the
> commits. (This may just be my naïvety.)

No, you can keep pushing to the branch you created the PR from, and
new commits show up as they are made. You can even rebase and force
push, and the PR will get rebased too. We do the first all the time,
and the second also, though less frequently.

Cheers,

f

Jason Grout

unread,

Feb 21, 2012, 2:39:51 PM2/21/12

to sage-...@googlegroups.com

On 2/21/12 1:17 PM, Fernando Perez wrote:
> Yes. Typically one or more core committers participate in the
> discussion, and it's easy to informally choose who does the final
> merge. But if you want to be more formal about it, there's an
> assignee field and a project can choose to use it explicitly and
> require that person to be the one who does the actual final merge.

Interesting. Where is this assignee field? Is it the assignee field in
the Issues? Does setting that require that the committer be the assignee?

Thanks,

Jason

Robert Bradshaw

unread,

Feb 21, 2012, 3:32:47 PM2/21/12

to sage-...@googlegroups.com

On Tue, Feb 21, 2012 at 11:17 AM, Fernando Perez <fpere...@gmail.com> wrote:
> On Tue, Feb 21, 2012 at 11:03 AM, Christopher Swenson
> <ch...@caswenson.com> wrote:
>> My only possible issue with the github workflow: I'm not sure how it
>> interacts with having multiple people who have control of the "master"
>> (central) repo. When a pull request comes in, can anyone who has push access
>> to the repo take control of that pull request?
>
> Yes. Typically one or more core committers participate in the
> discussion, and it's easy to informally choose who does the final
> merge. But if you want to be more formal about it, there's an
> assignee field and a project can choose to use it explicitly and
> require that person to be the one who does the actual final merge.

I think we'd still keep our code review process, and the release
manager would to the actual merge, but it would make submitting,
pulling, revewing, collaborating on, and merging patches much easier.

>> Also, in my vastly finite experience with github, I've had problems with the
>> fact that pull requests seem to be immutable: once a request has been
>> created, it doesn't seem to be easy to add more commits to it, or change the
>> commits. (This may just be my naïvety.)
>
> No, you can keep pushing to the branch you created the PR from, and
> new commits show up as they are made. You can even rebase and force
> push, and the PR will get rebased too. We do the first all the time,
> and the second also, though less frequently.

What happens to (inline) comments in this case?

Fernando Perez

unread,

Feb 21, 2012, 3:56:50 PM2/21/12

to sage-...@googlegroups.com

On Tue, Feb 21, 2012 at 12:32 PM, Robert Bradshaw
<robe...@math.washington.edu> wrote:
>
>> No, you can keep pushing to the branch you created the PR from, and
>> new commits show up as they are made. You can even rebase and force
>> push, and the PR will get rebased too. We do the first all the time,
>> and the second also, though less frequently.
>
> What happens to (inline) comments in this case?

I think they get preserved as static snippets in the discussion page,
but I'm not 100% sure. I know we have PRs that have gone through
this, but I wouldn't know how to find one right now to verify; if
you're really curious we could do a quick experiment with a fake PR to
test what happens, it would not take long.

Cheers,

f

Fernando Perez

unread,

Feb 21, 2012, 3:58:21 PM2/21/12

to sage-...@googlegroups.com

On Tue, Feb 21, 2012 at 11:39 AM, Jason Grout
<jason...@creativetrax.com> wrote:
>
> Interesting. Where is this assignee field? Is it the assignee field in the
> Issues? Does setting that require that the committer be the assignee?

It's in the gray bar below the title for the PR. Anyone in the team
who owns the target repo can be assignee, it's a drop-list that shows
all the committers to the target repo.

Cheers,

f

Keshav Kini

unread,

Feb 21, 2012, 6:23:54 PM2/21/12

to sage-...@googlegroups.com

Jason Grout <jason...@creativetrax.com> writes:

+1. IMO, merging into master is a more "correct" way of squashing
multiple commits into one commit (the merge commit, from the point of
view of the main trunk), since you still have the original history.

Merging into the main line (the master branch) would be equivalent to
what right now we call getting your patch merged.

Robert Bradshaw

unread,

Mar 2, 2012, 4:06:47 AM3/2/12

to sage-...@googlegroups.com

Though for a single commit, a rebase is cleaner IMHO than a merge, and
squashing commits improves the signal-to-noise ratio of the history
(and it's nice to be able to squash away stuff like typos).

Keshav Kini

unread,

Mar 2, 2012, 9:13:45 AM3/2/12

to sage-...@googlegroups.com

Robert Bradshaw <robe...@math.washington.edu> writes:
> Though for a single commit, a rebase is cleaner IMHO than a merge,

The problem with rebasing is that unless you do it manually, the actual
commits (i.e. snapshots of the codebase) do not necessarily represent code that
anyone has actually looked at. Just because a merge or rebase is
possible to resolve automatically doesn't mean that it the result is
what you intended. With merging, you isolate this badness in the merge
commit, and can commit on top of your merge to fix any incompatibilities
between your original code and the current state of the other branch:

X --------------- Y --- Z
\ \
A --- B --- C --- D --- M --- E

Changes made in A through D do not directly conflict with changes made
in Y and Z, but the changes in Y and Z make the code added in A through
D perform differently from expected. The change in E fixes this. Every
commit here except M represents a state of the codebase that was
considered coherent and correct by whoever authored the commit.

Now in the case of rebasing, you get this:

X --- Y --- Z
\ \
\ A'--- B'--- C'--- D' --- E
\
A - - B - - C - - D <- - (these will disappear)

As before, changes made in A through D do not directly conflict with
changes made in Y and Z, so the user rebases A through D to A' through
D' automatically. But now, none of A' through D' represent states of the
codebase that were considered coherent and correct by the people who
made A through D respectively, even though A' through D' bear the names
of those people in their headers.

Furthermore, once any problems in the interaction between Y or Z and A
through D become apparent, they are recorded in E, but these changes are
already several nodes away from Z, and so appear out of their natural
context.

Worse, A through D will either hang around in other people's
repositories, confusing matters, or, if you were wise and didn't share
commits that you later rebased, they have disappeared and, once garbage
collected, are not recoverable.

I think the above is still true when A through D is actually A through A
(i.e. a single commit).

> and
> squashing commits improves the signal-to-noise ratio of the history
> (and it's nice to be able to squash away stuff like typos).

You can `git commit --amend` away stuff like typos, but squashing is bad
for various reasons. The obvious ones are preserving history, observing
developer intent in context, logical breakdown of large code changes,
etc., but another important one is that merges can be done automatically
more easily if commits are small.

If we follow IPython's lead and never commit directly to the master
branch, only merging into it, then `git log --merges` basically gives
you your squashed commits without actually destroying the originals.

Fernando Perez

unread,

Mar 2, 2012, 10:10:04 AM3/2/12

to sage-...@googlegroups.com

On Fri, Mar 2, 2012 at 1:06 AM, Robert Bradshaw
<robe...@math.washington.edu> wrote:
> Though for a single commit, a rebase is cleaner IMHO than a merge, and
> squashing commits improves the signal-to-noise ratio of the history
> (and it's nice to be able to squash away stuff like typos).

That was my original position, but I recently changed my mind after
discussions with Min Ragan-Kelley and Aaron Smeurer (sympy lead). The
main reason (in addition to Keshav's valid arguments) was to have the
merge commit serve as an indicator of the fact that there was review,
and who did the review, straight into the log. Seen in that way, the
merge commit is signal, not noise.

More details behind my reasoning here:

http://mail.scipy.org/pipermail/ipython-dev/2012-January/008630.html

We've made the change and we're all quite happy with the new approach.

Cheers,

f

Reply all

Reply to author

Forward