Avoiding big branches

Aaron Meurer

unread,

Jul 25, 2011, 6:24:41 PM7/25/11

to sy...@googlegroups.com

Hi everyone.

I wrote a blog post about this
(http://asmeurersympy.wordpress.com/2011/07/25/merging-integration3-with-sympy-0-7-0-nightmare/),
but I think it's important enough that it should be discussed here.

You can read my blog post for the details, but basically I spent the
past three weeks on and off merging master into my integration3
development branch. I'm not yet ready to submit this for review: I
just did the merge so that I could get some upstream fixes. But, for
whatever reason, git decided that I needed to review basically every
change made in master since the base of my branch.

But even other than that, I found at least two regressions in the
polys and two regressions outside the polys, had to make several
changes to my code to make it run again (like replacing as_basic()
with as_expr()), and had to manually check the diff of the merge to
make sure I handled all the merge conflicts correctly.

We talked about this a little back when we did the polys12 merge, but
I want to reiterate it. Having big branches like this is very bad.
Keeping things up to date with master is a nightmare. Furthermore,
there are bound to be regressions in master that are not noticed
because they only trigger test failures in your branch.

Therefore, I recommend that anyone who has a big branch of work to try
to get it merged in, and that in the future you try to submit your
fixes as separate small (but still meaningful) pull requests. git
makes it easy to work with several small branches; much easier than
than working with one big branch in my opinion. This is especially
true if you are like me and do not like rebasing. You can just merge
your small branches together, and the merges will be easier because
the size of the changes will be smaller. And since we basically never
rebase pull requests any more due to the merge button, you can usually
just use the exact same commits.

But even if your git habits are more rebase oriented, you should still
do this, because rebases can be just as painful as merges for big
branches. And regardless of your merging habits, you still have the
same scenario where someone changes something in master and makes
fixes everywhere where necessary, but this does not include your code,
because it's separate from master. For example, we recently changed
the printer to use lexicographic ordering by default. So all the
doctests in SymPy had to be updated. Anyone who had a development
branch had to change his own doctests when he merged/rebased over this
change. But if those dev changes were merged with master, they would
have been fixed by Mateusz when he changed all the docstrings in
SymPy.

I've noticed that Mateusz has stopped using the big branch model.
Rather, he submits all changes as pull requests to master. And so far
we have not had anything near what we had with polys12. The changes
are all reviewed (this is another problem with big branches, is that
they are harder to review), merge conflicts are minimal, since only
specific pull requests get merge conflicts, rather than a whole big
branch, and it's easier to understand what is being done with each
pull request.

Sure, some of his requests haven't been merged yet, but actually,
because of those that *have* been merged, it's more than if he had a
polys13 with everything in it and none of it merged with master. And
like I said, with git, it's very easy to take in the changes you need
locally if they haven't been merged yet.

Other experienced devs could probably also share their experiences
with this sort of thing.

I want to get this message across especially to our GSoC students, who
are the ones making the most changes right now, and also may not
remember all the polys12 stuff enough to see the problems that were
shown with it.

Some of the GSoC students are doing a good job of submitting smaller
changes back as pull requests, especially atomic changes that do not
require the rest of their work to be merged. Others are not doing so
well, I think. I could go through and tell you each how well you are
doing more specifically if you want.

Even if you have development work that is not ready for user-level
interaction, you should still get this merged with master. Then
people will notice regressions against your code when your tests fail,
and if you allow some user interaction, for example, by turning if off
by default or by marking the module as unstable, some people will use
it, and find bugs in it for you. You may even get some people
submitting patches fixing your code. People do this with code that's
in master, but very rarely with code that's buried in a development
branch.

As for my branch, I still can't merge it because the code relies
pretty heavily on a regression I had to make, which was basically to
disable algebraic substitution in exp.subs. So I am going to put
priority on fixing subs (see
http://groups.google.com/group/sympy/browse_thread/thread/4a19d0f39f51fda6#)
over any additional improvements to the Risch algorithm. Then, when
this is fixed, I will submit my branch for review, and any additional
fixes I make, no matter how small, I will submit immediately as pull
requests, rather than storing them up in some dev branch.

Aaron Meurer

Ondřej Čertík

unread,

Jul 26, 2011, 2:38:48 PM7/26/11

to sy...@googlegroups.com

On Mon, Jul 25, 2011 at 3:24 PM, Aaron Meurer <asme...@gmail.com> wrote:
> Hi everyone.

Can you describe what exactly you did with your branch? I.e.:

1) forked sympy in summer 2010, kept adding patches
2) merged with polys1 branch
3) polys12 (incompatible with polys1) merged to master
4) merged with master (thus *conflicting* with polys1, merged to your branch)

I think, that maybe all (?) your troubles were caused by merging the
polys1 with polys12, which were known to be incompatible, because
polys12 was rebased. In other words, maybe this is just a
manifestation of "never rebase, only merge", if people are using your
branch.

I think that as long as the only thing that you do is "merge", things
should be much more easier. I have personally worked with some larger
branches too (~250 commits easily), and my experience so far is that
as long as there is no "double merge" (like polys1 and polys12),
things are quite simple. My longest rebase was I think 1h of work.

Ondrej

Aaron Meurer

unread,

Jul 26, 2011, 4:48:00 PM7/26/11

to sy...@googlegroups.com

Basically, it was like this:

Mateusz had his polys branch, which had all his polys fixes. Every
time Mateusz wanted to be up to date with master, he rebased, and
created a new branch, polys2, polys3, etc.

My branch from the start, integration, was based on one of these polys
branches. Then, when I wanted an update from one of the polys
branches, I would rebase over the newest polysn and renumber my
branch, integration2, and then integration3. But at this point, I
realized that rebasing is bad, and I should really not be doing it.

So from that point on, I have never rebased the branch. This is why
my branch is called integration3 instead of just integration, btw.

Probably if I had been rebasing over master instead of merging, this
would have been easier. But it would have been a false sense of easy.
This is because only half of the problems came from merge conflicts.
Other things were changes in the code. For example, as_basic was
renamed to as_expr. If I had rebased my branch over maser, not a
single commit in my history would have worked any more, because they
all use as_basic. So my choices would have been:

1. Go back and fix each commit in the history so that the tests pass.
This would have been even *harder* than the merge, I think.

2. Have a branch were bisecting is basically impossible.

Another problem with rebasing is that sometimes changes are rebased
out, so you are left with commits that no longer change what they say
that change. I have commits like this in my branch, from when I was
rebasing. For example,
https://github.com/asmeurer/sympy/commit/4fbc80c12aae45f3a95e05740b62c4d546ada337,
which is from my original integration branch, does what it says, it
adds a check for the Python version. But this was backported to master
since my integration3 rebase, so now the corresponding commit
2ffedb50a3e0a4ae274d117ef87e09834ff8c98b looks confusing, because the
part that does what the commit message says was rebased out. We also
see the problems of rebasing from the original polys merge (not
polys12, but the very first new polys branch). There are a bunch of
commits near the beginning of that branch for which sympy does not
import; you may have noticed these if you've tried bisecting back that
far. No doubt these worked originally, but the rebase broke them
somehow.

This is why I believe you when you say your longest *rebase* was 1h or
work, because rebasing is easier in the sense that it hides these
problems from you. How long would it have taken if you had done the
right thing and merged instead?

So yes, I think if Mateusz had only merged with polys and I did the
same with integration, then this would have been easier. As it is,
whenever my branch gets in, there will be copies of commits, I think
up to three or more in some places, because of the different versions
of the polys that have been merged into my branch.

On the other hand, even if I had totally screwed this merge up, I can
still do git checkout HEAD~1 and get *exactly* the same commit that I
had before the merge, and everything will still work exactly as it did
before.

I think that new git users have to learn how to rebase, because they
don't know good commit habits and will have to go back and rewrite
their history anyway. But once you learn good commit habits, and
never have to squash any more or go back and rewrite bad commit
messages, I think you should treat your commits as immutable; once
they are made, you should never change them, and if you want to get
new changes, you should only merge. Especially for large branches
(for one commit branches, rebasing can be OK).

This is another thing for you GSoC students to start considering.
Once you consider yourselves "git gurus" (some of you are there and
some of you aren't yet), you should consider to stop rebasing. Anyway,
don't do it until you feel comfortable enough with git and with your
commit habits that you will never want to edit a commit after you make
it. But that actually doesn't take that long to get to that point.

Aaron Meurer

2011/7/26 Ondřej Čertík <ondrej...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
>
>

Aaron Meurer

unread,

Jul 26, 2011, 4:48:46 PM7/26/11

to sy...@googlegroups.com

Sorry, meant to give a url here:
https://github.com/asmeurer/sympy/commit/2ffedb50a3e0a4ae274d117ef87e09834ff8c98b

Aaron Meurer

Ondřej Čertík

unread,

Jul 26, 2011, 5:47:21 PM7/26/11

to sy...@googlegroups.com

My branch was messy, so I didn't want to have it "just merged". Also,
I realized, that I think at one occasion,
it took me more like 2h. Still though, not a big deal.

From what you say, I think that the conclusions are clear:

1) Mateusz should have just kept merging, and you should have kept
merging to his branch.

2) The only time rebase is allowed is on your private branch, that
nobody is using (yet).

There is also another big problem --- when reviewing pull request,
some people just rebase everything, and there is no way for me to
check what changes they actually made. I think that a better practice
is to do "the best" and submit a good solid pull request. And then
just keep adding patches. (Unless it's so screwed up, that a good nice
rebase is appropriate, for example when one is new to git, and one
should do it at the end of the review, if nobody is using that branch
yet)

Ondrej

Aaron Meurer

unread,

Jul 26, 2011, 5:50:19 PM7/26/11

to sy...@googlegroups.com

2011/7/26 Ondřej Čertík <ondrej...@gmail.com>:

Also, you loose any commit comments, though I talked to the GitHub
guys about this and they said they might fix it.

But I agree. It's hard to do a diff against non-fast-forward commits.

Aaron Meurer

Brian Granger

unread,

Jul 26, 2011, 11:16:45 PM7/26/11

to sy...@googlegroups.com

But all of the technical details about merging and rebasing aside, I
completely agree that huge branches should be avoided whenever
possible.

Cheers,

Brian

2011/7/26 Ondřej Čertík <ondrej...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
>
>

--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgra...@calpoly.edu and elli...@gmail.com

Reply all

Reply to author

Forward