Django git guidelines

333 views
Skip to first unread message

Anssi Kääriäinen

unread,
May 18, 2012, 4:43:13 AM5/18/12
to Django developers
A heads up: I am working on Git and Github usage guidelines. There is
a ticket https://code.djangoproject.com/ticket/18307, and I have a
github branch with some initial work https://github.com/akaariai/django/tree/django_git_guidelines
(or for changeset https://github.com/akaariai/django/compare/django_git_guidelines)

The guidelines in short:
- For trivial patches use pull requests directly
- For non-trivial patches, create a trac ticket, announce your work
by linking to your github branch, and when your work is ready to be
pulled in, only then do a pull request
- Aim for logically contained commits, commit messages of 50 char
summary line, 72 char paragraphs thereafter.
- When upstream has changed use git rebase instead of git pull
- When you do additional fixes to your work, use git rebase -i so
that your work still fullfills the logical commits requirement.

Lots of more details in the WIP branch. All feedback welcomed. Lets
keep the discussion of any high-level issues here on django-
developers, as the choices made impact the whole community.

The biggest issue is how we aim to use the pull requests. My take on
this is that pull requests should be only used for work ready for
committer. That is, the original author feels the work is ready, or he
doesn't know how to do anything more. If the pull requests are used
for feature requests or work-in-progress patches we risk having lots
of open tickets and lots of open pull requests.

I have tried to gather pieces of wisdom around the net. I am not too
experienced with Git, so if you have experience with the above
mentioned and/or other Git workflows feedback is appreciated.

- Anssi

Andrei Antoukh

unread,
May 18, 2012, 4:50:39 AM5/18/12
to django-d...@googlegroups.com
2012/5/18 Anssi Kääriäinen <anssi.ka...@thl.fi>
Looks excellent. Great work! 

--
Andrei Antoukh - <ni...@niwi.be>
http://www.niwi.be/page/about/

"Linux is for people who hate Windows, BSD is for people who love UNIX"
"Social Engineer -> Because there is no patch for human stupidity"

Calvin Spealman

unread,
May 18, 2012, 6:44:42 AM5/18/12
to django-d...@googlegroups.com


On May 18, 2012 4:43 AM, "Anssi Kääriäinen" <anssi.ka...@thl.fi> wrote:
>
> A heads up: I am working on Git and Github usage guidelines. There is
> a ticket https://code.djangoproject.com/ticket/18307, and I have a
> github branch with some initial work https://github.com/akaariai/django/tree/django_git_guidelines
> (or for changeset https://github.com/akaariai/django/compare/django_git_guidelines)
>
> The guidelines in short:
>  - For trivial patches use pull requests directly
>  - For non-trivial patches, create a trac ticket, announce your work
> by linking to your github branch, and when your work is ready to be
> pulled in, only then do a pull request
>  - Aim for logically contained commits, commit messages of 50 char
> summary line, 72 char paragraphs thereafter.
>  - When upstream has changed use git rebase instead of git pull

A rebase workflow has always been problematic, in my experience. It will cause previously published commits to be rewritten, changing their hashes, and invalidating other branches, forks, or installations depending on them.

I also think rebasing can seriously devalue the history of a branch, by eradicating the context in which changes had been made.

Even the got developers will recommend not rebasing public commits. I think we should follow that advice.

>  - When you do additional fixes to your work, use git rebase -i so
> that your work still fullfills the logical commits requirement.
>
> Lots of more details in the WIP branch. All feedback welcomed. Lets
> keep the discussion of any high-level issues here on django-
> developers, as the choices made impact the whole community.
>
> The biggest issue is how we aim to use the pull requests. My take on
> this is that pull requests should be only used for work ready for
> committer. That is, the original author feels the work is ready, or he
> doesn't know how to do anything more. If the pull requests are used
> for feature requests or work-in-progress patches we risk having lots
> of open tickets and lots of open pull requests.
>
> I have tried to gather pieces of wisdom around the net. I am not too
> experienced with Git, so if you have experience with the above
> mentioned and/or other Git workflows feedback is appreciated.
>
> - Anssi
>

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>

Anssi Kääriäinen

unread,
May 18, 2012, 7:38:38 AM5/18/12
to Django developers
On May 18, 1:44 pm, Calvin Spealman <ironfro...@gmail.com> wrote:
> >  - When upstream has changed use git rebase instead of git pull
>
> A rebase workflow has always been problematic, in my experience. It will
> cause previously published commits to be rewritten, changing their hashes,
> and invalidating other branches, forks, or installations depending on them.
>
> I also think rebasing can seriously devalue the history of a branch, by
> eradicating the context in which changes had been made.
>
> Even the got developers will recommend not rebasing public commits. I think
> we should follow that advice.

The idea is that when working on a topic branch, use git rebase. The
published branches (django/django/master, django/django/stable/1.4.x)
will never get rewritten history (except for truly exceptional
situations). The viewpoint is that the github branches people are
working on are considered topic branches, and thus nobody should base
their work on these branches.

The take above is that topic branches published on github are nothing
more than a way to publish a patch set. They are _not intended for
others to work upon_. The main problem I have with assuming the github
branches should never be rebased is this:
1. A developer publishes some work on github
2. A reviewer spots some minor issues (typos, whitespace issues,
pep-8 issues).
x.
4. Committer merges the final work into master branch.

Now, the issue is what should the original developer do in step x.? He
could do the fixes and add a commit. However, we do not want (possibly
multiple) extraneous "whitespace cleanup" commits in there. So, the
branch needs to be rebased. Other option is that the committer does a
squash merge of the work. The clearest example why we can't just merge
the full history is that there might be commits which break Django
totally and these commits must not be merged to master.

From Git user manual (http://schacon.github.com/git/user-
manual.html#problems-With-rewriting-history):
"""
You may still choose to publish branches whose history is rewritten,
and it may be useful for others to be able to fetch those branches in
order to examine or test them, but they should not attempt to pull
such branches into their own work.

For true distributed development that supports proper merging,
published branches should never be rewritten.
"""

So, the issue really is what is the take on those topic branches. Are
they public as in other should base their work upon the branch or not.
I think most of the branches are there just for easy code review, and
these branches should use the rebase workflow.

- Anssi

Anssi Kääriäinen

unread,
May 18, 2012, 10:24:03 AM5/18/12
to Django developers
On May 18, 11:43 am, Anssi Kääriäinen <anssi.kaariai...@thl.fi> wrote:
> A heads up: I am working on Git and Github usage guidelines. There is
> a tickethttps://code.djangoproject.com/ticket/18307, and I have a
> github branch with some initial workhttps://github.com/akaariai/django/tree/django_git_guidelines
> (or for changesethttps://github.com/akaariai/django/compare/django_git_guidelines)

While looking at the pull requests the biggest obstacle to pulling the
changes in seems to be the commit messages, or having multiple commits
where there should be only one. The second issue can be resolved with
git rebase -i. For the first issue, here are the commit guidelines.
Message format is:

"""
Fixed #nnnnn -- Made sure foo does bar consistently

The summary line (the line above) should be less than 50 chars. It
should be prefixed with "Fixed #nnnnn -- " where #nnnnn is the Trac
ticket number. If the commit just references some ticket, use Refs
#nnnnn in the message body.

If the commit targets some other branch than the master branch, it
should be prefixed with the branch name ([1.4.x] for example). So,
the above would be:
[1.4.x] Fixed #nnnnn -- Made sure foo does bar consistently

The message body should be composed of lines no longer than 72 chars
each. Each paragraph should be separated from each other by a newline.

The line length restrictions aren't hard ones. In particular, the
summary line will often be longer than 50 chars, as the prefix part
can be almost half of the 50 chars already.
"""

More details in the message is better than less. The commit message
should be in past tense.

It might be we need to revisit the summary line format (24 chars used
for just the prefix part above). Also, the tense format can be a
little annoying as git doesn't use that for its revert and merge
messages.

Still, at this point aim to use the above format. If you do, you give
committers a chance to directly merge your commits, which makes
merging the trivial patches much easier. In the pull requests you
should mention if you have ran any tests. Having "passes all tests on
sqlite3" is a good signal for a committer. Always remember to mention
if you think there is still some work to be done.

You can edit the already published commit message with (git rebase -i
if you have multiple commits to squash or edit) or git commit --amend.
When finished, push the work to your topic branch with git push -f
origin topic_branch. Note that you should use the commit --amend and
push -f only if you do not intend others to base work on the
topic_branch.

- Anssi

Carl Meyer

unread,
May 18, 2012, 11:04:12 AM5/18/12
to django-d...@googlegroups.com
Hi Anssi,

Thanks for working on git usage guidelines! I very much agree that a
pull request should only be created when the contributor considers the
branch finished and ready for review and merge (for instance, there
should never be a pull request created without the necessary docs and
tests). Having lots of half-finished pull requests in the queue is a
burden on everyone.

On 05/18/2012 05:38 AM, Anssi K��ri�inen wrote:
> The take above is that topic branches published on github are nothing
> more than a way to publish a patch set. They are _not intended for
> others to work upon_. The main problem I have with assuming the github
> branches should never be rebased is this:
> 1. A developer publishes some work on github
> 2. A reviewer spots some minor issues (typos, whitespace issues,
> pep-8 issues).
> x.
> 4. Committer merges the final work into master branch.
>
> Now, the issue is what should the original developer do in step x.? He
> could do the fixes and add a commit. However, we do not want (possibly
> multiple) extraneous "whitespace cleanup" commits in there. So, the
> branch needs to be rebased. Other option is that the committer does a
> squash merge of the work. The clearest example why we can't just merge
> the full history is that there might be commits which break Django
> totally and these commits must not be merged to master.

I think you've correctly identified the two options here: rebase or
squashed merge.* I'm not sure it's clear that rebase is the better
choice of those two.

Advantages of rebase:

* The contributor gets their own name on the final commit to master.

* Less work for committers, more work for contributors (arguable whether
this is an advantage or disadvantage).

Advantages of squashed merge:

* Line-by-line code comments on the pull request (and in general,
in-context history of the development of the pull request) are not lost
every time the contributor rebases and force-pushes.

* We don't ask every contributor to Django to learn advanced
history-rewriting features that some find difficult to use, and that
lead to more risk of data loss (or at least perceived data loss for
users who aren't familiar with the magic of the ref-log). Only core
committers need to use an advanced feature of git.

All in all, I would be -1 on making rebase a required part of
contributing to Django. I would prefer to leave this choice up to the
contributor, and say something like:

"If you want your own name on the final commit in master, feel free to
rebase your pull request down to a single commit with a well-formatted
commit message. Otherwise, we will 'merge --squash' your pull request
into a single commit on master, and credit you in the commit message."

Carl


*The third option, just doing regular merges of multiple-commit pull
requests, I don't think is nearly as bad as you make it out to be:
master would never actually be in a broken state for anyone tracking it,
even if there were broken halfway states in the merged branch, and it is
possible to track the "main line" of development back through each merge
commit, without ever touching any broken or in-progress commits. But
those in-progress commits would still be in the repo, and would be shown
by default by "git log" and e.g. the github commit list - I agree that
this makes it harder to quickly scan meaningful changes in the history,
and is not preferable.

Donald Stufft

unread,
May 18, 2012, 11:08:55 AM5/18/12
to django-d...@googlegroups.com
On Friday, May 18, 2012 at 11:04 AM, Carl Meyer wrote:
Hi Anssi,

Thanks for working on git usage guidelines! I very much agree that a
pull request should only be created when the contributor considers the
branch finished and ready for review and merge (for instance, there
should never be a pull request created without the necessary docs and
tests). Having lots of half-finished pull requests in the queue is a
burden on everyone.

I personally prefer doing normal merges with --no-ff. While "clean up whitespace"
commits are extraneous, they don't particularly hurt anything. If an incoming pull
request is particularly messy it's easy enough to say that the pull request is
sound in theory/implementation but that they need to rebase it to clean up
the history. 

Anssi Kääriäinen

unread,
May 18, 2012, 12:30:19 PM5/18/12
to Django developers
On May 18, 6:08 pm, Donald Stufft <donald.stu...@gmail.com> wrote:
> I personally prefer doing normal merges with --no-ff. While "clean up whitespace"
> commits are extraneous, they don't particularly hurt anything. If an incoming pull
> request is particularly messy it's easy enough to say that the pull request is
> sound in theory/implementation but that they need to rebase it to clean up
> the history.

While the white space commits aren't that serious, there are a couple
of issues which need rebasing:
- If we aim to have well formatted commit messages, any bad commit
messages must be rewritten by changing history. Git ensures there
isn't any other way.
- I believe merging in broken states (code doesn't compile etc) will
make bisecting much harder. I am not sure of this...
- I don't find it particularly good idea to have 10 lines patches
come in in 5 commits when just a single one is required. If you looks
at the pull requests, you will see this is not far fetched.


- Anssi

Donald Stufft

unread,
May 18, 2012, 12:48:16 PM5/18/12
to django-d...@googlegroups.com
I think here is where it's going to come down to judgement calls. If their is an
undue amount of extraneous commits for small patches then sure either rebasing
or squashing probably makes sense. (Or rejecting telling the author to clean up their history).

Commits broken for reasons other then what you are bisecting for make bisect a tad bit harder
but not terribly so, basically you'd just take a peek at the log and switch to a nearby commit and
test that one instead. 


- Anssi

Alex Ogier

unread,
May 18, 2012, 1:40:41 PM5/18/12
to django-d...@googlegroups.com
I am +1 on merge --squash. The reason is that there is tremendous
value to having a mostly linear mainline development branch,
especially for one so widely depended on as Django's. My feeling is
that we should aim to have the official branches consist of only those
commits that we would feel comfortable checking out and developing
with. It is much nicer to run "git log" and see a series of focused
bug fix patches than it is to see a tangle of "Fix the reticulating
splines by frobbing the axis" "Oops, the axis was already fixed,
revert that commit" "Do the real work in hobnob.py".

In addition to making automated git bisects possible, it also lowers
the bar for contributions: so long as the code changes in a pull
request are sound, we can accept fragmented histories and badly
formatted commit messages and put the onus on the committers to clean
up commit messages for posterity.

Best,
Alex Ogier

Donald Stufft

unread,
May 18, 2012, 1:43:54 PM5/18/12
to django-d...@googlegroups.com
On Friday, May 18, 2012 at 1:40 PM, Alex Ogier wrote:
I am +1 on merge --squash. The reason is that there is tremendous
value to having a mostly linear mainline development branch,
especially for one so widely depended on as Django's. My feeling is
that we should aim to have the official branches consist of only those
commits that we would feel comfortable checking out and developing
with. It is much nicer to run "git log" and see a series of focused
bug fix patches than it is to see a tangle of "Fix the reticulating
splines by frobbing the axis" "Oops, the axis was already fixed,
revert that commit" "Do the real work in hobnob.py".

In addition to making automated git bisects possible, it also lowers
the bar for contributions: so long as the code changes in a pull
request are sound, we can accept fragmented histories and badly
formatted commit messages and put the onus on the committers to clean
up commit messages for posterity.
In my experience, squash makes git bisect harder, instead finding the bad
commit to be a small, atomic change, you're often given a large change where
you must then determine which change out of the entire commit caused the issue. 

Anssi Kääriäinen

unread,
May 18, 2012, 3:10:12 PM5/18/12
to Django developers
On May 18, 8:40 pm, Alex Ogier <alex.og...@gmail.com> wrote:
> I am +1 on merge --squash. The reason is that there is tremendous
> value to having a mostly linear mainline development branch,
> especially for one so widely depended on as Django's. My feeling is
> that we should aim to have the official branches consist of only those
> commits that we would feel comfortable checking out and developing
> with. It is much nicer to run "git log" and see a series of focused
> bug fix patches than it is to see a tangle of "Fix the reticulating
> splines by frobbing the axis" "Oops, the axis was already fixed,
> revert that commit" "Do the real work in hobnob.py"
>
> In addition to making automated git bisects possible, it also lowers
> the bar for contributions: so long as the code changes in a pull
> request are sound, we can accept fragmented histories and badly
> formatted commit messages and put the onus on the committers to clean
> up commit messages for posterity.

The problem with this approach is that the committer bandwidth is
really limited. It is the bottleneck currently. The more committers
need to do per patch, the more visible this limit will be. I really
don't see the availability of contributors as a big problem - how to
get their work into Django is the problem. The amount of Trac's open
tickets with patches supports this.

I agree that requiring everybody to create perfect pull requests will
not work. But on the other hand we really should aim for a workflow
where the more active contributors are encouraged, if not required, to
create patches 100% ready for commit. The idea is that if a
contributor creates constantly high quality work, there really isn't
much difference between the core committer and such a contributor. A
committer can just pull in the contributors work with quick sanity
check (or even without one if the contributor is trusted enough).

Just to be clear: Many if not most patches will need to be eventually
merged in by some way of squashing. Still, if we do not even try to
aim for a workflow where the contributors do as much of the work as
possible, then we are throwing away part of the power of Git.
Documenting that "you don't need to worry about commit message
guidelines, or logical commits, the committer will take care of this
by squash merging your work" will ensure we will get only large
monolithic commits. The committer simply can not rework a dirty
history into logical commits, or at least not with the time the
committers have available.

There isn't a big disagreement here. It is more about the tone of the
requirements of a good pull request. We should not require perfect
pull request, but we should say that if you aim to contribute to
Django long term, you should learn to create good quality pull
requests.

BTW It is perfectly possible to have good linear commit history even
without doing any squash merges:

git checkout master
git pull upstream master
# fetch and apply the commit/commits from the pull request
curl https://github.com/django/django/pull/NN.patch | git am
# if the work isn't ready, you can do commits, rebase -i the commits
etc..
# verify the push
git push --dry-run upstream master
git push upstream master

- Anssi

Anssi Kääriäinen

unread,
May 18, 2012, 3:19:34 PM5/18/12
to Django developers
On May 18, 8:43 pm, Donald Stufft <donald.stu...@gmail.com> wrote:
> In my experience, squash makes git bisect harder, instead finding the bad
> commit to be a small, atomic change, you're often given a large change where
> you must then determine which change out of the entire commit caused the issue.

I do not want us to have single commit per pull request. Instead I
would like to see nice clean logical commits. These make bisecting as
easy as possible. So, we mostly agree here.

If you edit a file I do think it is a good idea to have two separate
commits if one is for whitespace cleanup of already existing errors in
the file, and then another for your actual changes. But there should
not be two commits if one is for your changes, and then another for
whitespace cleanup of _your changes_.

- Anssi

Ryan D Hiebert

unread,
May 18, 2012, 3:27:35 PM5/18/12
to django-d...@googlegroups.com
On May 18, 2012, at 12:19 PM, Anssi Kääriäinen wrote:
> I do not want us to have single commit per pull request. Instead I
> would like to see nice clean logical commits. These make bisecting as
> easy as possible. So, we mostly agree here.

My feeling is that we want every commit in the main tree to be a working
commit, even if its an early one in a multi-commit pull. This doesn't
eliminate the possibility of multi-commit pull requests, but it seems to
point to most small-change pull requests typically being a single commit.

I'm not entirely sure what that should mean to feature branches that get
merged in, since they can often be very large and start with commits that
don't have the feature working. Probably that they just need to make sure
that each commit along the way does not break anything outside of the new
feature.

Calvin Spealman

unread,
May 18, 2012, 3:29:33 PM5/18/12
to django-d...@googlegroups.com
On Fri, May 18, 2012 at 7:38 AM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> On May 18, 1:44 pm, Calvin Spealman <ironfro...@gmail.com> wrote:
>> >  - When upstream has changed use git rebase instead of git pull
>>
>> A rebase workflow has always been problematic, in my experience. It will
>> cause previously published commits to be rewritten, changing their hashes,
>> and invalidating other branches, forks, or installations depending on them.
>>
>> I also think rebasing can seriously devalue the history of a branch, by
>> eradicating the context in which changes had been made.
>>
>> Even the got developers will recommend not rebasing public commits. I think
>> we should follow that advice.
>
> The idea is that when working on a topic branch, use git rebase. The
> published branches (django/django/master, django/django/stable/1.4.x)
> will never get rewritten history (except for truly exceptional
> situations). The viewpoint is that the github branches people are
> working on are considered topic branches, and thus nobody should base
> their work on these branches.

You are making a lot of assumptions about the branches someone might send
via pull request. It is entirely valid, and often _preferable_ that someone is
depending on these. Maybe they were running their site on this branch,
developing
the feature to support something they needed, or simply as a way to test it
in the wild. Maybe someone else based their own work on an existing topic
branch.

Also, we can't assume that merging back to upstream is the only reason someone
may have created this branch.

> The take above is that topic branches published on github are nothing
> more than a way to publish a patch set. They are _not intended for
> others to work upon_. The main problem I have with assuming the github
> branches should never be rebased is this:
>  1. A developer publishes some work on github
>  2. A reviewer spots some minor issues (typos, whitespace issues,
> pep-8 issues).
>  x.
>  4. Committer merges the final work into master branch.
>
> Now, the issue is what should the original developer do in step x.? He
> could do the fixes and add a commit. However, we do not want (possibly
> multiple) extraneous "whitespace cleanup" commits in there. So, the
> branch needs to be rebased. Other option is that the committer does a
> squash merge of the work. The clearest example why we can't just merge
> the full history is that there might be commits which break Django
> totally and these commits must not be merged to master.

Frankly I would prefer keeping whitespace-only commits. Better than muddling
up *real* commits with changes that had nothing to do with them.

> From Git user manual (http://schacon.github.com/git/user-
> manual.html#problems-With-rewriting-history):
> """
> You may still choose to publish branches whose history is rewritten,
> and it may be useful for others to be able to fetch those branches in
> order to examine or test them, but they should not attempt to pull
> such branches into their own work.
>
> For true distributed development that supports proper merging,
> published branches should never be rewritten.
> """
>
> So, the issue really is what is the take on those topic branches. Are
> they public as in other should base their work upon the branch or not.
> I think most of the branches are there just for easy code review, and
> these branches should use the rebase workflow.
>
>  - Anssi
>
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>



--
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

Anssi Kääriäinen

unread,
May 18, 2012, 4:34:37 PM5/18/12
to Django developers
On May 18, 10:29 pm, Calvin Spealman <ironfro...@gmail.com> wrote:
> > The idea is that when working on a topic branch, use git rebase. The
> > published branches (django/django/master, django/django/stable/1.4.x)
> > will never get rewritten history (except for truly exceptional
> > situations). The viewpoint is that the github branches people are
> > working on are considered topic branches, and thus nobody should base
> > their work on these branches.
>
> You are making a lot of assumptions about the branches someone might send
> via pull request. It is entirely valid, and often _preferable_ that someone is
> depending on these. Maybe they were running their site on this branch,
> developing
> the feature to support something they needed, or simply as a way to test it
> in the wild. Maybe someone else based their own work on an existing topic
> branch.
>
> Also, we can't assume that merging back to upstream is the only reason someone
> may have created this branch.

I am talking about topic branches here. If you have a branch of "admin-
next" for example then the workflow might need to be different.

It seems the correct way to describe a Github topic branches is that
they are just another way of publishing a _patch series_. In Linux
this is done by sending the patches to LKML. We used to do this by
Trac patches (still accepted) and now by pull requests in Github. The
confusion comes from the note that "public branches should never be
rebased". The topic branches should not be viewed as public branches,
just as a way to publish patch series.

I am not sure what to do with a pull request of a branch a site is
depending upon. If the pull is not accepted into Django, how is the
depended-upon branch supposed to continue and be acceptable for
further pull requests into Django? If you merge and resolve conflicts
(by a competing approach maybe), then you will have such a messy
history it will not be pulled into Django. If you rebase, everybody
depending on the branch will need to do the same. If you revert the
rejected patch, then merge, you are again having a history which
should not be merged into Django. Am I missing something important
here?

> Frankly I would prefer keeping whitespace-only commits. Better than muddling
> up *real* commits with changes that had nothing to do with them.

I agree, seems like there has been some confusion about this. I do
think it is a good idea to have whitespace cleanup and real commits
different. But having one real commit, then whitespace cleanup to
_that commit_ doesn't serve any purpose except of confusing the commit
history.

- Anssi

Michael Mior

unread,
May 19, 2012, 10:00:53 AM5/19/12
to django-d...@googlegroups.com
Hi all,

New to this list but I saw this post and thought I would chime in with my two cents. I'm not really a Django contributor (yet) but I have a fair bit of experience using Git and Django both personally and professionally.

I can understand the hesitation for using rebase on public commits. I think rebasing after a pull request has already been issued might not be the best idea. However, I think contributors should generally be encouraged to rebase rather than pull. The reason being is that if a topic branch is closely tracking it's parent (which I assume would be a common situation) you end up with a lot of merge commits which make the history hard to follow.

If a change in a commit requires adapting due to upstream changes, I personally think it's preferable for the contributor to do the work of hiding this fact via rebase. It helps group logical changes together and simply makes the history much easier to read. For development branches, I don't see the fact that history is changing  to be a problem. (If someone is basing work off a development branch, should they not expect things to break?)

Anyway, really happy to see Django move to GitHub and looking forward to tossing some code your way in the future. :)

Cheers, 
-- Michael

Anssi Kääriäinen

unread,
May 22, 2012, 7:18:57 AM5/22/12
to Django developers
I have done some more research. This post is somewhat long, so if you
want just the conclusions, it is enough to read the last three
paragraphs.

First, some useful links:
http://kerneltrap.org/Linux/Git_Management
http://sethrobertson.github.com/GitBestPractices/#workflow

From the above, and elsewhere in the net, it is pretty easy to see
that getting the "sausage making" (see GitBestPractices link above)
steps into the core repository history is not wanted. Hiding sausage
making is just a way to say we want good logical commits, but we do
not want back-and-forth development commits into the history. Getting
the development commits into the history makes the history messier,
and it makes bisecting harder. Seeing a "final whitespace cleanup" as
the commit in git blame isn't good...

In Linux the workflow is as follows: topic branches are dealt in patch
format alone. This means that there are no public topic branches, and
this again means the patch author is free to rewrite the history at
will - and that the (subsystem) committer is free to do additional
edits to the patches when committing (I don't know if they do this -
but they could). The patches sent to the mailing lists will at least
sometimes need to go through a couple of cycles of review - tune, so
additional changes to the topic branches are usually required.

When the patch is good to go, the subsystem maintainer commits it.
After the commit, the patch is essentially part of the history of
Linux kernel already. Linus may reject it, or ask for some more
polishing, but he usually does not do this. The subsystem maintainer
is really the maintainer of that part of the kernel, and is officially
so. If for some reason revert is needed, the subsystem maintainer
simply reverts the patch, and Linux master will get the commit +
revert commit of the rejected patch.

The Git team uses a different workflow: they have public branches they
reset once in a while. The 'pu' (proposed updates) branch mentioned in
these links: http://www.kernel.org/pub/software/scm/git/docs/v1.7.2.2/gitworkflows.html,
http://stackoverflow.com/a/5713627. So, even the Git developers do
rebase / reset public branches. We do not have a "pu" branch, but one
might be a good idea in the future.

In the http://kerneltrap.org/Linux/Git_Management link Linus argues
that public branches should never be rebased - and his take on this is
very strict, this would apply to any branch that is potentially
public, and thus to github topic branches too. This would cause a
messy history to appear in the commit history of the topic branches,
and as mentioned above this is not a good idea.

For the idea of making pull requests from a branch which a site is
based on the answer is: sorry, it does not work that way. This turns
the dependency around - the branch the site is dependent on should not
dominate Django's development. Only official subsystem branches have
the property that their history is part of Django's history. The
possible tunings/reverts in the site-based branch have no business of
getting into Django's history. What the pull requester should do is
cherry pick the commits he/she wants to pull into Django, create
another branch for pull, rework the commit history into nice logical
commits and ask for pull from that branch.

It is a good question who should do the squashing - the patch author
continuously, the patch author when the patch has passed reviews, or
the committer. It seems that more often than not the committer will
need to do some final cleanup. Requiring 100% ready pull requests all
the time seems like waste of both committer and patch submitter time.
For almost all patches the squash merge workflow works well, it is
only the largest feature patches where this will lead to too large
commits. Still, to me it seems that we should encourage developers to
aim for 100% ready pull requests in the hope that some contributors
are willing to learn to create such pulls.

Everything I have found suggests that rebased topic branches are a
good idea, and these branches should hide the internal development
stages from public view. If they need to be held non-public just to
make sure they are not a base for other peoples work, then it means we
need to deal with patches instead of github branches and pull
requests. I think that is a bad trade-off. However, there are good
reasons to not rebase the work continuously (collaborative effort for
example).

So, lets aim for a compromise solution. We do not require constant
rebasing or keeping the patch as a series of clean commits. It should
(note: no must here) be series of clean commits when the pull request
is initiated. This makes initial review easier. Then, based on reviews
the branch will get some cleanup commits. At this stage we do not
encourage rebasing or rewriting the history of the work. It is OK to
do merge of master if absolutely needed (but usually this is not
needed, even if there are conflicting changes in the master branch),
and it is OK to get a messy history. When the reviews are done, the
history should be rewritten so that the patch is again a series of
clean commits. This can be done by the patch author, or by the
committer. The easiest way to do this is by squash merge. At this
stage the final merge conflicts are resolved, and the patch series is
turned into a nice clean commit history again.

The "rewrite history only at final stage" workflow allows for easy
collaborative effort, yet it provides a nice clean history into Django
master. It allows the reviewer to make a pull request of spotted
errors. For at least 9/10 of the patches a squash merge is enough. The
catch is that if the patch author doesn't do the final squashing, then
she/he will not end up as the "Author" in the commit logs. If the
changes are clean and small enough in the review stage, then merging
the branch directly can be done. Retaining the reviewer requested
changes as a separate (single) commit might be a good idea actually.
So, this is my current suggestion: The rebase workflow, but with
minimal rebasing. The topic branches are usable as a base for work,
but not in the long term.

The above of course needs to be turned into nice easy to follow
guidelines. We need as-easy-as-possible path for trivial changes. As
suggested upthread, we need to make contributing to Django easy - but
it is also a good idea to ask more from those contributors who can and
have will to do more work. Unfortunately I do not have time to write
the guidelines this week, but it is of course OK to make pull requests
into https://github.com/akaariai/django/compare/django_git_guidelines
- I will not rebase the branch (until submitting it for final review)
- it will be using the above guidelines hereupon.

- Anssi

Richard Laager

unread,
May 22, 2012, 4:34:58 PM5/22/12
to Anssi Kääriäinen, Django developers
On Tue, 2012-05-22 at 04:18 -0700, Anssi Kääriäinen wrote:
> if the patch author doesn't do the final squashing, then
> she/he will not end up as the "Author" in the commit logs.

This isn't an issue. Just do:
git commit --author "John Doe <jd...@example.com>"

And if the "squash merge" workflow (which isn't something I've used)
doesn't allow you to set the author, then just follow it by:
git commit --amend --author "John Doe <jd...@example.com>"

--
Richard
signature.asc

Alex Ogier

unread,
May 22, 2012, 4:54:20 PM5/22/12
to django-d...@googlegroups.com
Git actually has native support for this workflow. Each commit has an
"author" and a "committer" which are typically the same, but in the
case of a squash merge or patch are different.

For example, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=72c04af9a2d57b7945cf3de8e71461bd80695d50

Best,
Alex Ogier

Anssi Kääriäinen

unread,
May 22, 2012, 5:39:41 PM5/22/12
to Django developers
On May 22, 11:54 pm, Alex Ogier <alex.og...@gmail.com> wrote:
> On Tue, May 22, 2012 at 4:34 PM, Richard Laager <rlaa...@wiktel.com> wrote:
> > On Tue, 2012-05-22 at 04:18 -0700, Anssi Kääriäinen wrote:
> >> if the patch author doesn't do the final squashing, then
> >> she/he will not end up as the "Author" in the commit logs.
>
> > This isn't an issue. Just do:
> >    git commit --author "John Doe <j...@example.com>"
>
> > And if the "squash merge" workflow (which isn't something I've used)
> > doesn't allow you to set the author, then just follow it by:
> >    git commit --amend --author "John Doe <j...@example.com>"
>
> Git actually has native support for this workflow. Each commit has an
> "author" and a "committer" which are typically the same, but in the
> case of a squash merge or patch are different.

The catch was meant to be that you won't get to be official author if
you don't do the final polish yourself. Maybe that isn't the brightest
of idea, we would probably still want that author information in this
case, too.

The more I work with the pull requests, the clearer it is that we need
not be overly pedantic with requiring the author to do all the
polishing. Requiring the author to remove one space, then waiting for
that space to be removed and finally merging the work is a lot more
laborious than just doing this simple change on commit.

So, I will try to reword the doc changes to suggest a lightweight path
for those who just want their change into Django (patch in Trac ticket
or a pull request, no need for good commit messages or rebasing), and
then have documentation for how to create perfect commit ready pull
requests, and suggest trying to use that path if the contributor wants
to contribute to Django more than just a single patch.

This is very much work in progress and I expect more changes to the
workflow still.

Thank you all very much for the feedback so far,
- Anssi

Anssi Kääriäinen

unread,
Jun 6, 2012, 3:09:22 PM6/6/12
to Django developers
I have update the working with git guidelines based on the above
discussions. Here is the stats of the whole of the work:
docs/internals/contributing/committing-code.txt | 108 ++++++++-
.../contributing/writing-code/branch-policy.txt | 171
-------------
docs/internals/contributing/writing-code/index.txt | 2 +-
.../writing-code/submitting-patches.txt | 61 +++--
.../contributing/writing-code/working-with-git.txt | 228 ++++++++++++
++++++
docs/internals/git.txt | 198 ++++++++++++
+++
docs/internals/index.txt | 2 +-
docs/internals/svn.txt | 254
--------------------
8 files changed, 566 insertions(+), 458 deletions(-)

The biggest changes to the previous work are:
- It is not required to create "perfect" pull requests
- It is recommended that the committer do the final squashing
- For basic patches the suggested (though not required) workflow is:
- create a trac ticket
- once you have some code available, push your work to github,
announce your branch in the ticket
- once you have something committable available, make a pull
request. Logical commits at this time are suggested.
- based on reviews, add commits to your work, do not squash work
at this stage, this allows the reviewer easily see what you changed.
- If it seems the work is far away of being commit quality, or the
patch author does not respond in reasonable time, the pull request
should be closed (the patch author is free to reopen after dealing
with the review issues).
- once the work is ready for committer, the committer will rework
the history according to her judgment. Logical commits is the
suggestion, but the documentation intentionally leaves some freedom
for the committer.

I am sure there is still a lot to polish. I might have tried to change
too big portion of the docs in one go. Still, I would like to commit
what I have tomorrow, so that the sprinters at djangocon have the
possibility to use the guidelines and begin the polishing work.
Alternatively, if it is possible to build the docs and make them
available somewhere for the sprinters, I could wait for reviews from
the sprinters, then commit the code early next week. This would make
for a really good review for the guidelines.

The work is available from:
https://github.com/akaariai/django/tree/django_git_guidelines
or for compare view:
https://github.com/akaariai/django/compare/django_git_guidelines

- Anssi

Aymeric Augustin

unread,
Jun 7, 2012, 12:53:26 PM6/7/12
to django-d...@googlegroups.com
On 6 juin 2012, at 21:09, Anssi Kääriäinen wrote:
> I am sure there is still a lot to polish. I might have tried to change
> too big portion of the docs in one go. Still, I would like to commit
> what I have tomorrow, so that the sprinters at djangocon have the
> possibility to use the guidelines and begin the polishing work.
> Alternatively, if it is possible to build the docs and make them
> available somewhere for the sprinters, I could wait for reviews from
> the sprinters, then commit the code early next week. This would make
> for a really good review for the guidelines.
>
> The work is available from:
> https://github.com/akaariai/django/tree/django_git_guidelines
> or for compare view:
> https://github.com/akaariai/django/compare/django_git_guidelines

Hello Anssi,

As discussed on IRC, I reviewed your patch, copy-edited it a bit slightly and committed it.

This can definitely still be polished, but we had to start somewhere, and it's done. We'll adjust it as necessary.

Many thanks for your work!

--
Aymeric.

Jeremy Dunck

unread,
Jul 18, 2012, 5:46:21 AM7/18/12
to django-d...@googlegroups.com
I noticed this hasn't made it to master yet. Could it? I'm running
sprints and there's a bit of confusion on how to contribute to git.

Cheers,
Jeremy

Aymeric Augustin

unread,
Jul 18, 2012, 7:06:51 AM7/18/12
to django-d...@googlegroups.com
On 18 juil. 2012, at 11:46, Jeremy Dunck wrote:

> I noticed this hasn't made it to master yet. Could it? I'm running
> sprints and there's a bit of confusion on how to contribute to git.

Hi Jeremy,

This was committed. It's under Docs > How to get involved > Working with Git and GitHub.

https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/working-with-git/

Best regards,

--
Aymeric.
Reply all
Reply to author
Forward
0 new messages