[DISCUSS] Pull Request merge policy for core

105 views
Skip to first unread message

Stephen Connolly

unread,
Oct 20, 2015, 6:09:07 AM10/20/15
to jenkin...@googlegroups.com
We seem to have some strong opinions on how pull requests should be
merged for core.

On the one hand, there is the view (which I share) that we should not
rewrite the work of contributors.

On the other hand, there is the view that we should squash all pull
requests before merging.

Now when I look at the other point of view, I believe I can see where
they are coming from:

* The GitHub UI for merging pull requests currently does not allow you
to edit the first line of the commit message, forcing it to be "Merge
pull request #1234 from joebloggs/jenkins-32198"

* The GitHub UI for browsing the commit history currently does not
allow you to view the history of the branch from the --first-parent
point of view

* The GitHub UI for looking at the blame on a file currently does not
allow you to view the blame of the file with the --first-parent point
of view

So if we restrict ourselves to using the GitHub UI, you need to trade
off between the ability to do code archeology from the release
viewpoint and the ability to do code archeology from the what were
they thinking viewpoint.

Since there is a slight majority of use cases looking at things from
the release viewpoint, it therefore makes sense *for users of the
GitHub UI* to request that pull requests be merged... in other words:

Assume that we are using the GitHub UI
Assume that release viewpoint archeology is more frequent
Then we should squash commits before merging

Now I have a number of issues with the above set of assumptions, but
the main one as I see it is this:

When you throw away data, you can never recover it. I remember a
science project where we recorded the temperature at different times
of the day over a month... every day we would take a number of
readings and record the average of the days readings... at the end of
the month we were introduced to statistical analysis (this was evil
teacher making us learn through making mistakes) so that we would
realise that *YOU NEVER THROW AWAY THE RAW DATA* because you do not
know what you will need it for in the future.

I agree with some things:

* The GitHub UI for merging commits is braindead and gives crappy
commit summary messages... it would totally be much better if the
merge commit summary was actually the PR branch description (i.e.
something like what you would get locally with `merge.branchdesc=true`
in your git config). I have lodged a feature request for that with
GitHub.

* The GitHub UI for browsing commits is a pain as you cannot see the
`git log --first-parent` view (but while we merge pull requests using
the GitHub UI with it's crappy "Merge pull request #1234 from
joebloggs/jenkins-32198" style summaries it's not like `git log
--first-parent` gives us a great history view either)

* The GitHub UI for attributing blame should allow for the
`--first-parent` `-w` `-C` and `-M` options to be configured.

So I see lots of issues with the GitHub UI... but I do not see that
the solution is to squash commits.

Firstly, if you are squashing commits, then you have to be merging
from the command line... and if you are merging from the command line,
well you can give a proper merge commit message and then `git log
--first-parent` is beautiful and we don't need to throw away the raw
data.

Secondly, the code archeology almost always starts from `git blame`
and the number of hours I have wasted with `git blame` when I really
wanted `git blame --first-parent -w` (sometimes with a mix of `-C`
`-M`) is such that I have learned *never* to use the GitHub UI to
track these things down. Attempting to do any code archeology from the
GitHub UI is a complete and utter waste of time and will cause you to
lose the will to live.

Thirdly, there is an argument about cherry picking which seems to
ignore two simple facts:
* You can cherry pick merge commits - provided you know what number
to pass to `-m`
* When you use `git log --first-parent` then you know it's always
`git cherry-pick -m 1`

So I see the advocates of squashing commits as looking for
* a short-term gain (because GitHub will eventually fix their crappy UI bugs)
* of limited utility (because you almost always are better off with
the Git CLI anyway)
* with questionable value (because the commit history is still
distracted by all those "Merge pull request #... from .../..." noise)

Now be careful, because I am not against asking the pull request
author to tidy up their PR before we merge it... there is a big
difference between asking for a tidy-up and mandating a squash.

If you tidy-up a pull request you might probably:
* reorganise the history into a logical flow.
* clean up the commit messages to better reflect your thinking
* remove noise commits from formatting changes and reverts of those
formatting changes
* combine some commits together as they form a single logical change
* split some commits apart as they co-mix two logically separate changes

I have some concerns about mandating a tidy-up (I note that Jesse is
against rewriting the history of a PR branch as that means that GitHub
hides the code review comments) but I see nothing wrong with asking
for a tidy-up as being a choice available to the merger...

Aside: There is also a possible small question of the copyright of
commit messages and whether those commit messages are covered by the
MIT license that the code diff is covered by... after all, at the time
the MIT license was drafted there was no such thing as a pull request
containing multiple commits, so the concept of a license applying to
the patch metadata would not even have occurred. It's probably not too
important but it may mean that even under the current CLA for core we
do not have the legal right to modify the original author's commit
messages for the initial merge to master.... never mind the drive-by
pull requests we get from people without a CLA.

But really what do others think?

How do people want to merge pull requests against core?
Do we want to use the GitHub UI to merge the pull requests?
Do we want to mandate squashing commits (which will prevent us from
using the GitHub UI to merge pull requests and force manual closing of
the PRs)?
Do we want to mandate tidying up commits (which will hide code review comments)?

James Nord

unread,
Oct 20, 2015, 7:06:43 AM10/20/15
to Jenkins Developers
I have some concerns about mandating a tidy-up (I note that Jesse is 
against rewriting the history of a PR branch as that means that GitHub 
hides the code review comments) 

Can you explain this.  If I pushed a rebased commit the comments are still there with "commented on an outdated diff" (and if the comment is still applicable to the LOC is still shows) [1]
This is no different to comments on a commit being addressed in a future commit and as such are still visible?

/James 

[1] or so it seemed when I have done this before...

Stephen Connolly

unread,
Oct 20, 2015, 7:26:14 AM10/20/15
to jenkin...@googlegroups.com
On 20 October 2015 at 12:06, James Nord <jn...@cloudbees.com> wrote:
>> I have some concerns about mandating a tidy-up (I note that Jesse is
>> against rewriting the history of a PR branch as that means that GitHub
>> hides the code review comments)
>
>
> Can you explain this. If I pushed a rebased commit the comments are still
> there with "commented on an outdated diff" (and if the comment is still
> applicable to the LOC is still shows) [1]
> This is no different to comments on a commit being addressed in a future
> commit and as such are still visible?

Well I will not speak for Jesse, so you would want to check with him
as to his logic.

I have seen that the GitHub comment tracking feature can be
unreliable... while it does the right thing and keeps the comments
with the new rewritten history *most* (say 8 out of 10 times) of the
time, there are times when it doesn't... so if you rewrite the history
I cannot trust that all my comments will have been tracked correctly
(especially if I have more than 10 of them... then there's a good
chance that 2 of them were mis-tracked... and it's oh so fun trying to
manually track comments, esp if they get "deleted" and you have to
switch back to email threads) and I have to re-review all the changes
to ensure that in rewriting the history you didn't inadvertently
introduce some other side change

>
> /James
>
> [1] or so it seemed when I have done this before...
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jenkinsci-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-dev/43ed9f06-924d-4912-9389-200590b78ac8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Christopher Orr

unread,
Oct 20, 2015, 10:14:19 AM10/20/15
to jenkin...@googlegroups.com
On 20/10/15 13:26, Stephen Connolly wrote:
> On 20 October 2015 at 12:06, James Nord <jn...@cloudbees.com> wrote:
>>> I have some concerns about mandating a tidy-up (I note that Jesse is
>>> against rewriting the history of a PR branch as that means that GitHub
>>> hides the code review comments)
>>
>>
>> Can you explain this. If I pushed a rebased commit the comments are still
>> there with "commented on an outdated diff" (and if the comment is still
>> applicable to the LOC is still shows) [1]
>> This is no different to comments on a commit being addressed in a future
>> commit and as such are still visible?
>
> Well I will not speak for Jesse, so you would want to check with him
> as to his logic.
>
> I have seen that the GitHub comment tracking feature can be
> unreliable... while it does the right thing and keeps the comments
> with the new rewritten history *most* (say 8 out of 10 times) of the
> time, there are times when it doesn't... so if you rewrite the history
> I cannot trust that all my comments will have been tracked correctly
> (especially if I have more than 10 of them... then there's a good
> chance that 2 of them were mis-tracked... and it's oh so fun trying to
> manually track comments, esp if they get "deleted" and you have to
> switch back to email threads) and I have to re-review all the changes
> to ensure that in rewriting the history you didn't inadvertently
> introduce some other side change

FWIW (I'm not involved in core dev), what I usually do with GitHub pull
requests is to squash them after the review is complete — I see no need
to pollute git history with ["wip", "cleanup", "cleanup", "address
review comments", "fix nitpicks"] — but I do this entirely locally.

i.e. I squash commits but don't then force-push that feature branch to
the remote, so I avoid wiping out the "who-committed-what-when"
notifications in the PR, and I avoid screwing up the GitHub comment
tracking.

That is, I do something like:

1. git checkout feature/whatever
2. git rebase -i HEAD~n
3. <squash to one or more logical commits>
4. git checkout master
5. git merge feature/whatever
6. git push
7. Enter "Merged" and click "Comment & close" on the GitHub PR (if the
PR hasn't auto-closed)

In this case, the commit history and PR comments on GitHub should remain
as they were while the review was in progress.

Regards,
Chris

Oleg Nenashev

unread,
Oct 20, 2015, 10:42:06 AM10/20/15
to Jenkins Developers
Since Stephen mostly addresses me, I need to explain my position. We had a discussion with him in https://github.com/jenkinsci/jenkins/pull/1815 and https://github.com/jenkinsci/jenkins/pull/1804

I'm not against multi-commit changes when these commits are really reasonable on the atomic level (E.g. "Added new API method" + "Added unit tests for new API" + "Migrated stuff to the new API"). I'm against dozens of commits happening during the PR polishing: typo fixes, minor regressions, etc. I doubt this info will be ever required, but it makes the analysis very difficult when you want to investigate the issue without checking out the entire repo.

Since Jenkins project is being hosted on GitHub, we should rely on its features and make the project more comfortable to all types of contributors including developers and mostly silent Jenkins users and instance maintainers. They are may not have to have much experience with Git or to have local clones of repositories, so I'm against referring to advanced-level Git features allowing to work with unsquashed commit histories.

In order to simplify my work, I've developed the script, which allows to easily perform squashed merges of pull requests, which require only one logically atomic commit. This script mostly follows the approach being used by Cristopher.

From my side, I would propose the following approach:
  • Jenkins contributors are eligible to do whatever they want in PRs => we don't require contributors to squash commits
  • When we merge PRs, into master, we squash commits and reference the original pull requests there
    • => We keep the master branch clean
    • => We allow to dig into the history by going to a pull request if anybody wants it
Would such approach be convenient enough?
If yes, I can convert the script to a more generic and reliable Java application, which will also pull GitHub for data and probably make references autmatically







вторник, 20 октября 2015 г., 13:09:07 UTC+3 пользователь Stephen Connolly написал:

Stephen Connolly

unread,
Oct 20, 2015, 10:42:45 AM10/20/15
to jenkin...@googlegroups.com
Yes so that fixes one issue but complicates post-hoc analysis of PRs.

For instance, suppose I wanted to determine which areas of Jenkins
core are the most tricky for contributors to help out with...

My initial guess would be to look at the master branch for merge
commits, trace each one and count how long the commits being merged
were.

With the above scheme I no longer have GIT tooling to help and I have
to cobble together something involving the GitHub APi as well as
GIT... frankenstein's history analyser if you like.

If we avoid squashing then this analysis is much easier and just an
exercise in command line scripting
>
> Regards,
> Chris
>
> --
> You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/56264C35.9070705%40orr.me.uk.

Oleg Nenashev

unread,
Oct 28, 2015, 8:29:08 AM10/28/15
to Jenkins Developers
Added the topic to the governance meeting agenda.


For instance, suppose I wanted to determine which areas of Jenkins
core are the most tricky for contributors to help out with...

Well, I suppose we know several hotspots w/o any commit archeology :)
I understand this idea, but IMO there much more Jenkins users looking into GitHub to analyze their regressions, etc. Non-squashed commits impact this use-case especially for users, who are not very familiar with Git.

вторник, 20 октября 2015 г., 17:42:45 UTC+3 пользователь Stephen Connolly написал:

Baptiste Mathus

unread,
Oct 28, 2015, 12:11:01 PM10/28/15
to jenkin...@googlegroups.com
<life>won't be able to attend the gov meeting since now DST is finished, meeting starts at 7PM, exactly while I'm feeding kids</life>
Anyway, I'm personnally for, IIUC, what Oleg somehow expressed here.

That is:
  • Like when using Git normally, squashing (well the difference is indeed you still haven't pushed in that case, anyway) should be encouraged to provide meaningful commits in the long term.

    Meaning: having an initial commit like "Fixes the issue bla blablab...", then followed by "hack", "fix typo in hack" should really be encouraged to squash. Cause it doesn't provide any value in the long term and makes it harder to understand intention.

    • IIUC, in that situation, Jesse says that it might be preferrable to open a new squashed PR superceding the previous one which you'd keep unsquashed for historical references (?). About that, why not, but as it makes it more complicated to contribute, and there are risks to have yet another comments leading to squashing... Well, back to a new one? Not sure it would work?

  • There can absolutely be many commits in a PR as long as it's a consistent set related to the PR subject (that is: you shouldn't have formatting fixes in the middle of commits about some feature/fix related PR).
My 2 cents.

Cheers


For more options, visit https://groups.google.com/d/optout.



--
Baptiste <Batmat> MATHUS - http://batmat.net
Sauvez un arbre,
Mangez un castor !

Jesse Glick

unread,
Oct 28, 2015, 12:22:55 PM10/28/15
to Jenkins Dev
On Wed, Oct 28, 2015 at 12:10 PM, Baptiste Mathus <m...@batmat.net> wrote:
> Jesse says that it might be preferrable to open a new squashed PR

I never said anything of the sort. My preference remains to use plain
old Git merge and never destroy history, since simplified views of
history can always be produced on demand by using simple Git options.
But if people insist on destroying history, at least leave the PR
intact so the information is not totally lost (so long as we continue
to host on GitHub), and have the maintainer use `git merge --squash`
to produce a commit to be pushed to `master`. (Can use a magic commit
message to tell GH that this commit closes the PR.)

Oleg Nenashev

unread,
Feb 4, 2018, 7:07:54 AM2/4/18
to Jenkins Developers
Bumping this thread to the top.
Since the number of contributions to the core grows a lot, I think we need to finally discuss the merge policy and JEP it.
Will discuss it with Daniel/Oliver so that we can sync-up and come up with a proposal.

BR, Oleg

среда, 28 октября 2015 г., 17:22:55 UTC+1 пользователь Jesse Glick написал:
Reply all
Reply to author
Forward
0 new messages