Rakudo's fork queue on GitHub

Patrick R. Michaud

unread,

Feb 28, 2009, 11:29:31 AM2/28/09

to perl6-c...@perl.org

A few people have asked about submitting patches via the
"fork queue" on GitHub; after working with the queue this
morning and largely being frustrated by it, I'm going to request
that patches continue to be submitted as patches through RT.

The biggest difficulty that I'm encounter with the fork queue
is that developers (understandably) will develop something in their
local repository as a series of small commits, but it's hard for me
review and evaluate the end result as a unit. In other words, the
commit list in the queue contains all of the false starts and
retractions, making it very difficult to see what was done.

It's also more difficult to comment on any particular set of
commits, e.g. to ask that something be revised or factored in
a different manner.

So, for the time being Rakudo's official policy will be to
accept patch submissions via RT. I've now cleared the fork queue
on GitHub; any patches that were previously there need to be
resubmitted.

Hopefully someone can update the README and github wiki pages
with a good procedure for generating patch files for Rakudo.
(The current README suggests git-format-patch, but this also
apparently produces a series of small patch files which are
equally difficult to review.)

Or, if someone can point me to a guide or sequence that describes
how to make handling the fork queue or multiple patch submissions
easier for the maintainers to review, I'll reconsider the policy.

Thanks,

Pm

Andy Lester

unread,

Feb 28, 2009, 11:46:10 AM2/28/09

to Patrick R. Michaud, perl6-c...@perl.org

On Feb 28, 2009, at 10:29 AM, Patrick R. Michaud wrote:

> So, for the time being Rakudo's official policy will be to
> accept patch submissions via RT. I've now cleared the fork queue
> on GitHub; any patches that were previously there need to be
> resubmitted.

My understanding is that this is bad for everyone else, because it
loses the changeset history by making your committed patch different
than the accumulated changesets that you're applying.

I suspect there's a way of working the workflow that lets you easily
view them in aggregate. Perhaps the RT ticket would include not a
patch but the commit IDs that the patch would include. The submitter
says <<END_OF_RT

"Hey, Patrick, I finished the work on the wangdoodle. Here are the
commits for it:

88b9da2ce43c4e8583cda28ace5144e0a618d70b
4a03ba2fe2e94895881d301de0446158424e7c1d
9e8c18ff0093125830a4bd06a3daee2e20cd8faa
0240f7776a9a01443567dfeaa98bba91b734affd
END_OF_RT

I'm sorry this is poorly researched. I'm heading out for the day, or
I'd have spent more time on it. I'm just concerned that using patches
is very anti-git and will make things worse.

xoxo,
Andy

--
Andy Lester => an...@petdance.com => www.petdance.com => AIM:petdance

Patrick R. Michaud

unread,

Feb 28, 2009, 12:07:49 PM2/28/09

to Andy Lester, perl6-c...@perl.org

On Sat, Feb 28, 2009 at 10:46:10AM -0600, Andy Lester wrote:
>
> On Feb 28, 2009, at 10:29 AM, Patrick R. Michaud wrote:
>
>> So, for the time being Rakudo's official policy will be to
>> accept patch submissions via RT. I've now cleared the fork queue
>> on GitHub; any patches that were previously there need to be
>> resubmitted.
>
> My understanding is that this is bad for everyone else, because it loses
> the changeset history by making your committed patch different than the
> accumulated changesets that you're applying.

I suspect that "everyone else" should be making their changes in
personal branches as well, rather than expecting the rakudo/master
repository to exactly follow the commits in their forks.

> I suspect there's a way of working the workflow that lets you easily
> view them in aggregate. Perhaps the RT ticket would include not a patch
> but the commit IDs that the patch would include. The submitter says
> <<END_OF_RT
>
> "Hey, Patrick, I finished the work on the wangdoodle. Here are the
> commits for it:
>
> 88b9da2ce43c4e8583cda28ace5144e0a618d70b
> 4a03ba2fe2e94895881d301de0446158424e7c1d
> 9e8c18ff0093125830a4bd06a3daee2e20cd8faa
> 0240f7776a9a01443567dfeaa98bba91b734affd
> END_OF_RT

This approach goes totally the wrong way -- the submitter has to
somehow produce a (long) list of commits, which the maintainer then
has to manually replay in the review and test.

> I'm sorry this is poorly researched. I'm heading out for the day, or
> I'd have spent more time on it. I'm just concerned that using patches
> is very anti-git and will make things worse.

I agree we probably need more research here to potentially use
git/github more effectively, but what little I understand of the
fork queue absolutely doesn't work for me as a maintainer, and
turns the maintainers into a bottleneck.

The "send a patch via RT" approach is well understood, and has
worked fine for us for years. So, we either somehow improve our
understanding of git patch management (and believe me, I've
unsuccessfully looked for guides on "fork queue management for
maintainers"), or we stick with RT.

Pm

Chris Dolan

unread,

Feb 28, 2009, 5:56:55 PM2/28/09

to Patrick R.Michaud, Andy Lester, perl6-c...@perl.org

Based on my experience watching the Linux kernel development process
via git, I think both Andy and Patrick are partially right. An
important point is that git optimizes for scalability of the project
leadership.

Andy is right that it's enormously useful to preserve history. Git
has remarkable features where lieutenants can review and digitally
sign commits so the benevolent dictator can slurp them into HEAD
without needing to personally review the commits if he trusts his
lieutenants. Git also tracks the owner of a patch quite well, so
copyright concerns become more automatable. Linux cares deeply about
this because of cases like SCO where the project leaders want to know
who may have committed code of questionable origin.

But Patrick is right that reviewing false starts is an big waste of
time for project leaders, who may already be bottlenecks for a
project. The Linux dev process insists that submitters do this:
1) fork a trunk point
2) do work on a branch, which may include false starts
3) if the work is more than one commit:
a) make a new clean branch
b) manually rebuild the patch as a series of commits handling
separate concerns
c) ask upstream to pull that optimized branch
4) submitter deletes dev branch which included the false starts

This process means the submitted branch/patches represents a
fictitious optimal development, as if the programmer were perfect and
made no false starts along the way. That means more work for the
submitter of a complicated patch, but sometimes raising the bar is not
a bad thing.

On the other hand, if the submitter thinks the patch is awesome but
the reviewer thinks it needs a little work, that's a bit harder
because the reviewer needs to do one of the following:
1) branch and fix
a) make a branch
b) pull the submission
c) edit
d) commit
2) edit the patch as text and commit
3) reject the patch back to the submitter
Option #2 is the easiest but the worst because it loses attribution.
#3 happens the quite often in the Linux kernel process. We should not
view rejection as rude, but as necessary for a good code base.

The ffmpeg dev team takes a similar view on patch rejection -- because
video codecs are so dangerously patent-encumbered, the team won't
accept any patch until it's really clean and deeply understood. And
they have ended up with some surprisingly readable code as a result.

All that said, I have not studied the git best practices for reviewing
submissions. I strongly suspect that there's lots more we can learn
from Linux.

Chris

Andy Lester

unread,

Mar 1, 2009, 12:06:53 PM3/1/09

to Chris Dolan, Patrick R.Michaud, perl6-c...@perl.org

So how about this for the workflow, Patrick:

1) Developer creates a branch to work on a feature or cleanup or
whatever

2) Developer commits to dev/rakudo work branch however much she wants.

3) Developer merges back to dev/rakudo master branch with a "squash"
option.

4) Developer posts merge request to pmichaud (or whoever)

5) Patrick does a cherry-pick with the --no-commit flag on rakudo/
rakudo.

6) Patrick tests out the patch, and commits when happy.

This way, Patrick is still getting one big patch, but without the mess
of creating a patch, and we still get to take advantage of git's
tracking.

This means two big rules:

1) All development work of any size has to be done on a branch in dev/
rakudo, because...

2) Any given commit on a developer's master has to be able to stand on
its own, and not require siblings.

I think if we just get past the way we've done things where we shove
things on to our master branches willy-nilly, the fork queue will be
MUCH smaller, and Patrick will have a much easier time of things.

Sound reasonable? I'll go throw it on a "how to be a rakudo
developer" page.

Chris Dolan

unread,

Mar 1, 2009, 4:30:59 PM3/1/09

to Andy Lester, Patrick R.Michaud, perl6-c...@perl.org

On Mar 1, 2009, at 11:06 AM, Andy Lester wrote:

> So how about this for the workflow, Patrick:
>
> 1) Developer creates a branch to work on a feature or cleanup or
> whatever
> 2) Developer commits to dev/rakudo work branch however much she wants.
> 3) Developer merges back to dev/rakudo master branch with a "squash"
> option.
> 4) Developer posts merge request to pmichaud (or whoever)
> 5) Patrick does a cherry-pick with the --no-commit flag on rakudo/
> rakudo.
> 6) Patrick tests out the patch, and commits when happy.
>

> ...

> This means two big rules:
>
> 1) All development work of any size has to be done on a branch in

> dev/rakudo, because...

> 2) Any given commit on a developer's master has to be able to stand
> on its own, and not require siblings.
>

> ...

[Note that I already started developer instructions at:
http://wiki.github.com/rakudo/rakudo/steps-to-create-a-patch]

Your steps sound right to me (but I haven't actually tested this
workflow myself).

For larger features which are too complex to be squashed into a single
patch, non-core developers should perhaps use one more level of branch
indirection and squash subfeatures, but leave each logical change as a
separate commit on the intermediate branch.

To pick a concrete example, say I'm porting PCT::HLLCompiler from PIR
to Perl 6. I'll do most of my work on chrisdolan/rakudo/hllcompiler-
perl6-dev. As I complete major steps (say 3 parts: a straight port,
then idiomizing, then enhancing), I'll squash them to my intermediate
branch, chrisdolan/rakudo/hllcompiler-perl6. So, when I send a pull
request for that intermediate branch, @patrick can review as three
commits and perhaps take the first two and reject the third one.

Another (likely better) approach is for developers to just build one
part at a time and get feedback at earlier stages rather than risk
getting the whole body of work rejected because of an early design
flaw (like, it's silly to write PCT::HLLCompiler in Perl 6 because
other languages wouldn't want to use it that way and Rakudo couldn't
bootstrap with it anyway).

Chris

Patrick R. Michaud

unread,

Mar 1, 2009, 4:11:59 PM3/1/09

to Andy Lester, Chris Dolan, perl6-c...@perl.org

On Sun, Mar 01, 2009 at 11:06:53AM -0600, Andy Lester wrote:
> So how about this for the workflow, Patrick:
>
> 1) Developer creates a branch to work on a feature or cleanup or
> whatever
> 2) Developer commits to dev/rakudo work branch however much she wants.
> 3) Developer merges back to dev/rakudo master branch with a "squash"
> option.
> 4) Developer posts merge request to pmichaud (or whoever)
> 5) Patrick does a cherry-pick with the --no-commit flag on rakudo/
> rakudo.
> 6) Patrick tests out the patch, and commits when happy.

Works for me.

> Sound reasonable? I'll go throw it on a "how to be a rakudo developer"
> page.

Please do.

Pm

Patrick R. Michaud

unread,

Mar 1, 2009, 12:57:24 PM3/1/09

to Chris Dolan, Andy Lester, perl6-c...@perl.org

On Sat, Feb 28, 2009 at 04:56:55PM -0600, Chris Dolan wrote:
> On Feb 28, 2009, at 11:07 AM, Patrick R. Michaud wrote:
>> On Sat, Feb 28, 2009 at 10:46:10AM -0600, Andy Lester wrote:
>>> On Feb 28, 2009, at 10:29 AM, Patrick R. Michaud wrote:
>>>> So, for the time being Rakudo's official policy will be to
>>>> accept patch submissions via RT. I've now cleared the fork queue
>>>> on GitHub; any patches that were previously there need to be
>>>> resubmitted.
>>>
>>> My understanding is that this is bad for everyone else, because it
>>> loses the changeset history by making your committed patch
>>> different than the accumulated changesets that you're applying.

>>> [...]

>
> Based on my experience watching the Linux kernel development process via
> git, I think both Andy and Patrick are partially right. An important
> point is that git optimizes for scalability of the project leadership.

Agreed; this was one of the (many) reasons I also decided to make the
jump to git.

> [...] reviewing false starts is an big waste of time

> for project leaders, who may already be bottlenecks for a project. The
> Linux dev process insists that submitters do this:
> 1) fork a trunk point
> 2) do work on a branch, which may include false starts
> 3) if the work is more than one commit:
> a) make a new clean branch
> b) manually rebuild the patch as a series of commits handling
> separate concerns
> c) ask upstream to pull that optimized branch
> 4) submitter deletes dev branch which included the false starts
>
> This process means the submitted branch/patches represents a fictitious
> optimal development, as if the programmer were perfect and made no false
> starts along the way. That means more work for the submitter of a
> complicated patch, but sometimes raising the bar is not a bad thing.

Agreed. And I don't think it's important that Rakudo's patch history
contain _all_ of the false starts made and subsequently corrected by
an individual developer. I'd much prefer to see it as fictitious
optimal development.

So, I think what we really need is a good guide describing how
submitters can create the "optimized" branches for the maintainers
to pull from.

> On the other hand, if the submitter thinks the patch is awesome but the
> reviewer thinks it needs a little work, that's a bit harder because the
> reviewer needs to do one of the following:
> 1) branch and fix
> a) make a branch
> b) pull the submission
> c) edit
> d) commit
> 2) edit the patch as text and commit
> 3) reject the patch back to the submitter
> Option #2 is the easiest but the worst because it loses attribution. #3
> happens the quite often in the Linux kernel process. We should not view
> rejection as rude, but as necessary for a good code base.

In general I'm in favor of #3 as well; I think that rejecting patches
(with explanation and suggestions for improvement) is another really
good mechanism for raising the quality of code and process. I don't
find #1 to be too much work either, since it's easy to do local edits
and commits before pushing them to rakudo/master.

> All that said, I have not studied the git best practices for reviewing
> submissions. I strongly suspect that there's lots more we can learn
> from Linux.

Agreed. And thank you very much for this thoughtful and helpful
message, it really helps to clarify some of my thinking on the topic.

Pm