Tightening up branch releases

Michael Connor

unread,

Oct 13, 2008, 2:57:50 PM10/13/08

to dev-pl...@lists.mozilla.org

In an effort to improve our stability and security release process,
there is a proposal to make some significant changes in how we drive
and ship releases on stable branches. The primary goals are to
improve quality and reduce the number of release candidates needed to
ship these releases. We have discussed this directly with key release
drivers and QA leads, as well as the Wednesday product delivery
meeting, and the response has been generally quite positive. Now
we're bringing that proposal to dev-planning to get wider feedback
before finalizing the new setup.

The high-level elements of the plan are:

1) Aggressively mitigate risk by setting a much higher bar for patch
acceptance on the branch.
2) Ship more frequently (typically four or five weeks) with smaller
scope for each release.
3) Change the structure of the release cycle to create more time for
QA to verify and analyze fixes and test plans

The full details are at: https://wiki.mozilla.org/User:Mconnor/SecurityUpdates

Please feel free to either comment here or ask questions at
Wednesday's product delivery meeting.

-- Mike

Boris Zbarsky

unread,

Oct 13, 2008, 3:11:49 PM10/13/08

to

Michael Connor wrote:
> The full details are at:
> https://wiki.mozilla.org/User:Mconnor/SecurityUpdates

This looks like a pretty good plan overall, but I have a few concerns:

1) Under this plan major-release regressions need explicit approval to
land on the branch, right? How does one request said approval?

2) Historically, we haven't even started triaging branch
blocker/approval stuff until a week or so after shipping the previous
branch release (at least as far as I could tell from my bugmail),
because the branch drivers were all burned out from releasing. Do we
expect that to change now, in that they will be able to start triaging
the next release before the current one ships? Ideally, we would start
triaging the next release immediately upon freezing and handoff to QA,
so that when we reopen for checkins there is no idle time. It might be
even be good to reopen before QA signoff, if we relbranch earlier. That
should also prevent the "rush to checkin" issue we have right now.

-Boris

Michael Connor

unread,

Oct 13, 2008, 3:24:32 PM10/13/08

to Boris Zbarsky, dev-pl...@lists.mozilla.org

On 13-Oct-08, at 3:11 PM, Boris Zbarsky wrote:

> Michael Connor wrote:
>> The full details are at: https://wiki.mozilla.org/User:Mconnor/SecurityUpdates
>
> This looks like a pretty good plan overall, but I have a few concerns:
>
> 1) Under this plan major-release regressions need explicit approval
> to land on the branch, right? How does one request said approval?

Yeah, we need a plan for this. Thinking of having a stable-release-
drivers mailing list or something for people to mail, need to sort out
exact details.

> 2) Historically, we haven't even started triaging branch blocker/
> approval stuff until a week or so after shipping the previous branch
> release (at least as far as I could tell from my bugmail), because
> the branch drivers were all burned out from releasing. Do we expect
> that to change now, in that they will be able to start triaging the
> next release before the current one ships? Ideally, we would start
> triaging the next release immediately upon freezing and handoff to
> QA, so that when we reopen for checkins there is no idle time. It
> might be even be good to reopen before QA signoff, if we relbranch
> earlier. That should also prevent the "rush to checkin" issue we
> have right now.

That's the idea, yes. There's a two week window between code freeze
and tagging/building that's pretty much all QA, so it seems like
that's the optimal time to plan and assess bugs.

-- Mike

Johnathan Nightingale

unread,

Oct 14, 2008, 10:13:16 AM10/14/08

to Michael Connor, dev-pl...@lists.mozilla.org

I'm a big fan of the tightened branch criteria (point 1), and of
allocating a larger share of the cycle time to QA by moving up freeze
dates (point 3). I think your observations on the Wednesday call were
right on, that the lion's share of code landings are in the week
before freeze anyhow, so taking some of the time when the tree is
slack and moving it over to the QA side of the freeze makes sense.

Is it necessary to the plan that we also shrink the cycle time between
releases (point 2)? It's entirely possible that, after running with
points 1 & 3 for a few cycles, we might come to the conclusion that a
shorter release cycle is containable; and a direct effect of the
tighter criteria should be fewer re-spins and more on-time delivery,
no question. But I am not clear on why we would make the timetable
change in tandem with the other changes.

Put another way: We know that our current system has room for
improvement - too much work in not enough time to be safe - that's why
we're proposing to change things. But if we reduce the work AND the
time, it's not clear to me from inspection that the situation is
materially better off.

Have we considered an approach like:

- Implement tighter branch rules and QA time-allotment changes
- Watch the metrics we care about (# of re-spins, # of firedrill
regressions, &c) for a few branch cycles
- Once we have a baseline on the new system, and assuming there is
still general agreement on the desirability of a shorter cycle, roll
that change in as well and see if the metrics suffer.

If we have, can I ask for clarification on why this approach would be
worse?

Cheers,

Johnathan

On 13-Oct-08, at 2:57 PM, Michael Connor wrote:

> In an effort to improve our stability and security release process,
> there is a proposal to make some significant changes in how we drive
> and ship releases on stable branches. The primary goals are to
> improve quality and reduce the number of release candidates needed
> to ship these releases. We have discussed this directly with key
> release drivers and QA leads, as well as the Wednesday product
> delivery meeting, and the response has been generally quite
> positive. Now we're bringing that proposal to dev-planning to get
> wider feedback before finalizing the new setup.
>
> The high-level elements of the plan are:
>
> 1) Aggressively mitigate risk by setting a much higher bar for patch
> acceptance on the branch.
> 2) Ship more frequently (typically four or five weeks) with smaller
> scope for each release.
> 3) Change the structure of the release cycle to create more time for
> QA to verify and analyze fixes and test plans
>

> The full details are at: https://wiki.mozilla.org/User:Mconnor/SecurityUpdates
>

> Please feel free to either comment here or ask questions at
> Wednesday's product delivery meeting.
>
> -- Mike
>

> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning

---
Johnathan Nightingale
Human Shield
joh...@mozilla.com

Michael Connor

unread,

Oct 15, 2008, 1:30:55 AM10/15/08

to Johnathan Nightingale, dev-pl...@lists.mozilla.org

On 14-Oct-08, at 10:13 AM, Johnathan Nightingale wrote:

> Is it necessary to the plan that we also shrink the cycle time
> between releases (point 2)? It's entirely possible that, after
> running with points 1 & 3 for a few cycles, we might come to the
> conclusion that a shorter release cycle is containable; and a direct
> effect of the tighter criteria should be fewer re-spins and more on-
> time delivery, no question. But I am not clear on why we would make
> the timetable change in tandem with the other changes.

Because otherwise we end up with a very long QA cycle (we can't add
development time, and thus volume of bugs, without changing the
schedule even more, and ending up with a much longer QA test period
that's more like four weeks. This feels like a very long cycle, and
its not a straight line difficulty curve. The basic proportions seem
rather sound, and the current 8-10 week cycle is not flexible enough
to handle rapidly breaking issues, necessitating more firedrills.

> Put another way: We know that our current system has room for
> improvement - too much work in not enough time to be safe - that's
> why we're proposing to change things. But if we reduce the work AND
> the time, it's not clear to me from inspection that the situation is
> materially better off.

As a word problem, I think that makes sense. But, if you measure the
work for QA as directly proportional to the volume of fixes (which is
the new reality I want to see, if we're doing solid verification of
all bugs we take on branch that's the bulk of the time QA will spend
on the branches) then the workload doesn't change in major ways if you
do twice as many releases with half the number of bugs. Unless we're
upping to four weeks of QA for an 8-10 week cycle, we aren't in a good
place.

Additionally, shorter cycles:

* reduce the pressure to take a a risky fix for release N, as N+1 is
four weeks further out, not eight or more.
* decrease our exposure to firedrill situations, as especially during
the QA period we can take specific fixes and roll them into the
current release without significantly increasing risk.
* spread the load into more cycles of lesser intensity (steady is
better than bursty)
* keeps the ball rolling on branch releases, rather than the current
lulls we hit.
* gives us a better story for quicker responses to security issues
disclosed to us by researchers.

> Have we considered an approach like:
>
> - Implement tighter branch rules and QA time-allotment changes
> - Watch the metrics we care about (# of re-spins, # of firedrill
> regressions, &c) for a few branch cycles
> - Once we have a baseline on the new system, and assuming there is
> still general agreement on the desirability of a shorter cycle, roll
> that change in as well and see if the metrics suffer.
>
> If we have, can I ask for clarification on why this approach would
> be worse?

I think the problem is that we haven't done any releases over 70 bugs
that didn't hurt us one way or another. The smallest approximation I
can come up with for 3.0.2 is ~80 bugs. Doing two releases of ~40
bugs each seems like much more digestible pieces for QA to tackle.
Also, assuming we're not going to fix more or less of the "important"
bugs based on release cycle, getting half of those fixes a month
earlier seems like a win all around.

-- Mike

Johnathan Nightingale

unread,

Oct 15, 2008, 4:47:18 PM10/15/08

to Michael Connor, dev-pl...@lists.mozilla.org

On 15-Oct-08, at 1:30 AM, Michael Connor wrote:

> I think the problem is that we haven't done any releases over 70
> bugs that didn't hurt us one way or another. The smallest
> approximation I can come up with for 3.0.2 is ~80 bugs. Doing two
> releases of ~40 bugs each seems like much more digestible pieces for
> QA to tackle. Also, assuming we're not going to fix more or less of
> the "important" bugs based on release cycle, getting half of those
> fixes a month earlier seems like a win all around.

So, to be clear, I definitely agree that likelihood of regressions
grows greater-than-linear with number of bugs taken, and I didn't mean
to seem that I was advocating for taking more bugs. Absolutely that
should be kept sane.

Having said that...

> Additionally, shorter cycles:
>
> * reduce the pressure to take a a risky fix for release N, as N+1 is
> four weeks further out, not eight or more.
> * decrease our exposure to firedrill situations, as especially
> during the QA period we can take specific fixes and roll them into
> the current release without significantly increasing risk.
> * spread the load into more cycles of lesser intensity (steady is
> better than bursty)
> * keeps the ball rolling on branch releases, rather than the current
> lulls we hit.
> * gives us a better story for quicker responses to security issues
> disclosed to us by researchers.

... *this* is the kind of rationale I was looking for. If we can do
what you describe, and do it on a tighter schedule, then by all
means. My concern was the sense I had that we were assuming that
smaller bug counts and shorter development time would naturally mean
we could do it on a tighter schedule, and it didn't feel like that was
a good thing to assume, at least not a good thing to commit to, until
we'd done it once or twice. But if you're saying that shrinking the
cycle time (beyond just reducing respin-induced-delays) is its own,
explicit end-goal (for the reasons you give above), that's a different
story for me. I agree that those wins (particularly numbers 1, 2, and
5) are worth reaching for and that with those in mind, shorter cycles
aren't just something we're trying to pick up "for free" with the
tighter branch requirements.

Cheers,

Johnathan