Re: Reining in the release process

J. Paul Reed

unread,

Oct 6, 2006, 5:19:49 AM10/6/06

to dev-pl...@lists.mozilla.org

L. David Baron wrote:
> One thing I've noticed over the past few years is that the build and
> test processes needed to get a release out the door seem to keep getting
> longer and longer. In other words, the time between deciding that we're
> code-complete and being ready to ship the release has been increasing.

As always, I'm interesting in feedback on the release process, and
suggestions anyone has.

> I'd like to propose a new piece of policy that I hope will help (along
> with ongoing work to automate some parts of this process) to remedy
> this:
>
> We should stop accepting changes that add additional manual steps to
> our release processes (the building, testing, and shipping from when
> we decide we're code-complete to when users get our software).

Do you have examples of changes that have been accepted that have added
manual steps to the release process?

As one of the "three musketeers" who are responsible for running manual
release processes, I whole-heartedly agree with such a policy. But I
can't think of any code changes in recent memory that we've accepted
that have increased manual steps into the release process.

We have started running some release verification steps (namely l10n
metadiff testing and update-advertisement and -application testing) that
we've worked with QA on developing, and those tests have proven
extremely valuable in pointing out mistakes/oversights in the release
process that would--and have, in previous releases--otherwise gone
undetected.

> One area of concern with such a policy is localization: if adding new
> languages meant adding manual build work or testing, this policy would
> dictate that we not add new languages. (The main concern here is really
> testing; I believe the build process already meets this standard.) So
> I'm not sure we'd be ready to enforce this policy on adding languages
> quite yet -- but I think we should be at that point with localizations
> soon.

We've become more organized with locales, but I think that's mostly a
function of maintaining the ship schedule that we do, which is also a
recentish development.

I know Chase used to do multiple "batches" of locales for a releases.
Now, for the 1.5.0.x series, we only accept new locales at version
boundaries, thus reducing work. We have enough automation in this arena
that adding a locale during a release isn't difficult. Adding it out of
band is time consuming, so we've had to make some tough calls (lt, anyone?)

Axel has been working hard to coordinate localizers, so they know when
the code complete dates for l10n are, as well, and that they are the
same as the release code complete dates if at all possible. For the
maintenance releases, this is often possible. For major releases, like
2.0, it's been more difficult (mostly due to string changes made up to
the code complete date).

For people who've been around a long time, I can understand the
curiosity about why build/release times seem to be increasing over the
past few months. Personally, I think there's two major reasons for this:

1. We have a complex, multi-platform release process, built upon an
aging, brittle infrastructure. Historically, there hasn't been
engineer-time to improve this. That's now changed, and rhelmer has
actively been working on build/release automation in bug 352230.

2. We've added some verification steps to the release process. This does
add time, but it ensures that we ship a high quality release to the
millions of people that use our software. As more people adopt Firefox
and Thunderbird, we need to ensure that their user experience,
especially in the automatic update department, is a good one. Running
the tools to verify l10n and updates does take extra time, but the cost
of *not* running them is significantly higher in terms of "bad taste"
left in people's mouths.

If there are places were we can optimize processes or steps, I'd be
interested in hearing them. I think the new build automation harness
will reduce the time somewhat, but it won't reduce them to the days of
Firefox 1.0.

We have more users, more deliverables, and more locales now, and to
manage all of those, we need more testing, verification, and confidence
that we're releasing a high quality product.

Later,
preed
--
J. Paul Reed
Build/Release Engineer - The Mozilla Corporation
smtp://pr...@mozilla.com
irc://irc.mozilla.org/preed
pots://650.903.0800/x256

Gervase Markham

unread,

Oct 6, 2006, 11:54:05 AM10/6/06

to

J. Paul Reed wrote:
> If there are places were we can optimize processes or steps, I'd be
> interested in hearing them. I think the new build automation harness
> will reduce the time somewhat, but it won't reduce them to the days of
> Firefox 1.0.

As I understand his original post, dbaron is concerned about the release
process taking up the time of coders who would otherwise be working on
the next release. (dbaron: am I right?)

So, in a world where the release process took two weeks, but it involved
only the build and release team (and assuming, for this hypothetical
situation, that we only ever made one release at once), then that would
be fine.

So, it seems to me that the goals here are:

1) Have as short a time possible between "code complete" and "no coders
need worry about this release any more".

2) Also, have the build and release time short - but that's because we
have limited build and release resources, which is a different problem.

The discussion so far seems to have centered on optimising 2); perhaps
we should think more about 1)?

Gerv

L. David Baron

unread,

Oct 6, 2006, 1:13:48 PM10/6/06

to dev-pl...@lists.mozilla.org

On Friday 2006-10-06 16:54 +0100, Gervase Markham wrote:
> As I understand his original post, dbaron is concerned about the release
> process taking up the time of coders who would otherwise be working on
> the next release. (dbaron: am I right?)

No. I'm also (in fact, primarily) concerned about testers. And also
about whoever does things like Web page changes, since I'm worried that
process might get significantly more complicated.

> So, in a world where the release process took two weeks, but it involved
> only the build and release team (and assuming, for this hypothetical
> situation, that we only ever made one release at once), then that would
> be fine.

No, it wouldn't. We need to be able to do security firedrills much
faster than that.

-David

--
L. David Baron <URL: http://dbaron.org/ >
Technical Lead, Layout & CSS, Mozilla Corporation

Adam Guthrie

unread,

Oct 6, 2006, 3:22:48 PM10/6/06

to dev-pl...@lists.mozilla.org

L. David Baron wrote:
> No. I'm also (in fact, primarily) concerned about testers. And also
> about whoever does things like Web page changes, since I'm worried that
> process might get significantly more complicated.

I think that the solution here is for QA to do what build has done for
the release process: automate it.

I spent a lot of time this summer testing updates and spot checking
various locales for every release we did. (And I mostly tested Linux; QA
has to test on all three platforms, with various update paths and at
least eight locales.) Each time I did this I thought, "This is so easy
and monotonous. I would much rather be verifying bugs or testing other
things that really need a human being to do."

Most of the time that's spent testing releases is on stuff that could be
easily automated; e.g. locale and update checking. Granted, smoketests,
basic and full functional tests will probably still need to be run by
people (until we can get Eggplant to do this). I think that if somehow
QA could automate this then we wouldn't have to sacrifice things such as
adding a new language.

Just imagine if QA could spend all the time they spend verifying updates
and spot checking locales on QAing bugs...

-Adam

Tim Riley

unread,

Oct 6, 2006, 5:06:23 PM10/6/06

to dev-pl...@lists.mozilla.org

L. David Baron wrote:
> One thing I've noticed over the past few years is that the build and
> test processes needed to get a release out the door seem to keep getting
> longer and longer. In other words, the time between deciding that we're
> code-complete and being ready to ship the release has been increasing.

> This is a problem because:
>

I agree with dbarons concept. That we want to make the release process,
and testing process in particular, as short as possible.

Keep in mind that we didn't add these steps randomly and we don't enjoy
the monotonous testing (in case anyone had any doubts about that ;) )

Personally I found when I arrived last year a lot of test effort on
rushing around plugging holes in our release process. For example,
inconsistent verification of bug fixes, which locales were tested, which
platforms, which software update paths. In every build we were finding
serious bugs here and there. So we had a choice stopping the testing or
making it more consistent... as in testing more to ensure that the area
that broke before is now working _plus_ testing other things like it to
ensure they do not break. Like we found that certain complex dialogs
commonly break so we need to test other complex dialogs.

The next approach was to add automation. We added a L10n checker,
update checker and all-html download checker, and Eggplant automated GUI
testing. These reduced our manual testing by about 90%. But then there
are other things we have been asked to test such as the top 50
extensions in various categories, large amounts of top sites, more sites
using plug-ins. More locales. So the manual task becomes bigger.

Keep in mind that this is a very graphical product and there are 100's
of ways that the product can fail in ways that automated tests can't
catch. Particularly when the product has changed dramatically like with
the visual refresh for FF2. Also, when we do re-spins we cannot count
on community testing or baking.

So to wrap up, we need to be very judicious about where we add manual
tests, but we can't put in place procedures that handicap the release
team or test team. Sometimes smarter testing means adding manual tests
first and then converting to automated tests later.

The release team (dev, qa, build, mktg) should discuss these things in
the context of a release. There should be a different approach with
security releases which might dictate more manual bug verification and
more manual user based testing given less bake time in general. As
compared to FF2 with more bake time and more betas and RCs. And the
release team should make pragmatic decisions rather then rely on a
global process.

--Tim

Tim Riley

unread,

Oct 6, 2006, 5:33:58 PM10/6/06

to Adam Guthrie, dev-pl...@lists.mozilla.org

Adam,

I agree with the concept, but we have to watch the rhetoric. See
details below. The devil is in the details.

--Tim

Adam Guthrie wrote:
> L. David Baron wrote:
>> No. I'm also (in fact, primarily) concerned about testers. And also
>> about whoever does things like Web page changes, since I'm worried that
>> process might get significantly more complicated.
>
> I think that the solution here is for QA to do what build has done for
> the release process: automate it.

Automation is definitely the way to go, but it is not a silver bullet.
Automating a test typically takes 3-10 times as long as running it
manually. We need to to it, but it takes time.

This has been happening: L10n Metadiff, L10n validator, a whole series
of checkers by Axel, update checker, eggplant GUI tests

One of the big challenges is we can't stop the production to do this.
We still have to test FF2, FF150x, all the distros. These are all high
priority to the success of Mozilla. The good news is we have more
people on board like robcee, alice, martijn who can create the tools we
need. This _will_ bear fruit!

>
> I spent a lot of time this summer testing updates and spot checking
> various locales for every release we did. (And I mostly tested Linux; QA
> has to test on all three platforms, with various update paths and at
> least eight locales.) Each time I did this I thought, "This is so easy
> and monotonous. I would much rather be verifying bugs or testing other
> things that really need a human being to do."

Yes, some of this is monotonous. But it is important. The locale
testing is a classic example of something very hard to automate. How do
you tell a computer what every possible layout or font error might look
like. How does the computer know to check for an XML parser error.
These are things that have been found by people and if they happen on a
major locale we would look foolish to 100s of thousands in that user
community and to the press.

We have brought this down to about 14 combinations of locale and
platforms. It takes 2-3 people about 3 hours. Is it worth 6-9 hours of
effort to ensure we don't look foolish in a major locale?

One key here is we can move some of the testing out of the critical
release testing window which is from code freeze to public release. We
can do some pre-testing and some post-testing. We have been working out
these details with mconnor and schrep.

>
> Most of the time that's spent testing releases is on stuff that could be
> easily automated; e.g. locale and update checking. Granted, smoketests,
> basic and full functional tests will probably still need to be run by
> people (until we can get Eggplant to do this). I think that if somehow
> QA could automate this then we wouldn't have to sacrifice things such as
> adding a new language.

We have automated away about 70% of the locale and update checking. We
used to test 30-40 combos of locales fro 1.5. Now we test about 14.
We still find layout and font issues. These are not caught by the
automated tools and would be very difficult to do so.

Benjamin Smedberg

unread,

Oct 6, 2006, 6:33:47 PM10/6/06

to

Tim Riley wrote:

> Yes, some of this is monotonous. But it is important. The locale
> testing is a classic example of something very hard to automate. How do
> you tell a computer what every possible layout or font error might look
> like. How does the computer know to check for an XML parser error. These
> are things that have been found by people and if they happen on a major
> locale we would look foolish to 100s of thousands in that user community
> and to the press.

Not to detract from the gist of your post, but XML parsing errors are
reported to the JS console and it should be very simple to automate testing
for XML parsing errors.

--BDS

Mike Connor

unread,

Oct 7, 2006, 1:36:47 PM10/7/06

to dev-pl...@lists.mozilla.org

On 6-Oct-06, at 1:13 PM, L. David Baron wrote:

> On Friday 2006-10-06 16:54 +0100, Gervase Markham wrote:

>> As I understand his original post, dbaron is concerned about the
>> release
>> process taking up the time of coders who would otherwise be
>> working on
>> the next release. (dbaron: am I right?)
>

> No. I'm also (in fact, primarily) concerned about testers. And also
> about whoever does things like Web page changes, since I'm worried
> that
> process might get significantly more complicated.
>

>> So, in a world where the release process took two weeks, but it
>> involved
>> only the build and release team (and assuming, for this hypothetical
>> situation, that we only ever made one release at once), then that
>> would
>> be fine.
>

> No, it wouldn't. We need to be able to do security firedrills much
> faster than that.

Security firedrills, IMO, should not be the basis under which we
design our release quality process, especially for final releases.
We should build a firedrill process to handle those cases that carry
a different quality burden, aimed at striking a different balance
between user need and quality expectations.

To go from patch to release in two or three days for all locales is
possible, but results in a much greater chance of needing to do
something like the 1.5.0.5->1.5.0.6 bump. In firedrill situations,
we might be willing to take that risk. For regular updates, that
shouldn't be the case, and our quality guidelines should be much
tighter.

-- Mike

Mike Schroepfer

unread,

Oct 8, 2006, 1:55:50 AM10/8/06

to Mike Connor, dev-pl...@lists.mozilla.org

Yep that's exactly right. For 1.5.0.x releases we are doing all the
following which we didn't used to do:
* Require sim-ship for all locales (and have increased the number of
locales)
* Sim-ship Firefox and Thunderbird
* Ship partial and full updates for all previous releases (i.e.
1.5.0.*->1.5.0.7)
* Do a much better job (mostly automated) of validating both locales
and updates
* Document the product release process as we go
* Additional time left to ensure bits are properly mirrored before we
release

We've also been much better about tagging during the early part of this
process and opening the tree back up for parallel development. So in
the 2.0 releases, at least, this process did not block landing of new
patches.

Each release is a risk/reward tradeoff. If there is a zero-day
vulnerability then days matter and we'll take additional risk by
reducing the validation steps and getting the release out sooner. If
there isn't immediate pressure to release then it makes sense to do a
little bit of extra validation to make sure we get everything right.
Publishing a bad update is a pretty catastrophic failure and thus
warrants vigilance.

It's also useful to understand your frame of reference. Do you know
what the QA/Build cycle for comparable products shipped on 3 platforms,
in 40 locales, with automatic updates to an existing user base north of
50 million users? For many commercial software companies their "back
end" for products like this is measured in weeks to months.

This is not to say that we can't make the process faster, more
repeatable, and less painful through process and tools. Check out:

https://bugzilla.mozilla.org/show_bug.cgi?id=355309 (tracking bugs for
more automated builds)
http://wiki.mozilla.org/Firefox:1.5.0.7:Community (example of how much
better documented our dot release process is)
https://intranet.mozilla.org/Build:Unified_Release_Process (the
incredibly well documented release process we now have)

The good news is we understand (and have documented) the release process
- so everyone can help make it better.

Cheers,

Schrep

Tim Riley

unread,

Oct 11, 2006, 3:26:42 PM10/11/06

to Benjamin Smedberg

But you have to _trigger_ the parsing error. Automation includes
invoking and checking. The invoking part is more challenging.

We could use eggplant or jssh to automate this.

--Tim

Benjamin Smedberg

unread,

Oct 11, 2006, 3:38:12 PM10/11/06

to

Tim Riley wrote:

> But you have to _trigger_ the parsing error. Automation includes
> invoking and checking. The invoking part is more challenging.

Note that you don't actually have to open the window to test the XML
parsing. You can simply feed the URI stream to the XML parser (using
saxxmlreader, for example). The tricky part would be compiling a list of XUL
pages which need testing. You could generate this list automatically by
listing the contents of the chrome JARs and mapping that back to chrome URIs.

--BDS

Tim Riley

unread,

Oct 11, 2006, 9:05:59 PM10/11/06

to Benjamin Smedberg

Thanks Benjamin. I was wondering if there was a simple parser we could
pass these pages through just to check for errors and how we could
identify the pages. So there would be some work to create the list of
pages and ideally some work to detect when new pages are added.

--Tim

Robert Kaiser

unread,

Oct 12, 2006, 8:42:01 AM10/12/06

to

Benjamin Smedberg schrieb:

Would this take into account all that XBL stuff we pick up and the
overlays we're using all over the place?
Esp. for detecting unmatched entities correctly this may be needed.

Robert Kaiser

Benjamin Smedberg

unread,

Oct 12, 2006, 9:21:05 AM10/12/06

to

You would have to parse the XBL files also (and xhtml, and any other XML I'm
not thinking of... maybe help RDF uses entities?).

--BDS

Axel Hecht

unread,

Oct 12, 2006, 5:45:21 PM10/12/06

to

Yep, they do.

Is there a way to enumerate packages and files? (Just didn't bother to
look right now)

Axel