Firefox 3 PRD should include Quality requirements

Justin Dolske

unread,

Feb 14, 2007, 12:32:56 AM2/14/07

to

I was looking at the PRD and meeting notes, and the thought struck me
that *quality* isn't really mentioned in the PRD. There's a couple of
vague references to it in the "Crosscutting Concerns" section, but I
feel it should really be called out as a peer to other major requirement
sections.

Some of the most common criticisms of FF2 are related to high memory
usage/leakage, and crashiness. That these two things are mentioned so
often ought to be enough to merit specific attention in FF3. We're
obviously aware of the issues, but these should be formally tracked as
FF3 requirements.

How should we track and these things? We could be lazy and simply add a
couple of NFRs to "be stable" and "use less memory". But I think we
should try to nail them down as FRs in a quantitative way.

As a brainstorming example:
- FR: reduce browser MTBF by X%, as compared to Firefox 2.0.0.x
- FR: MTBF > X hours for some particular use case or test.
- FR: total FF memory usage < X MB after completing some test
- FR: reduce total FF memory usage by X%, compared to FF2.

We would almost certainly need new (or modified) tests to generate
whatever measurements we decide are needed. Crash-related data, in
particular, unobtainable via Talkback might be obtainable with Airbag
(which has implications on the schedule for Airbag). And while getting
FF3 figures in absolute form would be good, also having figures for FF2
to enable relative comparisons would be great. [And as a blue-sky
thought, imagine being able to compare metrics against other browsers.]

In addition to having Quality requirements for plain-vanilla Firefox, I
think we should also have similar requirements Firefox with some set of
add-ons installed (extensions, themes, plugins). Achieving these
requirements would mean more than just fixing browser bugs -- we would
also need to make an active effort to engage the add-on community...
This might mean providing better documentation and tools for addressing
the problem, and even having Mozilla contribute or suggest code changes
to some add-ons wouldn't be entirely out of the question, IMO.

Justin
"Now wiht 47% less typos!"

Gervase Markham

unread,

Feb 14, 2007, 10:03:58 AM2/14/07

to

Justin Dolske wrote:
> I was looking at the PRD and meeting notes, and the thought struck me
> that *quality* isn't really mentioned in the PRD. There's a couple of
> vague references to it in the "Crosscutting Concerns" section, but I
> feel it should really be called out as a peer to other major requirement
> sections.

A fair point IMO.

> How should we track and these things? We could be lazy and simply add a
> couple of NFRs to "be stable" and "use less memory". But I think we
> should try to nail them down as FRs in a quantitative way.
>
> As a brainstorming example:
> - FR: reduce browser MTBF by X%, as compared to Firefox 2.0.0.x

Do we have MTBF numbers, from Talkback or elsewhere?

> - FR: total FF memory usage < X MB after completing some test
> - FR: reduce total FF memory usage by X%, compared to FF2.

That would require a usage scenario, of course.

> In addition to having Quality requirements for plain-vanilla Firefox, I
> think we should also have similar requirements Firefox with some set of
> add-ons installed (extensions, themes, plugins).

But which set? Pulling together a set of "common" addons is a difficult
thing, because in the grand scheme of things, people's addon preferences
are so varied that no addon is all that common.

I'd be concerned that this would be an enormous can of worms.

Gerv

Mike Beltzner

unread,

Feb 14, 2007, 12:18:47 PM2/14/07

to Justin Dolske, dev-pl...@lists.mozilla.org

On 14-Feb-07, at 12:32 AM, Justin Dolske wrote:

> I was looking at the PRD and meeting notes, and the thought struck
> me that *quality* isn't really mentioned in the PRD. There's a
> couple of vague references to it in the "Crosscutting Concerns"
> section, but I feel it should really be called out as a peer to
> other major requirement sections.

First, the cross-cutting concerns are of equal or indeed higher
importance than the FRs and NFRs, as I see things. They reflect
concerns and requirements which *must* be taken into account across
all work done on the product. They're extracted like that so that
every "feature" doesn't have an additional P1 for things like "code
is secure" or "feature doesn't crash." They represent the non-
negotiable basic set of expectations for any changes to the product.
Perhaps that needs to be made clearer in the structure of the
document or in the way they are introduced, or perhaps every
functional area needs to have a P1 FR of "satisfied all cross-cutting
concerns"?

Second, how does one measure "quality"? The issues you raise below
imply "performance" (as measured by speed & memory usage) and
"stability" (as measured by crashiness) which are captured in the
cross-cutting concerns directly as "Performance" and "Reliability".

I do think, however, that there should be a separate part of the PRD
devoted to scoping and assigning resource to the task of eliminating
cruft within our codebase. Someone might Joel on Software me here,
but I think it might be worth our time to try and figure out if
there's pieces of code that can be refactored or made more efficient,
or outright removed in the service of performance and download size.

> Some of the most common criticisms of FF2 are related to high
> memory usage/leakage, and crashiness. That these two things are
> mentioned so often ought to be enough to merit specific attention
> in FF3. We're obviously aware of the issues, but these should be
> formally tracked as FF3 requirements.
>
> How should we track and these things? We could be lazy and simply
> add a couple of NFRs to "be stable" and "use less memory". But I
> think we should try to nail them down as FRs in a quantitative way.

I agree entirely. We tried to get this nailed down for Fx2, but kept
pushing off meetings scheduled for "figuring out what metrics to use
and what reasonable targets are" until it became too late. We
shouldn't repeat that mistake.

You'll note that the cross-cutting concern for performance states:

* Performance metrics have been discussed, agreed upon, and recorded.
* Does not have a negative impact on overall Firefox performance.
* If performance impact is unavoidable, that impact is within a set
and agreed upon range.
* Optimized as much as possible to improve overall performance.
* Performance tests have been run and verified by QA.

This applies to each feature area in the PRD, meaning that we'll have
to come up with metrics for each. This should be done in conjunction
with QA to ensure that the required test suites can be run frequently
to checkpoint against those metrics, really.

> As a brainstorming example:
> - FR: reduce browser MTBF by X%, as compared to Firefox 2.0.0.x
> - FR: MTBF > X hours for some particular use case or test.
> - FR: total FF memory usage < X MB after completing some test
> - FR: reduce total FF memory usage by X%, compared to FF2.
>
> We would almost certainly need new (or modified) tests to generate
> whatever measurements we decide are needed. Crash-related data, in
> particular, unobtainable via Talkback might be obtainable with
> Airbag (which has implications on the schedule for Airbag). And
> while getting FF3 figures in absolute form would be good, also
> having figures for FF2 to enable relative comparisons would be
> great. [And as a blue-sky thought, imagine being able to compare
> metrics against other browsers.]

Agreed; also, I know that QA is looking to figure out what sort of
comparisons would be useful. A rough metric of crashiness could be
assembled by taking the number of crash reports and dividing it by
the number of AUS pings.

What other tests would be good for measuring performance? I know that
Alice is working on some of these already, too ...

> In addition to having Quality requirements for plain-vanilla
> Firefox, I think we should also have similar requirements Firefox
> with some set of add-ons installed (extensions, themes, plugins).
> Achieving these requirements would mean more than just fixing
> browser bugs -- we would also need to make an active effort to
> engage the add-on community... This might mean providing better
> documentation and tools for addressing the problem, and even having
> Mozilla contribute or suggest code changes to some add-ons wouldn't
> be entirely out of the question, IMO.

The AMO team already has some Great Plans Afoot here, as I understand
things, but definitely the idea of running our performance tests
against a set of common plugins (ABP, FlashBlock, Forecast Fox, the
ones that get mentioned all the time in articles, etc) seems like a
great idea.

cheers,
mike

Chris Hofmann

unread,

Feb 14, 2007, 11:53:46 AM2/14/07

to Gervase Markham, dev-pl...@lists.mozilla.org

Gervase Markham wrote:
> Justin Dolske wrote:
>> I

>>
>> As a brainstorming example:
>> - FR: reduce browser MTBF by X%, as compared to Firefox 2.0.0.x
>
> Do we have MTBF numbers, from Talkback or elsewhere?
>

http://talkback-public.mozilla.org/reports/firefox/FF2001/smart-analysis.all

The problem with trying set this as a criteria for release is that its a
circular dependency.

You need to have population that has similar usage patterns and amounts
of usage to compare 2.0.0.x to 3.0 so that means you have to have
shipped 3.0 to collect meaningful MTBF data, and it has to have shipped
for awhile to allow the MTBF number to "stabilize". With talkback we
only get reports from users that have crashed so the MTBF number not
perfect, but shows some interesting release to release patters that are
helpful in understanding if we are getting better or worse. The number
starts low on the first day, then grows as the possible number of uptime
hours grows. After about 4-6 weeks the MTBF number usually stabilizes.
We used to plot those graphs but I don't think that is turned on any
more. There will also be some problems in trying to compare MTBF
numbers between 2.x and 3.0 if we switch over to airbag and a new
mechanism/quirks for computing MTBF.

The best kind of goal for stability and other quality criteria is to
specify that beta's and release candidates are shipped to a significant
number of users. We should set goals of 2-3% market share ( from many
accounts IE 7 had about 2.3% market share when it shipped, and Firefox
1.0 had about 3%).

If you can get that many users happily using a pre-release it helps to
get the right amount of volume for all kinds of feedback and it
significantly reduces risk.

For more specific stability goals/actions you can specify that the top
crash lists for betas and release candidates show no significant
regressions from previous releases.

chris h.
>> -

Justin Dolske

unread,

Feb 14, 2007, 5:28:55 PM2/14/07

to

Gervase Markham wrote:

>> As a brainstorming example:
>> - FR: reduce browser MTBF by X%, as compared to Firefox 2.0.0.x
>
> Do we have MTBF numbers, from Talkback or elsewhere?

My understanding is that it's hard to get a sound number from Talkback,
mainly because (IIRC) a report says when the last crash was, but not how
much usage happened since then.

But I think the main point here is that a discussion about what metric
to use here will need to balance what we *should* measure with what we
*can* measure.

>> In addition to having Quality requirements for plain-vanilla Firefox,
>> I think we should also have similar requirements Firefox with some set
>> of add-ons installed (extensions, themes, plugins).
>
> But which set? Pulling together a set of "common" addons is a difficult
> thing, because in the grand scheme of things, people's addon preferences
> are so varied that no addon is all that common.

Good question. We obviously can't test all extensions, and even
determining a set of typical configurations might be impractical. But we
do have data from AMO to identify the most popular add-ons, and so a
simple starting point might be to test with the top-10 addons installed.

Another possibility might be to make test suites accessible to extension
developers, and allow them to do their own testing.

> I'd be concerned that this would be an enormous can of worms.

It could be, we just don't want to open the can too much. :-)

Justin

Rafael Ebron

unread,

Feb 14, 2007, 6:42:56 PM2/14/07

to dev-pl...@lists.mozilla.org

Well, there was a lot of work done around quality in the past and there
were fairly clear metrics associated with them. I hate to think that
we're starting from scratch here with definition and data/benchmarks.
chofmann, dbaron, jay should know more. Most of this info should be
somewhere in that black hole called mozilla.org.

MTBF, back and forward button performance, open new window (cold and
warm start), page load tests with the top x number of sites. Should be
testing all these against IE 7 on Windows Vista, Firefox 2.0, Safari.

There's new ones to add for sure, e.g. Zimbra seems to have performance
metrics around their apps and I would bet Google does too. Music and
video sites should be included in some of these measures too, watching a
video on YouTube, listening to music on someone's profile on MySpace.

So there's the set of objective quality measures around stability and
performance. The more subjective ones around visual and UE experience
e.g. pixels are off or UI in preferences on Mac OS X shouldn't be
choppy, have to be defined separately.

But yes, quality (performance, stability, user experience) should be
defined in the prd, not in full detail but enough detail. Performance
and stability should have it's own section in the PRD and each feature
won't necessarily have performance or stability requirements attached to
them, that's just overkill.

Also, where it would be good to see more testing is with Flash, WMP,
QuickTime, some of the more original and more widely used "add-ons".

-Rafael

> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
>

rki...@mozilla.com

unread,

Feb 15, 2007, 12:33:41 AM2/15/07

to

Rafael Ebron wrote:
> Well, there was a lot of work done around quality in the past and there
> were fairly clear metrics associated with them. I hate to think that
> we're starting from scratch here with definition and data/benchmarks.
> chofmann, dbaron, jay should know more. Most of this info should be
> somewhere in that black hole called mozilla.org.

As someone who has recently been given the task of spelunking into that
black hole to find tests and test harnesses, I can tell you that those
people and many more have a lot of information about what has been done
and what is available. But a lot of knowledge at Mozilla is in the form
of "tribal knowledge" or in bits and pieces around the distributed
information space that is "Mozilla" and I think that most people do not
assume their view is comprehensive.

> MTBF, back and forward button performance, open new window (cold and
> warm start), page load tests with the top x number of sites. Should be
> testing all these against IE 7 on Windows Vista, Firefox 2.0, Safari.
> There's new ones to add for sure, e.g. Zimbra seems to have performance
> metrics around their apps and I would bet Google does too. Music and
> video sites should be included in some of these measures too, watching a
> video on YouTube, listening to music on someone's profile on MySpace.

People are testing many of these things and we are recording the results
of this testing. When some of the automation testing is launched, and it
is fairly near to being launched, the testing information will be much
more current and visible.

The expectations you are expressing are exactly what I have been hearing
from everybody at Mozilla that I have talked to. So do not let yourself
feel that you are alone. People know that all this needs to get done.

> But yes, quality (performance, stability, user experience) should be
> defined in the prd, not in full detail but enough detail. Performance
> and stability should have it's own section in the PRD and each feature
> won't necessarily have performance or stability requirements attached to
> them, that's just overkill.

I have a feeling this kind of this will be happening. People know there
is work to be done for things like performance and stability. There was
work done that tightened up memory leakage, but I check for leaks and I
know there is still work to do.

http://xoatlicue.blogspot.com/2007/02/quick-and-easy-leaks-detection-on-mac.html

> Also, where it would be good to see more testing is with Flash, WMP,
> QuickTime, some of the more original and more widely used "add-ons".
> -Rafael

I think the inventory of testing that I have done brings me to say you
are completely right. Frankly, I want to try to find a way to
characterize the behavior of some of the add-ons, but I want to be sure
of what I am doing first. I am new to the Mozilla community and I do not
want to cause a back-and-forth of blame for quality issues by looking at
these issues in a simplistic way.

If you have suggestions for testing add-ons, please let us know what
they are.

I am often on the IRC channels (nick: "ray", lurker in: #qa, #devmo,
#litmus).

thanx - ray

Gervase Markham

unread,

Feb 15, 2007, 8:41:19 AM2/15/07

to

Justin Dolske wrote:
> My understanding is that it's hard to get a sound number from Talkback,
> mainly because (IIRC) a report says when the last crash was, but not how
> much usage happened since then.

Yes; and I'm not sure we can ever get that figure without being too
intrusive. :-|

> Good question. We obviously can't test all extensions, and even
> determining a set of typical configurations might be impractical. But we
> do have data from AMO to identify the most popular add-ons, and so a
> simple starting point might be to test with the top-10 addons installed.

But if we isolated a problematic add-on, would we end up de facto
joining its development team? Surely part of the point of addons is that
they are addons, maintained and worried about by someone else? :-)

Gerv

Gijs Kruitbosch

unread,

Feb 15, 2007, 9:21:13 AM2/15/07

to Gervase Markham

Gervase Markham wrote:
>> Good question. We obviously can't test all extensions, and even
>> determining a set of typical configurations might be impractical. But
>> we do have data from AMO to identify the most popular add-ons, and so
>> a simple starting point might be to test with the top-10 addons
>> installed.
>
> But if we isolated a problematic add-on, would we end up de facto
> joining its development team? Surely part of the point of addons is that
> they are addons, maintained and worried about by someone else? :-)
>
> Gerv

Yes and no, respectively, I believe.

Yes in that you're right as is: if we fix add-on problems ourselves then
we join their development team. Whether that's a really bad thing is
questionable, but I don't think Justin said we would have to develop the
fix themselves. Surely you wouldn't consider everyone who has ever filed
a bug in bugzilla "part of the Firefox development team" ?

More importantly, I think the answer to your second question is "no" in
the sense that (for instance) the Windows development team contacted
Mozilla some time before releasing Vista and offered them help in
adapting Firefox and their other software so it would run well on it. In
the same way, I could imagine the Firefox development team would want to
make sure that at least some of the well-known add-ons or applications
that use Firefox / the "Mozilla Platform" continue running well in new
versions. This is wise both from a developer relations point of view
(try not to have the extension developers find out about problems
themselves at the last minute) and a user experience point of view -
users will want their add-ons to continue to work, and if they don't
then they will first go to the add-on authors, who will point the finger
back at Mozilla if it's their fault.

~ Gijs

Gervase Markham

unread,

Feb 16, 2007, 9:45:55 AM2/16/07

to

Gijs Kruitbosch wrote:
> More importantly, I think the answer to your second question is "no" in
> the sense that (for instance) the Windows development team contacted
> Mozilla some time before releasing Vista and offered them help in
> adapting Firefox and their other software so it would run well on it.

True.

> In
> the same way, I could imagine the Firefox development team would want to
> make sure that at least some of the well-known add-ons or applications
> that use Firefox / the "Mozilla Platform" continue running well in new
> versions. This is wise both from a developer relations point of view
> (try not to have the extension developers find out about problems
> themselves at the last minute) and a user experience point of view -
> users will want their add-ons to continue to work, and if they don't
> then they will first go to the add-on authors, who will point the finger
> back at Mozilla if it's their fault.

Well, I would rather expect extension developers to be testing and
giving feedback throughout the development cycle anyway. The reason
Microsoft needs a Vista lab is because you can't download nightly builds
of Vista during the development and test against it, and you can't
propose patches to fix things you need fixed.

But maybe you are right. We should make some effort to ensure the X most
popular extensions work. However, what is the value of X?

It would be great if someone could take amo download data and produce a
graph of number of extensions vs. percentage of downloads, so we could
see if the top 50 made up 10% of downloads, or 90%. In other words, how
long is the long tail?

Gerv

James Ross

unread,

Feb 16, 2007, 12:04:19 PM2/16/07

to

Gervase Markham wrote:
> It would be great if someone could take amo download data and produce a
> graph of number of extensions vs. percentage of downloads, so we could
> see if the top 50 made up 10% of downloads, or 90%. In other words, how
> long is the long tail?

I've got graphs of the downloads/week for the top 30 extensions (and
some others) at http://twpol.dyndns.org/mozilla/extensions/stats/, and
just for kicks I worked out the sum of downloads/week for them [1].

So, the total downloads/week for the current top N AMO extensions is:

N D/week
5 630,000
10 980,000
20 1,425,000
30 1,675,000

Anyone know what the total downloads for all extensions is like?

--
James Ross <sil...@warwickcompsoc.co.uk>
ChatZilla Developer

[1] I could have done this an easier way, actually, but doing graphs is
way more fun. See:
http://twpol.dyndns.org/mozilla/extensions/stats/?group=1;m=dlsum and
http://twpol.dyndns.org/mozilla/extensions/stats/?group=2;m=dlsum for
top 5/10 respectively (they take a while to load the image, need to fix
that).

rki...@mozilla.com

unread,

Feb 16, 2007, 3:29:59 PM2/16/07

to

<snip>

Gervase Markham wrote:
>> In the same way, I could imagine the Firefox development team would
>> want to make sure that at least some of the well-known add-ons or
>> applications that use Firefox / the "Mozilla Platform" continue
>> running well in new versions. This is wise both from a developer
>> relations point of view (try not to have the extension developers find
>> out about problems themselves at the last minute) and a user
>> experience point of view - users will want their add-ons to continue
>> to work, and if they don't then they will first go to the add-on
>> authors, who will point the finger back at Mozilla if it's their fault.
>
> Well, I would rather expect extension developers to be testing and
> giving feedback throughout the development cycle anyway. The reason
> Microsoft needs a Vista lab is because you can't download nightly builds
> of Vista during the development and test against it, and you can't
> propose patches to fix things you need fixed.
>

FYI, I proposed that, in addition to doing the nightly builds that the
build team does now, they also do builds that have tests enabled.

https://bugzilla.mozilla.org/show_bug.cgi?id=369809

This should make it easier for extension developers, and other members
of the community, to run the tests that can be made available.

<snip>

Justin Dolske

unread,

Feb 17, 2007, 1:53:02 AM2/17/07

to

Gervase Markham wrote:

> Well, I would rather expect extension developers to be testing and
> giving feedback throughout the development cycle anyway.

True. But that's placing the bar fairly high, as I don't think you can
necessarily rely on extension developers to test things beyond "does my
extension work" and "did it break anything obvious". I'd be surprised if
more than a handful of developers had any kind of test suite, or checks
for things like memory leaks and page load performance.

Justin

Justin Dolske

unread,

Feb 17, 2007, 3:02:00 AM2/17/07

to

Justin Dolske wrote:

> True. But that's placing the bar fairly high

Err, I mean "low". For some reason I had limbo stuck in my head. :-)

Justin

Gervase Markham

unread,

Feb 20, 2007, 5:31:25 AM2/20/07

to

James Ross wrote:
> Anyone know what the total downloads for all extensions is like?

Yes - we need to know this to know how long the tail is.

Gerv

Mike Shaver

unread,

Feb 20, 2007, 8:43:02 AM2/20/07

to Gervase Markham, dev-pl...@lists.mozilla.org

Now is a _particularly_ bad time for me to ask people to dive into
mining that data, but -- to the extent that it's reliable -- I'll try
to extract that this week.

Update-pings would be a more interesting metric, perhaps, since they
reflect continuing use and not just new installs, but they have their
own problems! (For one thing, the sheer size of the dataset makes
some of our tools sad-faces.)

Mike

alice

unread,

Feb 20, 2007, 2:54:51 PM2/20/07

to

> Agreed; also, I know that QA is looking to figure out what sort of
> comparisons would be useful. A rough metric of crashiness could be
> assembled by taking the number of crash reports and dividing it by the
> number of AUS pings.
>
> What other tests would be good for measuring performance? I know that
> Alice is working on some of these already, too ...

The discussion that QA recently had was concerning the ability for QA to
apply some sort of quality number to a given release. The idea being
that we do a lot of testing on the front end, but don't really have a
measure to say if a release could be considered good or bad - other than
if we had to push out a rapid fix in short order.

We threw around a bunch of numbers (top crashes, bug filings, bug
priorities in filings, tp/ts numbers etc). We mostly wanted to get to a
point where we could create some metric which would then be calculated
and posted after a reasonable time period post release.

Having such a metric would also help in the creation of future goals.
If we could say that a release was considered 80% stable (or somesuch)
we could then aim to create something with a higher number the next go
around. It would also help to isolate areas of improvement (ie, the
perf component of the metric caused a release to be 'bad' so we need to
work on that number to improve the overall).

For me, it was an attempt to get all the different information that we
gather into a tidy package. I realize that this may be a pipe dream,
but it could go a long way to providing us with the information we need
for properly directly our efforts.

alice.