Milestone Scheduling

30 views
Skip to first unread message

Mike Schroepfer

unread,
Jul 8, 2007, 12:37:50 PM7/8/07
to
We discussed this at the last Gecko and Firefox meetings - but I wanted
to get some notes on the plan for scheduling of future milestones down
in print.

Here's the context we are using to evaluate scheduling of milestones:

1) We are driven by quality, not time. We want to Firefox 3 to be
something that we are all proud of. This means features that delight
users and the same or higher quality than previous releases. "Quality"
includes performance (Tp/Ts/TDHTML/etc), footprint, web compatibility,
regressions, and general fit and finish. Having said that, we want to
move the web forward and are in a competitive market. So we should
converge on a release as fast as possible.

2) There has been almost 2 years of development on the 1.9 platform
incorporating major changes: Reflow, Cairo, Cycle Collector, Native Mac
Widgets, contenteditable, many parts of the Web Apps 1.0 Spec, etc. We
need to have enough "bake time", public milestones, and regression fix
time to ensure we meet our quality goal. We should also endeavor to get
this functionality into the hands of users and web developers as soon as
possible. The sooner we ship this the sooner web authors can count on
>15% of their users supporting the latest capabilities and standards.

3) The Firefox front-end has had significantly less development time
than the platform and has yet to have the opportunity to innovate on top
of infrastructure built for places, password manager, and others. So
we'd like to give them until M8 to continue to develop user-visible
features on top of the core infrastructure.

4) A milestone schedule with a release every 6 weeks (4 weeks till code
freeze from last milestone, 2 weeks of stabilization/build work) seems
to work the best. Note that actual tree closures will in practice
likely be shorter than 2 weeks if there are not multiple re-spins.

Based on this context the proposed schedule is:

* M7: Freeze on July 25
* Platform feature freeze
* This is the "web developer preview release" since it is
platform complete. This will be marketed at a higher volume
than other alphas to help get wider-scale testing.
* M8: Freeze on Sept 5
* Firefox feature freeze
* M9: Freeze on Oct 16
* M*: Ongoing as needed

Feature Freeze = all planned features are implemented and exposed
(through APIs and user interface elements) in ways that are usable, but
not necessarily polished. After freeze, landings will be restricted to
regressions (from 1.8), performance and footprint fixes, as well as
additional functional or unit test coverage and changes to APIs and user
interface elements based on feedback from the beta cycle.

In order to hit our goals above we are going to do the following:

1) Only explicitly named platform features are available for landing
before M7 (with exceptions heard by the release drivers). At the time
of this writing the only platform features remaining to land before M7
that I'm aware of are Anti-malware, Secure wrappers, and some Offline
work. This means if you are working on a platform feature for 1.9
that's not on this list you should help close out the long blocker list.

2) The trunk will go under release driver control after M7. This means
all check-ins will require release driver approval after July 25.
Release drivers currently include MConnor, CBeard, Betlzner, Basil,
Schrep, Damon, Vlad. Additional volunteers welcomed :-). As always
these folks will do frequent triage and will rely heavily on the
judgment and assistance of module owners and experts in each major
functional area.

3) We'll switch from Alphas to Betas as soon as we believe Firefox is
stable and usable enough for daily browsing for a large number of
people. Until we hit this criteria we'll continue to release Alphas on
the 6 week cadence above. Criteria:
a) Footprint at or below that of 1.8. This is being measured regularly
through Talos working set size (http://tinyurl.com/252ka3) and through
informal dogfooding.
b) Most sites should display properly and regression free (from
previous major release)
c) No known common dataloss bugs
d) No common hangs or crashes
e) No problems with major features in common use cases

"Common" is defined as usage of the browser with any popular websites
or frequent occurrence in daily browsing for our dogfood or beta
population. We'll measure this through frequency of bug reports and
direct feedback from users.

Based on this criteria it does not appear that M7 will be ready to be
called a beta. Talos is showing a ~18% increase in Footprint and
informal dogfooding confirms things are currently worse on the trunk.
Search for keyword mlk in bugzilla to find plenty of known bugs here.

4) We'll release betas until we complete our regression work and
incorporate feedback from wider-scale testing. Before we release the
final beta Performance (specifically Ts, Tp, Tdhtml, Txul, and any other
benchmarks we add to the main tinderboxes) will be as good or better
than 1.8. We should strive for improved Tp and Tdhtml scores
performance v.s. 1.8.

5) After the last beta we'll release a Release Candidate. The Release
Candidate is meant to be bit-for-bit the final release. Only new
problems found after the RC is released will cause additional RC's to be
published. Once we are confident there are no new issues we'll release
the final release.

So in summary:

* Can I land platform feature or old bug fix X?
* In general no, but read above carefully
* When will Beta 1 Ship?
* As soon as it is ready (see #3 above)
* When is the next Milestone?
* 6 weeks from the last one.
* When will the last Beta ship?
* As soon as it is ready (see #4 above)
* What can I do to help?
* Platform folks let's sprint to the finish. Footprint, performance,
regressions, unit tests! Everyone involved wants to get a beta into
people's hands asap. We could also use your help getting the blocker
lists managed. If it doesn’t fit that criteria please minus it.
* Firefox - you've got a little bit of time left to crank. Delight us!
* Everyone else - plenty of help needed reproducing, filing, and
confirming bugs. Dogfood. Run the nightly tester tools + leak gauge,
help us hammer this thing into shape.

Questions or Thoughts?

Simon Paquet

unread,
Jul 8, 2007, 12:49:42 PM7/8/07
to
And on the seventh day Mike Schroepfer spoke:

>Questions or Thoughts?

Is a M-release the same as a alpha release?

If yes, could you please call it an alpha as before, because the
M-releases are still widely associated by people with releases before
Mozilla 0.6.

Simon
--
Calendar l10n coordinator
Calendar Website Maintainer: http://www.mozilla.org/projects/calendar
Calendar developer blog: http://weblogs.mozillazine.org/calendar

Mike Connor

unread,
Jul 8, 2007, 1:05:20 PM7/8/07
to Simon Paquet, dev-pl...@lists.mozilla.org
Simon Paquet wrote:
> And on the seventh day Mike Schroepfer spoke:
>
>
>> Questions or Thoughts?
>>
>
> Is a M-release the same as a alpha release?
>
> If yes, could you please call it an alpha as before, because the
> M-releases are still widely associated by people with releases before
> Mozilla 0.6.
>
They're not the same. It is not clear at this time whether those
releases will be alphas or betas beyond M7, so calling them alphas (i.e.
alpha 8/9) is possibly inaccurate/misleading. I think there's very
little likelihood of confusion here. I suggested, for the purpose of
scheduling, to resurrect the M* numbering convention, though we will
publicly use alpha or beta versioning as appropriate.

-- Mike

Boris Zbarsky

unread,
Jul 8, 2007, 1:42:02 PM7/8/07
to
Mike Schroepfer wrote:
> Questions or Thoughts?

1) Where do wanted-1.9 bugs fit into this setup? Especially regressions
since 1.8?

2) Where do long-standing patches that have been waiting on reviews
for months that are neither blocking1.9 nor marked wanted-1.9 fit in
at this point?

3) Does "platform features" (the things that should no longer be worked
on) include platform bug and regression fixes that are not blockers,
or only new functionality?

4) How does one request approval for patches?

I assume the answers to the above are:

1) OK to land before M7, need approval after M7 like everything else. The
notation is meaningless in terms of release scheduling, and only there
to indicate to people who want something to do what to work on (outside
the blockers).
2) As #1, unless they're feature additions.
3) New functionality.
4) We'll add a flag before M7 ships.

Let me know if I'm wrong?

-Boris

Schrep

unread,
Jul 8, 2007, 2:33:39 PM7/8/07
to
> 1) Where do wanted-1.9 bugs fit into this setup? Especially
regressions
> since 1.8?

> OK to land before M7, need approval after M7 like everything else. The


> notation is meaningless in terms of release scheduling, and only there
> to indicate to people who want something to do what to work on (outside
> the blockers).

That's correct. But more generally regressions fixes since 1.8 are
encouraged and welcomed throughout the schedule.

> 2) Where do long-standing patches that have been waiting on reviews
> for months that are neither blocking1.9 nor marked wanted-1.9 fit in
> at this point?

> 2) As #1, unless they're feature additions.

Correct

> 3) Does "platform features" (the things that should no longer be worked
> on) include platform bug and regression fixes that are not blockers,
> or only new functionality?

> 3) New functionality.

Correct. We are trying to start reducing the total amount of code
churn by closing the gate to new stuff and focusing on regression
fixes.

> 4) How does one request approval for patches?
>

> 4) We'll add a flag before M7 ships.
>

Correct.

Axel Hecht

unread,
Jul 11, 2007, 5:49:35 AM7/11/07
to
To recap what we talked about in the Firefox meeting yesterday and to
broadcast this to .l10n:

We'll push back the string freeze along with additional milestones. We
know that that's unfortunate as it's making it harder to plan for the
hot localization phase, but we don't have any other realistic choice. We
don't intend to change the amount of time we plan for localization, but
we may have to shift that time window.

We will start to require l10n-swags (*) by the time we require per-patch
approvals in general, that should be after the Firefox feature freeze, IIRC.

Questions?

Axel

(*) l10n-swag is the number of lines added to the localization files,
excluding comments, of course. That's not supposed to be a precise
number, but 1, 10, or 100 is important to know.

fantasai

unread,
Jul 19, 2007, 12:36:24 PM7/19/07
to Mike Connor, Simon Paquet, dev-pl...@lists.mozilla.org

Can we use lower-case 'm's, then? The early M* scheme used capital Ms.
This is also consistent with how we use lower-case for alphas and betas.
An even clearer abbreviation would be e.g. 1.9m7.

~fantasai

stu...@gmail.com

unread,
Jul 19, 2007, 2:46:20 PM7/19/07
to
On Jul 8, 9:37 am, Mike Schroepfer <sch...@mozilla.com> wrote:
> 2) The trunk will go under release driver control after M7. This means
> all check-ins will require release driver approval after July 25.
> Release drivers currently include MConnor, CBeard, Betlzner, Basil,
> Schrep, Damon, Vlad. Additional volunteers welcomed :-). As always
> these folks will do frequent triage and will rely heavily on the
> judgment and assistance of module owners and experts in each major
> functional area.
>

Going under release driver control for non-blockers at M7 seems like a
good step, but I don't think that should require approval for bugs
already marked blocking+. Requiring approval for blockers will slow
down their rate of fix and they've already gotten one level of
approval to be blocking+. I wouldn't start throttling blocker bugs
until much closer to shipping -- M9?

stuart

Mike Connor

unread,
Jul 19, 2007, 2:58:43 PM7/19/07
to stu...@gmail.com, dev-pl...@lists.mozilla.org

Agreed. I thought the plan was to do just that, though it wasn't stated
here.

We'll need to post a plan for approvals going forward, since we're not
going to require approvals for the front end until after M8, given how
much work is going to be going on with rapid iteration.

-- Mike

Jonas Sicking

unread,
Jul 19, 2007, 4:24:42 PM7/19/07
to

This sounds like an excellent idea to me.

/ Jonas

Mike Connor

unread,
Jul 19, 2007, 4:32:56 PM7/19/07
to dev-pl...@lists.mozilla.org
I don't think it matters enough to redo milestones and the dev calendar
at this point, and since we're not using this in UA strings or real
version numbers, I don't think it's a big deal. (And we can't use them
in version numbers, because m7 > b1 in our version comparison scheme,
and we're not changing that again). Other than aesthetic reasons, I
don't think it matters whether its 1.9m7 or 1.9 M7 on the schedule...

-- Mike

Schrep

unread,
Jul 19, 2007, 10:45:28 PM7/19/07
to
Hey Folks,

I wanted to follow-up to make sure that everyone has thought this
through and given feedback. I wrote this with a very decisive tone in
order to have something specific for everyone to discuss. It was not
intended to stifle feedback or be written in stone. Getting us to FF3
is going to take a lot of hard work from everyone here and it requires
that we all understand and agree on the game plan. So please do jump
in here, at the Gecko 1.9 meeting, on irc, or via email if you have
any thoughts.

Best,

Schrep

fantasai

unread,
Jul 20, 2007, 12:10:33 AM7/20/07
to
Mike Connor wrote:
> Jonas Sicking wrote:
>> fantasai wrote:
>>
>>> Can we use lower-case 'm's, then? The early M* scheme used capital Ms.
>>> This is also consistent with how we use lower-case for alphas and betas.
>>> An even clearer abbreviation would be e.g. 1.9m7.
>>
>> This sounds like an excellent idea to me.
>>
> I don't think it matters enough to redo milestones and the dev calendar
> at this point, and since we're not using this in UA strings or real
> version numbers,

No, it doesn't matter enough to redo the dev calendar etc, but we
can still use the lower-case convention from this point forward.

~fantasai

Robert Sayre

unread,
Jul 20, 2007, 10:54:48 PM7/20/07
to Schrep
Schrep wrote:
> Hey Folks,
>
> I wanted to follow-up to make sure that everyone has thought this
> through and given feedback.

It looks like NSS 3.12 will add a very, very large codesize hit. This is
a concrete regression, so there should be some concrete benefits if we
take it.

- Rob

RyanVM

unread,
Jul 21, 2007, 10:57:31 AM7/21/07
to

Is that really the case now that we're building sqlite3 as a separate
DLL that NSS can link to? As I understand it, NSS building their own
copy of sqlite3 was the main reason it led to such a huge codesize
increase last time.

Robert Sayre

unread,
Jul 21, 2007, 12:46:06 PM7/21/07
to RyanVM
RyanVM wrote:
>
> Is that really the case now that we're building sqlite3 as a separate
> DLL that NSS can link to?

Yes. See

<https://bugzilla.mozilla.org/show_bug.cgi?id=388403#c11>

- Rob

Mike Connor

unread,
Jul 21, 2007, 2:27:08 PM7/21/07
to dev-pl...@lists.mozilla.org

As a note, the codesize hit is the only visible problem, Tp/Ts/etc seem
generally unaffected.

Almost by definition, any major new feature adds code, the question is
how much new code is acceptable for a given feature. And the answer
will vary in direct proportion to how much you personally think the
feature is worth, so that's unlikely to be a real number. I think we've
decided we want EV cert support as part of our security UI strategy, and
there's other pieces that we might use in 1.9.1.

That said, there's clearly a ton of work that should be done to optimize
a lot of this codesize pain (bz has made some concrete suggestions in
the bug), and we'll have to discuss separately how to deal with those,
but I think we're very very unlikely to stay on NSS 3.11.x for Firefox
3. Probably the biggest reason is maintenance for security releases (we
migrated the branches to use the current stable NSS tag during the
winter, because NSS is not going to spot fix older versions anymore).
AIUI, 3.11 will be replaced by 3.12, and 3.11 will no longer be updated,
long before the Firefox 3 end of life. It is not viable for us to lock
into a to-be-unsupported version of NSS for the next 18-24 months for
Firefox 3, so we need to help make NSS 3.12 as performant as possible
sooner or later. Unless we're prepared to maintain our own fork for
NSS until libpkix meets some relatively arbitrary codesize target, and I
don't think we're at all prepared to do that.

I'm not saying a 9% Z hit is shippable (I'm going to ignore mZ, since it
doesn't include libxul or thebes, and is therefore broken right now),
but I think we will take some sort of nontrivial hit, and I think we
need to be prepared for that in order to get onto the new NSS version.
That hit should be as small as possible, but I see no situation where
we'll throw away EV cert support over a codesize hit.

-- Mike

Robert Sayre

unread,
Jul 21, 2007, 2:52:18 PM7/21/07
to Mike Connor, dev-pl...@lists.mozilla.org
Mike Connor wrote:
>
> I'm not saying a 9% Z hit is shippable (I'm going to ignore mZ, since it
> doesn't include libxul or thebes, and is therefore broken right now),
> but I think we will take some sort of nontrivial hit, and I think we
> need to be prepared for that in order to get onto the new NSS version.
> That hit should be as small as possible, but I see no situation where
> we'll throw away EV cert support over a codesize hit.

OK. So where are we going to compromise? Performance? Fit and finish?
Ship date?

It's pretty late in the game, and I don't see how taking a megabyte of
PKI code and adding front-end features for whatever it does is
compatible with our other goals.

- Rob

L. David Baron

unread,
Jul 21, 2007, 3:00:34 PM7/21/07
to dev-pl...@lists.mozilla.org
On Saturday 2007-07-21 14:27 -0400, Mike Connor wrote:
> sooner or later. Unless we're prepared to maintain our own fork for
> NSS until libpkix meets some relatively arbitrary codesize target, and I
> don't think we're at all prepared to do that.

What is libpkix and why do we want it? Can we build NSS without it?

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Mike Connor

unread,
Jul 21, 2007, 3:05:04 PM7/21/07
to dev-pl...@lists.mozilla.org
L. David Baron wrote:
> On Saturday 2007-07-21 14:27 -0400, Mike Connor wrote:
>
>> sooner or later. Unless we're prepared to maintain our own fork for
>> NSS until libpkix meets some relatively arbitrary codesize target, and I
>> don't think we're at all prepared to do that.
>>
>
> What is libpkix and why do we want it? Can we build NSS without it?
>
> -David

Quoting from the NSS team:

"Libpkix provides a much more complete an modern parsing of
certificates, most importantly policy parsing and handling cross
certificate environments correctly. Both of these are needed for EV (the
primary driver of getting libpkix in). (It also includes such things a
on the fly fetching of intermediate certs."

I'm not sure whether we can build without it in the future, in the
immediate short term we won't use it, but the coming NSS changes I
believe will depend on it.

-- Mike

Robert Sayre

unread,
Jul 22, 2007, 2:59:05 AM7/22/07
to Mike Connor, dev-pl...@lists.mozilla.org
Mike Connor wrote:

>
> On 21-Jul-07, at 2:52 PM, Robert Sayre wrote:
>
>> Mike Connor wrote:
>>> I'm not saying a 9% Z hit is shippable (I'm going to ignore mZ, since
>>> it doesn't include libxul or thebes, and is therefore broken right
>>> now), but I think we will take some sort of nontrivial hit, and I
>>> think we need to be prepared for that in order to get onto the new
>>> NSS version. That hit should be as small as possible, but I see no
>>> situation where we'll throw away EV cert support over a codesize hit.
>>
>> OK. So where are we going to compromise? Performance? Fit and finish?
>> Ship date?
>
> I'm not saying any of those should be affected, they should not be, to
> my knowledge.

That doesn't sound reasonable. We are going to accept a very large body
of code with known quality control problems.

> If you think any of these will be affected by the NSS
> 3.12 work, please speak up. As it stands I don't believe there's any
> unnecessary hit to any of them, do you have data suggesting otherwise?

I'm not the one claiming we should accept unknown risk for unknown
benefit, so the burden of proof is not on me.

> That said, EV cert support is listed as a P1 (release blocker)
> requirement for Firefox 3, so we intend to ship it, and we'll take a
> ship delay to get it. The decision was made, and nothing I've heard or
> seen has caused me to change my own perspective on that requirement.

We are going to support EV certs in Firefox 3. Thus far, they don't have
any measurable benefits, but it turns out we invented them, and it would
be too easy for our competitors to depict us as insecure if we dropped
them. So, here we are. We should at least assess the cost.

>
>> It's pretty late in the game, and I don't see how taking a megabyte of
>> PKI code and adding front-end features for whatever it does is
>> compatible with our other goals.
>

> Who said we're going to take a megabyte of codesize hit?

We can reduce the codesize by diverting time and effort from other things.

-Rob

Mike Connor

unread,
Jul 22, 2007, 4:40:19 AM7/22/07
to Robert Sayre, dev-pl...@lists.mozilla.org

On 22-Jul-07, at 2:59 AM, Robert Sayre wrote:

> Mike Connor wrote:
>> On 21-Jul-07, at 2:52 PM, Robert Sayre wrote:
>>> Mike Connor wrote:
>>>> I'm not saying a 9% Z hit is shippable (I'm going to ignore mZ,
>>>> since it doesn't include libxul or thebes, and is therefore
>>>> broken right now), but I think we will take some sort of
>>>> nontrivial hit, and I think we need to be prepared for that in
>>>> order to get onto the new NSS version. That hit should be as
>>>> small as possible, but I see no situation where we'll throw away
>>>> EV cert support over a codesize hit.
>>>
>>> OK. So where are we going to compromise? Performance? Fit and
>>> finish? Ship date?
>> I'm not saying any of those should be affected, they should not
>> be, to my knowledge.
>
> That doesn't sound reasonable. We are going to accept a very large
> body of code with known quality control problems.

If you have evidence that libpkix has known quality control problems,
take it up with the NSS maintainers. I'm not going to assert
anything either way, other than to say that codesize is the only
metric I'm willing to take a hit on. Anything else is a bug.

>> If you think any of these will be affected by the NSS 3.12 work,
>> please speak up. As it stands I don't believe there's any
>> unnecessary hit to any of them, do you have data suggesting
>> otherwise?
>
> I'm not the one claiming we should accept unknown risk for unknown
> benefit, so the burden of proof is not on me.

I'm not the one claiming that either. I believe the risks are well
understood, and the NSS team has a solid track record. IMO, not
taking 3.12 is the risky play, since we'll either need to find our
own NSS hackers to maintain a fork of NSS 3.11, or take the hit
anyway when we need security fixes that 3.12.x will get. I'd much
rather do that in alpha/beta than in security releases.

>> That said, EV cert support is listed as a P1 (release blocker)
>> requirement for Firefox 3, so we intend to ship it, and we'll take
>> a ship delay to get it. The decision was made, and nothing I've
>> heard or seen has caused me to change my own perspective on that
>> requirement.
>
> We are going to support EV certs in Firefox 3. Thus far, they don't
> have any measurable benefits, but it turns out we invented them,
> and it would be too easy for our competitors to depict us as
> insecure if we dropped them. So, here we are. We should at least
> assess the cost.

If you're saying we should cut EV, say it, don't use codesize as an
excuse.

-- Mike

Robert Sayre

unread,
Jul 22, 2007, 10:56:20 AM7/22/07
to Mike Connor, dev-pl...@lists.mozilla.org
Mike Connor wrote:
>
>>
>> That doesn't sound reasonable. We are going to accept a very large
>> body of code with known quality control problems.
>
> If you have evidence that libpkix has known quality control problems,

We do. That is why it is so big. But, I agree it's likely that it
performs whatever PKI incantations it does correctly, in spite of that.


> IMO, not taking
> 3.12 is the risky play, since we'll either need to find our own NSS
> hackers to maintain a fork of NSS 3.11, or take the hit anyway when we
> need security fixes that 3.12.x will get. I'd much rather do that in
> alpha/beta than in security releases.

Well, I agree that we have to take it.

>
>>> That said, EV cert support is listed as a P1 (release blocker)
>>> requirement for Firefox 3, so we intend to ship it, and we'll take a
>>> ship delay to get it. The decision was made, and nothing I've heard
>>> or seen has caused me to change my own perspective on that requirement.
>>
>> We are going to support EV certs in Firefox 3. Thus far, they don't
>> have any measurable benefits, but it turns out we invented them, and
>> it would be too easy for our competitors to depict us as insecure if
>> we dropped them. So, here we are. We should at least assess the cost.
>
> If you're saying we should cut EV, say it, don't use codesize as an excuse.

I wasn't being facetious. I think EV is mystery meat, but we have to
ship it for the reasons I listed.

- Rob

Boris Zbarsky

unread,
Jul 22, 2007, 12:28:44 PM7/22/07
to
Mike Connor wrote:
> As a note, the codesize hit is the only visible problem, Tp/Ts/etc seem
> generally unaffected.

I very much doubt Tp exercises any of this code, since none of it is over https.
Ts exercises some parts of PSM/NSS, I think (due to creating principals for
the stylesheets coming from jars). I don't know whether it actually ends up
loading this .so, though.

> Almost by definition, any major new feature adds code, the question is
> how much new code is acceptable for a given feature.

Yes. So let's put this in perspective. Is a codesize that is 20% of gklayout
(or double that of cairo + thebes if you prefer to look at it that way)
acceptable for EV support?

I agree that the actual amount of code in terms of code complexity is not really
that big; the code is large because it's written with so much logic inlined, not
because there is so much logic. At least the parts I saw. So I'm not worried
about this destabilizing the app or anything, though I _am_ worried about
potential security issues in what is a large glob of code no matter what. But
not much we can do about that, as you say.

> I think we've decided we want EV cert support as part of our security UI strategy

Given the limited real value of EV certs, a number of people (myself included)
were fine to include them as (a small) part of a more comprehensive approach to
the problem of phishing. But if there's a high enough price to pay for EV
support, perhaps we need to revisit that decision.

Put another way, at the time it was nonobvious that EV support involved a 60%
increase in the size of the NSS libraries.

> That said, there's clearly a ton of work that should be done to optimize
> a lot of this codesize pain (bz has made some concrete suggestions in
> the bug)

Right. I don't think anyone is arguing we shouldn't take this, offhand. What
we need to figure out are:

1) What can we do to improve things?
2) Who will do that work?
3) How we can get them started on it yesterday.

The closer we get to release, the less willing we should be to take the sort of
refactoring it will take to make this code smaller... I really wish this
landing had taken place six months ago or so. Not much use crying over that,
though.

> but I think we will take some sort of nontrivial hit, and I think we
> need to be prepared for that in order to get onto the new NSS version.

I think sayrer's suggestion of a hit that's no bigger than the win we got from
turning off webservices is a good starting point. If the code I looked at is
representative, I think this should be achievable.

> That hit should be as small as possible, but I see no situation where
> we'll throw away EV cert support over a codesize hit.

That contradicts the "I'm not saying a 9% Z hit is shippable" statement you make
earlier, for what it's worth.

-Boris

Jean-Marc Desperrier

unread,
Jul 23, 2007, 7:52:54 AM7/23/07
to
Mike Connor wrote:
> "Libpkix provides a much more complete an modern parsing of
> certificates, most importantly policy parsing and handling cross
> certificate environments correctly. Both of these are needed for EV (the
> primary driver of getting libpkix in). (It also includes such things a
> on the fly fetching of intermediate certs."

I am not so convinced those elements are so absolutly required to
support EV certificates. After all, verisign did an EV extension that
works with the current Firefox, even if it's very certainly taking some
ugly short-cuts.

The NSS team also says that most of the support for EV cert should be
inside PSM and not NSS (bug 374336, 375666,
news://news.mozilla.org:23/fM2dnQ0AXqlgvWbY...@mozilla.org ),
and by extending the part that's inside PSM it might be possible to
support EV certs without changing NSS. I'm sure the required policy
checking can be done outside of NSS (only a small part of what libpkix
supports is really required). The cross-certificates part also seem
solvable from what I've undertood about what is really done by CAs in
practice (by reading http://alwayson.goingon.com/permalink/post/7871).
If we give PSM knowledge of both the self-signed EV cert and the
cross-signed one, then it doesn't really matter what way NSS handles the
cross-cert path.

Of course, it would be much nicer to just use NSS 3.12, that bring many
other long awaited features (shared db !), but that code changes lot of
things and still seems very alpha.
http://wiki.mozilla.org/NSS_Shared_DB_Samples
"prealpha shared database code" (this is the description as of 8 june)

Gervase Markham

unread,
Jul 23, 2007, 6:00:27 PM7/23/07
to
Jean-Marc Desperrier wrote:
> I am not so convinced those elements are so absolutly required to
> support EV certificates. After all, verisign did an EV extension that
> works with the current Firefox, even if it's very certainly taking some
> ugly short-cuts.

FYI, as I understand it, they did it by reimplementing certificate
decoding in JavaScript.

Gerv

us...@domain.invalid

unread,
Jul 23, 2007, 9:23:12 PM7/23/07
to
We have https://bugzilla.mozilla.org/show_bug.cgi?id=389343 to track the NSS
codesize hit now.

-Boris


Nelson Bolyard

unread,
Jul 24, 2007, 4:40:39 AM7/24/07
to
L. David Baron wrote:
> On Saturday 2007-07-21 14:27 -0400, Mike Connor wrote:
>> sooner or later. Unless we're prepared to maintain our own fork for
>> NSS until libpkix meets some relatively arbitrary codesize target, and I
>> don't think we're at all prepared to do that.
>
> What is libpkix and why do we want it? Can we build NSS without it?

libPKIX supports EV, but EV is not the only reason for libPKIX.
EV is just one reason of many to go to a certificate library that
implements the whole PKIX standard. EV is one reason to integrate
libPKIX into NSS and FireFox at this time.

Here's some background on libPKIX.

The existing certificate handling code in NSS is firmly stuck in time,
at around the year 2000 or 2001. The world of PKI standards has evolved
a LOT since then, and NSS is now catching up. Vista is already caught up.
libPKIX will bring feature parity for the new Extended PKI (PKIX)
standards to Mozilla products.

Certificates once were issued in simple hierarchies, or trees, with
the highest level CA in any tree being the "root" CA, and the End Entity
(EE) certs (e.g. server certs, end user certs) were called "leaf" certs.
Any EE cert was a member of at most one tree. Starting from any EE
cert, a trivial straight line walk took you straight to the root CA,
with no particularly challenging decisions to make. That's the world
for which NSS's current (pre-3.12) cert code was written.

But today's world allows certificates to have multiple issues, multiple
parent CAs. Multiple trees of certs that were formerly separate may now
be combined through intermediate CAs that have parents in multiple trees.
Those are the so-called "bridge" CAs.

In a world where bridge CAs exist, the act of constructing a "chain" or
"path" of certificates to be validated is no longer a trivial walk.
It is now possible to start with an EE cert and construct multiple
chains, each leading up to a different top-level CA cert. In this new
world, those top level CAs are called "trust anchors" rather than "roots"
because the process of cert path building creates a tree with the EE
cert as its root, and the (formerly "root") CAs as leaves.

The challenge for cert verification software in this new world is to find
the right path to the right trust anchor, with which to do the validation.

In the old world, any server with a cert for encryption, or any email user
with a cert for signature verification, sent out its entire cert chain,
with all the certs from its EE cert up to the "trust anchor". This was
easy to do because there was only one such chain possible, since each EE
cert chains up to just one root, and the sender always had all the certs
in the chain.

In the new world, each possible recipient of that EE cert (each "relying
party) may trust a different trust anchor, and may need to have a different
chain, leading up to a trust anchor that the recipient trusts. No one
chain constructed by the sender can satisfy all the recipients. So, the
recipient of an EE cert now needs to be able to construct cert chains,
starting with just the sender's EE cert. This means that the relying party
needs to be able to go out and fetch CA certificates on the Internet,
via http or LDAP. Constructing the right cert chain may literally mean
going out and getting all the necessary certs over the internet.

This new world of "eXtended PKI" (PKIX) is defined in a 5-year old RFC,
RFC 3280. There are already certain parts of the world (Japan and South
Korea) where use of these new features is now normal and commonplace.
NSS doesn't always play well in those worlds. IIRC, Mozilla Foundation
has wanted to get caught up in this area in order to better penetrate
markets like South Korea for quite some time.

So for 3 years now, the NSS team has been working on writing a new
certificate library that fully conforms to the new PKIX standards.
libPKIX is 60K+ lines of code, and it adds all the missing new features
that are not patent encumbered.

Here's a question: would any of you be interested in a presentation on
the world of PKIX? Maybe we could put something together to present at
(say) Mozilla HQ?

/Nelson

Nelson Bolyard

unread,
Jul 24, 2007, 4:40:57 AM7/24/07
to
Jean-Marc Desperrier wrote:
> Mike Connor wrote:
>> "Libpkix provides a much more complete an modern parsing of
>> certificates, most importantly policy parsing and handling cross
>> certificate environments correctly. Both of these are needed for EV
>> (the primary driver of getting libpkix in). (It also includes such
>> things a on the fly fetching of intermediate certs."
>
> I am not so convinced those elements are so absolutly required to
> support EV certificates. After all, verisign did an EV extension that
> works with the current Firefox, even if it's very certainly taking some
> ugly short-cuts.

It *appears* to work, with Verisign's certs only. But it does not do
all the right things to work with all the other CAs' EV certs.

Limiting an implementation to a single root CA simplifies the problem
dramatically. One you have constructed the right cert chain, from the
EE cert (End User or Server cert) to the right trust anchor ("root"), the
actual chain validation is not too difficult. By assuming there is only
one trust anchor of interest (e.g. Verisign's) you can even code the path
validation code in JavaScript.

The hard part is constructing the right chain to validate. That's where
the vast majority of new code is used. If you want Mozilla products to
work in a new world of bridge CAs (including bridges over bridges) and
incomplete cert chains, you need a real PKIX implementation.

/Nelson

Benjamin Smedberg

unread,
Jul 26, 2007, 10:21:42 AM7/26/07
to
Mike Connor wrote:

> We'll need to post a plan for approvals going forward, since we're not
> going to require approvals for the front end until after M8, given how
> much work is going to be going on with rapid iteration.

I suggest we do that soon ;-)

Also, I'd like to suggest that all bugfixes (not new features) be allowed in
the 1.9b1 timeframe with module owner approval. This gives module owners the
responsibility of deciding whether a particular fix is too risky for the 1.9
branch.

--BDS

L. David Baron

unread,
Aug 10, 2007, 1:48:04 AM8/10/07
to dev-pl...@lists.mozilla.org
On Sunday 2007-07-22 04:40 -0400, Mike Connor wrote:
> On 22-Jul-07, at 2:59 AM, Robert Sayre wrote:
> > That doesn't sound reasonable. We are going to accept a very large
> > body of code with known quality control problems.
>
> If you have evidence that libpkix has known quality control problems,

For what it's worth, I do.

I just got trace-malloc working on Windows, so I was looking at the
list of what we leak when starting and shutting down Firefox. (It's
hard to do that on Linux because GNOME and GTK don't bother freeing
things on shutdown.)

Currently, all we do with libpkix when starting and shutting down
the browser is initialize it. That is, we call PKIX_Initialize:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix/top/pkix_lifecycle.c&rev=1.2#84
This creates 7 hash tables and a lock. NSS doesn't currently call
PKIX_Shutdown. But if we had it wouldn't have helped us free the
memory allocated when creating these 7 hash tables and a lock.

Why not?

PKIX has its own object system. Every object has, among other
things, a reference count, a hash code, and a lock used to protect
the reference count and the hash code:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_object.h&rev=1.2&mark=74-80#72
(I'd note that using atomic operations for reference counts avoids
the rather significant size overhead of the lock. And I'm skeptical
that the average mutex needs to be put in a hash table.)

So, for example, this lock that we create is created in
PKIX_PL_MonitorLock_Create:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_monitorlock.c&rev=1.2&mark=119#109
which in turn calls PKIX_PL_Object_Alloc:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_object.c&rev=1.2&mark=532#485
which makes sure to allocate the object header (above) in addition
to the (one word) PKIX_PL_MonitorLockStruct. So this lock object is
an 8 word struct that owns a PRLock (to protect its reference count
and hash code) and a PRMonitor (for the locking exposed to the
caller), which are not themselves small or simple objects (a PRLock
is 132 bytes on Windows; a PRMonitor is three allocations, a 132
byte PRLock, a 132 byte PRCondVar, and a 12 byte PRMonitor).
(PKIX_PL_Mutex_Create gives you something similar, except with two
PRLocks instead of a PRMonitor and a PRLock.)

The hash tables, in turn, have their own PRLock to protect their
reference count (like all objects) and a table lock to protect the
table (an object created by PKIX_PL_Mutex_Create, which in turn has
two PRLocks, one to protect its reference count and one to expose to
the caller).


When we initialize libpkix, we tell it to use arenas for allocation,
by passing PR_TRUE as the second parameter to PKIX_Initialize:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/nss/nssinit.c&rev=1.80&mark=520-523#520
This means, through a bit of indirection, that the PKIX reference
counting functions are no-ops:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_object.c&rev=1.2&mark=736-737,805-806#718
This means that we'll never free all the locks that we allocate,
something that I imagine could be a significant memory leak if we
did more than create 7 hash tables and a lock. (For just that, it's
3312 bytes of leaked lock structures, excluding the
1 PKIX_PL_MonitorLockStruct, 7 PKIX_PL_MutexStructs, and their
header (320 bytes) that were allocated in the arena.)

It also means that we'll never return any of these arena-allocated
objects to a freelist to be reallocated. Not that we would return
them to a freelist if we removed the reference counting, anyway:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_mem.c&rev=1.2&mark=154,194#180

( And while I'm in that file, I'd note that if we're using arenas,
then its version of memcpy returns an *allocation error*:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_mem.c&rev=1.2&mark=220-222#206
. Not that memcpy actually allocates any memory.)

bre...@mozilla.org

unread,
Aug 15, 2007, 3:10:26 PM8/15/07
to
On Aug 9, 10:48 pm, "L. David Baron" <dba...@dbaron.org> wrote:
> On Sunday 2007-07-22 04:40 -0400, Mike Connor wrote:
>
> > On 22-Jul-07, at 2:59 AM, Robert Sayre wrote:
> > > That doesn't sound reasonable. We are going to accept a very large
> > > body of code with known quality control problems.
>
> > If you have evidence that libpkix has known quality control problems,
>
> For what it's worth, I do.
>
> I just got trace-malloc working on Windows, so I was looking at the
> list of what we leak when starting and shutting down Firefox. (It's
> hard to do that on Linux because GNOME and GTK don't bother freeing
> things on shutdown.)
>
> Currently, all we do with libpkix when starting and shutting down
> the browser is initialize it. That is, we call PKIX_Initialize:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

> This creates 7 hash tables and a lock. NSS doesn't currently call
> PKIX_Shutdown. But if we had it wouldn't have helped us free the
> memory allocated when creating these 7 hash tables and a lock.
>
> Why not?
>
> PKIX has its own object system. Every object has, among other
> things, a reference count, a hash code, and a lock used to protect
> the reference count and the hash code:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

> (I'd note that using atomic operations for reference counts avoids
> the rather significant size overhead of the lock. And I'm skeptical
> that the average mutex needs to be put in a hash table.)
>
> So, for example, this lock that we create is created in
> PKIX_PL_MonitorLock_Create:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...
> which in turn calls PKIX_PL_Object_Alloc:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

> which makes sure to allocate the object header (above) in addition
> to the (one word) PKIX_PL_MonitorLockStruct. So this lock object is
> an 8 word struct that owns a PRLock (to protect its reference count
> and hash code) and a PRMonitor (for the locking exposed to the
> caller), which are not themselves small or simple objects (a PRLock
> is 132 bytes on Windows; a PRMonitor is three allocations, a 132
> byte PRLock, a 132 byte PRCondVar, and a 12 byte PRMonitor).
> (PKIX_PL_Mutex_Create gives you something similar, except with two
> PRLocks instead of a PRMonitor and a PRLock.)
>
> The hash tables, in turn, have their own PRLock to protect their
> reference count (like all objects) and a table lock to protect the
> table (an object created by PKIX_PL_Mutex_Create, which in turn has
> two PRLocks, one to protect its reference count and one to expose to
> the caller).
>
> When we initialize libpkix, we tell it to use arenas for allocation,
> by passing PR_TRUE as the second parameter to PKIX_Initialize:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

> This means, through a bit of indirection, that the PKIX reference
> counting functions are no-ops:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

> This means that we'll never free all the locks that we allocate,
> something that I imagine could be a significant memory leak if we
> did more than create 7 hash tables and a lock. (For just that, it's
> 3312 bytes of leaked lock structures, excluding the
> 1 PKIX_PL_MonitorLockStruct, 7 PKIX_PL_MutexStructs, and their
> header (320 bytes) that were allocated in the arena.)
>
> It also means that we'll never return any of these arena-allocated
> objects to a freelist to be reallocated. Not that we would return
> them to a freelist if we removed the reference counting, anyway:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

>
> ( And while I'm in that file, I'd note that if we're using arenas,
> then its version of memcpy returns an *allocation error*:http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/nss/lib...

> . Not that memcpy actually allocates any memory.)
>
> -David
>
> --
> L. David Baron http://dbaron.org/
> Mozilla Corporation http://www.mozilla.com/
>
> application_pgp-signature_part
> 1KDownload

This message posted by dbaron seems to have been lost in the thread.
I'm raising it to its own new thread header position.

I didn't see it in David's post, but he mentioned to me that the arena-
based allocation of objects with non-LIFO allocation patterns, without
a freelist to recycle arena-allocated objects who die in non-LIFO
order, seems to be a problem too. That is, without LIFO allocation
order, this code will bloat its arenas with uncollected garbage,
potentially badly.

>From what I can see and what David demonstrates, libpkix has wrong-
headedly cloned Java's object model into C, in the most inefficient,
not-seen-in-a-JVM-since-1995, unjustified for C code, and manifestly
bloaty and leaky manner possible.

This is not code I want in Firefox 3. Something fairly radical needs
to be done to fix this problem. Nothing, certainly not EV cert
promises, justifies this quality problem.

Followups to m.d.planning.

/be

Mike Beltzner

unread,
Aug 15, 2007, 5:40:14 PM8/15/07
to bre...@mozilla.org, dev-pl...@lists.mozilla.org
----- bre...@mozilla.org wrote:
> This message posted by dbaron seems to have been lost in the thread.
> I'm raising it to its own new thread header position.

Thanks, Brendan. This definitely shouldn't be getting lost, and should be actively tracked. Is status being monitored as part of the Gecko 1.9 meetings?

> I didn't see it in David's post, but he mentioned to me that the
> arena-
> based allocation of objects with non-LIFO allocation patterns,
> without
> a freelist to recycle arena-allocated objects who die in non-LIFO
> order, seems to be a problem too. That is, without LIFO allocation
> order, this code will bloat its arenas with uncollected garbage,
> potentially badly.
>
> >From what I can see and what David demonstrates, libpkix has wrong-
> headedly cloned Java's object model into C, in the most inefficient,
> not-seen-in-a-JVM-since-1995, unjustified for C code, and manifestly
> bloaty and leaky manner possible.
>
> This is not code I want in Firefox 3. Something fairly radical needs
> to be done to fix this problem. Nothing, certainly not EV cert
> promises, justifies this quality problem.
>
> Followups to m.d.planning.

It's pretty obvious that we shouldn't be taking code which is at best leaky and inefficient, and at worst potentially exploitable. What isn't as clear to me is:

1. What can be done to fix it, and what the timeline for fixing it would look like,
2. How badly we need the functionality it provides (not just EV!) in Firefox 3.

A previous message in this thread (http://groups.google.com/group/mozilla.dev.planning/msg/e71afad655a0265a) contains Nelson's overview on what the code does. Pulling it out into a high level list gives me:

- updating NSS to catch up with PKI standards (hasn't been updated since 2001)
- support for Extended PKI (PKIX, RFC 3280)
--- multiple parent CAs
--- already being used in Japan and Korea
--- feature parity with Windows Vista
- EV support

Assuming the next product delivery isn't for 6 months after Firefox 3, what's the cost to us for not supporting PKIX?

cheers,
mike

Justin Wood (Callek)

unread,
Aug 15, 2007, 6:27:47 PM8/15/07
to
Mike Beltzner wrote:

> Assuming the next product delivery isn't for 6 months after Firefox 3, what's the cost to us for not supporting PKIX?

Obviously I'm no expert here :-) but would there be a harm in not
pushing back FF3 for this. And issuing a "feature security upgrade" in
between FF3 and FF4 if we get this situation ironed out in that timeframe?

I am surely not even close to aware of the amount of work that would
entail, just throwing the idea out there.

~Justin Wood (Callek)

Philip Chee

unread,
Aug 16, 2007, 7:58:33 AM8/16/07
to
On Wed, 15 Aug 2007 19:10:26 -0000, bre...@mozilla.org wrote:

> I didn't see it in David's post, but he mentioned to me that the arena-
> based allocation of objects with non-LIFO allocation patterns, without
> a freelist to recycle arena-allocated objects who die in non-LIFO

Nit: "arena-allocated objects s/who/which/ die"

Phil

--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.
[ ]Hello.. Incontinence Hotline.. Can you hold?
* TagZilla 0.066.6

bre...@mozilla.org

unread,
Aug 17, 2007, 1:06:52 AM8/17/07
to
On Aug 15, 2:40 pm, Mike Beltzner <beltz...@mozilla.com> wrote:
> Assuming the next product delivery isn't for 6 months after Firefox 3, what's the cost to us for not supporting PKIX?

I'm not convinced that the owners should not just fix it. The changes
would be mostly mechanical. Possibly Taras's Elsa-based patch
generating tools could help, or maybe it would just take a pot of
coffee and a good keyboard.

/be

Mike Beltzner

unread,
Aug 17, 2007, 3:43:15 AM8/17/07
to bre...@mozilla.org, dev-pl...@lists.mozilla.org

I'd be willing to throw in a few bucks for one of the new Apple
keyboards if that's all it takes. Do we have a point of contact who's
engaging with the NSS team here?

cheers,
mike

Gervase Markham

unread,
Aug 17, 2007, 5:06:22 AM8/17/07
to
Mike Beltzner wrote:
> I'd be willing to throw in a few bucks for one of the new Apple
> keyboards if that's all it takes. Do we have a point of contact who's
> engaging with the NSS team here?

There are status meetings with the Firefox team and the NSS team. The
intention was that they happen regularly, although I think we've only
had one so far. The NSS team just sent a mail round wanting another one
ASAP, but there was pushback because it's all-hands this week.

Gerv

Reply all
Reply to author
Forward
0 new messages