Shipping Type Inference and Other "Irreversible" Changes

David Mandelin

unread,

Aug 9, 2011, 6:48:58 PM8/9/11

to dev-pl...@lists.mozilla.org, Brian Hackett

* Background: TI

Brian Hackett, of the JavaScript team, has been doing great applied
research on using a hybrid static/dynamic analysis to optimize compiled
JS. The project is called TI or JM+TI, for "type inference". Brian now
has it to the point where it runs in the browser and is green on
Tinderbox. I just measured this morning and on Windows it improves our
V8 score from 4980 to 5440 (1.1x) and our Kraken score from 4880 to 3365
(1.45x). Some V8 subscores are particularly big; e.g., crypto goes from
7110 to 15000. TI also gets top scores of any engine on some workloads.
So it's a big boost for our performance, pushes forward JS optimization
technology, and will be even more effective once we have IonMonkey, so
it's good to start using it and testing it now. There are a few
regressions still to be worked out, but Brian is close to being ready to
land.

* Background: Irreversible Changes

The difficulty of landing TI in the train model is that the landing is
"irreversible". By that, I mean that neither pref'ing it off nor backing
it out is practical.
- Pref'ing off doesn't work because the project changes many things
throughout the JS engine, so some of those changes can not be disabled
at run time. (The TI optimizations themselves can be pref'd off, of
course.)
- Backing out is also hard because it touches so many areas of the JS
engine. This means a standard backout would generate a big merge that
might conflict with changes that landed on top of it. Doing this a few
weeks before an Aurora or Beta merge sounds risky. Alternatively, we
could back out the following changes, then back out TI, then rebase and
reland those changes. But that's a ton of work, and also risky because
of the rebasing.

We've made other necessary irreversible changes, including fatvals and
compartments last year. I think Azure might be another case. So it seems
very likely to happen again, which means we need a general solution to
the problem. The proposal below is intended to set up a general scheme
we can use for any future irreversible change.

* Proposal for Shipping Irreversible Changes

In the train model, changes can only land if they are reversible, so my
proposed solution to irreversible changes is not to land them, but
rather to hold them in separate repos and ship from there. So instead of
"landing", we "switch users to an alternate repo". And instead of
"backing out", we "switch users back to the canonical repo". Once the
product ships as a final release, we land to the canonical repo. In more
detail:

In the standard train model, we have these repos:

- mozilla-central
- mozilla-aurora
- mozilla-beta
- mozilla-release

Changes propagate through these repos on a schedule:

- at Aurora cut time, m-c merges to mozilla-aurora
- at Beta cut time, m-a merges to mozilla-beta
- at Release time, m-b merges to mozilla-release

And each channel is linked to exactly one repo:

- Nightly builds from mozilla-central
- Aurora builds from mozilla-aurora
- Beta ships builds mozilla-beta
- Release builds from mozilla-release

With irreversible changes, we add an alternate repo for each standard
one except release, so we have

- mozilla-central
- [project repo; TI in this case]: merges from mozilla-central
automatically every day
- mozilla-aurora
- mozilla-aurora-alt: merges from mozilla-aurora automatically every day
- mozilla-beta
- mozilla-beta-alt: merges from mozilla-beta automatically every day
- mozilla-release

Most fixes land to the same repo they always did. Fixes to TI land to
the alt repos.

Because of the automatic daily merges, mozilla-x-alt is always a copy of
mozilla-x with the addition of the project changes; changes landed to
the normal repo quickly propagate to the "alt" repo.

Propagation is similar to before:

- At Aurora cut time, m-c merges to mozilla-aurora and TI merges to
mozilla-aurora-alt
- At Beta cut time, m-a merges to m-b and mozilla-aurora-alt merges to
mozilla-beta-alt
- At Release cut time, mozilla beta-alt merges to mozilla-release
- One extra thing is that after release, the alt branches must become
the normal branches. We can do this by merging TI to m-c and each alt
branch to its corresponding normal branch.

The other big change is that we use channel switching to activate the
alt repos. The schedule would be like this:

- When TI is ready (should be very early (first week) in Nightly
cycle), point Nightly to TI repo.
- If TI is too unstable, point Nightly back to m-c. Everything is fully
back to normal!
- If TI restabilizes quickly (say one killer bug got fixes), point
Nightly back to TI.
- At Aurora cut, point Aurora to mozilla-aurora-alt
- At Beta cut, point Beta to mozilla-beta-alt

The main work items for the proposal are:

- Set up the 2 alt repos
- Set up scripts to automatically merge normal to alt repos
- Redirect Nightly/Aurora/Beta users according to the schedule above

Steve Wendt

unread,

Aug 9, 2011, 7:07:33 PM8/9/11

to

On 8/9/2011 3:48 PM, David Mandelin wrote:

> With irreversible changes, we add an alternate repo for each standard
> one except release
>

> - When TI is ready (should be very early (first week) in Nightly
> cycle), point Nightly to TI repo.
> - If TI is too unstable, point Nightly back to m-c. Everything is fully
> back to normal!
> - If TI restabilizes quickly (say one killer bug got fixes), point
> Nightly back to TI.
> - At Aurora cut, point Aurora to mozilla-aurora-alt
> - At Beta cut, point Beta to mozilla-beta-alt

So you only get one "irreversible change" per 18 week cycle? Perhaps
there should not be a beta-alt at all; either the feature makes it out
of aurora-alt, or it waits for the next cycle?

Jeff Muizelaar

unread,

Aug 9, 2011, 7:16:03 PM8/9/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On 2011-08-09, at 6:48 PM, David Mandelin wrote:

> We've made other necessary irreversible changes, including fatvals and
> compartments last year. I think Azure might be another case.

Just for the record Azure is currently still pref-able and I think the pain of having to handle an irreversible change like this is enough to make us try really hard to avoid a similar situation.

Just the same it does seem like a problem we're going to have deal with and your proposal, while painful, doesn't sound totally insane to me.

-Jeff

Christian Legnitto

unread,

Aug 9, 2011, 7:25:51 PM8/9/11

to Steve Wendt, dev-pl...@lists.mozilla.org

There are some problems we only find when have millions of users so we still need the go/no-go option there as well.

Thanks,
Christian

Jonas Sicking

unread,

Aug 9, 2011, 7:48:13 PM8/9/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On Tue, Aug 9, 2011 at 3:48 PM, David Mandelin <dman...@mozilla.com> wrote:
> The main work items for the proposal are:
>
> - Set up the 2 alt repos
> - Set up scripts to automatically merge normal to alt repos
> - Redirect Nightly/Aurora/Beta users according to the schedule above

Another approach to all this is to simply go with the "normal"
approach, keeping in mind that we have the no-go option available for
each release.

In other words, if this lands on mozilla-central right after a merge,
that leaves 6 weeks of nightlies and 6 weeks of aurora before we reach
beta audience. That's almost 3 months of time to fix any bugs bad
enough to prevent a beta. After that there's another 6 weeks of beta
before release.

If we any time after that realize it's not of high enough quality, we
can always make a no-go decision for that train. By just making one
no-go decision we get the total time between landing and shipping
almost 6 months.

Granted, if we after 6 months realize that there's simply too many
bugs, or too wrong architecture, we are in a pretty bad place since
the patch at that point lives in all branches and so backing it out is
going to introduce a lot of risk even on aurora and beta channels. But
it seems pretty unlikely to me that that would be the case.

/ Jonas

Scott Johnson

unread,

Aug 9, 2011, 8:25:47 PM8/9/11

to dev-pl...@lists.mozilla.org

On 08/10/2011 11:25 AM, thus spoke Christian Legnitto:

> On Aug 9, 2011, at 4:07 PM, Steve Wendt wrote:
>

> There are some problems we only find when have millions of users so we still need the go/no-go option there as well.

Look on the bright side, though... we need names for these new channels.
Might I suggest 'mozilla-hyperion'? :)

J/K. It sounds like a good plan, but I do agree that the changes should
either make it out of aurora-alt, and be placed into beta, or not make
it into beta at all (i.e. my opinion would be to scrap the
mozilla-beta-alt). That said, I think it's going to be a large amount of
work, and it seems to me to be much more work than utilizing the 'no go'
alternative.

~Scott

Nicholas Nethercote

unread,

Aug 9, 2011, 8:36:45 PM8/9/11

to Jonas Sicking, dev-pl...@lists.mozilla.org, David Mandelin, Brian Hackett

On Tue, Aug 9, 2011 at 4:48 PM, Jonas Sicking <jo...@sicking.cc> wrote:
>
> Granted, if we after 6 months realize that there's simply too many
> bugs, or too wrong architecture, we are in a pretty bad place since
> the patch at that point lives in all branches and so backing it out is
> going to introduce a lot of risk even on aurora and beta channels. But
> it seems pretty unlikely to me that that would be the case.

Let's not bet the farm on that.

Nick

Robert O'Callahan

unread,

Aug 9, 2011, 10:12:14 PM8/9/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On Wed, Aug 10, 2011 at 10:48 AM, David Mandelin <dman...@mozilla.com>wrote:

> - Pref'ing off doesn't work because the project changes many things
> throughout the JS engine, so some of those changes can not be disabled
> at run time. (The TI optimizations themselves can be pref'd off, of
> course.)
>

How hard have you looked at breaking up those changes into independent
pieces for landing?

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not in
us. If we confess our sins, he is faithful and just and will forgive us our
sins and purify us from all unrighteousness. If we claim we have not sinned,
we make him out to be a liar and his word is not in us." [1 John 1:8-10]

Matt Brubeck

unread,

Aug 9, 2011, 11:09:28 PM8/9/11

to

On 08/09/2011 03:48 PM, David Mandelin wrote:
> - Backing out is also hard because it touches so many areas of the JS
> engine. This means a standard backout would generate a big merge that

> might conflict with changes that landed on top of it. [...]

>
> - mozilla-aurora-alt: merges from mozilla-aurora automatically every day

> - mozilla-beta-alt: merges from mozilla-beta automatically every day

Won't the merges, over the course of the release cycle, involve about as
many conflicts as an eventual backout would? Essentially it is
spreading the work of the backout over the 18-week cycle.

This is still useful, because it spreads out the work and also lets us
test continuously and find problems sooner. But it seems wrong to
assume these merges can be "automatic" when we've also said that the
delta between repos is likely to cause conflicts with subsequent changes.

Sometimes the automatic merge will fail, and other times it will succeed
but produce broken code. I'm sure the JS team understands this since
they have plenty of experience with long-lived branches. But for the
sake of this discussion it should be explicit that someone will need to
own the manual work of dealing with occasional merge conflicts.

Karl Tomlinson

unread,

Aug 10, 2011, 1:49:40 AM8/10/11

to

Matt Brubeck writes:

> Won't the merges, over the course of the release cycle, involve
> about as many conflicts as an eventual backout would?
> Essentially it is spreading the work of the backout over the
> 18-week cycle.

Yes.
And ensuring time is allocated for / spent on the work,
even if it turns out to be unneeded.

> This is still useful, because [...] and also

> lets us test continuously and find problems sooner.

Let's us test continuously on tinderboxen, which is more than the
standard train model provides for back-out paths.

IIUC it doesn't give us user testing of the back-out path.
User-testing issues take the longest to show up, and so we may not
have the confidence to go with the back-out path.

Axel Hecht

unread,

Aug 10, 2011, 5:26:02 AM8/10/11

to

Technical detail, we would need the feature repos and the main repos to
be the same l10n-wise. Not much of an issue in this case, but in the
general case, we should keep this on the radar.

Axel

Ben Hearsum

unread,

Aug 10, 2011, 8:44:40 AM8/10/11

to

On 08/09/11 06:48 PM, David Mandelin wrote:
> With irreversible changes, we add an alternate repo for each standard
> one except release, so we have
>
> - mozilla-central
> - [project repo; TI in this case]: merges from mozilla-central
> automatically every day
> - mozilla-aurora
> - mozilla-aurora-alt: merges from mozilla-aurora automatically every day
> - mozilla-beta
> - mozilla-beta-alt: merges from mozilla-beta automatically every day
> - mozilla-release

So, under this plan, we would have all of our Aurora and Beta users on
the -alt variants?

If that's the case, I don't see the benefit to having the repositories.

If we end up shipping the -alt code, the original aurora and beta
repositories are not useful as anything except a stepping stone to the
"real" code we're shipping.

If we lose confidence in the -alt code, we can't ship the original code,
because we've had no Aurora or Beta users testing it(1). On this
assumption, we may as well just have the irreversible changes in plain
-aurora and -beta repositories because we can't ship their contents with
your plan.

- Ben

(1) Given that changes like this are large and invasive, I don't think
we can say that testing done from -alt repositories can carry over.

Ted Mielczarek

unread,

Aug 10, 2011, 9:12:05 AM8/10/11

to dev-pl...@lists.mozilla.org, David Mandelin

Yeah. I think as painful as it might be, we need to think about how to
land large changes like this in a way that we can also disable or back
them out if need be. Look at the Firefox 4 cycle. We were stalled on
fallout from compartment changes that were necessary for JaegerMonkey
work, and that stretched our beta cycle out indefinitely. What's to
say that we won't find a series of hard-to-fix regressions from the TI
work that cause us to be unable to ship, and delay hundreds or
thousands of other fixes from reaching our users?

In short, if we're going to live on the faster release cycle, we have
to really believe it and not break the rules for anything, otherwise
the whole thing will blow up on us.

-Ted

Brian William Hackett

unread,

Aug 10, 2011, 9:31:07 AM8/10/11

to rob...@ocallahan.org, dev-pl...@lists.mozilla.org, David Mandelin, Brian Hackett

----- Original Message -----
> How hard have you looked at breaking up those changes into independent
> pieces for landing?
>
> Rob

The project can be broken up into a few (still large) pieces, but the problem is that the actual optimizations depend on all the rest being in place, and that remainder is mainly bookkeeping which gives no benefit and adds up to significant overhead on JS heavy pages (including 3-5% on the V8 and SunSpider benchmarks).

Brian

Brian William Hackett

unread,

Aug 10, 2011, 9:40:42 AM8/10/11

to Matt Brubeck, dev-pl...@lists.mozilla.org

----- Original Message -----
> Sometimes the automatic merge will fail, and other times it will
> succeed
> but produce broken code. I'm sure the JS team understands this since
> they have plenty of experience with long-lived branches. But for the
> sake of this discussion it should be explicit that someone will need
> to
> own the manual work of dealing with occasional merge conflicts.

I would be managing conflicts when merging into the alternate branch, I've been doing this already since the project's inception.

Brian

Robert Kaiser

unread,

Aug 10, 2011, 10:40:26 AM8/10/11

to

David Mandelin schrieb:

> With irreversible changes, we add an alternate repo for each standard
> one except release

We have no reasonable plan for analyzing crashes/stability on an
alternate branch, and currently no good tooling to count apart different
aurora or beta repos in terms of significant crashes. It's hard enough
already to get tooling in place to be able to watch different beta
releases, and we already have too low user volumes on Aurora and
probably also beta to get really good crash data.
Further splitting the audience would only lead to needing a lot more
manpower to create tooling and watch the differences, as well as less
reliable numbers due to low audience numbers on those different builds.

From that point of view, I don't really like that proposal and would
like it better to just land that work at the beginning of a Nightly
cycle so we can work out as many problems as possible in the 5-6 weeks
until this hits aurora.

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)

Johnathan Nightingale

unread,

Aug 10, 2011, 11:06:16 AM8/10/11

to Ted Mielczarek, dev-pl...@lists.mozilla.org, David Mandelin

On 2011-08-10, at 9:12 AM, Ted Mielczarek wrote:
> Yeah. I think as painful as it might be, we need to think about how to
> land large changes like this in a way that we can also disable or back
> them out if need be. Look at the Firefox 4 cycle. We were stalled on
> fallout from compartment changes that were necessary for JaegerMonkey
> work, and that stretched our beta cycle out indefinitely. What's to
> say that we won't find a series of hard-to-fix regressions from the TI
> work that cause us to be unable to ship, and delay hundreds or
> thousands of other fixes from reaching our users?

I'm terrified that your first paragraph here will spin this thread off into a debate about the sources of FF4 schedule slippage. If you, gentle reader, are tempted to point out that there were other causes for FF4's schedule being what it was, my fervent hope is that you will let it sit. The reasons for FF4's elongated delivery schedule are manifest and complex. Monolithic development has schedule slip in its core. It's true that compartments landed late, and needed follow up work, but so did a couple of other big pieces.

Please, god, let's avoid that rathole that I'm sure Ted didn't intend.

> In short, if we're going to live on the faster release cycle, we have
> to really believe it and not break the rules for anything, otherwise
> the whole thing will blow up on us.

Modulo a tiny bit of human judgement in that "not break the rules for anything" clause, I agree. We took a post-aurora merge for JS last release because of a communication failure that resulting in them missing the cut off. That won't be a habit, but we broke the rules in a minor way for the right reasons.

Still, though - I agree. Rapid release means you have more chances to get it right, that we never need to ship code we're not happy with (or hold back a release) because otherwise we'll have to wait a year. If TI is so invasive that a killswitch is impossible, then backing out is the killswitch we have left. If other code builds on top of that in ways that are harmed by a backout, that code may *also* have to wait 6 more weeks, or there might be some hand-merging needed to strain the other bugs through the TI mesh.

Either way, though, I think that for code *on a release train*, it either works well enough to ship it, or it gets killed by whatever means we have available.

There is still an important question here around how to get widespread testing for code not yet on a release train. Ideas like portioning off 20% of our nightly users to hammer on a TI-Nightly somewhere have been floated in the past, and might accomplish much of what Dave and Brian want here (finding TI bugs early) without needing to fight against our tree rules on release trains. If I'm right in thinking that might address part of the concern here, then we should revisit those conversations that were scoped out of the initial rapid release discussion, ideally in a separate thread.

I don't believe we should accept irreversible changes. I do believe that some changes are very expensive to reverse, though, and we should find ways to assess and reduce the risk of having to pay that cost.

J

---
Johnathan Nightingale
Director of Firefox Engineering
joh...@mozilla.com

Ted Mielczarek

unread,

Aug 10, 2011, 11:13:37 AM8/10/11

to Johnathan Nightingale, dev-pl...@lists.mozilla.org, David Mandelin

On Wed, Aug 10, 2011 at 11:06 AM, Johnathan Nightingale
<joh...@mozilla.com> wrote:
> I'm terrified that your first paragraph here will spin this thread off into a debate about the sources of FF4 schedule slippage. If you, gentle reader, are tempted to point out that there were other causes for FF4's schedule being what it was, my fervent hope is that you will let it sit. The reasons for FF4's elongated delivery schedule are manifest and complex. Monolithic development has schedule slip in its core. It's true that compartments landed late, and needed follow up work, but so did a couple of other big pieces.

Indeed, it was a reductionist view of things, I just remember the
compartment fallout very vividly. Don't take it as gospel. I believe
the point stands. Holding releases hostage for any one feature, no
matter how valuable, will just lead us back to the pain of our old
release process.

-Ted

Benjamin Smedberg

unread,

Aug 10, 2011, 11:16:17 AM8/10/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On 8/9/2011 6:48 PM, David Mandelin wrote:
>
> With irreversible changes, we add an alternate repo for each standard

> one except release, so we have
>
> - mozilla-central
> - [project repo; TI in this case]: merges from mozilla-central
> automatically every day
> - mozilla-aurora
> - mozilla-aurora-alt: merges from mozilla-aurora automatically every day
> - mozilla-beta
> - mozilla-beta-alt: merges from mozilla-beta automatically every day
> - mozilla-release
>

> Most fixes land to the same repo they always did. Fixes to TI land to
> the alt repos.
>
> Because of the automatic daily merges, mozilla-x-alt is always a copy of
> mozilla-x with the addition of the project changes; changes landed to
> the normal repo quickly propagate to the "alt" repo.

Having read lots of this discussion, I think there is probably an
alternate way to achieve these goals. The basic principle of your
proposal is to maintain a codebase without the TI code. I think we can
do this today:

When we believe it is ready for nightly users, merge TI into nightly.
Maintain a separate project branch "nightly-without-ti" where TI is
immediately "backed out" again.

The TI team will be responsible for maintaining the nightly-without-ti
branch. This probably means that other large JS landings which might
conflict should be avoided during this time period.

When we get to Aurora, release drivers and the TI team decide whether we
believe the feature is ready for the aurora train. If it is, we keep an
aurora-without-ti branch active so that we can flip TI off if blocking
issues are found. Similar for the beta branch.

This avoids a fair bit of release engineering headache switching users
between release channels and dealing with mechanics issues such as
download links, localization repos, etc, while keeping options open for
disabling TI throughout the release cycle.

The downside is that the TI team will need to deal with merge conflicts
on the *-without-ti branches, but it seems to me that this is roughly
the same amount of work as in your proposal, where you merge into the
-alt branch.

--BDS

Asa Dotzler

unread,

Aug 10, 2011, 11:20:35 AM8/10/11

to

Benjamin Smedberg wrote:

> The downside is that the TI team will need to deal with merge conflicts
> on the *-without-ti branches, but it seems to me that this is roughly
> the same amount of work as in your proposal, where you merge into the
> -alt branch.
>
> --BDS

The other downside is we won't have our 60K nightly testers making sure
that the not-TI branch is actually working as well as the TI branch
should there be any differences that manifest as a result of merges or
TI-dependent differences.

- A

Benjamin Smedberg

unread,

Aug 10, 2011, 11:33:52 AM8/10/11

to Asa Dotzler, dev-pl...@lists.mozilla.org

That problem doesn't change with either proposal, unless we decide to
somehow split up the audiences. As I've argued before, I don't think
that splitting up our audiences would be a good idea ever because of the
confusion it would cause, and it's even more not a good idea because we
just don't have enough testers to go around.

--BDS

Mike Shaver

unread,

Aug 10, 2011, 11:41:56 AM8/10/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On Tue, Aug 9, 2011 at 6:48 PM, David Mandelin <dman...@mozilla.com> wrote:
> - Backing out is also hard because it touches so many areas of the JS
> engine. This means a standard backout would generate a big merge that
> might conflict with changes that landed on top of it.

One thing we discussed in the early days of the rapid-release process
was what to do with some of these cases, with fatvats being the
thought experiment IIRC. This was in the context of tracemonkey
merges, but I think the model holds here.

An option that was mooted was the "nuclear option" of just rolling
back to a previous version of the JS engine. How much does the TI
work affect the rest of the browser, and how much other line-crossing
work are we expecting to take after TI lands?

If TI explodes, we could back out the JS work to the changeset before
we landed it, and then reland only the specific security/etc fixes
that are critical to the release. It would delay a bunch of JS work
for 6 weeks, but that seems like a better option than delaying
everything by skipping a ship.

(Backing out that way wouldn't be trivial, since you would have
interleaved changesets from different parts of Gecko, but just
stomping over the files with old versions and taking that as a
changeset would probably work without inordinate pain.)

We would need a squeal point that wasn't too close to the aurora or
beta branch day, so that we had time to reintegrate the critical
changes, but a week seems like it would suffice for that.

Mike

Ehsan Akhgari

unread,

Aug 10, 2011, 1:05:20 PM8/10/11

to Mike Shaver, dev-pl...@lists.mozilla.org, David Mandelin, Brian Hackett

First of all, let me voice my opinion. I think the original proposal is
a bad one, because of reasons mentioned elsewhere in the thread,
including imposing the risk for which the JS team is responsible to
other teams, creating additional developer/releng headache of
maintaining new repositories, and also the confusion that it causes when
we switch from *-alt to * or vice versa (and lack of proper testing on
the alternative repository), etc.

On 11-08-10 11:41 AM, Mike Shaver wrote:
> An option that was mooted was the "nuclear option" of just rolling
> back to a previous version of the JS engine. How much does the TI
> work affect the rest of the browser, and how much other line-crossing
> work are we expecting to take after TI lands?
>
> If TI explodes, we could back out the JS work to the changeset before
> we landed it, and then reland only the specific security/etc fixes
> that are critical to the release. It would delay a bunch of JS work
> for 6 weeks, but that seems like a better option than delaying
> everything by skipping a ship.

I think this makes a lot of sense. The only problem with this is that
the JS team might have some other fixes (security, regression, etc) that
they want to take even if we take the "nuclear option". That would be
fairly easy for them to do by just maintaining a named branch off of the
changeset before the TI landing (let's call that LAST_KNOWN_GOOD_JS),
and they could keep those fixes both on the default and the named branch
as they move forward.

> (Backing out that way wouldn't be trivial, since you would have
> interleaved changesets from different parts of Gecko, but just
> stomping over the files with old versions and taking that as a
> changeset would probably work without inordinate pain.)

Actually, it is very easy!

hg revert -r LAST_KNOWN_GOOD_JS js/
hg commit

(Yes, there are probably a bunch of other details to take care of, and a
bunch of testing to perform, but the actual backout process is very easy!)

> We would need a squeal point that wasn't too close to the aurora or
> beta branch day, so that we had time to reintegrate the critical
> changes, but a week seems like it would suffice for that.

Indeed.

Ehsan

David Mandelin

unread,

Aug 10, 2011, 10:11:02 PM8/10/11

to dev-pl...@lists.mozilla.org, Benjamin Smedberg, Brian Hackett

On 8/10/2011 8:16 AM, Benjamin Smedberg wrote:
> On 8/9/2011 6:48 PM, David Mandelin wrote:
>>
>> With irreversible changes, we add an alternate repo for each standard
>> one except release, so we have
>>
>> - mozilla-central
>> - [project repo; TI in this case]: merges from mozilla-central
>> automatically every day
>> - mozilla-aurora
>> - mozilla-aurora-alt: merges from mozilla-aurora automatically
>> every day
>> - mozilla-beta
>> - mozilla-beta-alt: merges from mozilla-beta automatically every day
>> - mozilla-release
>>
>> Most fixes land to the same repo they always did. Fixes to TI land to
>> the alt repos.
>>
>> Because of the automatic daily merges, mozilla-x-alt is always a copy of
>> mozilla-x with the addition of the project changes; changes landed to
>> the normal repo quickly propagate to the "alt" repo.
> Having read lots of this discussion, I think there is probably an
> alternate way to achieve these goals. The basic principle of your
> proposal is to maintain a codebase without the TI code.

Yes.

What it leaves out (and what I think all proposals so far leave out) is
user testing of the without-TI code base. Personally, I think it would
be great if we could enhance our capabilities so that we could test two
different things at once and understand the results. I think that would
have all sorts of applications in comparing features, UI elements, or
tuning parameters. It seems that people who have commented on that issue
so far think it's too complicated to test multiple browser variants
simultaneously, but I do hope that even if not now, we consider doing
that at some point.

> I think we can do this today:
>
> When we believe it is ready for nightly users, merge TI into nightly.
> Maintain a separate project branch "nightly-without-ti" where TI is
> immediately "backed out" again.
>
> The TI team will be responsible for maintaining the nightly-without-ti
> branch.

I think this means merging from nightly into nightly-without-ti from
time to time. Is that it, or is there more?

> This probably means that other large JS landings which might conflict
> should be avoided during this time period.

Sounds reasonable.

> When we get to Aurora, release drivers and the TI team decide whether
> we believe the feature is ready for the aurora train. If it is, we
> keep an aurora-without-ti branch active so that we can flip TI off if
> blocking issues are found. Similar for the beta branch.

How does the flip-off work? Is it a merge from nightly-without-ti to
nightly? Which would bring in the "undo" changeset and otherwise not
really do anything, because all the other changesets in it came from
nightly in the first place? Can merge conflicts happen here, or not?

> This avoids a fair bit of release engineering headache switching users
> between release channels and dealing with mechanics issues such as
> download links, localization repos, etc, while keeping options open
> for disabling TI throughout the release cycle.
>

> The downside is that the TI team will need to deal with merge
> conflicts on the *-without-ti branches, but it seems to me that this
> is roughly the same amount of work as in your proposal, where you
> merge into the -alt branch.

I think it's the same, except in your version the TI team can merge at
various times, where in the original it kind of has to be daily. So
that's an improvement.

Overall, AFAICT, this proposal has the same top-level features as the
original, except that it's much simpler mechanically. There is at least
one disadvantage, but it is very minor or even null: the switch back
requires some kind of repo action rather than just a redirect (although
frobulating the repos may actually be easier now). So it's practically a
strict improvement.

The only real problem I see is the aforementioned lack of user testing
on the without-TI branch. But if we aren't going to send out two
versions, then I think all schemes where we allow an effective backout
have that property. (It seems like an inherent risk of the train model.
We start a nightly cycle with browser in state Z. We then add features
to get Z + A + B + C. If B has problems, we back out to get Z + A + C,
which was never tested before.)

Dave

David Mandelin

unread,

Aug 10, 2011, 10:17:11 PM8/10/11

to Jonas Sicking, dev-pl...@lists.mozilla.org, Brian Hackett

On 8/9/2011 4:48 PM, Jonas Sicking wrote:

> On Tue, Aug 9, 2011 at 3:48 PM, David Mandelin <dman...@mozilla.com> wrote:
>> The main work items for the proposal are:
>>
>> - Set up the 2 alt repos
>> - Set up scripts to automatically merge normal to alt repos
>> - Redirect Nightly/Aurora/Beta users according to the schedule above

> Another approach to all this is to simply go with the "normal"
> approach, keeping in mind that we have the no-go option available for
> each release.
>
> In other words, if this lands on mozilla-central right after a merge,
> that leaves 6 weeks of nightlies and 6 weeks of aurora before we reach
> beta audience. That's almost 3 months of time to fix any bugs bad
> enough to prevent a beta. After that there's another 6 weeks of beta
> before release.

Some more data: In the case of TI, it's likely we'd be able to fix the
bugs and make it work in time. I think the likeliest no-go scenario
would be large perf regressions on important stuff, which could be fixed
by pref'ing it off. In that case, we'd be left with a 1-5% regression on
benchmarks, which would be pretty painful but not a disaster.

> If we any time after that realize it's not of high enough quality, we
> can always make a no-go decision for that train. By just making one
> no-go decision we get the total time between landing and shipping
> almost 6 months.

My sense is that skipping a release is an option, but that we'd really
like to avoid it and only want to take that risk to land something that
we could not live without. I have no idea if that's generally agreed
upon, though.

> Granted, if we after 6 months realize that there's simply too many
> bugs, or too wrong architecture, we are in a pretty bad place since
> the patch at that point lives in all branches and so backing it out is
> going to introduce a lot of risk even on aurora and beta channels. But
> it seems pretty unlikely to me that that would be the case.
>

Nick said not to bet the farm on that. I'm not sure how to evaluate that
risk myself: I do think it's a low-probability scenario, but it would
also be a giant mess. We could fix it, though--it would just introduce
some annoying delay into our entire cycle.

Dave

David Mandelin

unread,

Aug 10, 2011, 10:24:32 PM8/10/11

to Mike Shaver, dev-pl...@lists.mozilla.org, Brian Hackett

On 8/10/2011 8:41 AM, Mike Shaver wrote:
> On Tue, Aug 9, 2011 at 6:48 PM, David Mandelin <dman...@mozilla.com> wrote:
>> - Backing out is also hard because it touches so many areas of the JS
>> engine. This means a standard backout would generate a big merge that
>> might conflict with changes that landed on top of it.
> One thing we discussed in the early days of the rapid-release process
> was what to do with some of these cases, with fatvats being the
> thought experiment IIRC. This was in the context of tracemonkey
> merges, but I think the model holds here.
>

> An option that was mooted was the "nuclear option" of just rolling
> back to a previous version of the JS engine.

I kind of "pre-rejected" this option in my original proposal, but you
and Ehsan have re-raised it so it should get further discussion.

> How much does the TI
> work affect the rest of the browser, and how much other line-crossing
> work are we expecting to take after TI lands?

I don't think it affects the rest of the browser much at all. Brian
would have the real answer there. What's "line-crossing work"? Does that
mean other JS changes that affect the browser? It's hard to predict what
we might want to land of that sort. I think incremental GC is pretty
close, and that's probably in that category. But we might be able to
delay certain landings (although it runs the risk of bitrot).

> If TI explodes, we could back out the JS work to the changeset before
> we landed it, and then reland only the specific security/etc fixes
> that are critical to the release. It would delay a bunch of JS work
> for 6 weeks, but that seems like a better option than delaying
> everything by skipping a ship.
>

> (Backing out that way wouldn't be trivial, since you would have
> interleaved changesets from different parts of Gecko, but just
> stomping over the files with old versions and taking that as a
> changeset would probably work without inordinate pain.)

The backout is probably not that bad--Ehsan showed how to do it in his
post. But the relanding scares me: first, relanding the critical fixes
could get tricky if there are conflicts, although I don't actually
expect anything too serious there.

More problematically, wouldn't we have to reland everything that touched
JS that landed after TI, once nightly merged over? It seems like a huge
pile of land-and-merge work, and we might lose changesets. That's the
main thing that concerned me here.

>
> We would need a squeal point that wasn't too close to the aurora or
> beta branch day, so that we had time to reintegrate the critical
> changes, but a week seems like it would suffice for that.
>

> Mike

Dave

Jonas Sicking

unread,

Aug 11, 2011, 12:23:48 AM8/11/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On Aug 10, 2011 7:17 PM, "David Mandelin" <dman...@mozilla.com> wrote:
>
> On 8/9/2011 4:48 PM, Jonas Sicking wrote:
> > On Tue, Aug 9, 2011 at 3:48 PM, David Mandelin <dman...@mozilla.com>
wrote:

> >> The main work items for the proposal are:
> >>
> >> - Set up the 2 alt repos
> >> - Set up scripts to automatically merge normal to alt repos
> >> - Redirect Nightly/Aurora/Beta users according to the schedule above
> > Another approach to all this is to simply go with the "normal"
> > approach, keeping in mind that we have the no-go option available for
> > each release.
> >
> > In other words, if this lands on mozilla-central right after a merge,
> > that leaves 6 weeks of nightlies and 6 weeks of aurora before we reach
> > beta audience. That's almost 3 months of time to fix any bugs bad
> > enough to prevent a beta. After that there's another 6 weeks of beta
> > before release.
>
> Some more data: In the case of TI, it's likely we'd be able to fix the
> bugs and make it work in time. I think the likeliest no-go scenario
> would be large perf regressions on important stuff, which could be fixed
> by pref'ing it off. In that case, we'd be left with a 1-5% regression on
> benchmarks, which would be pretty painful but not a disaster.

A worst-case scenario of a 1-5% regression really does not sound too bad to
me. Of course, it depends on how likely such a scenario would be.

If performance is the big concern, then I would be a lot more worried about
some high profile website becoming unusably slow. But I done know if that is
a realistic scenario.

>
> > If we any time after that realize it's not of high enough quality, we
> > can always make a no-go decision for that train. By just making one
> > no-go decision we get the total time between landing and shipping
> > almost 6 months.
>
> My sense is that skipping a release is an option, but that we'd really
> like to avoid it and only want to take that risk to land something that
> we could not live without. I have no idea if that's generally agreed
> upon, though.

Given the potential that it sounds like TI has, I would say that its a
opportunity we could not live without.

/ Jonas

Mike Shaver

unread,

Aug 11, 2011, 9:55:32 AM8/11/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On Wed, Aug 10, 2011 at 10:24 PM, David Mandelin <dman...@mozilla.com> wrote:
> I don't think it affects the rest of the browser much at all. Brian
> would have the real answer there. What's "line-crossing work"? Does that
> mean other JS changes that affect the browser?

Things that need (non-trivial?) API use changes in the browser, yeah.
Ones where the new behaviour is problematic, at least (adding some
extra roots that became necessary would probably be fine to leave in
place, f.e.).

> It's hard to predict what
> we might want to land of that sort. I think incremental GC is pretty
> close, and that's probably in that category. But we might be able to
> delay certain landings (although it runs the risk of bitrot).

I have no doubt whatsoever that landing TI with our breath held will
require some other JS work to be held back until we know TI is going
to stick. Maintaining a tracemonkey-like branch in the wings that's
tracked daily à la inbound seems a reasonably low burden for the
handful of weeks we're talking about.

(And believe me, I am at LEAST as excited about incremental GC as any
other human alive.)

> More problematically, wouldn't we have to reland everything that touched
> JS that landed after TI, once nightly merged over? It seems like a huge
> pile of land-and-merge work, and we might lose changesets. That's the
> main thing that concerned me here.

I wouldn't reland everything; most of the stuff that was landed in
JS-land after TI would disappear and have to come back in after the
aurora cutover, along with TI. In my mental model that's about a
week:

- a week before aurora cutover, we decide whether to pull the cord on TI backout
- if yes
= revert js/src to pre-TI-landing
= reland only the critical (security/stability) fixes
= wait a week until aurora cutover
= re-revert js/src back to the pre-backout changeset
= merge the week's work in
= finish TI up

Mike

Brian William Hackett

unread,

Aug 11, 2011, 10:04:40 AM8/11/11

to David Mandelin, dev-pl...@lists.mozilla.org, Mike Shaver, Brian Hackett

----- Original Message -----
> I don't think it affects the rest of the browser much at all. Brian
> would have the real answer there.

TI changes outside of js/src are pretty minimal. The public API does not change at all, other than to introduce an option to enable TI, and a few new friend API functions are called in a few places in XPConnect (to ensure we can maintain precise types for properties of Window objects).

Brian

beltzner

unread,

Aug 11, 2011, 11:12:07 AM8/11/11

to Mike Shaver, dev-pl...@lists.mozilla.org, David Mandelin, Brian Hackett

> aurora cutover, along with TI. In my mental model that's about a
> week:
>
> - a week before aurora cutover, we decide whether to pull the cord on TI backout
> - if yes
> = revert js/src to pre-TI-landing
> = reland only the critical (security/stability) fixes
> = wait a week until aurora cutover
> = re-revert js/src back to the pre-backout changeset
> = merge the week's work in
> = finish TI up

I really like this model and plan, and moreover, like the idea of
feature implementation teams building timelines that have decision
checkpoints 1 or more weeks in advance of a repository cutover date
such that "crash-landings" of features are reduced.

cheers,
mike

Andrew McCreight

unread,

Aug 11, 2011, 11:55:09 AM8/11/11

to dev-pl...@lists.mozilla.org

----- Original Message -----

> I wouldn't reland everything; most of the stuff that was landed in
> JS-land after TI would disappear and have to come back in after the

> aurora cutover, along with TI. In my mental model that's about a
> week:
>
> - a week before aurora cutover, we decide whether to pull the cord on
> TI backout
> - if yes
> = revert js/src to pre-TI-landing
> = reland only the critical (security/stability) fixes
> = wait a week until aurora cutover
> = re-revert js/src back to the pre-backout changeset
> = merge the week's work in
> = finish TI up

One thing to keep in mind is that there is the occasional fix that changes both js/src and the main part of the browser in the same patch. Some cycle collector patches are like that. So in addition to relanding critical fixes, you'd have to either reland the js/src parts of those patches or backout the main browser part of them.

Andrew

>
> Mike
> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning

Jean-Marc Desperrier

unread,

Aug 11, 2011, 1:05:45 PM8/11/11

to

Johnathan Nightingale wrote:
> I don't believe we should accept irreversible changes. I do believe
> that some changes are very expensive to reverse, though, and we
> should find ways to assess and reduce the risk of having to pay that
> cost.

Reading what you say I feel that's it's indeed very important to try to
get only small chunks in, so what it's easier both to reverse it but
also to test it fully.

However if it's to hard to cut TI into small bits there's a viable
alternative : Do just like was done initially with JIT/TraceMonkey.
Have an option so that TI is not applied to everything and you can
select what will run with TI.

In the JIT/TraceMonkey, the separation was between content and chrome
JIT, but it could also be a white-list of site that opt-in to TI.
This mean in case TI doesn't live up to what is expected you can disable
it by reducing the white-list to zero in the final version.

Mike Shaver

unread,

Aug 11, 2011, 1:18:32 PM8/11/11

to Jean-Marc Desperrier, dev-pl...@lists.mozilla.org

On Thu, Aug 11, 2011 at 1:05 PM, Jean-Marc Desperrier <jmd...@gmail.com> wrote:
> However if it's to hard to cut TI into small bits there's a viable
> alternative : Do just like was done initially with JIT/TraceMonkey.
> Have an option so that TI is not applied to everything and you can select
> what will run with TI.

That's already in place. The issue is the compile-time changes (like
structure layout or relationships) that introduce risk, not the actual
inference engine, which can be pref controlled.

Mike

Benjamin Smedberg

unread,

Aug 11, 2011, 1:51:16 PM8/11/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett

On 8/10/2011 10:11 PM, David Mandelin wrote:
> Yes.
>
> What it leaves out (and what I think all proposals so far leave out) is
> user testing of the without-TI code base. Personally, I think it would
> be great if we could enhance our capabilities so that we could test two
> different things at once and understand the results. I think that would
> have all sorts of applications in comparing features, UI elements, or
> tuning parameters. It seems that people who have commented on that issue
> so far think it's too complicated to test multiple browser variants
> simultaneously, but I do hope that even if not now, we consider doing
> that at some point.

Yes. There are cost/benefit tradeoffs here, and at least now I think
that we're not anywhere near the point where we can effectively split
our audiences.

>
>> I think we can do this today:
>>
>> When we believe it is ready for nightly users, merge TI into nightly.
>> Maintain a separate project branch "nightly-without-ti" where TI is
>> immediately "backed out" again.
>>
>> The TI team will be responsible for maintaining the nightly-without-ti
>> branch.
> I think this means merging from nightly into nightly-without-ti from
> time to time. Is that it, or is there more?

Yes, and treat nightly-without-ti as a project branch to make sure that
it gets full automated testing.

--BDS

Ehsan Akhgari

unread,

Aug 11, 2011, 5:19:29 PM8/11/11

to David Mandelin, dev-pl...@lists.mozilla.org, Benjamin Smedberg, Brian Hackett

On 11-08-10 10:11 PM, David Mandelin wrote:
> What it leaves out (and what I think all proposals so far leave out) is
> user testing of the without-TI code base. Personally, I think it would
> be great if we could enhance our capabilities so that we could test two
> different things at once and understand the results. I think that would
> have all sorts of applications in comparing features, UI elements, or
> tuning parameters. It seems that people who have commented on that issue
> so far think it's too complicated to test multiple browser variants
> simultaneously, but I do hope that even if not now, we consider doing
> that at some point.

Does it make sense for us to get TI builds hosted somewhere and blog
about them and make some noise and hope to get feedback from the
volunteer testers before we land TI on trunk? I know that the UX and
DevTools teams are doing that right now, and I think the Graphics team
has tried this before. But I don't know if it's the right thing to do
in case of TI...

Ehsan

Mark Banner

unread,

Aug 11, 2011, 6:55:50 PM8/11/11

to

I'd be happy to point the Thunderbird try server at an appropriate repo
and run it through the Thunderbird unit tests (which have tended to find
JS issues in the past).

Standard8

David Mandelin

unread,

Aug 11, 2011, 7:02:37 PM8/11/11

to dev-pl...@lists.mozilla.org, Ehsan Akhgari, Christian Legnitto, Brian Hackett

On 8/11/2011 2:19 PM, Ehsan Akhgari wrote:
> On 11-08-10 10:11 PM, David Mandelin wrote:
>> What it leaves out (and what I think all proposals so far leave out) is
>> user testing of the without-TI code base. Personally, I think it would
>> be great if we could enhance our capabilities so that we could test two
>> different things at once and understand the results. I think that would
>> have all sorts of applications in comparing features, UI elements, or
>> tuning parameters. It seems that people who have commented on that issue
>> so far think it's too complicated to test multiple browser variants
>> simultaneously, but I do hope that even if not now, we consider doing
>> that at some point.
>
> Does it make sense for us to get TI builds hosted somewhere and blog
> about them and make some noise and hope to get feedback from the
> volunteer testers before we land TI on trunk? I know that the UX and
> DevTools teams are doing that right now, and I think the Graphics team
> has tried this before. But I don't know if it's the right thing to do
> in case of TI...

It kind of makes sense, but I think the volume of testing we would get
from that is very low.

Anyway, I think we have 3 live proposals at this point (#3 being a
better version of the original):

1. Land to nightly, skip a 6-week release if needed. (jonas)

+ very simple
- may have to skip a release
- may have to take regressions we ordinarily wouldn't

2. Land to nightly. If we decide no-go, roll back JS changes, undo any
Gecko changes that go along with them, reland critical JS changes. (shaver)

- may be a lot of work to reland critical fixes (this problem gets
worse as we go through
Aurora, Beta, etc) and un-undo other work

3. Land to nightly. Maintain without-ti branch on the side with
Tinderbox coverage. (bsmedberg)

+ automated testing of without-ti version
- may be a lot of work to maintain without-ti branch

I still prefer #3, because it's the one that seems to have the least
downside risk (in exchange for more work spread out over the landing
period). But all 3 seem viable, and all 3 seem to have some level of
support. Now, how do we make a decision via email thread?

Dave

Mike Shaver

unread,

Aug 12, 2011, 1:38:49 PM8/12/11

to David Mandelin, dev-pl...@lists.mozilla.org, Brian Hackett, Ehsan Akhgari, Christian Legnitto

On Thu, Aug 11, 2011 at 7:02 PM, David Mandelin <dman...@mozilla.com> wrote:
> Now, how do we make a decision via email thread?

You're the module owner, and you're on the hook for the results, so
you make the call. Just let people know what it is; you've certainly
gone out of your way to get feedback and advice.

Mike

David Mandelin

unread,

Aug 12, 2011, 9:32:44 PM8/12/11

to dev-pl...@lists.mozilla.org, Brian Hackett

OK. Brian and I have decided to go with #3: land to mozilla-central, and
Brian will maintain a repo mozilla-central-alternate (or similar name)
without type inference as a backup plan.

A couple of predictions, to test myself:

- merge work will be relatively light, with few conflicts: 1-2 hours of
work per week if done manually, less if automated.
- TI will stick and we'll ship it in Fx9.

We should do a debrief afterward to see how things went and provide
experience for future big landings.

Dave