Talos Regression: Ts, MED Dirty Profile increase 11.40% on Leopard Firefox

Nobody

unread,

Nov 10, 2009, 2:51:02 AM11/10/09

to dev-tree-...@lists.mozilla.org

Regression: Ts, MED Dirty Profile increase 11.40% on Leopard Firefox

Previous results: 972.632 from build 20091109170344 of revision 86a93c8c6a87 at 2009-11-09 17:35:00 on talos-rev2-leopard02 run # 0

New results: 1083.53 from build 20091109170934 of revision 53689357009f at 2009-11-09 19:01:00 on talos-rev2-leopard05 run # 0

Suspected checkin range: from revision 86a93c8c6a87 to revision 53689357009f

Johnathan Nightingale

unread,

Nov 10, 2009, 9:30:31 AM11/10/09

to Robert O'Callahan, Jonas Sicking, dev-tree-...@lists.mozilla.org

So, this window has one Jonas check-in and a bunch of Roc.

It is a ~100ms regression on Ts, but there is a ~100ms Ts Shutdown win right alongside it, which suggests to me that something was just moved to the other side of the goalposts?

For those following along at home, my understanding of our Ts/shutdown flow is:

[start] <--- Ts ---> [fire load event][shut down] <-- Ts shutdown -->[browser exits]

So given the numbers involved here, it looks to me like something was just moved earlier in the sequence, so that it's getting scored against one metric instead of the other. Roc, Jonas, does that seem likely?

Aside: this is a bad analysis situation to be in. It means we don't respond to regressions as quickly as we should, because we have to wait to see if it's "real" or just moving the goal posts. Any clever ideas on fixing this? I suspect the "real" answer is to make Ts more accurately reflect total startup time - my guess is that some things we would intuitively count as startup are happening after that load event fires, and that if Ts measured the actual end of startup churn, there wouldn't be this problem. But I know it's been discussed before, so I'd welcome any insights that have been generated there.

J

_______________________________________________
dev-tree-management mailing list
dev-tree-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tree-management

---

Johnathan Nightingale

Human Shield

joh...@mozilla.com

Boris Zbarsky

unread,

Nov 10, 2009, 9:41:33 AM11/10/09

to

On 11/10/09 9:30 AM, Johnathan Nightingale wrote:
> I suspect the "real" answer is to make Ts more accurately
> reflect total startup time - my guess is that some things we would
> intuitively count as startup are happening after that load event fires,

Quite on purpose, in fact; see the delayedStartup function in
browser.js. I've never quite understood what the purpose of this was,
other than to effectively game Ts.

Also note that some things are started with a several-second lag after
delayedStartup. This makes a lot more sense: lets us get the window up
there so the user can interact with it and then start things that aren't
immediately needed as we go. But these can run into Ts shutdown,
depending on how that's measured and how fast the machine is in
comparison to the timeouts set here.

-Boris

Mike Beltzner

unread,

Nov 10, 2009, 9:57:26 AM11/10/09

to Boris Zbarsky, dev-tree-...@lists.mozilla.org

On 2009-11-10, at 9:41 AM, Boris Zbarsky wrote:

> On 11/10/09 9:30 AM, Johnathan Nightingale wrote:
>> I suspect the "real" answer is to make Ts more accurately
>> reflect total startup time - my guess is that some things we would
>> intuitively count as startup are happening after that load event
>> fires,
>
> Quite on purpose, in fact; see the delayedStartup function in
> browser.js. I've never quite understood what the purpose of this
> was, other than to effectively game Ts.

I feel like we've been here before, but it really depends on what you
want Ts to measure.

If you want to measure the full time it takes for the browser to
initialize all of its services and rest in an idle state, then you
should include the items included in delayedStartup, etc. This is a
valuable measurement, as it will encourage us to optimize and
streamline those functions and has an obvious effect on browser and
system performance.

If you want to measure the time that a human perceives as "between
clicking on the icon and being able to browse the web" then you want
to stop measuring at the point where the UI is rendered, available and
fully responsive to user input. That's *before* delayedStartup, as I
understand it.

At the risk of being lynched, I think we should be measuring both
numbers. :)

> Also note that some things are started with a several-second lag
> after delayedStartup. This makes a lot more sense: lets us get the
> window up there so the user can interact with it and then start
> things that aren't immediately needed as we go. But these can run
> into Ts shutdown, depending on how that's measured and how fast the
> machine is in comparison to the timeouts set here.

Good point.

cheers,
mike

Boris Zbarsky

unread,

Nov 10, 2009, 10:04:38 AM11/10/09

to

On 11/10/09 9:57 AM, Mike Beltzner wrote:
> If you want to measure the time that a human perceives as "between
> clicking on the icon and being able to browse the web" then you want to
> stop measuring at the point where the UI is rendered, available and
> fully responsive to user input. That's *before* delayedStartup, as I
> understand it.

Is it also before BrowserStartup()? This last ends with:

1060 setTimeout(delayedStartup, 0, isLoadingBlank, mustLoadSidebar);

and I can assure you that if the browser is not fully responsive to user
input before BrowserStartup is entered it will complete delayedStartup
before becoming so, unless the user is superhuman and can get their
input in that 10ms window.

BrowserStartup() is the load event handler for the window in browser.xul.

If I remember the window opening sequence correctly (and there's no
guarantee of that by any means), there's no way a user can be
interacting with anything there before the load event is fired.

> At the risk of being lynched, I think we should be measuring both
> numbers. :)

Agreed.

-Boris

Jonas Sicking

unread,

Nov 10, 2009, 10:51:44 AM11/10/09

to Johnathan Nightingale, Robert O'Callahan, dev-tree-...@lists.mozilla.org

My patch should only affect content that sets the 'async' attribute on
<script> elements. This is a new feature from HTML5 so there really
shouldn't be a lot of existing content out there that's affected. And
definitely not during Ts/Ts shutdown as none of our internal <script>s
have an async attribute set.

On the matter of how to deal with this type of thing more generally:
It's hard to say until we know specifically what happened here. One
thing that could help is to have a delay between the startup and
shutdown to let everything stabalize. I did something similar for Tp
where we force a couple of cycle collections between each pageload in
order to prevent cycle collection to count towards pageload time (IIRC
this also made the numbers more stable, since the cycle collection
would happen randomly). During a normal page-load we avoid cycle
collecting, however since the test harness loaded so many pages back
to back the cycle collector eventually forced a cycle collection even
during page load.

Simply counting post-startup-churn towards startup doesn't seem right
either since I thought we wanted to encourage getting something in the
users face as soon as possible. At least if the UI is somewhat
responsive during the churn.

/ Jonas

On Tue, Nov 10, 2009 at 6:30 AM, Johnathan Nightingale
<joh...@mozilla.com> wrote:
> So, this window has one Jonas check-in and a bunch of Roc.
> It is a ~100ms regression on Ts, but there is a ~100ms Ts Shutdown win right
> alongside it, which suggests to me that something was just moved to the
> other side of the goalposts?
> For those following along at home, my understanding of our Ts/shutdown flow
> is:
> [start] <--- Ts ---> [fire load event][shut down] <-- Ts shutdown
> -->[browser exits]
> So given the numbers involved here, it looks to me like something was just
> moved earlier in the sequence, so that it's getting scored against one
> metric instead of the other. Roc, Jonas, does that seem likely?
> Aside: this is a bad analysis situation to be in. It means we don't respond
> to regressions as quickly as we should, because we have to wait to see if
> it's "real" or just moving the goal posts. Any clever ideas on fixing this?

> I suspect the "real" answer is to make Ts more accurately reflect total
> startup time - my guess is that some things we would intuitively count as

Robert O'Callahan

unread,

Nov 10, 2009, 1:13:04 PM11/10/09

to

On 11/11/09 3:30 AM, Johnathan Nightingale wrote:
> It is a ~100ms regression on Ts, but there is a ~100ms Ts Shutdown win
> right alongside it, which suggests to me that something was just moved
> to the other side of the goalposts?

This looks very much like bug 519893 biting us again: Mac machines
(Leopard only?) with no screen attached periodically pause for 100ms
when you draw. Any change in when drawing happens can move that 100ms
from one place to another in our performance tests.

> Aside: this is a bad analysis situation to be in. It means we don't
> respond to regressions as quickly as we should, because we have to wait
> to see if it's "real" or just moving the goal posts. Any clever ideas on
> fixing this?

In this case the fix is fairly straightforward, we just have to attach
dongles to all our performance test machines to make them think there's
a screen attached. That's bug 519893.

Rob

Robert O'Callahan

unread,

Nov 10, 2009, 1:24:09 PM11/10/09

to

On 11/11/09 7:13 AM, Robert O'Callahan wrote:
> This looks very much like bug 519893 biting us again: Mac machines
> (Leopard only?) with no screen attached periodically pause for 100ms
> when you draw. Any change in when drawing happens can move that 100ms
> from one place to another in our performance tests.

And by the way, let me again thank Markus Stange for figuring this out.
He was in the office with us during the process and it was a heroic effort.

Rob