Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

status of performance regressions on mozilla-central

3 views
Skip to first unread message

L. David Baron

unread,
Sep 27, 2008, 1:42:55 AM9/27/08
to dev-pl...@lists.mozilla.org
So yesterday morning there were substantial Tp regressions on all
platforms (barely noticeable on Linux, but significant elsewhere)
coinciding with a landing of sdwilsh's (bug 456910), and an even
more substantial Txul regression on Windows Vista only. (Talos
tests on trunk are running on Linux, Mac OS X 10.4, Mac OS X 10.5,
Windows XP, and Windows Vista.) This went unnoticed yesterday.

Later yesterday, sdwilsh landed the sqlite upgrade on top of that
patch.

This morning sdwilsh test-landed 4 changesets to reduce places fsync
calls, and backed them out due to a performance increase (I'm not
sure for what test).

Around the time of that first backout, I pointed out yesterday's
regression, and sdwilsh then backed out bug 456910, bug 417037, and
the sqlite upgrade.

(As an extra potentially-confounding factor, the first backout
created a problem where extra libraries present in the components
directory from the backed-out landing would cause leaks by their
presence. sdwilsh landed a patch to clobber dist/bin, but the patch
was wrong the first time around (removing dist/dist/bin instead),
and broke windows the second time around, so the third time I
changed it to just remove dist/bin/components/ to fix the tree.
However, this could have clobbered *other* libraries that have been
hanging around for arbitrarily long periods of time.)


Now that the performance numbers for this are all in, I've observed
the following, assuming that the changeset printed by talos is
accurate and I've kept track of everything I was looking at in the
tinderbox display correctly:

* sdwilsh's backout of 456910 and the sqlite changes fixed the
performance regression on Mac OS X 10.4 and 10.5 in the one build
generated right after he landed the backout, and then in the next
cycle (which only added a trivial reftest.list change), the
performance numbers were *even better*, more than erasing the
increases from yesterday.

* on Windows XP, the builds following the landing of the places
fsync work got us back the 20ms that we lost yesterday, but when
that was backed out the needle didn't move. Then when sdwilsh
backed out the sqlite upgrade and 456910, we got 20ms back again,
so again, we're a good bit ahead of where we were yesterday
morning. In other words, we're also ahead of where we started
yesterday morning, but the steps happened at different times.

* on Vista, however, the news is not so good. On Vista, there were
significant regressions yesterday not only in Tp (Tp3, to the
graph server) -- around 10-15%, but also in Txul -- around 35%
regression. Today, on Vista, the needle didn't move. Not at
all. We didn't get back anything from any of the things that
affected the other platforms.

At this point, I'm not sure how to proceed, or which numbers to
trust. (I'm *highly* suspicious of the changeset IDs printed by
talos on the tinderbox waterfall.) The tree is currently in pretty
bad shape, but listed as open for when it recovers. I'm not sure
whether it should be, given the Vista regression.

(This is the first time in a while that I found sheriffing to be a
full-time job. Fridays usually aren't so bad.)

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Frank Wein

unread,
Sep 27, 2008, 6:30:18 AM9/27/08
to
L. David Baron wrote:
[...]

> Now that the performance numbers for this are all in, I've observed
> the following, assuming that the changeset printed by talos is
> accurate and I've kept track of everything I was looking at in the
> tinderbox display correctly:
[...]

> * on Vista, however, the news is not so good. On Vista, there were
> significant regressions yesterday not only in Tp (Tp3, to the
> graph server) -- around 10-15%, but also in Txul -- around 35%
> regression. Today, on Vista, the needle didn't move. Not at
> all. We didn't get back anything from any of the things that
> affected the other platforms.
[...]

What about the other check-ins in that timeframe? I'm looking for
example at WINNT 6.0 talos mozilla-central nochrome qm-pvista-trunk04
now. The build with the lower (tp: 395.55) tp was id:20080925110153
rev:7b75a1f34434, the build following this id:20080925124346
rev:92cc665e9b84 had a higher tp (tp: 480.02). There was a security bug
check-in, Bug 456896, how much impact could this bug have on Tp (I do
not know that much about layout perf...)?

Frank

Frank Wein

unread,
Sep 27, 2008, 8:05:26 AM9/27/08
to
Frank Wein wrote:
[...]
> What about the other check-ins in that timeframe? I'm looking for
> example at WINNT 6.0 talos mozilla-central nochrome qm-pvista-trunk04
> now. The build with the lower (tp: 395.55) tp was id:20080925110153
> rev:7b75a1f34434, the build following this id:20080925124346
> rev:92cc665e9b84 had a higher tp (tp: 480.02). There was a security bug
> check-in, Bug 456896, how much impact could this bug have on Tp (I do
> not know that much about layout perf...)?
>
> Frank

Actually looks like this only has an effect when an element gets
focused, so not really something that would affect Tp.

Frank

Shawn Wilsher

unread,
Sep 27, 2008, 10:24:29 AM9/27/08
to L. David Baron
On 9/26/08 10:42 PM, L. David Baron wrote:
> This morning sdwilsh test-landed 4 changesets to reduce places fsync
> calls, and backed them out due to a performance increase (I'm not
> sure for what test).
One unit test appeared to have hung on windows (and only on windows).
We aren't really sure why since it doesn't seem to do that locally.

/sdwilsh

L. David Baron

unread,
Sep 27, 2008, 11:58:54 AM9/27/08
to dev-pl...@lists.mozilla.org
On Friday 2008-09-26 22:42 -0700, L. David Baron wrote:
> * on Vista, however, the news is not so good. On Vista, there were
> significant regressions yesterday not only in Tp (Tp3, to the
> graph server) -- around 10-15%, but also in Txul -- around 35%
> regression. Today, on Vista, the needle didn't move. Not at
> all. We didn't get back anything from any of the things that
> affected the other platforms.

However, given the overall graph of performance on the Windows Vista
machines over the past few months, I'm not sure that the numbers are
reliable enough to be confident there was a real regression. The
machines have gone through a number of unexplained state changes,
and also seem to go through gradual continuous increase, which
suggests that we might (still?) not be clobbering the profile for
each test run:
http://graphs.mozilla.org/graph.html#show=787087,787109,787113,1431842
http://graphs.mozilla.org/graph.html#show=787095,787118,787129,1431848

Does anybody know why these state changes occur?

(The first of these graphs isn't quite as scary as the Linux graph,
though:
http://graphs.mozilla.org/graph.html#show=395125,395135,395166,911694,1431032

Karl Tomlinson

unread,
Sep 28, 2008, 5:36:51 PM9/28/08
to
L. David Baron writes:

> * sdwilsh's backout of 456910 and the sqlite changes fixed the
> performance regression on Mac OS X 10.4 and 10.5 in the one build
> generated right after he landed the backout, and then in the next
> cycle (which only added a trivial reftest.list change), the
> performance numbers were *even better*, more than erasing the
> increases from yesterday.
>
> * on Windows XP, the builds following the landing of the places
> fsync work got us back the 20ms that we lost yesterday, but when
> that was backed out the needle didn't move. Then when sdwilsh
> backed out the sqlite upgrade and 456910, we got 20ms back again,
> so again, we're a good bit ahead of where we were yesterday
> morning. In other words, we're also ahead of where we started
> yesterday morning, but the steps happened at different times.

Being "even better" and "a good bit ahead" I think are due to
changes in the testing infrastructure:

https://bugzilla.mozilla.org/show_bug.cgi?id=457582

(I haven't looked into Vista/Txul, but I suspect that is another issue.)

0 new messages