10/12 10% argo Ts regression?

Vladimir Vukicevic

unread,

Oct 14, 2006, 4:02:57 AM10/14/06

to

While looking at
http://graphs.mozilla.org/#spst=range&spss=1160608348.96&spse=1160810940&spstart=1158278552&spend=1160810940&bpst=cursor&bpstart=1160608348.96&bpend=1160810940&m1tid=27&m1bl=0&m1avg=0
I noticed that there was a 10% startup regression on 10/12 -- the
tinderbox checkins between can be seen at
http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox&hours=24&maxdate=1160682979&legend=0
. Any idea why this is happening? The commits at 10:04:52 are the
only ones that seem to have come in between, and I can't see how either
would have spiked Ts time.

- Vlad

Gavin Sharp

unread,

Oct 14, 2006, 5:26:10 PM10/14/06

to Vladimir Vukicevic, Martijn

Vladimir Vukicevic wrote:
> Any idea why this is happening?

Based on Argo's raw data log, the checkin range for the regression that
I get is:
http://bonsai.mozilla.org/cvsquery.cgi?date=explicit&mindate=2006%3A10%3A12%3A10%3A04%3A38&maxdate=2006%3A10%3A12%3A10%3A46%3A12

which includes the patch for bug 248612
<https://bugzilla.mozilla.org/show_bug.cgi?id=248612>. That change
reminds me a little of bug 337516, which showed that tweaking that code
had a fairly significant impact on Txul. I think it's unlikely that it
would affect startup that much, but it probably wouldn't be a bad idea
to back it out and see, especially considering that the other checkins
in that range are pretty unlikely to have affected Firefox load time.

This all assumes that that regression range is correct, and that no
external factors affected argo, of course.

Gavin

Martijn

unread,

Oct 14, 2006, 6:29:31 PM10/14/06

to Gavin Sharp, Vladimir Vukicevic, dev-per...@lists.mozilla.org

Ok, I backed out the patch from
https://bugzilla.mozilla.org/show_bug.cgi?id=248612 so let's see if
that fixes the Ts regression.

Regards,
Martijn

On 10/14/06, Martijn <martijn...@gmail.com> wrote:

> Ah, I guess the patch for bug 248612 could have caused this, indeed,
> by removing the timer code.
> I'm wondering if this is a case that falls under the question that
> dbaron asked at
> https://bugzilla.mozilla.org/show_bug.cgi?id=337516#c11 :
> "
> So is this a regression that actually affects when the new window is usable for
> the end user, or is it only a regression in what we're measuring because the
> point that we're measuring moved within the process of getting a new window
> ready? If it's not a regression that actually affects user-percieved
> performance, why should we care about fixing it? The performance tests are
> simply a tool for us to help measure what we care about; they're not actually
> what we care about.
> "
>
> Regards,
> Martijn
>
> > Gavin
> >
>

Adam Guthrie

unread,

Oct 14, 2006, 7:31:10 PM10/14/06

to

Vladimir Vukicevic wrote:
> While looking at
> http://graphs.mozilla.org/#spst=range&spss=1160608348.96&spse=1160810940&spstart=1158278552&spend=1160810940&bpst=cursor&bpstart=1160608348.96&bpend=1160810940&m1tid=27&m1bl=0&m1avg=0
> I noticed that there was a 10% startup regression on 10/12

I'm glad you posted this. I saw the regression and since it was rather
odd (only happening on argo and Tp dropped and Ts went up) I figured
that it was a problem with the tinderbox itself.

It's really weird that it brought down Tp and made Ts go up...

On a related note, it looks like Vlad's patch for bug 356235 caused a Tp
hit on Linux. I hope there's a bug filed for that. If not, I'll probably
file one.

-Adam

Vladimir Vukicevic

unread,

Oct 15, 2006, 3:40:34 AM10/15/06

to

On Oct 14, 4:31 pm, Adam Guthrie <ispi...@gmail.com> wrote:
> Vladimir Vukicevic wrote:
> > While looking at

> >http://graphs.mozilla.org/#spst=range&spss=1160608348.96&spse=1160810...
> > I noticed that there was a 10% startup regression on 10/12I'm glad you posted this. I saw the regression and since it was rather

> odd (only happening on argo and Tp dropped and Ts went up) I figured
> that it was a problem with the tinderbox itself.
>
> It's really weird that it brought down Tp and made Ts go up...
>
> On a related note, it looks like Vlad's patch for bug 356235 caused a Tp
> hit on Linux. I hope there's a bug filed for that. If not, I'll probably
> file one.

Looking at it is on my todo list for Monday; it was very much not what
I expected, especially given the drops on both Windows and Mac -- my
patch should have significantly reduced the amount of duplicate work in
text rendering, so there may be some issue with the Pango code that
causes badness if a text run is reused.

However, regarding argo itself, I think the right bug to fix is 356522
-- just turn tests off on argo. The linux test machine is showing
"expected" results, that is no Ts hit or Tp improvement (even more
bizzarre) from Martijn's patch, but it still shows the Tp hit from my
patch clearly (which is sensible, since my patch is directly in Tp's
path).

Realistically, I don't think anyone's going to investigate which
gremlins are eating argo -- so in this particular case I think we
should sweep argo's tests under the rug, and that Martijn should
probably re-land his patch (and cite no Ts issues on any other
tinderbox).

- Vlad

Adam Guthrie

unread,

Oct 15, 2006, 10:50:39 PM10/15/06

to

Vladimir Vukicevic wrote:
However, regarding argo itself, I think the right bug to fix is 356522
> -- just turn tests off on argo. The linux test machine is showing
> "expected" results, that is no Ts hit or Tp improvement (even more
> bizzarre) from Martijn's patch, but it still shows the Tp hit from my
> patch clearly (which is sensible, since my patch is directly in Tp's
> path).

I don't know... I just don't see how we can write this off so easily. We
have solid proof that Martijn's patch is responsible for the Ts
regression and Tp gain.

Argo has always given very sensitive numbers, and in most other
occasions it's matched what the other tinderboxes say; e.g. when Ts
regressed after the new theme landing.

I'm not an expert in the subject, but as far as I know Martijn touched a
function that is called each time we load a new page --
onLocationChange. And as Gavin mentioned, that exact area of the code
(the setTimeout call Martijn removed) is quite sensitive (bug 337516).

As for the Ts hit, I am really not sure how to explain that. Martijn
added two <field>s near the onDragOver function and removed one near the
top of the file. Would where those were have any affect on how long it
took to add them?

I really think that this could possibly say that argo is the only
tinderbox that can provide reliable and accurate Ts numbers (as Gavin
mentioned on IRC).

I just realized this, too: The Tp drop was around 30ms on argo, but it
looks really big because the range of Tp numbers is only around ~5ms.
Now take bl-bldlnx01. Its Tp numbers range is around ~30ms, so even if
there was a Tp gain of 30ms, it wouldn't show it because the range is so
big.

-Adam

Vladimir Vukicevic

unread,

Oct 16, 2006, 1:34:51 AM10/16/06

to

On Oct 15, 7:50 pm, Adam Guthrie <ispi...@gmail.com> wrote:
> Vladimir Vukicevic wrote:However, regarding argo itself, I think the right bug to fix is 356522
>
> > -- just turn tests off on argo. The linux test machine is showing
> > "expected" results, that is no Ts hit or Tp improvement (even more
> > bizzarre) from Martijn's patch, but it still shows the Tp hit from my
> > patch clearly (which is sensible, since my patch is directly in Tp's

> > path).I don't know... I just don't see how we can write this off so easily. We

> have solid proof that Martijn's patch is responsible for the Ts
> regression and Tp gain.
>
> Argo has always given very sensitive numbers, and in most other
> occasions it's matched what the other tinderboxes say; e.g. when Ts
> regressed after the new theme landing.
>
> I'm not an expert in the subject, but as far as I know Martijn touched a
> function that is called each time we load a new page --
> onLocationChange. And as Gavin mentioned, that exact area of the code
> (the setTimeout call Martijn removed) is quite sensitive (bug 337516).
>
> As for the Ts hit, I am really not sure how to explain that. Martijn
> added two <field>s near the onDragOver function and removed one near the
> top of the file. Would where those were have any affect on how long it
> took to add them?

Sure, except that it should have affected all the boxes running Ts
equally -- that part wasn't in platform-specific code, as far as I can
tell, and the xul/xbl parsing code is the same everywhere.

> I really think that this could possibly say that argo is the only
> tinderbox that can provide reliable and accurate Ts numbers (as Gavin
> mentioned on IRC).
>
> I just realized this, too: The Tp drop was around 30ms on argo, but it
> looks really big because the range of Tp numbers is only around ~5ms.
> Now take bl-bldlnx01. Its Tp numbers range is around ~30ms, so even if
> there was a Tp gain of 30ms, it wouldn't show it because the range is so
> big.

No, you can't look at it that way; the test machine is significantly
slower (CPU and everything-wise) than argo itself. The actual value
itself (30ms/5ms/etc.) is meaningless; only the percentage change is
significant. argo Ts went from 920-ish -> 1010-ish, which is around
9.7%. The test machine should have gone from 2190 or so to around
2400, but it didn't. argo Tp went from 229 -> 199 (with maritjn's
patch); the test machine should have done something like 750 -> 651.
Ts on both machines has a similar amount of jitter as well, so both
machines seem to be equally precise.

However, as I keep saying, Tp itself sucks -- there is a nonzero
component that is highly hardware dependant and almost completely out
of our control: the low level rendering. I'd like to create a pageload
test that tests just the parts that are 100% CPU bound and never touch
the gfx hardware at all, but I just don't have time. One way to do
this would be to take my recent gfxTextRun cache work, and allow you to
prepopulate the cache with the full set of text needed for a run --
this will reduce text measurement time to 0, but would still result in
accurate rendering. Then, you skip the actual painting step --
initially we can just skip actually painting anything from an expose
handler, but that wouldn't measure display list system overhead; so we
could create a null cairo surface that just ignores all operations.
Doing those things would give us a dead-accurate Tp number of just our
own code.

- vlad

Martijn

unread,

Oct 16, 2006, 6:04:25 AM10/16/06

to Vladimir Vukicevic, dev-per...@lists.mozilla.org

On 15 Oct 2006 00:40:34 -0700, Vladimir Vukicevic <vlad...@gmail.com> wrote:
> should sweep argo's tests under the rug, and that Martijn should
> probably re-land his patch (and cite no Ts issues on any other
> tinderbox).

Just to be sure, I'm asking again: I can re-land the patch, right?
(knowing that it regresses Ts on argo)
Or does anybody else object?

Regards,
Martijn

> - Vlad
>
> _______________________________________________
> dev-performance mailing list
> dev-per...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-performance
>

Boris Zbarsky

unread,

Oct 16, 2006, 9:25:42 AM10/16/06

to

Vladimir Vukicevic wrote:
> However, as I keep saying, Tp itself sucks -- there is a nonzero
> component that is highly hardware dependant and almost completely out
> of our control: the low level rendering.

In my opinion this means we should be running the test on a variety of "typical
user hardware" (at least on low-end, medium, and high-end graphics hardware).

> Doing those things would give us a dead-accurate Tp number of just our
> own code.

This would be an interesting number for academic purposes, but we do care about
things like our code calling cairo stuff that's slow, or calling cairo in a dumb
way (not precisely slowness "in our own code"), but our fault anyway. Similar
for our code doing silly things with images (anyone recall the DIB vs DDB thing?).

In the end, we only care about the numbers insofar as they reflect what the user
sees and _maybe_ what various "performance test suites" claim about browser
comparisons. In both cases, actual hardware happens.

-Boris

Chris Hofmann

unread,

Oct 16, 2006, 11:21:50 AM10/16/06

to Boris Zbarsky, dev-per...@lists.mozilla.org

When a lot of focus on performance work was getting started back in
2000-2001 page loading performance benchmarks showed the mozilla suite
betting IE on low end hardware, and IE would beat the suite on mid to
upper end hardware... IIRC, most of the suite's advantage was gained
through optimization in cache and some layout tuning decisions. I
don't think these kind of configuration differences are completely out
of our control, and they do matter to specific sets of users.

chris h.

Boris Zbarsky

unread,

Oct 16, 2006, 12:36:37 PM10/16/06

to

Chris Hofmann wrote:
> When a lot of focus on performance work was getting started back in
> 2000-2001 page loading performance benchmarks showed the mozilla suite
> betting IE on low end hardware, and IE would beat the suite on mid to
> upper end hardware...

Which reminds me. There are several resposiveness-vs-performance frobs
in our layout engine that use wall-clock time and were tuned back in
2000 or so on then-current hardware. We should really revisit them one
of these days...

-Boris

Vladimir Vukicevic

unread,

Oct 16, 2006, 3:13:24 PM10/16/06

to

Boris Zbarsky wrote:
> Vladimir Vukicevic wrote:
>> However, as I keep saying, Tp itself sucks -- there is a nonzero
>> component that is highly hardware dependant and almost completely out
>> of our control: the low level rendering.
>
> In my opinion this means we should be running the test on a variety of
> "typical user hardware" (at least on low-end, medium, and high-end
> graphics hardware).

I absolutely agree, and I wasn't arguing against that.

>> Doing those things would give us a dead-accurate Tp number of just our
>> own code.
>
> This would be an interesting number for academic purposes, but we do
> care about things like our code calling cairo stuff that's slow, or
> calling cairo in a dumb way (not precisely slowness "in our own code"),
> but our fault anyway. Similar for our code doing silly things with
> images (anyone recall the DIB vs DDB thing?).
>
> In the end, we only care about the numbers insofar as they reflect what
> the user sees and _maybe_ what various "performance test suites" claim
> about browser comparisons. In both cases, actual hardware happens.

Yes, but it would be much more useful for us to have a number that is
"ideal-world" Tp, to compare to real-world Tp. We could (and should)
run the test on the same hardware even, one measuring ideal (no
painting) and one measuring real. That's really the information that I
want, and would help us better deal with issues like this 10% on argo.
If argo's ideal number moves, but no other machine's ideal number moves,
then something is very fishy, since they'd all be executing identical
code. Right now, we have no way of knowing.

- Vlad

Boris Zbarsky

unread,

Oct 16, 2006, 4:01:58 PM10/16/06

to

Vladimir Vukicevic wrote:
> We could (and should) run the test on the same hardware even, one measuring ideal (no
> painting) and one measuring real. That's really the information that I
> want, and would help us better deal with issues like this 10% on argo.

Ah, I see. Yeah, that would be awesome!

-Boris

Robert O'Callahan

unread,

Oct 18, 2006, 4:35:43 PM10/18/06

to

If we had jprof running on some of our test machines, that would also
help us understand weird performance changes.
</broken-record>

Rob

Jonas Sicking

unread,

Oct 19, 2006, 9:19:48 PM10/19/06

to Vladimir Vukicevic

I wonder how much of that would be addressed by simply having a profiler
run while we're running Tp tests. That way you can see how much time is
spent in text measurement and whatnot, versus how much is spent in
painting. This way we don't have to set up tinderboxes with different
parts of the code disabled.

/ Jonas