- Vlad
Based on Argo's raw data log, the checkin range for the regression that
I get is:
http://bonsai.mozilla.org/cvsquery.cgi?date=explicit&mindate=2006%3A10%3A12%3A10%3A04%3A38&maxdate=2006%3A10%3A12%3A10%3A46%3A12
which includes the patch for bug 248612
<https://bugzilla.mozilla.org/show_bug.cgi?id=248612>. That change
reminds me a little of bug 337516, which showed that tweaking that code
had a fairly significant impact on Txul. I think it's unlikely that it
would affect startup that much, but it probably wouldn't be a bad idea
to back it out and see, especially considering that the other checkins
in that range are pretty unlikely to have affected Firefox load time.
This all assumes that that regression range is correct, and that no
external factors affected argo, of course.
Gavin
Regards,
Martijn
On 10/14/06, Martijn <martijn...@gmail.com> wrote:
> Ah, I guess the patch for bug 248612 could have caused this, indeed,
> by removing the timer code.
> I'm wondering if this is a case that falls under the question that
> dbaron asked at
> https://bugzilla.mozilla.org/show_bug.cgi?id=337516#c11 :
> "
> So is this a regression that actually affects when the new window is usable for
> the end user, or is it only a regression in what we're measuring because the
> point that we're measuring moved within the process of getting a new window
> ready? If it's not a regression that actually affects user-percieved
> performance, why should we care about fixing it? The performance tests are
> simply a tool for us to help measure what we care about; they're not actually
> what we care about.
> "
>
> Regards,
> Martijn
>
> > Gavin
> >
>
I'm glad you posted this. I saw the regression and since it was rather
odd (only happening on argo and Tp dropped and Ts went up) I figured
that it was a problem with the tinderbox itself.
It's really weird that it brought down Tp and made Ts go up...
On a related note, it looks like Vlad's patch for bug 356235 caused a Tp
hit on Linux. I hope there's a bug filed for that. If not, I'll probably
file one.
-Adam
On Oct 14, 4:31 pm, Adam Guthrie <ispi...@gmail.com> wrote:
> Vladimir Vukicevic wrote:
> > While looking at
> >http://graphs.mozilla.org/#spst=range&spss=1160608348.96&spse=1160810...
> > I noticed that there was a 10% startup regression on 10/12I'm glad you posted this. I saw the regression and since it was rather
> odd (only happening on argo and Tp dropped and Ts went up) I figured
> that it was a problem with the tinderbox itself.
>
> It's really weird that it brought down Tp and made Ts go up...
>
> On a related note, it looks like Vlad's patch for bug 356235 caused a Tp
> hit on Linux. I hope there's a bug filed for that. If not, I'll probably
> file one.
Looking at it is on my todo list for Monday; it was very much not what
I expected, especially given the drops on both Windows and Mac -- my
patch should have significantly reduced the amount of duplicate work in
text rendering, so there may be some issue with the Pango code that
causes badness if a text run is reused.
However, regarding argo itself, I think the right bug to fix is 356522
-- just turn tests off on argo. The linux test machine is showing
"expected" results, that is no Ts hit or Tp improvement (even more
bizzarre) from Martijn's patch, but it still shows the Tp hit from my
patch clearly (which is sensible, since my patch is directly in Tp's
path).
Realistically, I don't think anyone's going to investigate which
gremlins are eating argo -- so in this particular case I think we
should sweep argo's tests under the rug, and that Martijn should
probably re-land his patch (and cite no Ts issues on any other
tinderbox).
- Vlad
I don't know... I just don't see how we can write this off so easily. We
have solid proof that Martijn's patch is responsible for the Ts
regression and Tp gain.
Argo has always given very sensitive numbers, and in most other
occasions it's matched what the other tinderboxes say; e.g. when Ts
regressed after the new theme landing.
I'm not an expert in the subject, but as far as I know Martijn touched a
function that is called each time we load a new page --
onLocationChange. And as Gavin mentioned, that exact area of the code
(the setTimeout call Martijn removed) is quite sensitive (bug 337516).
As for the Ts hit, I am really not sure how to explain that. Martijn
added two <field>s near the onDragOver function and removed one near the
top of the file. Would where those were have any affect on how long it
took to add them?
I really think that this could possibly say that argo is the only
tinderbox that can provide reliable and accurate Ts numbers (as Gavin
mentioned on IRC).
I just realized this, too: The Tp drop was around 30ms on argo, but it
looks really big because the range of Tp numbers is only around ~5ms.
Now take bl-bldlnx01. Its Tp numbers range is around ~30ms, so even if
there was a Tp gain of 30ms, it wouldn't show it because the range is so
big.
-Adam
On Oct 15, 7:50 pm, Adam Guthrie <ispi...@gmail.com> wrote:
> Vladimir Vukicevic wrote:However, regarding argo itself, I think the right bug to fix is 356522
>
> > -- just turn tests off on argo. The linux test machine is showing
> > "expected" results, that is no Ts hit or Tp improvement (even more
> > bizzarre) from Martijn's patch, but it still shows the Tp hit from my
> > patch clearly (which is sensible, since my patch is directly in Tp's
> > path).I don't know... I just don't see how we can write this off so easily. We
> have solid proof that Martijn's patch is responsible for the Ts
> regression and Tp gain.
>
> Argo has always given very sensitive numbers, and in most other
> occasions it's matched what the other tinderboxes say; e.g. when Ts
> regressed after the new theme landing.
>
> I'm not an expert in the subject, but as far as I know Martijn touched a
> function that is called each time we load a new page --
> onLocationChange. And as Gavin mentioned, that exact area of the code
> (the setTimeout call Martijn removed) is quite sensitive (bug 337516).
>
> As for the Ts hit, I am really not sure how to explain that. Martijn
> added two <field>s near the onDragOver function and removed one near the
> top of the file. Would where those were have any affect on how long it
> took to add them?
Sure, except that it should have affected all the boxes running Ts
equally -- that part wasn't in platform-specific code, as far as I can
tell, and the xul/xbl parsing code is the same everywhere.
> I really think that this could possibly say that argo is the only
> tinderbox that can provide reliable and accurate Ts numbers (as Gavin
> mentioned on IRC).
>
> I just realized this, too: The Tp drop was around 30ms on argo, but it
> looks really big because the range of Tp numbers is only around ~5ms.
> Now take bl-bldlnx01. Its Tp numbers range is around ~30ms, so even if
> there was a Tp gain of 30ms, it wouldn't show it because the range is so
> big.
No, you can't look at it that way; the test machine is significantly
slower (CPU and everything-wise) than argo itself. The actual value
itself (30ms/5ms/etc.) is meaningless; only the percentage change is
significant. argo Ts went from 920-ish -> 1010-ish, which is around
9.7%. The test machine should have gone from 2190 or so to around
2400, but it didn't. argo Tp went from 229 -> 199 (with maritjn's
patch); the test machine should have done something like 750 -> 651.
Ts on both machines has a similar amount of jitter as well, so both
machines seem to be equally precise.
However, as I keep saying, Tp itself sucks -- there is a nonzero
component that is highly hardware dependant and almost completely out
of our control: the low level rendering. I'd like to create a pageload
test that tests just the parts that are 100% CPU bound and never touch
the gfx hardware at all, but I just don't have time. One way to do
this would be to take my recent gfxTextRun cache work, and allow you to
prepopulate the cache with the full set of text needed for a run --
this will reduce text measurement time to 0, but would still result in
accurate rendering. Then, you skip the actual painting step --
initially we can just skip actually painting anything from an expose
handler, but that wouldn't measure display list system overhead; so we
could create a null cairo surface that just ignores all operations.
Doing those things would give us a dead-accurate Tp number of just our
own code.
- vlad
Just to be sure, I'm asking again: I can re-land the patch, right?
(knowing that it regresses Ts on argo)
Or does anybody else object?
Regards,
Martijn
> - Vlad
>
> _______________________________________________
> dev-performance mailing list
> dev-per...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-performance
>
In my opinion this means we should be running the test on a variety of "typical
user hardware" (at least on low-end, medium, and high-end graphics hardware).
> Doing those things would give us a dead-accurate Tp number of just our
> own code.
This would be an interesting number for academic purposes, but we do care about
things like our code calling cairo stuff that's slow, or calling cairo in a dumb
way (not precisely slowness "in our own code"), but our fault anyway. Similar
for our code doing silly things with images (anyone recall the DIB vs DDB thing?).
In the end, we only care about the numbers insofar as they reflect what the user
sees and _maybe_ what various "performance test suites" claim about browser
comparisons. In both cases, actual hardware happens.
-Boris
chris h.
Which reminds me. There are several resposiveness-vs-performance frobs
in our layout engine that use wall-clock time and were tuned back in
2000 or so on then-current hardware. We should really revisit them one
of these days...
-Boris
I absolutely agree, and I wasn't arguing against that.
>> Doing those things would give us a dead-accurate Tp number of just our
>> own code.
>
> This would be an interesting number for academic purposes, but we do
> care about things like our code calling cairo stuff that's slow, or
> calling cairo in a dumb way (not precisely slowness "in our own code"),
> but our fault anyway. Similar for our code doing silly things with
> images (anyone recall the DIB vs DDB thing?).
>
> In the end, we only care about the numbers insofar as they reflect what
> the user sees and _maybe_ what various "performance test suites" claim
> about browser comparisons. In both cases, actual hardware happens.
Yes, but it would be much more useful for us to have a number that is
"ideal-world" Tp, to compare to real-world Tp. We could (and should)
run the test on the same hardware even, one measuring ideal (no
painting) and one measuring real. That's really the information that I
want, and would help us better deal with issues like this 10% on argo.
If argo's ideal number moves, but no other machine's ideal number moves,
then something is very fishy, since they'd all be executing identical
code. Right now, we have no way of knowing.
- Vlad
Ah, I see. Yeah, that would be awesome!
-Boris
Rob
I wonder how much of that would be addressed by simply having a profiler
run while we're running Tp tests. That way you can see how much time is
spent in text measurement and whatnot, versus how much is spent in
painting. This way we don't have to set up tinderboxes with different
parts of the code disabled.
/ Jonas