I don't know why the add-ons manager causes an OSX Ts regression, can you help?

Dave Townsend

unread,

May 3, 2010, 7:36:43 PM5/3/10

to

So I've had an interesting weekend trying to narrow down just where the
new add-ons manager code is causing a Ts regression on OSX 10.5.8 (not
10.6, or Linux, or Windows mind you). So far my conclusions are that it
isn't the new add-ons manager code that is doing it but I need more
people to look at this and see if anything jumps out.

Let me start with some info for those not in the know. A Ts run consists
of 20 startups of Firefox (The first is with a clean profile, the latter
using the new profile it creates). The time measured is between
launching Firefox and it loading a mostly empty page. The largest time
is ignored and then an average is taken.

Here is a graph showing the normal times for those 20 runs (actually
only 19 are listed on the graphs server for some reason) in the most
recent Ts cycle on trunk: http://bit.ly/cp949Q

Looks fairly simple. The first run is long but that is when we register
components etc. in the new profile and do an app restart. The rest are
nice and stable at somewhere just over 650ms.

Now here is a graph from just after the new add-ons manager landed:
http://bit.ly/dl4zo6

Again, a long run on the first point (about the same as the previous
graph in fact). Then a few runs that are of about the same time as
previously. Then something strange happens. In just under half the
following runs the time is nearly 100ms higher than normal. Fairly
consistently too. This isn't a fluke either, the frequency and position
of the longer runs changes but every Ts test with the new extension
manager (on OSX 10.5.8) shows the regular low times mixed with higher
times, of a pretty consistent value.

Now I may be barking up the wrong tree but I don't think it is a stretch
to imagine that the regression is coming from the presence of those
higher values and something different must be happening during those
runs. Shouldn't be too hard to figure out right? Here is a problem graph
though: http://bit.ly/bHRzCg

That graph is what I see on the add-ons manager project branch when I
completely disable building of the add-ons manager. Gone is the high
first point because there is no building an extensions database etc. but
the bi-modal times remain. In fact I've run this test a bunch of times
and with no add-ons manager the results seem to be perfectly alternate
every time in this case.

A little more food for thought. Occasionally current trunk runs will see
similar high Ts values: http://bit.ly/9qNBol. My best guess so far is
that this has been happening for a while, rarely yet somehow some of the
small changes I made outside of the extension manager are tickling a
problem and making it more frequent.

The question is what could this be. Anybody got any ideas?

Boris Zbarsky

unread,

May 3, 2010, 9:08:35 PM5/3/10

to

On 5/3/10 7:36 PM, Dave Townsend wrote:
> Then something strange happens. In just under half the
> following runs the time is nearly 100ms higher than normal.

100ms on OSX? Sounds like shades of bug 519893 to me!

-Boris

Justin Dolske

unread,

May 3, 2010, 10:46:36 PM5/3/10

to

On 5/3/10 4:36 PM, Dave Townsend wrote:

> The question is what could this be. Anybody got any ideas?

Maybe look at file io (DTrace, dtruss/iosnoop) to see if anything
interesting differs between the fast and slow runs?

Does anything change in the profile between runs? Does it still happen
if you tar up the profile after the first run, and untar it for each
subsequent run?

Presumably you can't reproduce this on a normal trunk build? Try backing
out parts of your changes until you narrow down what's causing it? (If
it's not the code itself, perhaps it's a side effect... EG .jar size
blowing a cache, some CSS/image change, etc. Probably combined with
something intermittent.)

Justin

Dave Townsend

unread,

May 3, 2010, 11:01:59 PM5/3/10

to

On 5/3/10 19:46, Justin Dolske wrote:
> On 5/3/10 4:36 PM, Dave Townsend wrote:
>
>> The question is what could this be. Anybody got any ideas?
>
> Maybe look at file io (DTrace, dtruss/iosnoop) to see if anything
> interesting differs between the fast and slow runs?

So an interesting thing I totally forgot to mention in the original post
is that it seems that the more logging I enable the more likely it is
that each run is a high point. Basically once I turned on the timeline
logging the Ts runs became stable again, though longer than normal, but
probably a lot to do with it logging reams of data.

I also forgot to mention that I can't reproduce this on my machine, only
on the talos boxes so far.

> Does anything change in the profile between runs? Does it still happen
> if you tar up the profile after the first run, and untar it for each
> subsequent run?

The talos suite doesn't change anything between runs to my knowledge. I
can't easily test what you suggest since I can't reproduce it myself though.

> Presumably you can't reproduce this on a normal trunk build? Try backing
> out parts of your changes until you narrow down what's causing it? (If
> it's not the code itself, perhaps it's a side effect... EG .jar size
> blowing a cache, some CSS/image change, etc. Probably combined with
> something intermittent.)

I can try this if nothing else comes up, but the new manager is a
complete replacement for the old, it is kind of difficult to
incrementally switch between one and the other to see at what point it
changes.

Ehsan Akhgari

unread,

May 4, 2010, 12:22:00 AM5/4/10

to Dave Townsend, dev-pl...@lists.mozilla.org

On Mon, May 3, 2010 at 11:01 PM, Dave Townsend <dtow...@mozilla.com> wrote:
> On 5/3/10 19:46, Justin Dolske wrote:
>>
>> On 5/3/10 4:36 PM, Dave Townsend wrote:
>>
>>> The question is what could this be. Anybody got any ideas?
>>
>> Maybe look at file io (DTrace, dtruss/iosnoop) to see if anything
>> interesting differs between the fast and slow runs?
>
> So an interesting thing I totally forgot to mention in the original post is
> that it seems that the more logging I enable the more likely it is that each
> run is a high point. Basically once I turned on the timeline logging the Ts
> runs became stable again, though longer than normal, but probably a lot to
> do with it logging reams of data.

How did the startup timeline logs compare before and after your changes?

>> Presumably you can't reproduce this on a normal trunk build? Try backing
>> out parts of your changes until you narrow down what's causing it? (If
>> it's not the code itself, perhaps it's a side effect... EG .jar size
>> blowing a cache, some CSS/image change, etc. Probably combined with
>> something intermittent.)
>
> I can try this if nothing else comes up, but the new manager is a complete
> replacement for the old, it is kind of difficult to incrementally switch
> between one and the other to see at what point it changes.

It would be interesting to see how the file sizes change before and
after your patch.

--
Ehsan
<http://ehsanakhgari.org/>

Dave Townsend

unread,

May 4, 2010, 12:34:31 AM5/4/10

to

On 5/3/10 21:22, Ehsan Akhgari wrote:
> On Mon, May 3, 2010 at 11:01 PM, Dave Townsend<dtow...@mozilla.com> wrote:
>> On 5/3/10 19:46, Justin Dolske wrote:
>>>
>>> On 5/3/10 4:36 PM, Dave Townsend wrote:
>>>
>>>> The question is what could this be. Anybody got any ideas?
>>>
>>> Maybe look at file io (DTrace, dtruss/iosnoop) to see if anything
>>> interesting differs between the fast and slow runs?
>>
>> So an interesting thing I totally forgot to mention in the original post is
>> that it seems that the more logging I enable the more likely it is that each
>> run is a high point. Basically once I turned on the timeline logging the Ts
>> runs became stable again, though longer than normal, but probably a lot to
>> do with it logging reams of data.
>
> How did the startup timeline logs compare before and after your changes?

With the minimal logging from trunk on, the timelines are identical, but
that really only tracks up until the main event loop starts. With more
of the logging transplanted onto the branch the Ts results just became
near identical everytime so I didn't see any need to try to see where
there were differences, the only one time I saw a different result there
was just about a 300ms gap in the timeline log.

>>> Presumably you can't reproduce this on a normal trunk build? Try backing
>>> out parts of your changes until you narrow down what's causing it? (If
>>> it's not the code itself, perhaps it's a side effect... EG .jar size
>>> blowing a cache, some CSS/image change, etc. Probably combined with
>>> something intermittent.)
>>
>> I can try this if nothing else comes up, but the new manager is a complete
>> replacement for the old, it is kind of difficult to incrementally switch
>> between one and the other to see at what point it changes.
>
> It would be interesting to see how the file sizes change before and
> after your patch.

Which files? The add-ons manager files are basically different files
entirely.

Ehsan Akhgari

unread,

May 4, 2010, 12:40:48 AM5/4/10

to Dave Townsend, dev-pl...@lists.mozilla.org

On Tue, May 4, 2010 at 12:34 AM, Dave Townsend <dtow...@mozilla.com> wrote:
> On 5/3/10 21:22, Ehsan Akhgari wrote:
>>
>> On Mon, May 3, 2010 at 11:01 PM, Dave Townsend<dtow...@mozilla.com>
>> wrote:
>>>
>>> On 5/3/10 19:46, Justin Dolske wrote:
>>>>
>>>> On 5/3/10 4:36 PM, Dave Townsend wrote:
>>>>
>>>>> The question is what could this be. Anybody got any ideas?
>>>>
>>>> Maybe look at file io (DTrace, dtruss/iosnoop) to see if anything
>>>> interesting differs between the fast and slow runs?
>>>
>>> So an interesting thing I totally forgot to mention in the original post
>>> is
>>> that it seems that the more logging I enable the more likely it is that
>>> each
>>> run is a high point. Basically once I turned on the timeline logging the
>>> Ts
>>> runs became stable again, though longer than normal, but probably a lot
>>> to
>>> do with it logging reams of data.
>>
>> How did the startup timeline logs compare before and after your changes?
>
> With the minimal logging from trunk on, the timelines are identical, but
> that really only tracks up until the main event loop starts.

What we usually do is try to manually enter a URL into the location
bar and press enter and wait for the page to load, then we find the
entered url inside the log file so that we can determine at which
point the browser was actually usable.

> With more of
> the logging transplanted onto the branch the Ts results just became near
> identical everytime so I didn't see any need to try to see where there were
> differences,

But the interesting thing is to compare the logs between an
instrumented build with the old addons manager and one with the new
addons manager. You could even add more instrumentation specific to
you so that you can determine if there's a hotpath in your code during
startup.

> the only one time I saw a different result there was just about
> a 300ms gap in the timeline log.

Where was this gap? If it was before the OnStartRequest call for
hiddenWindow.xul, it might be possible that we're doing too much I/O
in loading toolkit.jar.

>>>> Presumably you can't reproduce this on a normal trunk build? Try backing
>>>> out parts of your changes until you narrow down what's causing it? (If
>>>> it's not the code itself, perhaps it's a side effect... EG .jar size
>>>> blowing a cache, some CSS/image change, etc. Probably combined with
>>>> something intermittent.)
>>>
>>> I can try this if nothing else comes up, but the new manager is a
>>> complete
>>> replacement for the old, it is kind of difficult to incrementally switch
>>> between one and the other to see at what point it changes.
>>
>> It would be interesting to see how the file sizes change before and
>> after your patch.
>
> Which files? The add-ons manager files are basically different files
> entirely.

I meant the binary and jar files inside your objdir/dist after an
optimized build.

--
Ehsan
<http://ehsanakhgari.org/>

Dave Townsend

unread,

May 4, 2010, 7:17:26 PM5/4/10

to

Here are the file differences between two builds with the old and new
add-ons manager (with some of the timeline branch in them).

-68293304 XUL
+68165916 XUL

-83815 greprefs.js
+83796 greprefs.js

-25012 removed-files
+25045 removed-files

-2286080 browser.jar
+2284539 browser.jar

-471641 en-US.jar
+463248 en-US.jar

-2600550 toolkit.jar
+2533029 toolkit.jar

-366655 browser.xpt
+363996 browser.xpt

-40191 fuelApplication.js
+38849 fuelApplication.js

-11718 nsAddonRepository.js
+11881 nsAddonRepository.js

-34389 nsBlocklistService.js
+35178 nsBlocklistService.js

-93253 nsUpdateService.js
+93386 nsUpdateService.js

-40503 firefox.js
+39089 firefox.js

-11602 LightweightThemeManager.jsm
+21015 LightweightThemeManager.jsm

-309757 nsExtensionManager.js

+5049 addonManager.js
+2107 amContentHandler.js
+5598 amWebInstallListener.js
+2890 AddonLogging.jsm
+28569 AddonManager.jsm
+21061 AddonUpdateChecker.jsm
+6980 PluginProvider.jsm
+164296 XPIProvider.jsm

Ehsan Akhgari

unread,

May 4, 2010, 8:21:50 PM5/4/10

to Dave Townsend, dev-pl...@lists.mozilla.org

How much time do we spend in loading the new components/modules? How
much time did we use to spend in loading nsExtensionManager.js?

--
Ehsan
<http://ehsanakhgari.org/>

Justin Dolske

unread,

May 4, 2010, 8:55:44 PM5/4/10

to

On 5/3/10 8:01 PM, Dave Townsend wrote:

>> Presumably you can't reproduce this on a normal trunk build? Try backing
>> out parts of your changes until you narrow down what's causing it?
>

> I can try this if nothing else comes up, but the new manager is a
> complete replacement for the old, it is kind of difficult to
> incrementally switch between one and the other to see at what point it
> changes.

Sorry, I should have been clearer. I wasn't referring to
piecewise-reverting to the old manager... Rather, start excising chunks
of the new code until you get perf runs that don't bounce between high
and low values. Stub things out, remove new files, etc. Since you said
the regression happens even with the EM disabled, these don't have to be
functional reverts, just enough to make Ts run to completion.

Justin

Boris Zbarsky

unread,

May 4, 2010, 9:05:47 PM5/4/10

to

Which just got reopened because apparently the rev3 talos boxes don't
have the hardware bit to make it not happen.

-Boris

Joe Drew

unread,

May 4, 2010, 9:49:35 PM5/4/10

to Boris Zbarsky, dev-pl...@lists.mozilla.org

That... is good news.

(!!!!!!!!!!!!!!!!)

Joe

PS: Mossop check it back in omg