Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Proposal: Replace codesighs with something much simpler (sum of binary sizes in staged package)

42 views
Skip to first unread message

Ted Mielczarek

unread,
May 26, 2011, 11:45:02 AM5/26/11
to dev-platform
I filed a bug[1] on this, I figured I would post it here as well. We
use a tool called "codesighs"[2] to attempt to measure code size
numbers. codesighs currently has a lot of issues. We don't have an
implementation for Windows and the Linux implementation seems to have
huge variance[3]. The codesighs implementation is a bunch of Perl
scripts to scrape the output of readelf or nm, plus some C(!) programs
to parse the textual output of the Perl scripts(!!). It has a lot of
neat features that we don't actually use and I bet nobody has looked
at 99% of the data it's producing anyway, since we're basically
ignoring the codesighs graph as it stands.

While investigating this, I realized that the numbers it's producing
probably aren't what we care to measure anyway. It's summing up the
total size of function + data symbols from all the binaries in
dist/bin, which includes a bunch of crap we don't ship (like tests and
utilities).

glandium proposed that we instead just sum up the on-disk file sizes
of all binaries in the package staging directory after make package
has been run (so that we have stripped binaries). This should be a
much closer approximation of a codesize number that makes sense.

The alternatives are either to have someone spend a lot of time
digging through the codesighs Perl + C code trying to figure out what
the bugs are, implementing it for Windows, etc, or just to shut it off
completely, since it's not producing useful data at the moment.

One downside is that we would lose the per-function diff reports that
codesighs produces. This data probably isn't super useful with
function inlining etc, and our implementation on tinderbox is pretty
gross right now, since builds run in parallel, so they're just
comparing vs. some arbitrary older build right now anyway. We should
just drop all of that since there's no sane way to do it in our modern
world of multiple parallel builders. We should just produce a single
data point per build.

-Ted

1. https://bugzilla.mozilla.org/show_bug.cgi?id=659950
2. http://mxr.mozilla.org/mozilla-central/source/tools/codesighs/
3. See http://graphs.mozilla.org/graph.html#tests=[[31,1,597]]&sel=1305779877,1306358108
, for example

Boris Zbarsky

unread,
May 26, 2011, 12:11:37 PM5/26/11
to
On 5/26/11 11:45 AM, Ted Mielczarek wrote:
> While investigating this, I realized that the numbers it's producing
> probably aren't what we care to measure anyway.

Well, the first question is what we're trying to measure.

Originally, the codesighs stuff was trying to measure the size of the
data that would actually have to be read off disk during startup. Note
that this is different from the on-disk binary size due to debugging
symbol sections and whatnot.

Is this still a quantity we care about measuring at all? Or are our new
better startup time tests a better idea than this proxy for them?

I'm starting to think we should just retire codesighs, period. And if
we care about download size, file a bug on reporting _that_.

-Boris

Mike Hommey

unread,
May 26, 2011, 12:54:13 PM5/26/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On Thu, May 26, 2011 at 12:11:37PM -0400, Boris Zbarsky wrote:
> On 5/26/11 11:45 AM, Ted Mielczarek wrote:
> >While investigating this, I realized that the numbers it's producing
> >probably aren't what we care to measure anyway.
>
> Well, the first question is what we're trying to measure.
>
> Originally, the codesighs stuff was trying to measure the size of
> the data that would actually have to be read off disk during
> startup. Note that this is different from the on-disk binary size
> due to debugging symbol sections and whatnot.
>
> Is this still a quantity we care about measuring at all? Or are our
> new better startup time tests a better idea than this proxy for
> them?

This is a quantity that is interesting to measure, but that codesigh
actually doesn't measure. While it's true that we currently load more
than 90% of code+data during startup, we expect to get that much lower
in the future. Also note that codesighs also includes .bss size, which
is virtually free to access, but doesn't include e.g. relocations,
which are quite big and always accessed.

So while this is an interesting quantity to measure, we'd need a new
tool to do so. But I don't think it's interesting to have such a tool
until we actually start actively decreasing the amount of code we
load at startup.

> I'm starting to think we should just retire codesighs, period. And
> if we care about download size, file a bug on reporting _that_.

AOL.

Mike

Asa Dotzler

unread,
May 26, 2011, 1:17:48 PM5/26/11
to

I don't know the answer to this so it's a genuine question and not a
suggestion: does this matter to Firefox for devices more than Firefox
for desktops? When trying to figure out what it is that we want to
measure, should we make sure that the mobile team are heavily involved?

- A

Boris Zbarsky

unread,
May 26, 2011, 1:21:25 PM5/26/11
to
On 5/26/11 1:17 PM, Asa Dotzler wrote:
> I don't know the answer to this so it's a genuine question and not a
> suggestion: does this matter to Firefox for devices more than Firefox
> for desktops?

Possibly, yes. Depends on what we're measuring....

-Boris

Taras Glek

unread,
Jun 1, 2011, 2:35:44 PM6/1/11
to
On 05/26/2011 09:54 AM, Mike Hommey wrote:
> On Thu, May 26, 2011 at 12:11:37PM -0400, Boris Zbarsky wrote:
>> On 5/26/11 11:45 AM, Ted Mielczarek wrote:
>>> While investigating this, I realized that the numbers it's producing
>>> probably aren't what we care to measure anyway.
>>
>> Well, the first question is what we're trying to measure.
>>
>> Originally, the codesighs stuff was trying to measure the size of
>> the data that would actually have to be read off disk during
>> startup. Note that this is different from the on-disk binary size
>> due to debugging symbol sections and whatnot.
>>
>> Is this still a quantity we care about measuring at all? Or are our
>> new better startup time tests a better idea than this proxy for
>> them?
>
> This is a quantity that is interesting to measure, but that codesigh
> actually doesn't measure. While it's true that we currently load more
> than 90% of code+data during startup, we expect to get that much lower
> in the future. Also note that codesighs also includes .bss size, which
> is virtually free to access, but doesn't include e.g. relocations,
> which are quite big and always accessed.

In the near future we will preread the whole libxul binary(bug 632404),
so a rough file size will be precise enough :)

Since our startup IO is spread across many executable + data files, bug
609111 will be a much more realistic way to measure of how much data is
read on startup.

Taras

0 new messages