Profiling nightlies on Mac - what tools are used?

117 views
Skip to first unread message

Chris Cooper

unread,
Jun 19, 2017, 6:04:00 PM6/19/17
to
Hey all,

The build peers are looking to change the way that nightlies are created on Mac as we switch to cross-compilation. Specifically, we're looking at stripping the nightlies to avoid an as-of-yet undiagnosed performance discrepancy vs native builds[1], but also to make the nightly configuration match what we ship on beta/release (stripped).

Of course, stripping removes the symbols, and while we believe we have a solution for re-acquiring symbols that works for the Gecko Profiler, we realize
that people out there may be using other profiling tools.

If you profile on Mac, now is your chance to speak up. What other profiling tools do you use that we should be aware of?

cheers,
--
coop

1. https://bugzilla.mozilla.org/show_bug.cgi?id=1338651

Bobby Holley

unread,
Jun 19, 2017, 6:08:15 PM6/19/17
to Chris Cooper, dev-pl...@lists.mozilla.org
Instruments is the big one that I'm aware of.
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Kearwood Kip Gilbert

unread,
Jun 19, 2017, 6:40:13 PM6/19/17
to Bobby Holley, Chris Cooper, dev-pl...@lists.mozilla.org
I would add to this Apple’s “OpenGL Profiler”:

https://developer.apple.com/library/content/technotes/tn2178

Cheers,
- Kip

From: Bobby Holley
Sent: June 19, 2017 3:08 PM
To: Chris Cooper
Cc: dev-pl...@lists.mozilla.org
Subject: Re: Profiling nightlies on Mac - what tools are used?

Instruments is the big one that I'm aware of.

On Mon, Jun 19, 2017 at 3:03 PM, Chris Cooper <co...@mozilla.com> wrote:

Felipe G

unread,
Jun 19, 2017, 6:46:53 PM6/19/17
to Chris Cooper, dev-pl...@lists.mozilla.org
The Activity Monitor has a built-in process sampling tool that is very
handy and I use it every now and then.

On Mon, Jun 19, 2017 at 7:40 PM, Kearwood Kip Gilbert <kgil...@mozilla.com>
wrote:

> I would add to this Apple’s “OpenGL Profiler”:
>
> https://developer.apple.com/library/content/technotes/tn2178
>
> Cheers,
> - Kip
>
> From: Bobby Holley
> Sent: June 19, 2017 3:08 PM
> To: Chris Cooper
> Cc: dev-pl...@lists.mozilla.org
> Subject: Re: Profiling nightlies on Mac - what tools are used?
>
> Instruments is the big one that I'm aware of.
>
> On Mon, Jun 19, 2017 at 3:03 PM, Chris Cooper <co...@mozilla.com> wrote:
>

Eric Rahm

unread,
Jun 19, 2017, 7:40:08 PM6/19/17
to Chris Cooper, dev-platform
DMD [1], although it's a bit busted on mac right now [2] I'd prefer if it
didn't get more busted :)

-e

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1371397

On Mon, Jun 19, 2017 at 3:03 PM, Chris Cooper <co...@mozilla.com> wrote:

Jeff Muizelaar

unread,
Jun 19, 2017, 10:06:53 PM6/19/17
to Bobby Holley, dev-pl...@lists.mozilla.org, Chris Cooper
Yes. I use Instruments on Nightly builds extensively. It would really
be a loss to lose this functionality. I think it's important to weigh
the performance improvements that we get from easy profiling against
any advantage we get from stripping the symbols.

-Jeff

On Mon, Jun 19, 2017 at 6:07 PM, Bobby Holley <bobby...@gmail.com> wrote:
> Instruments is the big one that I'm aware of.
>
> On Mon, Jun 19, 2017 at 3:03 PM, Chris Cooper <co...@mozilla.com> wrote:
>

Boris Zbarsky

unread,
Jun 19, 2017, 10:21:19 PM6/19/17
to
On 6/19/17 6:03 PM, Chris Cooper wrote:
> If you profile on Mac, now is your chance to speak up. What other profiling tools do you use that we should be aware of?

Instruments for targeted profiling, though I mostly do that on my own
builds, not mozilla.org nightlies. The sampling tool in Activity
Monitor for opportunistic "my browser just hung" profiling of my nightly.

The Gecko profiler doesn't work for the "my browser just hung" case,
obviously, and is so far a much worse user experience than Instruments
for normal profiling. I end up using it mostly for "something was just
slow, what was it?" profiling, mostly.

The other thing stripping would affect is debugging. And yes, I've had
to attach a debugger to a nightly a few times to figure out something
when I ran into a problem while browsing. The symbols we have are not
great for debugging, but they're a lot better than nothing.

-Boris

Gregory Szorc

unread,
Jun 19, 2017, 11:23:51 PM6/19/17
to Jeff Muizelaar, dev-pl...@lists.mozilla.org, Bobby Holley, Chris Cooper
On Mon, Jun 19, 2017 at 7:06 PM, Jeff Muizelaar <jmuiz...@mozilla.com>
wrote:

> Yes. I use Instruments on Nightly builds extensively. It would really
> be a loss to lose this functionality. I think it's important to weigh
> the performance improvements that we get from easy profiling against
> any advantage we get from stripping the symbols.
>

Instruments supports configuring where the symbols live. So that's a
solvable problem (although it /may/ require a one-time manual config to set
up - not sure if it can be automated).

I'm not sure if Activity Monitor knows how to obtain symbols from a custom
source. So there's a chance an outcome is "sorry, you have to use
Instruments if you want symbols."

The decision to strip Nightly builds does not come lightly. Read 1338651
comment 111 and later for the ugly backstory.


>
> On Mon, Jun 19, 2017 at 6:07 PM, Bobby Holley <bobby...@gmail.com>
> wrote:
> > Instruments is the big one that I'm aware of.
> >
> > On Mon, Jun 19, 2017 at 3:03 PM, Chris Cooper <co...@mozilla.com> wrote:
> >

Boris Zbarsky

unread,
Jun 19, 2017, 11:58:48 PM6/19/17
to
On 6/19/17 11:22 PM, Gregory Szorc wrote:
> The decision to strip Nightly builds does not come lightly. Read 1338651
> comment 111 and later for the ugly backstory.

It's still really confusing to me that not stripping symbols has a
significant performance impact. That's not the case in any other build
configuration I'm aware of, and is somewhat surprising from first
principles for everything except startup performance.

It really would be good to figure out what's actually going on there...

-Boris

P.S. Yes, I'd like to have my symbols cake and eat my performance cake
too. ;)

Julian Seward

unread,
Jun 20, 2017, 4:00:38 AM6/20/17
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On 20/06/17 05:58, Boris Zbarsky wrote:
> On 6/19/17 11:22 PM, Gregory Szorc wrote:
>> The decision to strip Nightly builds does not come lightly. Read 1338651
>> comment 111 and later for the ugly backstory.
>
> It's still really confusing to me that not stripping symbols has a significant
> performance impact. That's not the case in any other build configuration I'm
> aware of, and is somewhat surprising from first principles for everything
> except startup performance.
>
> It really would be good to figure out what's actually going on there...

I agree. Stripping the symbols as a solution makes no sense to me, given
that they are not expected to be loaded into the process image.

>From my scan of 1338651 it appears that we've demonstrated that the same
preprocessed source is compiled in both cases. But IIUC (and correct me if
I'm wrong), we haven't shown that either the same code is generated, nor
that there is not some different interaction with the underlying machine
for the two builds.

One thing we could do is profile both with VTune, to look at both

* mispredicts, cache misses (D and I), alignment stalls, whatever .. in
short, anything that affects IPC.

* average instruction counts for some "known" hot functions, which we can
reliably compare between builds. That might show cases where the
compiler didn't generate identical code.

I've used VTune on Linux and have some idea what it can and can't do.
I have tried it on Mac, but my impression, from the Intel web site, is
that it is at least available for Mac.

J

Ted Mielczarek

unread,
Jun 20, 2017, 6:34:23 AM6/20/17
to dev-pl...@lists.mozilla.org
On Tue, Jun 20, 2017, at 03:59 AM, Julian Seward wrote:
> I've used VTune on Linux and have some idea what it can and can't do.
> I have tried it on Mac, but my impression, from the Intel web site, is
> that it is at least available for Mac.

Apparently the version for Mac is just a GUI for viewing results, per[1]
"An optional OS X host interface can be downloaded separately to analyze
data collected on other targets. An OS X collector to profile on OS X is
not currently available."

-Ted

1. https://software.intel.com/en-us/intel-vtune-amplifier-xe

Nathan Froyd

unread,
Jun 20, 2017, 8:34:31 AM6/20/17
to Julian Seward, Boris Zbarsky, dev-platform
On Tue, Jun 20, 2017 at 3:59 AM, Julian Seward <jse...@acm.org> wrote:
> On 20/06/17 05:58, Boris Zbarsky wrote:
>> On 6/19/17 11:22 PM, Gregory Szorc wrote:
>>> The decision to strip Nightly builds does not come lightly. Read 1338651
>>> comment 111 and later for the ugly backstory.
>>
>> It's still really confusing to me that not stripping symbols has a significant
>> performance impact. That's not the case in any other build configuration I'm
>> aware of, and is somewhat surprising from first principles for everything
>> except startup performance.
>>
>> It really would be good to figure out what's actually going on there...
>
> I agree. Stripping the symbols as a solution makes no sense to me, given
> that they are not expected to be loaded into the process image.
>
> From my scan of 1338651 it appears that we've demonstrated that the same
> preprocessed source is compiled in both cases. But IIUC (and correct me if
> I'm wrong), we haven't shown that either the same code is generated, nor
> that there is not some different interaction with the underlying machine
> for the two builds.

We have demonstrated that the command lines for linking are basically
identical; there are of course differences in paths. The native Mac
build was passing a static libc++ archive for linking on the command
line, but we showed that didn't matter by passing the same archive in
the cross-compiled case, which produced no change.

We have looked at the underlying machine code. It is functionally
identical; jump tables are tagged as data-in-code in one, and there
are some small offset differences in jump instructions (which are due
to slightly different offsets in the binaries themselves), but nothing
else.

We have looked at the binaries themselves (e.g. sections and so
forth). They are functionally identical; there are some small
differences between them which I think amount to path differences
being baked into the binary.

The native builds are codesigned while the cross ones are not. This
too makes no difference.

There is some kind of interaction with the underlying machine (see
comment 104 in said bug, where the binaries perform identically on a
local machine, but differently on infrastructure), but we haven't
tracked that down yet.

Your theories are most welcome at this point. :)

-Nathan

Ehsan Akhgari

unread,
Jun 20, 2017, 12:19:38 PM6/20/17
to Nathan Froyd, Julian Seward, Boris Zbarsky, dev-platform
On 06/20/2017 08:34 AM, Nathan Froyd wrote:
> There is some kind of interaction with the underlying machine (see
> comment 104 in said bug, where the binaries perform identically on a
> local machine, but differently on infrastructure), but we haven't
> tracked that down yet.
From comment 104 it seems that it is possible to reproduce the slowdown
from the unstripped cross builds locally. Has anyone profiled one of
these builds comparing them to an unstripped non-cross build to see
where the additional time is being spent? I couldn't tell from the bug
if this investigation has happened.

Nathan Froyd

unread,
Jun 20, 2017, 12:28:17 PM6/20/17
to Ehsan Akhgari, Julian Seward, dev-platform, Boris Zbarsky
My understanding is that the slowdown cannot be reproduced on local
developer machines, but can be reproduced on loaner machines from
infra. I don't think anybody has tried profiling on infra to see
where time differences are.

-Nathan

Ehsan Akhgari

unread,
Jun 20, 2017, 1:28:50 PM6/20/17
to Nathan Froyd, Julian Seward, dev-platform, Boris Zbarsky
On 06/20/2017 12:28 PM, Nathan Froyd wrote:
> On Tue, Jun 20, 2017 at 12:19 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> On 06/20/2017 08:34 AM, Nathan Froyd wrote:
>>> There is some kind of interaction with the underlying machine (see
>>> comment 104 in said bug, where the binaries perform identically on a
>>> local machine, but differently on infrastructure), but we haven't
>>> tracked that down yet.
>> From comment 104 it seems that it is possible to reproduce the slowdown from
>> the unstripped cross builds locally. Has anyone profiled one of these
>> builds comparing them to an unstripped non-cross build to see where the
>> additional time is being spent? I couldn't tell from the bug if this
>> investigation has happened.
> My understanding is that the slowdown cannot be reproduced on local
> developer machines, but can be reproduced on loaner machines from
> infra.
Huh. That's interesting and even more puzzling...
> I don't think anybody has tried profiling on infra to see
> where time differences are.
That seems like the obvious next step to investigate to me. We should
*really* only talk about stripping builds as the last resort IMO, since
we have way too many developers using OSX every day...

Chris Peterson

unread,
Jun 20, 2017, 2:09:39 PM6/20/17
to
On 6/20/17 10:28 AM, Ehsan Akhgari wrote:
> That seems like the obvious next step to investigate to me. We should
> *really* only talk about stripping builds as the last resort IMO, since
> we have way too many developers using OSX every day...

Does profiling an unstripped Mac build still produce useful results if
the unstripped builds are slower than the stripped builds we ship to users?

Jeff Muizelaar

unread,
Jun 20, 2017, 3:12:54 PM6/20/17
to Chris Peterson, Mozilla
Very much so yes. Even if having unstripped builds were universally
slower (they only seem to be only slower on the ci machines) any
performance impact is likely to not impact the distribution of samples
substantially.

Gregory Szorc

unread,
Jun 20, 2017, 4:43:21 PM6/20/17
to Ehsan Akhgari, Julian Seward, dev-platform, Nathan Froyd, Boris Zbarsky
On Tue, Jun 20, 2017 at 10:28 AM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:

> On 06/20/2017 12:28 PM, Nathan Froyd wrote:
>
>> On Tue, Jun 20, 2017 at 12:19 PM, Ehsan Akhgari <ehsan....@gmail.com>
>> wrote:
>>
>>> On 06/20/2017 08:34 AM, Nathan Froyd wrote:
>>>
>>>> There is some kind of interaction with the underlying machine (see
>>>> comment 104 in said bug, where the binaries perform identically on a
>>>> local machine, but differently on infrastructure), but we haven't
>>>> tracked that down yet.
>>>>
>>> From comment 104 it seems that it is possible to reproduce the slowdown
>>> from
>>> the unstripped cross builds locally. Has anyone profiled one of these
>>> builds comparing them to an unstripped non-cross build to see where the
>>> additional time is being spent? I couldn't tell from the bug if this
>>> investigation has happened.
>>>
>> My understanding is that the slowdown cannot be reproduced on local
>> developer machines, but can be reproduced on loaner machines from
>> infra.
>>
> Huh. That's interesting and even more puzzling...
>
>> I don't think anybody has tried profiling on infra to see
>> where time differences are.
>>
> That seems like the obvious next step to investigate to me. We should
> *really* only talk about stripping builds as the last resort IMO, since we
> have way too many developers using OSX every day...
>

I would argue it is in our best interest to have as little divergence
between Firefox release channels as possible. Today, Nightly is distributed
with symbols and Beta/Release/ESR all ship without symbols. There are other
variations between the build configurations as well. Every variation
between channels increases the risk of introducing bugs or other undesired
behavior and that said behavior won't be detected until weeks after it has
landed on central (we can run CI for these variations, sure, but that's no
substitute for a user base). It should not be a controversial statement to
say that variation between build configurations / channels is not ideal.

In the case of debug symbols, we've historically made the trade-off that
symbols are useful to developers using Nightly, they don't appear to have a
significant downside (other than package/download size), so why not ship
them. We've accepted the risk of this variation in the Nightly channel in
return for not inconveniencing developers. What's happening now is that we
have a new problem caused by debug symbols for cross builds and we are
re-assessing whether the trade-off to ship debug symbols on Nightly is
still justified. If it is, then there may be considerable work to preserve
the status quo. I understand stripping Nightly would be inconvenient and I
personally don't want to do it for this reason. But the flip side is we
converge the configurations for Nightly and Beta, which I argue is a good
thing.

FWIW, I'd like to point out that Chrome Canary (Nightly equivalent) doesn't
ship debug symbols for MacOS (assuming my methodology of running `nm
--debug-syms` is correct). While this is quite possibly due to Chrome
having closed source components, their developers have obviously found a
way to work without debug symbols on shipped builds. I'd like to think that
if Chrome has figured out how to make it work, we can too.

Chris Cooper

unread,
Jun 21, 2017, 6:19:41 AM6/21/17
to
On Tuesday, June 20, 2017 at 1:28:50 PM UTC-4, Ehsan Akhgari wrote:
> > My understanding is that the slowdown cannot be reproduced on local
> > developer machines, but can be reproduced on loaner machines from
> > infra.
> Huh. That's interesting and even more puzzling...
> > I don't think anybody has tried profiling on infra to see
> > where time differences are.
> That seems like the obvious next step to investigate to me. We should
> *really* only talk about stripping builds as the last resort IMO, since
> we have way too many developers using OSX every day...

One possible avenue we're considering is shipping the unstripped cross-compiled builds to the nightly audience (or some subset) and seeing if the performance issue manifests in the wild. We're confident in our ability to repatriate these users via nightly updates.

Ehsan Akhgari

unread,
Jun 21, 2017, 10:10:20 AM6/21/17
to Gregory Szorc, Julian Seward, dev-platform, Nathan Froyd, Boris Zbarsky
On 06/20/2017 04:42 PM, Gregory Szorc wrote:
> On Tue, Jun 20, 2017 at 10:28 AM, Ehsan Akhgari
> <ehsan....@gmail.com <mailto:ehsan....@gmail.com>> wrote:
>
> On 06/20/2017 12:28 PM, Nathan Froyd wrote:
>
> On Tue, Jun 20, 2017 at 12:19 PM, Ehsan Akhgari
> <ehsan....@gmail.com <mailto:ehsan....@gmail.com>> wrote:
>
> On 06/20/2017 08:34 AM, Nathan Froyd wrote:
>
> There is some kind of interaction with the underlying
> machine (see
> comment 104 in said bug, where the binaries perform
> identically on a
> local machine, but differently on infrastructure), but
> we haven't
> tracked that down yet.
>
> From comment 104 it seems that it is possible to
> reproduce the slowdown from
> the unstripped cross builds locally. Has anyone profiled
> one of these
> builds comparing them to an unstripped non-cross build to
> see where the
> additional time is being spent? I couldn't tell from the
> bug if this
> investigation has happened.
>
> My understanding is that the slowdown cannot be reproduced on
> local
> developer machines, but can be reproduced on loaner machines from
> infra.
>
> Huh. That's interesting and even more puzzling...
>
> I don't think anybody has tried profiling on infra to see
> where time differences are.
>
> That seems like the obvious next step to investigate to me. We
> should *really* only talk about stripping builds as the last
> resort IMO, since we have way too many developers using OSX every
> day...
>
>
> I would argue it is in our best interest to have as little divergence
> between Firefox release channels as possible.
I don't think that's what this debate is about though, we should think
about it in the form of keeping Nigthly on OSX where the majority of
developers are (for the better or worse, and myself excluded, since I
recently switched away from it!) as useful for them as it currently is,
which means keeping it profilable with Instruments, Activity Monitor and
other less popular tools, debuggable with lldb, etc. out of the box.

I'll also note that as far as performance testing differences between
Nightly vs other channels go, we have all sorts of extra checking macros
that turn themselves on for Nightly only and turn themselves back off
for Beta and onwards, so when profiling anything on Nigthly you always
know that Beta and Release in general will be a bit faster, and when
discovering issues sometimes you will see that the root cause will go
away once a channel switch happens. So even by stripping symbols from
Nightly it still wouldn't be very close to Beta and Release for the
purpose of performance testing, and that's a fact of life we'll have to
live with. :-)

Cheers,
Ehsan

Ehsan Akhgari

unread,
Jun 21, 2017, 10:44:54 AM6/21/17
to Chris Cooper, dev-pl...@lists.mozilla.org
It seems like that we have an answer now in the bug!
https://bugzilla.mozilla.org/show_bug.cgi?id=1338651#c129

FWIW these questions aren't really all that easy to answer on telemetry,
especially when you have ways of reproducing the issue locally and you
don't necessarily know which probe if any is expected to show a change
on telemetry when this hits real users. In practice, deciding to ship
the unstripped build to Nightly and seeing if a change would be observed
in the wild wouldn't have been all that different from deciding to take
the regression in practice. :-)

Cheers,
Ehsan

Boris Zbarsky

unread,
Jun 21, 2017, 11:07:04 AM6/21/17
to
On 6/21/17 10:44 AM, Ehsan Akhgari wrote:
> It seems like that we have an answer now in the bug!
> https://bugzilla.mozilla.org/show_bug.cgi?id=1338651#c129

Just for clarity, so people don't have to read the whole bug, changing
the _path_ the build is at when it's compiled/linked results in the huge
observed performance difference. At least if I understand the comments
in the bug correctly.

-Boris

Chris Peterson

unread,
Jun 21, 2017, 12:56:10 PM6/21/17
to
i.e. Mike Shal's patch here fixed multiple 30% Talos regressions!

-: WORKSPACE ${WORKSPACE:=/home/worker/workspace}
+: WORKSPACE ${WORKSPACE:=/builds/slave/try-m64-0000000000000000000000}

Steve Fink

unread,
Jun 21, 2017, 1:20:56 PM6/21/17
to dev-pl...@lists.mozilla.org
Clearly we should try adding more zeroes!

Slightly more seriously, 20-30% is a rather big deal. If this *is* a
result of changing section offsets or whatever it was that glandium was
saying, it seems worth a look to see if we can squeeze any more of this
magic speedup sauce out of it by reordering things (or... something?).

At the very least, it would be good to detect when we're going back to
doing the bad thing, whatever that was, so that we don't have to infer
it from a Talos regression (that might not be quite as clear-cut next time.)

Mike Hommey

unread,
Jun 23, 2017, 7:10:19 AM6/23/17
to Boris Zbarsky, dev-pl...@lists.mozilla.org
You can actually download the full symbols (and that won't change). Use
https://hg.mozilla.org/users/jwatt_jwatt.org/fetch-symbols/raw-file/27a61dd0bcab/fetch-symbols.py
to get them.

Mike

Chris Cooper

unread,
Jun 23, 2017, 10:13:51 AM6/23/17
to
On Wednesday, June 21, 2017 at 12:56:10 PM UTC-4, Chris Peterson wrote:
> i.e. Mike Shal's patch here fixed multiple 30% Talos regressions!
>
> -: WORKSPACE ${WORKSPACE:=/home/worker/workspace}
> +: WORKSPACE ${WORKSPACE:=/builds/slave/try-m64-0000000000000000000000}

We're still working to get that change into the task and/or TaskCluster worker, but yes, it's now an implementation detail. :)

To close the loop, that also means we won't need to strip the Mac nightlies by default. If we choose to start stripping nightlies to bring them in line with beta/release, that can be a separate discussion.

cheers,
--
coop
Reply all
Reply to author
Forward
0 new messages