Faster gecko builds with IceCC on Mac and Linux

Michael Layzell

unread,

Jul 4, 2016, 3:09:35 PM7/4/16

to dev-platform

If you saw the platform lightning talk by Jeff and Ehsan in London, you
will know that in the Toronto office, we have set up a distributed compiler
called `icecc`, which allows us to perform a clobber build of
mozilla-central in around 3:45. After some work, we have managed to get it
so that macOS computers can also dispatch cross-compiled jobs to the
network, have streamlined the macOS install process, and have refined the
documentation some more.

If you are in the Toronto office, and running a macOS or Linux machine,
getting started using icecream is as easy as following the instructions on
the wiki:
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Using_Icecream

If you are in another office, then I suggest that your office starts an
icecream cluster! Simply choose one linux desktop in the office, run the
scheduler on it, and put its IP in the Wiki, then everyone can connect to
the network and get fast builds!

If you have questions, myself, BenWa, and jeff are probably the ones to
talk to.

Gijs Kruitbosch

unread,

Jul 4, 2016, 4:39:36 PM7/4/16

to

What about people not lucky enough to (regularly) work in an office,
including but not limited to our large number of volunteers? Do we
intend to set up something public for people to use?

~ Gijs

Ralph Giles

unread,

Jul 4, 2016, 4:47:26 PM7/4/16

to Gijs Kruitbosch, dev-platform

On Mon, Jul 4, 2016 at 1:39 PM, Gijs Kruitbosch
<gijskru...@gmail.com> wrote:

> What about people not lucky enough to (regularly) work in an office,
> including but not limited to our large number of volunteers? Do we intend to
> set up something public for people to use?

By all accounts, the available distributed compilers aren't very good
at hiding latency. The build server needs to be on the local lan to
help much.

More generally, we have artifact builds for developers who don't need
the change C++ code, and there's some experiments happening to see if
the build can pull smaller pieces from the s3 build cache for those
who do.

-r

David Burns

unread,

Jul 4, 2016, 4:51:07 PM7/4/16

to Gijs Kruitbosch, dev-platform

Yes!

Part of the build project work that I regularly email this list[1] we have
it on our roadmap to have the same distributed cache that we use in
automation available for engineers who are working on C++ code. We have
completed our rewrite and will be putting the initial work through try over
the next fortnight to make sure we havent regressed anything. After that we
will be working towards making it available to engineers before the end of
Q3 (at least on one platform).

David

[1]
https://groups.google.com/forum/#!topicsearchin/mozilla.dev.platform/Build$20System$20Project$20

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Benoit Girard

unread,

Jul 4, 2016, 5:06:58 PM7/4/16

to Gijs Kruitbosch, dev-platform

This barely works in a office with 10MB/sec wireless uplink. Ideally you
want machines to be accessible on a gigabit LAN. It's more about bandwidth
throughput than latency AFAIK. i.e. can you *upload* dozens of 2-4MB
compressed pre-processed file faster than you compile it? I'd imagine
unless you can get reliable 50MB/sec upload throughput then you probably
wont benefit from connecting to a remote cluster.

However the good news is you can see a lot of benefits from having a
network of just one machine! In my case my Linux desktop can compile a mac
build faster than my top of the line 2013 macbook pro. and with a network
of 2 machines it's drastically faster. A cluster of 12 machines is nice,
but you're getting diminishing returns on that until the build system gets
better.

I'd imagine distributed object caching will have a similar bandwidth
problem, however users tend to have better download speeds than upload
speeds.

So to emphasize, if you compile a lot and only have one or two machines on
your 100mps or 1gbps LAN you'll still see big benefits.

On Mon, Jul 4, 2016 at 4:39 PM, Gijs Kruitbosch <gijskru...@gmail.com>
wrote:

Gijs Kruitbosch

unread,

Jul 4, 2016, 5:12:40 PM7/4/16

to

On 04/07/2016 22:06, Benoit Girard wrote:
> So to emphasize, if you compile a lot and only have one or two machines on your 100mps or 1gbps LAN you'll still see big benefits.

I don't understand how this benefits anyone with just one machine
(that's compatible...) - there's no other machines to delegate compile
tasks to (or to fetch prebuilt blobs from). Can you clarify? Do you just
mean "one extra machine"? Am I misunderstanding how this works?

~ Gijs

Michael Layzell

unread,

Jul 4, 2016, 5:34:26 PM7/4/16

to dev-platform

I'm pretty sure he means one extra machine. For example, if you have a
laptop and a desktop, just adding the desktop into the network at home will
still dramatically improve build times (I think).

On Mon, Jul 4, 2016 at 5:12 PM, Gijs Kruitbosch <gijskru...@gmail.com>

Benoit Girard

unread,

Jul 4, 2016, 5:36:24 PM7/4/16

to Gijs Kruitbosch, dev-platform

In my case I'm noticing an improvement with my mac distributing jobs to a
single Ubuntu machine but not compiling itself (Right now we don't support
distributing mac jobs to other mac, primarily because we just want to
maintain one homogeneous cluster).

On Mon, Jul 4, 2016 at 5:12 PM, Gijs Kruitbosch <gijskru...@gmail.com>

Xidorn Quan

unread,

Jul 4, 2016, 7:26:40 PM7/4/16

to dev-pl...@lists.mozilla.org

I hope it could support MSVC one day as well, and support distribute any
job to macOS machines as well.

In my case, I use Windows as my main development environment, and I have
a personally powerful enough MacBook Pro. (Actually I additionally have
a retired MBP which should still work.) And if it is possible to
distribute Windows builds to Linux machines, I would probably consider
purchasing another machine for Linux.

I would expect MSVC to be something not too hard to run with wine. When
I was in my university, I ran VC6 compiler on Linux to test my homework
without much effort. I guess the situation shouldn't be much worse with
VS2015. Creating the environment tarball may need some work, though.

- Xidorn

On Tue, Jul 5, 2016, at 07:36 AM, Benoit Girard wrote:
> In my case I'm noticing an improvement with my mac distributing jobs to a
> single Ubuntu machine but not compiling itself (Right now we don't
> support
> distributing mac jobs to other mac, primarily because we just want to
> maintain one homogeneous cluster).
>

> On Mon, Jul 4, 2016 at 5:12 PM, Gijs Kruitbosch

Michael Layzell

unread,

Jul 5, 2016, 10:07:42 AM7/5/16

to dev-platform

I'm certain it's possible to get a windows build working, the problem is
that:

a) We would need to modify the client to understand cl-style flags (I don't
think it does right now)
b) We would need to create the environment tarball
c) We would need to make sure everything runs on windows

None of those are insurmountable problems, but this has been a small side
project which hasn't taken too much of our time. The work to get MSVC
working is much more substantial than the work to get macOS and linux
working.

Getting it such that linux distributes to darwin machines, and darwin
distributes to darwin machines is much easier. It wasn't done by us because
distributing jobs to people's laptops seems kinda silly, especially because
they may have a wifi connection, and as far as I know, basically every mac
in this office is a macbook.

The darwin machines simply need to add an `icecc` user, to run the build
jobs in, and then darwin-compatible toolchains need to be distributed to
all building machines.

On Mon, Jul 4, 2016 at 7:26 PM, Xidorn Quan <m...@upsuper.org> wrote:

> I hope it could support MSVC one day as well, and support distribute any
> job to macOS machines as well.
>
> In my case, I use Windows as my main development environment, and I have
> a personally powerful enough MacBook Pro. (Actually I additionally have
> a retired MBP which should still work.) And if it is possible to
> distribute Windows builds to Linux machines, I would probably consider
> purchasing another machine for Linux.
>
> I would expect MSVC to be something not too hard to run with wine. When
> I was in my university, I ran VC6 compiler on Linux to test my homework
> without much effort. I guess the situation shouldn't be much worse with
> VS2015. Creating the environment tarball may need some work, though.
>
> - Xidorn
>
> On Tue, Jul 5, 2016, at 07:36 AM, Benoit Girard wrote:
> > In my case I'm noticing an improvement with my mac distributing jobs to a
> > single Ubuntu machine but not compiling itself (Right now we don't
> > support
> > distributing mac jobs to other mac, primarily because we just want to
> > maintain one homogeneous cluster).
> >

> > On Mon, Jul 4, 2016 at 5:12 PM, Gijs Kruitbosch

Gregory Szorc

unread,

Jul 5, 2016, 2:06:44 PM7/5/16

to Michael Layzell, dev-platform

On Tue, Jul 5, 2016 at 7:07 AM, Michael Layzell <mic...@thelayzells.com>
wrote:

> I'm certain it's possible to get a windows build working, the problem is
> that:
>
> a) We would need to modify the client to understand cl-style flags (I don't
> think it does right now)
> b) We would need to create the environment tarball
>

There is a script in-tree to create a self-contained archive containing
MSVC and the Windows SDK. Instructions at
https://gecko.readthedocs.io/en/latest/build/buildsystem/toolchains.html#windows.
You only need MozillaBuild and the resulting archive to build Firefox on a
fresh Windows install.

> > > On Mon, Jul 4, 2016 at 5:12 PM, Gijs Kruitbosch

Steve Fink

unread,

Jul 5, 2016, 2:08:51 PM7/5/16

to dev-pl...@lists.mozilla.org

I work remotely, normally from my laptop, and I have a single (fairly
slow) desktop usable as a compile server. (Which I normally leave off,
but when I'm doing a lot of compiling I'll turn it on. It's old and
power-hungry.)

I used distcc for a long time, but more recently have switched to icecream.

With distcc, the time to build standalone > time to build on the laptop
using distcc to use the compile server > time to build standalone
locally on the compile server. (So if I wanted the fastest builds, I'd
ditch the laptop and just do everything on the compile server.)

I haven't checked, but I would guess it's about the same story with icecc.

Both have given me numerous problems. distcc would fairly often get into
a state where it would spend far more time sending and receiving data
than it saved on compiling. I suspect it was some sort of
bufferbloat-type problem. I poked at it a little, setting queue sizes
and things, but never satisfactorily resolved it. I would just leave the
graphical distcc monitor open, and notice when things started to go south.

With icecream, it's much more common to get complete failure -- every
compile command starts returning weird icecc error messages, and the
build slows way down because everything has to fail the icecc attempt
before it falls back to building locally. I've tried digging into it on
multiple occasions, to no avail, and with some amount of restarting it
magically resolves itself.

At least mostly -- I still get an occasional failure message here and
there, but it retries the build locally so it doesn't mess anything up.

I've also attempted to use a machine in the MTV office as an additional
lower priority compile server, with fairly disastrous results. This was
with distcc and a much older version of the build system, but it ended
up slowing down the build substantially.

I've long thought it would be nice to have some magical integration
between some combination of a distributed compiler, mercurial, and
ccache. You'd kick off a build, and it would predict object files that
you'd be needing in the future and download them into your local cache.
Then when the build got to that part, it would already have that build
in its cache and use it. If the network transfer were too slow, the
build would just see a cache miss and rebuild it instead. (The optional
mercurial portion would be to accelerate knowing which files have and
have not changed, without needing to checksum them.)

All of that is just for gaining some use of remote infrastructure over a
high latency/low throughput network.

On a related note, I wonder how much of a gain it would be to compile to
separate debug info files, and then transfer them using a binary diff (a
la rsync against some older local version) and/or (crazytalk here)
transfer them in a post-build step that you don't necessarily have to
wait for before running the binary. Think of it as a remote symbol
server, locally cached and eagerly populated but in the background.

Gregory Szorc

unread,

Jul 5, 2016, 3:12:36 PM7/5/16

to Steve Fink, Lawrence Mandel, dev-platform

On Tue, Jul 5, 2016 at 11:08 AM, Steve Fink <sf...@mozilla.com> wrote:

> I work remotely, normally from my laptop, and I have a single (fairly
> slow) desktop usable as a compile server.
>

Gecko developers should have access to 8+ modern cores to compile Gecko.
Full stop. The cores can be local (from a home office), on a machine in a
data center you SSH or remote desktop into, or via a compiler farm (like
IceCC running in an office).

If you work from home full time, you should probably have a modern and
beefy desktop at home. I recommend 2x Xeon E5-2637v4 or E5-2643v4. Go with
the E5 v4's, as the v3's are already obsolete. If you go with the higher
core count Xeons, watch out for clock speed: parts of the build like
linking libxul are still bound by the speed of a single core and the Xeons
with higher core counts tend to drop off in CPU frequency pretty fast. That
means slower libxul links and slower builds.

Yes, dual socket Xeons will be expensive and more than you would pay for a
personal machine. But the cost is insignificant compared to your cost as an
employee paid to work on Gecko. So don't let the cost of something that
would allow you to do your job better discourage you from asking for
something! If you hit resistance buying a dual core Xeon machine, ping
Lawrence Mandel, as he possesses jars of developer productivity lubrication
that have the magic power of unblocking purchase requests.

Chris Pearce

unread,

Jul 5, 2016, 5:27:20 PM7/5/16

to

It would be cool if, once distributed compilation is reliable, if `./mach mercurial-setup` could 1. prompt you enable using the local network's infrastructure for compilation, and 2. prompt you to enable sharing your CPUs with the local network for compilation.

Distributing a Windows-friendly version inside the MozillaBuild package would be nice too.

Masatoshi Kimura

unread,

Jul 5, 2016, 5:34:35 PM7/5/16

to dev-pl...@lists.mozilla.org

Oh, my laptop has only 4 core and I won't buy a machine or a compiler
farm account only to develope Gecko because my machine works perfectly
for all my other puoposes.

This is not the first time you blame my poor hardware. Mozilla (you are
a Mozilla employee, aren't you?) does not want my contribution? Thank
you very much!

Ralph Giles

unread,

Jul 5, 2016, 5:37:42 PM7/5/16

to Gregory Szorc, Steve Fink, dev-platform, Lawrence Mandel

On Tue, Jul 5, 2016 at 12:12 PM, Gregory Szorc <g...@mozilla.com> wrote:

> I recommend 2x Xeon E5-2637v4 or E5-2643v4.

For comparison's sake, what kind of routine and clobber build times do
you see on a system like this? How much does the extra cache on Xeon
help vs something like a 4 GHz i7?

My desktop machine is five years old, but it's still faster than my
MacBook Pro, so I've never bothered upgrading beyond newer SSDs. If
there's a substantial improvement available in build times it would be
easier to justify new hardware.

A nop build on my desktop is 22s currently. Touching a cpp file (so
re-linking xul) is 46s. A clobber build is something like 17 minutes.

-r

Gregory Szorc

unread,

Jul 5, 2016, 5:45:38 PM7/5/16

to Masatoshi Kimura, dev-platform

On Tue, Jul 5, 2016 at 2:33 PM, Masatoshi Kimura <VYV0...@nifty.ne.jp>
wrote:

> Oh, my laptop has only 4 core and I won't buy a machine or a compiler
> farm account only to develope Gecko because my machine works perfectly
> for all my other puoposes.
>
> This is not the first time you blame my poor hardware. Mozilla (you are
> a Mozilla employee, aren't you?) does not want my contribution? Thank
> you very much!
>

My last comment was aimed mostly at Mozilla employees. We still support
building Firefox/Gecko on older machines. Of course, it takes longer unless
you have fast internet to access caches or a modern machine. That's the sad
reality of large software projects. Your contributions are welcome no
matter what machine you use. But having a faster machine should allow you
to contribute more/faster, which is why Mozilla (the company) wants its
employees to have fast machines.

FWIW, Mozilla has been known to send community contributors hardware so
they can have a better development experience. Send an email to
mh...@mozilla.com to inquire.

Gregory Szorc

unread,

Jul 5, 2016, 5:49:01 PM7/5/16

to Chris Pearce, dev-platform

On Tue, Jul 5, 2016 at 2:27 PM, Chris Pearce <cpe...@mozilla.com> wrote:

> It would be cool if, once distributed compilation is reliable, if `./mach
> mercurial-setup` could 1. prompt you enable using the local network's
> infrastructure for compilation, and 2. prompt you to enable sharing your
> CPUs with the local network for compilation.
>
>

We've already discussed this in build system meetings. There are a number
of optimizations around detection of your build environment that can be
made. Unfortunately I don't think we have any bugs on file yet.

> Distributing a Windows-friendly version inside the MozillaBuild package
> would be nice too.

Gregory Szorc

unread,

Jul 5, 2016, 6:36:29 PM7/5/16

to Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel, Gregory Szorc

On Tue, Jul 5, 2016 at 2:37 PM, Ralph Giles <gi...@mozilla.com> wrote:

> On Tue, Jul 5, 2016 at 12:12 PM, Gregory Szorc <g...@mozilla.com> wrote:
>

> > I recommend 2x Xeon E5-2637v4 or E5-2643v4.
>

> For comparison's sake, what kind of routine and clobber build times do
> you see on a system like this? How much does the extra cache on Xeon
> help vs something like a 4 GHz i7?
>
> My desktop machine is five years old, but it's still faster than my
> MacBook Pro, so I've never bothered upgrading beyond newer SSDs. If
> there's a substantial improvement available in build times it would be
> easier to justify new hardware.
>
> A nop build on my desktop is 22s currently. Touching a cpp file (so
> re-linking xul) is 46s. A clobber build is something like 17 minutes.
>

Let's put it this way: I've built on AWS c4.8xlarge instances (Xeon E5-2666
v3 with 36 vCPUs) and achieved clobber build times comparable to the best
numbers the Toronto office has reported with icecc (between 3.5 and 4
minutes). That's 36 vCPUs @ 2.9-3.2/3.5GHz (base vs turbo single/all cores
frequency).

I don't have access to a 2xE5-2643v4 machine, but I do have access to a 2 x
E5-2637v4 with 32 GB RAM and an SSD running CentOS 7 (Clang 3.4.2 + gold
linker):

* clobber (minus configure): 368s (6:08)
* `mach build` (no-op): 24s
* `mach build binaries` (no-op): 3.4s
* `mach build binaries` (touch network/dns/DNS.cpp): 14.1s

I'm pretty sure the clobber time would be a little faster with a newer
Clang (also, GCC is generally faster than Clang).

That's 8 physical cores + hyperthreading (16 reported CPUs) @ 3.5 GHz. A
2643v4 would be 12 physical cores @ 3.4 GHz. So 28 GHz vs 40.8 GHz. That
should at least translate to 90s clobber build time savings. So 4-4.5
minutes. Not too shabby. And I'm sure they make good space heaters too.

FWIW, my i7-6700K (4+4 cores @ 4.0 GHz) is currently taking ~840s (~14:00)
for clobber builds (with Ubuntu 16.04 and a different toolchain however).
Those extra cores (even at lower clock speeds) really do matter.

Ralph Giles

unread,

Jul 5, 2016, 6:58:11 PM7/5/16

to Gregory Szorc, Steve Fink, dev-platform, Lawrence Mandel

On Tue, Jul 5, 2016 at 3:36 PM, Gregory Szorc <g...@mozilla.com> wrote:

> * `mach build binaries` (touch network/dns/DNS.cpp): 14.1s

24s here. So faster link times and significantly faster clobber times. I'm sold!

Any motherboard recommendations? If we want developers to use machines
like this, maintaining a current config in ServiceNow would probably
help.

-r

Xidorn Quan

unread,

Jul 5, 2016, 7:06:20 PM7/5/16

to dev-pl...@lists.mozilla.org

On Wed, Jul 6, 2016, at 05:12 AM, Gregory Szorc wrote:
> On Tue, Jul 5, 2016 at 11:08 AM, Steve Fink <sf...@mozilla.com> wrote:
>
> > I work remotely, normally from my laptop, and I have a single (fairly
> > slow) desktop usable as a compile server.
>
> Gecko developers should have access to 8+ modern cores to compile Gecko.
> Full stop. The cores can be local (from a home office), on a machine in a
> data center you SSH or remote desktop into, or via a compiler farm (like
> IceCC running in an office).

I use my 4-core laptop for building as well... mainly because I found it
inconvenient to maintain development environment on multiple machines.
I've almost stopped writing patches in my personal MBP due to that.

That said, if I can distribute the build to other machines, I'll happily
buy a new desktop machine and use it as a compiler farm to boost the
build.

- Xidorn

Lawrence Mandel

unread,

Jul 5, 2016, 7:33:03 PM7/5/16

to Ralph Giles, Steve Fink, dev-platform, Gregory Szorc

Completely agree. You should not have to figure this out for yourself. We
should provide good recommendations in ServiceNow. I'm looking into
updating the ServiceNow listings with gps.

Lawrence

Gregory Szorc

unread,

Jul 5, 2016, 7:42:21 PM7/5/16

to Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel, Gregory Szorc

On Tue, Jul 5, 2016 at 3:58 PM, Ralph Giles <gi...@mozilla.com> wrote:

> On Tue, Jul 5, 2016 at 3:36 PM, Gregory Szorc <g...@mozilla.com> wrote:
>
> > * `mach build binaries` (touch network/dns/DNS.cpp): 14.1s
>
> 24s here. So faster link times and significantly faster clobber times. I'm
> sold!
>
> Any motherboard recommendations? If we want developers to use machines
> like this, maintaining a current config in ServiceNow would probably
> help.

Until the ServiceNow catalog is updated...

The Lenovo ThinkStation P710 is a good starting point (
http://shop.lenovo.com/us/en/workstations/thinkstation/p-series/p710/).
>From the default config:

* Choose a 2 x E5-2637v4 or a 2 x E5-2643v4
* Select at least 4 x 8 GB ECC memory sticks (for at least 32 GB)
* Under "Non-RAID Hard Drives" select whatever works for you. I recommend a
512 GB SSD as the primary HD. Throw in more drives if you need them.

Should be ~$4400 for the 2xE5-2637v4 and ~$5600 for the 2xE5-2643v4
(plus/minus a few hundred depending on configuration specific).

FWIW, I priced out similar specs for a HP Z640 and the markup on the CPUs
is absurd (costs >$2000 more when fully configured). Lenovo's
markup/pricing seems reasonable by comparison. Although I'm sure someone
somewhere will sell the same thing for cheaper.

If you don't need the dual socket Xeons, go for an i7-6700K at the least. I
got the
http://store.hp.com/us/en/pdp/cto-dynamic-kits--1/hp-envy-750se-windows-7-desktop-p5q80av-aba-1
a few months ago and like it. At ~$1500 for an i7-6700K, 32 GB RAM, and a
512 GB SSD, the price was very reasonable compared to similar
configurations at Dell, HP, others.

The just-released Broadwell-E processors with 6-10 cores are also nice
(i7-6850K, i7-6900K). Although I haven't yet priced any of these out so I
have no links to share. They should be <$2600 fully configured. That's a
good price point between the i7-6700K and a dual socket Xeon. Although if
you do lots of C++ compiling, you should get the dual socket Xeons (unless
you have access to more cores in an office or a remote machine).

If you buy a machine today, watch out for Windows 7. The free Windows 10
upgrade from Microsoft is ending soon. Try to get a Windows 10 Pro license
out of the box. And, yes, you should use Windows 10 as your primary OS
because that's what our users mostly use. I run Hyper-V under Windows 10
and have at least 1 Linux VM running at all times. With 32 GB in the
system, there's plenty of RAM to go around and Linux performance under the
VM is excellent. It feels like I'm dual booting without the rebooting part.

Chris H-C

unread,

Jul 6, 2016, 12:01:12 PM7/6/16

to Gregory Szorc, Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel

Are there any scripts for reporting, analysing build times reported by
mach? I think this would be really useful data to have, especially to track
build system improvements (and regressions) as well as poorly-supported
configurations.

Chris

Gregory Szorc

unread,

Jul 6, 2016, 2:41:57 PM7/6/16

to Chris H-C, Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel, Gregory Szorc

We're actively looking into a Telemetry-like system for mach and the build
system.

Chris H-C

unread,

Jul 6, 2016, 3:01:14 PM7/6/16

to Gregory Szorc, Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel

> We're actively looking into a Telemetry-like system for mach and the
build system.

I heartily endorse this event or product and would like to subscribe to
your newsletter.

Chris

Trevor Saunders

unread,

Jul 6, 2016, 3:03:39 PM7/6/16

to Gregory Szorc, Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel

The other week I built a machine with a 6800k, 32gb of ram, and a 2 tb
hdd for $1525 cad so probably just under $1000 usd. With just that
machine I can do a 10 minute linux debug build. For less than the
price of the e3 machine quoted above I can buy 4 of those machines
which I expect would produce build times under 5:00.

I believe with 32gb of ram there's enough fs cache disk performance
doesn't actually matter, but it might be worth investigating moving a
ssd to that machine at some point.

So I would tend to conclude Xeons are not a great deal unless you really
need to build for windows a lot before someone gets icecc working there.

Trev

Henri Sivonen

unread,

Mar 23, 2017, 9:44:21 AM3/23/17

to dev-platform

On Wed, Jul 6, 2016 at 2:42 AM, Gregory Szorc <g...@mozilla.com> wrote:
> The Lenovo ThinkStation P710 is a good starting point (
> http://shop.lenovo.com/us/en/workstations/thinkstation/p-series/p710/).

To help others who follow the above advice save some time:

Xeons don't have Intel integrated GPUs, so one has to figure how to
get this up and running with a discrete GPU. In the case of Nvidia
Quadro M2000, the latest Ubuntu and Fedora install images don't work.

This works:
Disable or enable the TPM. (By default, it's in a mode where the
kernel can see it but it doesn't work. It should either be hidden or
be allowed to work.)
Disable secure boot. (Nvidia's proprietary drivers don't work with
secure boot enabled.)
Use the Ubuntu 16.04.1 install image (i.e. intentionally old
image--you can upgrade later)
After installing, edit /etc/default/grub and set
GRUB_CMDLINE_LINUX_DEFAULT="" (i.e. make the string empty; without
this, the nvidia proprietary driver conflicts with LUKS pass phrase
input).
update-initramfs -u
update-grub
apt install nvidia-375
Then upgrade the rest. Even rolling forward the HWE stack works
*after* the above steps.

(For a Free Software alternative, install Ubuntu 16.04.1, stick to 2D
graphics from nouveau with llvmpipe for 3D and be sure never to roll
the HWE stack forward.)

--
Henri Sivonen
hsiv...@hsivonen.fi
https://hsivonen.fi/

Jeff Gilbert

unread,

Mar 23, 2017, 7:51:32 PM3/23/17

to Trevor Saunders, Ralph Giles, Steve Fink, dev-platform, Lawrence Mandel, Gregory Szorc

They're basically out of stock now, but if you can find them, old
refurbished 2x Intel Xeon E5-2670 (2.6GHz Eight Core) machines were
bottoming out under $1000/ea. It happily does GCC builds in 8m, and I
have clang builds down to 5.5. As the v2s leave warranty, similar
machines may hit the market again.

I'm interested to find out how the new Ryzen chips do. It should fit
their niche well. I have one at home now, so I'll test when I get a
chance.

On Wed, Jul 6, 2016 at 12:06 PM, Trevor Saunders
<tbsaunde...@tbsaunde.org> wrote:
> On Tue, Jul 05, 2016 at 04:42:09PM -0700, Gregory Szorc wrote:
>> On Tue, Jul 5, 2016 at 3:58 PM, Ralph Giles <gi...@mozilla.com> wrote:
>>
>> > On Tue, Jul 5, 2016 at 3:36 PM, Gregory Szorc <g...@mozilla.com> wrote:
>> >
>> > > * `mach build binaries` (touch network/dns/DNS.cpp): 14.1s
>> >
>> > 24s here. So faster link times and significantly faster clobber times. I'm
>> > sold!
>> >
>> > Any motherboard recommendations? If we want developers to use machines
>> > like this, maintaining a current config in ServiceNow would probably
>> > help.
>>
>>
>> Until the ServiceNow catalog is updated...
>>

>> The Lenovo ThinkStation P710 is a good starting point (
>> http://shop.lenovo.com/us/en/workstations/thinkstation/p-series/p710/).

Ehsan Akhgari

unread,

Mar 23, 2017, 8:13:26 PM3/23/17

to Jeff Gilbert, Ralph Giles, Steve Fink, Gregory Szorc, Lawrence Mandel, dev-platform, Trevor Saunders

On Thu, Mar 23, 2017 at 7:51 PM, Jeff Gilbert <jgil...@mozilla.com> wrote:

> I'm interested to find out how the new Ryzen chips do. It should fit
> their niche well. I have one at home now, so I'll test when I get a
> chance.
>

Ryzen currently on Linux implies no rr, so beware of that.

--
Ehsan

Robert O'Callahan

unread,

Mar 23, 2017, 11:43:04 PM3/23/17

to Ehsan Akhgari, Ralph Giles, Steve Fink, Gregory Szorc, Jeff Gilbert, Lawrence Mandel, dev-platform, Trevor Saunders

On Fri, Mar 24, 2017 at 1:12 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 7:51 PM, Jeff Gilbert <jgil...@mozilla.com> wrote:
>
>> I'm interested to find out how the new Ryzen chips do. It should fit
>> their niche well. I have one at home now, so I'll test when I get a
>> chance.
>>
>
> Ryzen currently on Linux implies no rr, so beware of that.

A contributor almost got Piledriver working with rr, but that was
based on "LWP" features that apparently are not in Ryzen. If anyone
finds any detailed documentation of the hardware performance counters
in Ryzen, let us know! All I can find is PR material.

Rob
--
lbir ye,ea yer.tnietoehr rdn rdsme,anea lurpr edna e hnysnenh hhe uresyf toD
selthor stor edna siewaoeodm or v sstvr esBa kbvted,t rdsme,aoreseoouoto
o l euetiuruewFa kbn e hnystoivateweh uresyf tulsa rehr rdm or rnea lurpr
.a war hsrer holsa rodvted,t nenh hneireseoouot.tniesiewaoeivatewt sstvr esn

Jeff Muizelaar

unread,

Mar 24, 2017, 12:11:09 AM3/24/17

to Jeff Gilbert, Ralph Giles, Steve Fink, Gregory Szorc, Lawrence Mandel, dev-platform, Trevor Saunders

I have a Ryzen 7 1800 X and it does a Windows clobber builds in ~20min
(3 min of that is configure which seems higher than what I've seen on
other machines). This compares pretty favorably to the Lenovo p710
machines that people are getting which do 18min clobber builds and
cost more than twice the price.

-Jeff

On Thu, Mar 23, 2017 at 7:51 PM, Jeff Gilbert <jgil...@mozilla.com> wrote:

> They're basically out of stock now, but if you can find them, old
> refurbished 2x Intel Xeon E5-2670 (2.6GHz Eight Core) machines were
> bottoming out under $1000/ea. It happily does GCC builds in 8m, and I
> have clang builds down to 5.5. As the v2s leave warranty, similar
> machines may hit the market again.
>

> I'm interested to find out how the new Ryzen chips do. It should fit
> their niche well. I have one at home now, so I'll test when I get a
> chance.
>

Jeff Muizelaar

unread,

Mar 24, 2017, 12:17:19 AM3/24/17

to robert@ocallahan.org O'Callahan, Ralph Giles, Steve Fink, Ehsan Akhgari, Gregory Szorc, Jeff Gilbert, Lawrence Mandel, dev-platform, Trevor Saunders

On Thu, Mar 23, 2017 at 11:42 PM, Robert O'Callahan
<rob...@ocallahan.org> wrote:
> On Fri, Mar 24, 2017 at 1:12 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:

>> On Thu, Mar 23, 2017 at 7:51 PM, Jeff Gilbert <jgil...@mozilla.com> wrote:
>>
>>> I'm interested to find out how the new Ryzen chips do. It should fit
>>> their niche well. I have one at home now, so I'll test when I get a
>>> chance.
>>>
>>

>> Ryzen currently on Linux implies no rr, so beware of that.
>
> A contributor almost got Piledriver working with rr, but that was
> based on "LWP" features that apparently are not in Ryzen. If anyone
> finds any detailed documentation of the hardware performance counters
> in Ryzen, let us know! All I can find is PR material.

I have NDA access to at least some of the Ryzen documentation and I
haven't been able to find anything more on the performance counters
other than:

AMD64 Architecture Programmer’s Manual
Volume 2: System Programming
3.27 December 2016

This document is already publicly available.

I also have one of the chips so I can test code. If there are specific
questions I can also forward them through our AMD contacts.

-Jeff

Gregory Szorc

unread,

Mar 24, 2017, 12:40:04 AM3/24/17

to Jeff Muizelaar, Ralph Giles, Steve Fink, Gregory Szorc, Jeff Gilbert, Lawrence Mandel, dev-platform, Trevor Saunders

On Thu, Mar 23, 2017 at 9:10 PM, Jeff Muizelaar <jmuiz...@mozilla.com>
wrote:

> I have a Ryzen 7 1800 X and it does a Windows clobber builds in ~20min
> (3 min of that is configure which seems higher than what I've seen on
> other machines).

Make sure your power settings are aggressive. Configure and its single-core
usage is where Xeons and their conservative clocking really slowed down
compared to consumer CPUs (bug 1323106). Also, configure time can vary
significantly depending on page cache hits. So please run multiple times.

On Windows, I measure `mach configure` separately from `mach build` because
configure on Windows is just so slow and skews results. For Gecko
developers, I feel we want to optimize for compile time, so I tend to give
less weight to configure performance.

> This compares pretty favorably to the Lenovo p710
> machines that people are getting which do 18min clobber builds and
> cost more than twice the price.
>

I assume this is Windows and VS2015?

FWIW, I've been very interested in getting my hands on a Ryzen. I wouldn't
at all be surprised for the Ryzens to have better value than dual socket
Xeons. The big question is whether it is unquestionably better. For some
people (like remote employees who don't have access to an icecream
cluster), you can probably justify the extreme cost of a dual socket Xeon
over a Ryzen, even if the difference is only like 20%. Of course, the
counterargument is you can probably buy 2 Ryzen machines in place of a dual
socket Xeon. The introduction of Ryzen has literally changed the landscape
and the calculus that determines what hardware engineers should have.
Before I disappeared for ~1 month, I was working with IT and management to
define an optimal hardware load out for Firefox engineers. I need to resume
that work and fully evaluate Ryzen...

>
> -Jeff

>
> On Thu, Mar 23, 2017 at 7:51 PM, Jeff Gilbert <jgil...@mozilla.com>
> wrote:

> > They're basically out of stock now, but if you can find them, old
> > refurbished 2x Intel Xeon E5-2670 (2.6GHz Eight Core) machines were
> > bottoming out under $1000/ea. It happily does GCC builds in 8m, and I
> > have clang builds down to 5.5. As the v2s leave warranty, similar
> > machines may hit the market again.
> >

> > I'm interested to find out how the new Ryzen chips do. It should fit
> > their niche well. I have one at home now, so I'll test when I get a
> > chance.
> >

Gabriele Svelto

unread,

Mar 24, 2017, 5:30:33 AM3/24/17

to Gregory Szorc, Jeff Muizelaar, Ralph Giles, Steve Fink, Jeff Gilbert, Lawrence Mandel, dev-platform, Trevor Saunders

On 24/03/2017 05:39, Gregory Szorc wrote:
> The introduction of Ryzen has literally changed the landscape
> and the calculus that determines what hardware engineers should have.
> Before I disappeared for ~1 month, I was working with IT and management to
> define an optimal hardware load out for Firefox engineers. I need to resume
> that work and fully evaluate Ryzen...

The fact that with the appropriate motherboard they also support ECC
memory (*) made a lot of Xeon offerings a lot less appealing. Especially
the workstation-oriented ones.

Gabriele

*) Which is useful to those of us who keep their machines on for weeks
w/o rebooting or just want to have a more reliable setup

signature.asc

Ted Mielczarek

unread,

Mar 24, 2017, 6:33:06 AM3/24/17

to Jeff Muizelaar, dev-platform

On Fri, Mar 24, 2017, at 12:10 AM, Jeff Muizelaar wrote:
> I have a Ryzen 7 1800 X and it does a Windows clobber builds in ~20min
> (3 min of that is configure which seems higher than what I've seen on

> other machines). This compares pretty favorably to the Lenovo p710

> machines that people are getting which do 18min clobber builds and
> cost more than twice the price.

Just as a data point, I have one of those Lenovo P710 machines and I get
14-15 minute clobber builds on Windows.

-Ted

Jean-Yves Avenard

unread,

Jan 16, 2018, 10:51:51 AM1/16/18

to Ted Mielczarek, Jeff Muizelaar, dev-platform

Sorry for resuming an old thread.

But I would be interested in knowing how long that same Lenovo P710 takes to compile *today*….
In the past 6 months, compilation times have certainly increased massively.

Anyhow, I’ve received yesterday the iMac Pro I ordered early December. It’s a 10 cores Xeon-W (W-2150B) with 64GB RAM

Here are the timings I measured, in comparison with the Mac Pro 2013 I have (which until today was the fastest machines I had ever used)

macOS 10.13.2:
Mac Pro late 2013 : 13m25s
iMac Pro : 7m20s

Windows 10 fall creator
Mac Pro late 2013 : 24m32s (was 16 minutes less than a year ago!)
iMac Pro : 14m07s (16m10s with windows defender going)

Interestingly, I can almost no longer get any benefits when using icecream, with 36 cores it saves 11s, with 52 cores it saves 50s only…

It’s a very sweet machine indeed

Jean-Yves

Jean-Yves Avenard

unread,

Jan 16, 2018, 2:20:12 PM1/16/18

to Ralph Giles, Jeff Muizelaar, Ted Mielczarek, dev-platform

> On 16 Jan 2018, at 7:02 pm, Ralph Giles <gi...@mozilla.com> wrote:
>
> On my Lenovo P710 (2x2x6 core Xeon E5-2643 v4), Fedora 27 Linux
>
> debug -Og build with gcc: 12:34
> debug -Og build with clang: 12:55
> opt build with clang: 11:51

I didn’t succeed in booting linux unfortunately. so I can’t compare…
12 minutes sounds rather long, it’s about what the macpro is currently doing. I typically get compilation times similar to mac...

>
> Interestingly, I can almost no longer get any benefits when using icecream, with 36 cores it saves 11s, with 52 cores it saves 50s only…
>

> Are you staturating all 52 cores during the buidls? Most of the increase in build time is new Rust code, and icecream doesn't distribute Rust. So in addition to some long compile times for final crates limiting the minimum build time, icecream doesn't help much in the run-up either. This is why I'm excited about the distributed build feature we're adding to sccache.

icemon certainly shows all machines to be running (I ran it with -j36 and -j52)

>
> I'd still expect some improvement from the C++ compilation though.

>
> It’s a very sweet machine indeed
>

> Glad you finally got one! :)
>

probably will return it though, prefer to wait on the next mac pro.

Ralph Giles

unread,

Jan 16, 2018, 2:35:55 PM1/16/18

to Jean-Yves Avenard, Jeff Muizelaar, Ted Mielczarek, dev-platform

On Tue, Jan 16, 2018 at 11:19 AM, Jean-Yves Avenard <jyav...@mozilla.com>
wrote:

12 minutes sounds rather long, it’s about what the macpro is currently
> doing. I typically get compilation times similar to mac...
>

Yes, I'd like to see 7 minute build times again too! The E5-2643 has a
higher clock speed than the Xeon W in the iMac Pro (3.4 vs 3.0 GHz) but a
much lower peak frequency (3.7 vs 4.5 GHz) so maybe the iMac catches up
during the single-process bottlenecks. Or it could be memory bandwidth.

-r

Gregory Szorc

unread,

Jan 16, 2018, 3:10:02 PM1/16/18

to Ralph Giles, Sophana Soap Aik, Jeff Muizelaar, Jean-Yves Avenard, Ted Mielczarek, dev-platform

Yes, most of the build time regressions in 2017 came from Rust. Leaning
more heavily on C++ features that require more processing or haven't been
optimized as much as C++ features that have been around for years is likely
also contributing.

Enabling sccache allows Rust compilations to be cached, which makes things
much faster on subsequent builds (since many Rust crates don't change that
often - but a few "large" crates like style do need to rebuild
semi-frequently).

We'll be transitioning workstations to the i9's because they are faster,
cheaper, and have more cores than the Xeons. But if you insist on having
ECC memory, you can still get the dual socket Xeons.

Last I heard Sophana was having trouble finding an OEM supplier for the
i9's (they are still relatively new). But if you want to put in a order for
the i9 before it is listed in the hardware catalog, contact Sophana (CCd)
and you can get the hook up.

While I'm here, we also have a contractor slated to add distributed
compilation to sccache [to replace icecream]. The contractor should start
in ~days. You can send questions, feature requests, etc through Ted for
now. We also had a meeting with IT and security last Friday about more
officially supporting distributed compilation in offices. We want people to
walk into any Mozilla office in the world and have distributed compilation
"just work." Hopefully we can deliver that in 2018.

Jean-Yves Avenard

unread,

Jan 16, 2018, 3:54:14 PM1/16/18

to Ralph Giles, Jeff Muizelaar, Ted Mielczarek, dev-platform

> On 16 Jan 2018, at 8:19 pm, Jean-Yves Avenard <jyav...@mozilla.com> wrote:

>
>
>
>> On 16 Jan 2018, at 7:02 pm, Ralph Giles <gi...@mozilla.com <mailto:gi...@mozilla.com>> wrote:
>>
>> On my Lenovo P710 (2x2x6 core Xeon E5-2643 v4), Fedora 27 Linux
>>
>> debug -Og build with gcc: 12:34
>> debug -Og build with clang: 12:55
>> opt build with clang: 11:51
>
> I didn’t succeed in booting linux unfortunately. so I can’t compare…

> 12 minutes sounds rather long, it’s about what the macpro is currently doing. I typically get compilation times similar to mac…

so I didn’t manage to get linux to boot (tried all known main distributions)

But I ran a compilation inside VMWare on Mac, allocating “only” 16 cores as that’s the maximum and 32GB of RAM, it took 13m51s

No doubt it would go much lower once I manage to boot linux.

Damn fast machine !

JY

Mike Hommey

unread,

Jan 16, 2018, 4:42:38 PM1/16/18

to Ralph Giles, Jeff Muizelaar, Jean-Yves Avenard, Ted Mielczarek, dev-platform

On Tue, Jan 16, 2018 at 10:02:12AM -0800, Ralph Giles wrote:

> On Tue, Jan 16, 2018 at 7:51 AM, Jean-Yves Avenard <jyav...@mozilla.com>
> wrote:
>
> But I would be interested in knowing how long that same Lenovo P710 takes
> > to compile *today*….
> >
>

> On my Lenovo P710 (2x2x6 core Xeon E5-2643 v4), Fedora 27 Linux
>
> debug -Og build with gcc: 12:34
> debug -Og build with clang: 12:55
> opt build with clang: 11:51
>

> Interestingly, I can almost no longer get any benefits when using icecream,
> > with 36 cores it saves 11s, with 52 cores it saves 50s only…
> >
>
> Are you staturating all 52 cores during the buidls? Most of the increase in
> build time is new Rust code, and icecream doesn't distribute Rust. So in
> addition to some long compile times for final crates limiting the minimum
> build time, icecream doesn't help much in the run-up either. This is why
> I'm excited about the distributed build feature we're adding to sccache.

Distributed compilation of rust won't help unfortunately. That won't
solve the fact that the long pole of rust compilation is a series of
multiple long single-threaded processes that can't happen in parallel
because each of them depends on the output of the previous one.

Mike

Ted Mielczarek

unread,

Jan 16, 2018, 4:44:00 PM1/16/18

to Jean-Yves Avenard, dev-platform

On Tue, Jan 16, 2018, at 10:51 AM, Jean-Yves Avenard wrote:
> Sorry for resuming an old thread.
>

> But I would be interested in knowing how long that same Lenovo P710

> takes to compile *today*….> In the past 6 months, compilation times have certainly increased

> massively.>
> Anyhow, I’ve received yesterday the iMac Pro I ordered early December.
> It’s a 10 cores Xeon-W (W-2150B) with 64GB RAM>
> Here are the timings I measured, in comparison with the Mac Pro 2013 I
> have (which until today was the fastest machines I had ever used)>
> macOS 10.13.2:
> Mac Pro late 2013 : 13m25s
> iMac Pro : 7m20s
>
> Windows 10 fall creator
> Mac Pro late 2013 : 24m32s (was 16 minutes less than a year ago!)
> iMac Pro : 14m07s (16m10s with windows defender going)
>

> Interestingly, I can almost no longer get any benefits when using
> icecream, with 36 cores it saves 11s, with 52 cores it saves 50s only…>

> It’s a very sweet machine indeed

I just did a couple of clobber builds against the tip of central
(9be7249e74fd) on my P710 running Windows 10 Fall Creators Update
and they took about 22 minutes each. Definitely slower than it
used to be :-/
-Ted

smaug

unread,

Jan 16, 2018, 5:59:51 PM1/16/18

to Mike Hommey, Ralph Giles, Jeff Muizelaar, Jean-Yves Avenard, Ted Mielczarek

Distributed compilation won't also help those remotees who may not have machines to setup
icecream or distributed sscache.
(I just got a new laptop because of rust compilation being so slow. )
I'm hoping rust compiler gets some heavy optimizations itself.

-Olli

Gregory Szorc

unread,

Jan 16, 2018, 6:39:07 PM1/16/18

to Ted Mielczarek, Jean-Yves Avenard, dev-platform

On Tue, Jan 16, 2018 at 1:42 PM, Ted Mielczarek <t...@mielczarek.org> wrote:

> On Tue, Jan 16, 2018, at 10:51 AM, Jean-Yves Avenard wrote:
> > Sorry for resuming an old thread.
> >

> > But I would be interested in knowing how long that same Lenovo P710

> > takes to compile *today*….> In the past 6 months, compilation times have
> certainly increased
> > massively.>
> > Anyhow, I’ve received yesterday the iMac Pro I ordered early December.
> > It’s a 10 cores Xeon-W (W-2150B) with 64GB RAM>
> > Here are the timings I measured, in comparison with the Mac Pro 2013 I
> > have (which until today was the fastest machines I had ever used)>
> > macOS 10.13.2:
> > Mac Pro late 2013 : 13m25s
> > iMac Pro : 7m20s
> >
> > Windows 10 fall creator
> > Mac Pro late 2013 : 24m32s (was 16 minutes less than a year ago!)
> > iMac Pro : 14m07s (16m10s with windows defender going)
> >

> > Interestingly, I can almost no longer get any benefits when using
> > icecream, with 36 cores it saves 11s, with 52 cores it saves 50s only…>

> > It’s a very sweet machine indeed
>
> I just did a couple of clobber builds against the tip of central
> (9be7249e74fd) on my P710 running Windows 10 Fall Creators Update
> and they took about 22 minutes each. Definitely slower than it
> used to be :-/
>
>

On an EC2 c5.17xlarge (36+36 CPUs) running Ubuntu 17.10 and using Clang
5.0, 9be7249e74fd does a clobber but configured `mach build` in 7:34. Rust
is very obviously the long pole in this build, with C++ compilation (not
linking) completing in ~2 minutes.

If I enable sccache for just Rust by setting "mk_add_options "export
RUSTC_WRAPPER=sccache" in my mozconfig, a clobber build with populated
cache for Rust completes in 3:18. And Rust is still the long pole -
although only by a few seconds. It's worth noting that CPU time for this
build remains in the same ballpark. But overall CPU utilization increases
from ~28% to ~64%. There's still work to do improving the efficiency of the
overall build system. But these are mostly in parts only touched by clobber
builds. If you do `mach build binaries` after touching compiled code, our
CPU utilization is terrific.

>From a build system perspective, C/C++ scales up to dozens of cores just
fine (it's been this way for a few years). Rust is becoming a longer and
longer long tail (assuming you have enough CPU cores that the vast amount
of C/C++ completes before Rust does).

Steve Fink

unread,

Jan 17, 2018, 1:27:34 PM1/17/18

to dev-pl...@lists.mozilla.org

I'm in the same situation, which reminds me of something I wrote long
ago, shortly after joining Mozilla:
https://wiki.mozilla.org/Sfink/Thought_Experiment_-_One_Minute_Builds
(no need to read it, it's ancient history now. It's kind of a fun read
IMO, though you have to remember that it long predates mozilla-inbound,
autoland, linux64, and sccache, and was in the dawn of the Era of
Sheriffing so build breakages were more frequent and more damaging.) But
in there, I speculated about ways to get other machines' built object
files into a local ccache. So here's my latest handwaving:

Would it be possible that when I do an hg pull of mozilla-central or
mozilla-inbound, I can also choose to download the object files from the
most recent ancestor that had an automation build? (It could be a
separate command, or ./mach pull.) They would go into a local ccache (or
probably sccache?) directory. The files would need to be atomically
updated with respect to my own builds, so I could race my build against
the download. And preferably the download would go roughly in the
reverse order as my own build, so they would meet in the middle at some
point, after which only the modified files would need to be compiled. It
might require splitting debug info out of the object files for this to
be practical, where the debug info could be downloaded asynchronously in
the background after the main build is complete.

Or, a different idea: have Rust "artifact builds", where I can download
prebuilt Rust bits when I'm only recompiling C++ code. (Tricky, I know,
when we have code generation that communicates between Rust and C++.)
This isn't fundamentally different from the previous idea, or
distributed compilation in general, if you start to take the exact
interdependencies into account.

Simon Sapin

unread,

Jan 17, 2018, 2:11:40 PM1/17/18

to dev-pl...@lists.mozilla.org

On 17/01/18 19:27, Steve Fink wrote:
> Would it be possible that when I do an hg pull of mozilla-central or
> mozilla-inbound, I can also choose to download the object files from the
> most recent ancestor that had an automation build? (It could be a
> separate command, or ./mach pull.) They would go into a local ccache (or
> probably sccache?) directory.

I believe that sccache already has support for Amazon S3. I don’t know
if we already enable that for our CI infra. Once we do, I imagine we
could make that store world-readable and configure local builds to use it.

--
Simon Sapin

Ralph Giles

unread,

Jan 17, 2018, 2:14:20 PM1/17/18

to Steve Fink, dev-platform

On Wed, Jan 17, 2018 at 10:27 AM, Steve Fink <sf...@mozilla.com> wrote:

> Would it be possible that when I do an hg pull of mozilla-central or
> mozilla-inbound, I can also choose to download the object files from the
> most recent ancestor that had an automation build?

You mention 'artifact builds' so I assume you know about `ac_add_options
--enable-artifact-builds` which does this for the final libXUL target,
greatly speeding up the first people for people working on the parts of
Firefox outside Gecko.

In the build team we've been discussing for a while if there's a way to
make this more granular. The most concrete plan is to use sccache again.
This tool already supports multi-level (local and remote) caches, so it
could certainly pull the latest object files from a CI build; it already
does this when running in automation. There are still some 'reproducible
build' issues which block general use of this: source directory prefixes
not matching, __FILE__ and __DATE__, different build flags between
automation and the default developer builds, that sort of thing. These
prevent cache hits when compiling the same code. There aren't too many
left; help would be welcome working out the last few if you're interested.

We've also discussed having sccache race local build and remote cache fetch
as you suggest, but not the kind of global scheduling you talk about.
Something simple with the jobserver logic might work here, but I think we
want to complete the long-term project of getting a complete dependency
graph available before looking at that kind of optimization.

FWIW,
-r

Jean-Yves Avenard

unread,

Jan 17, 2018, 2:22:23 PM1/17/18

to Ralph Giles, Steve Fink, dev-platform

> On 17 Jan 2018, at 8:14 pm, Ralph Giles <gi...@mozilla.com> wrote:
>
> Something simple with the jobserver logic might work here, but I think we
> want to complete the long-term project of getting a complete dependency
> graph available before looking at that kind of optimization.

Just get every person needing to work on mac an iMac Pro, and those on Windows/Linux a P710 or better and off we go.

Jeff Gilbert

unread,

Jan 17, 2018, 3:36:42 PM1/17/18

to Jean-Yves Avenard, Ralph Giles, Steve Fink, dev-platform

It's way cheaper to get build clusters rolling than to get beefy
hardware for every desk.
Distributed compilation or other direct build optimizations also allow
continued use of laptops for most devs, which definitely has value.

Nicholas Alexander

unread,

Jan 25, 2018, 12:14:53 AM1/25/18

to Steve Fink, dev-platform

<snip>

> Would it be possible that when I do an hg pull of mozilla-central or
> mozilla-inbound, I can also choose to download the object files from the

> most recent ancestor that had an automation build? (It could be a separate
> command, or ./mach pull.) They would go into a local ccache (or probably

> sccache?) directory. The files would need to be atomically updated with
> respect to my own builds, so I could race my build against the download.
> And preferably the download would go roughly in the reverse order as my own
> build, so they would meet in the middle at some point, after which only the
> modified files would need to be compiled. It might require splitting debug
> info out of the object files for this to be practical, where the debug info
> could be downloaded asynchronously in the background after the main build
> is complete.
>

Just FYI, in Austin (December 2017, for the archives) the build peers
discussed something like this. The idea would be to figure out how to
slurp (some part of) an object directory produced in automation, in order
to get cache hits locally. We really don't have a sense for how much of an
improvement this might be in practice, and it's a non-trivial effort to
investigate enough to find out. (I wanted to work on it but it doesn't fit
my current hats.)

My personal concern is that our current build system doesn't have a single
place that can encode policy about our build. That is, there's nothing to
control the caching layers and to schedule jobs intelligently (i.e., push
Rust and SpiderMonkey forward, and work harder to get them from a remote
cache). That could be a distributed job server, but it doesn't have to be:
it just needs to be able to control our build process. None of the current
build infrastructure (sccache, the recursive make build backend, the
in-progress Tup build backend) is a good home for those kind of policy
choices. So I'm concerned that we'd find that an object directory caching
strategy is a good idea... and then have a chasm when it comes to
implementing it and fine-tuning it. (The chasm from artifact builds to a
compile environment build is a huge pain point, and we don't want to
replicate that.)

Or, a different idea: have Rust "artifact builds", where I can download
> prebuilt Rust bits when I'm only recompiling C++ code. (Tricky, I know,
> when we have code generation that communicates between Rust and C++.) This
> isn't fundamentally different from the previous idea, or distributed
> compilation in general, if you start to take the exact interdependencies
> into account.

In theory, caching Rust crate artifacts is easier than caching C++ object
files. (At least, so I'm told.) In practice, nobody has tried to push
through the issues we might see in the wild. I'd love to see investigation
into this area, since it seems likely to be fruitful on a short time
scale. In a different direction, I am aware of some work (cited in this
thread?) towards an icecream-like job server for distributed Rust
compilation. Doesn't hit the artifact build style caching, but related.

Best,
Nick

Randell Jesup

unread,

Feb 1, 2018, 1:20:56 AM2/1/18

to

>On 1/16/18 2:59 PM, smaug wrote:

>Would it be possible that when I do an hg pull of mozilla-central or
>mozilla-inbound, I can also choose to download the object files from the
>most recent ancestor that had an automation build? (It could be a separate
>command, or ./mach pull.) They would go into a local ccache (or probably
>sccache?) directory. The files would need to be atomically updated with
>respect to my own builds, so I could race my build against the
>download. And preferably the download would go roughly in the reverse order
>as my own build, so they would meet in the middle at some point, after
>which only the modified files would need to be compiled. It might require
>splitting debug info out of the object files for this to be practical,
>where the debug info could be downloaded asynchronously in the background
>after the main build is complete.

Stolen from a document on Workflow Efficiencies I worked on:

Some type of aggressive pull-and-rebuild in the background may help
by providing a ‘hot’ objdir that can be switched to in place of the
normal “hg pull -u; ./mach build” sequence.

Users would need to deal with reloading editor buffers after
switching, but that’s normal after a pull. If the path changes it
might require more magic; Emacs could deal with that easily with an
elisp macro; not sure about other editors people use. Keeping paths
to source the same after a pull is a win, though.

Opportunistic rebuilds as you edit source might help, but the win is
much smaller and would be more work. Still worth looking at,
especially if you happen to touch something central.

We'd need to be careful how it interacts with things like hg pull,
witching branches, etc (defer starting builds slightly until source
has been unchanged for N seconds?)

I talked a fair bit about this with ted and others. The main trick here
would be in dealing with cache directories, and with sccache we could
make it support a form of hierarchy for caches (local and remote), so
you could leverage either local rebuilds-in-background (triggered by
automatic pulls on repo updates), or remote build resources (such as
from the m-c build machines).

Note that *any* remote-cache utilization depends on a fixed (or at least
identical-and-checked) configuration *and* compiler and system
includes. The easiest way to acheive this might be to leverage a local
VM instance of taskcluster, since system includes vary
machine-to-machine, even for the same OS version. (Perhaps this is less
of an issue on Mac or Windows...).

This requirement greatly complicates things (and requires building a
"standard" config, which many do not). Leveraging local background
builds would be much easier in many ways, though also less of a win.

--
Randell Jesup, Mozilla Corp
remove "news" for personal email

Jean-Yves Avenard

unread,

Feb 2, 2018, 1:55:03 PM2/2/18

to Gregory Szorc, dev-platform, Ted Mielczarek

Hi

> On 17 Jan 2018, at 12:38 am, Gregory Szorc <g...@mozilla.com> wrote:
>
> On an EC2 c5.17xlarge (36+36 CPUs) running Ubuntu 17.10 and using Clang 5.0, 9be7249e74fd does a clobber but configured `mach build` in 7:34. Rust is very obviously the long pole in this build, with C++ compilation (not linking) completing in ~2 minutes.
>
> If I enable sccache for just Rust by setting "mk_add_options "export RUSTC_WRAPPER=sccache" in my mozconfig, a clobber build with populated cache for Rust completes in 3:18. And Rust is still the long pole - although only by a few seconds. It's worth noting that CPU time for this build remains in the same ballpark. But overall CPU utilization increases from ~28% to ~64%. There's still work to do improving the efficiency of the overall build system. But these are mostly in parts only touched by clobber builds. If you do `mach build binaries` after touching compiled code, our CPU utilization is terrific.
>
> From a build system perspective, C/C++ scales up to dozens of cores just fine (it's been this way for a few years). Rust is becoming a longer and longer long tail (assuming you have enough CPU cores that the vast amount of C/C++ completes before Rust does).

After playing with the iMac Pro and loving its performance (though I’ve returned it now)

I was thinking of testing this configuration

Intel i9-7980XE
Asus Prime X299-Deluxe
Samsung 960 Pro SSD
G.Skill F4-3200OC16Q-32GTZR x 2 (allowing 64GB in quad channels)
Corsair AX1200i PSU
Corsair H100i water cooloer
Cooler Master Silencio 652S

Aim is for the fastest and most silent PC (if such thing exists)
The price on Amazon is 4400 euros which is well below the iMac Pro cost (less than half for similar core count) or the Lenovo P710.

The choice of the motherboard is that there’s successful report on the hackintosh forum to run macOS High Sierra (though no wifi support)

Any ideas when the updated Lenovo P710 will come out?

Anandtech had a nice article about the i9-7980EX in regards to clock speed according to the number of core in use… It clearly shows that base frequency matters very little as the turbo frequencies almost make them all equal.

JY

Jean-Yves Avenard

unread,

Feb 18, 2018, 8:09:37 AM2/18/18

to dev-platform, Ted Mielczarek

Hi

So I got this to work under all platforms (OS X , Ubuntu 17.10 and Windows 10)
Stock speed, no OC of any type.

macOS: 7m32s
Windows 10: 12m20s
Linux Ubuntu 17.10 (had to install kernel 4.15): 6m04s

So not much better than the iMac Pro 10 cores…