One minute builds

Steve Fink

unread,

Aug 31, 2010, 8:41:01 PM8/31/10

to dev-pl...@lists.mozilla.org, dev-b...@lists.mozilla.org

Recently in a thread that I can't find anymore, someone asserted that
most of our builds should complete within a minute. People came up with
various reasons why this isn't the case in reality, but it got me
thinking: what would it take to get tinderbox results within one minute?

It makes for an interesting thought experiment, if nothing else. I wrote
up my thoughts at
https://wiki.mozilla.org/Sfink/Thought_Experiment_-_One_Minute_Builds

I'm curious to hear other people's opinions, here or on the wiki. Most
of what I came up with is probably neither practical nor useful, but
maybe there's something to be had from it.

Mark Banner

unread,

Sep 1, 2010, 5:06:06 AM9/1/10

to

On 01/09/2010 01:41, Steve Fink wrote:
> It makes for an interesting thought experiment, if nothing else. I wrote
> up my thoughts at
> https://wiki.mozilla.org/Sfink/Thought_Experiment_-_One_Minute_Builds
>
> I'm curious to hear other people's opinions, here or on the wiki. Most
> of what I came up with is probably neither practical nor useful, but
> maybe there's something to be had from it.

"Build Slave Pull"

You mention about try server taking a while to pull - I'm assuming you
mean to pull the try server repository. If so, the main reason for this
is that all try builders will clobber each build - this is a full source
clobber.

I'm not sure you could get away without that, but an idea we had on
Thunderbird a while back (which we haven't had a real need to
implement), was this:

- Someone pushes.
- A single builder (probably Linux) gets triggered to pull the source.
- It zips (tarball, whatever you prefer) up the mozilla-central
directory and uploads it to ftp
- The rest of the builders download the source, unpack and build from that.

I think we did some experiments and found doing a full mozilla-central
clone on Windows was longer than the linux single clone +
upload/download, though it was a while ago, so I may have got that wrong.

Whilst this might not quite fit in the one-minute-builds, it could cut
down on the amount of time that all builders spend getting the source
code and cut down the load on the server.

Oh and the other thing about pulling, is to ask the question why we
clone buildbot-configs on every build, rather than just updating the
previous version.

Standard8

Chris AtLee

unread,

Sep 1, 2010, 10:40:10 AM9/1/10

to Steve Fink, dev-b...@lists.mozilla.org, dev-pl...@lists.mozilla.org

On 31/08/10 08:41 PM, Steve Fink wrote:
> Recently in a thread that I can't find anymore, someone asserted that
> most of our builds should complete within a minute. People came up with
> various reasons why this isn't the case in reality, but it got me
> thinking: what would it take to get tinderbox results within one minute?
>

> It makes for an interesting thought experiment, if nothing else. I wrote
> up my thoughts at
> https://wiki.mozilla.org/Sfink/Thought_Experiment_-_One_Minute_Builds
>
> I'm curious to hear other people's opinions, here or on the wiki. Most
> of what I came up with is probably neither practical nor useful, but
> maybe there's something to be had from it.

Thanks for this, it's really interesting!

A few quick thoughts to items on the wiki...

We currently poll hg.mozilla.org for new changes once every minute. So
right there you've got a mean delay of 30s. We could investigate using
a push model, but it sounds like that could wait a while. Ted suggested
having hg hooks to notify pulse about new pushes, and then we could
listen for the pulse events to trigger builds. This sounds like a
better way forward than having special hooks in hg for buildbot.

For mozilla-central we also have a 3 minute "tree stable timer". When
we detect a new push, we start the timer. If a new push is detected,
the timer is reset. Once the timer expires, we then schedule builds.
The reason for this is to give people a small window to push bustage
fixes if they realize they missed something in the original push. If
this isn't used, we can set this to 0, and builds will start as soon as
we know about the push. The 'try' branch does not share this behaviour
- builds are started as soon as we know about the push.

For pulling the hg changes, we have lots of room for improvement here.
The main problem (especially for try builds) is that we have separate
clones of hg repositories per build type. So the directory on the build
slave for a mozilla-central optimized build would have its own copy of
the repo, as would the directory for the mozilla-central debug build.
These build directories get deleted very regularly to make room for
other jobs, taking the hg clones with them. See e.g. bug 589885 for one
thing we're working towards for speeding this up.

We are already using ccache on linux to speed up builds (look for
"ccache -s" in the build logs to see how we're doing). We're going to
be revisiting it for OSX builds as well as looking at ccache version 3
(bug 588150). We did experiment using a shared network drive for
ccache, but it performed horribly.

Anybody want to get a version of ccache working for cl.exe?

Mike Shaver

unread,

Sep 1, 2010, 10:54:44 AM9/1/10

to Mark Banner, dev-b...@lists.mozilla.org

On Wed, Sep 1, 2010 at 2:06 AM, Mark Banner
<bugz...@invalid.standard8.plus.com> wrote:
> You mention about try server taking a while to pull - I'm assuming you mean
> to pull the try server repository. If so, the main reason for this is that
> all try builders will clobber each build - this is a full source clobber.

Yeah, we need to get a lot more efficient about how we pull in these
cases, because it's an enormous load on the server as well as wait
time we don't need to spend.

Mike

Mike Shaver

unread,

Sep 1, 2010, 11:19:28 AM9/1/10

to Chris AtLee, dev-b...@lists.mozilla.org, Steve Fink, dev-pl...@lists.mozilla.org

On Wed, Sep 1, 2010 at 7:40 AM, Chris AtLee <cat...@mozilla.com> wrote:
> We currently poll hg.mozilla.org for new changes once every minute. So
> right there you've got a mean delay of 30s. We could investigate using a
> push model, but it sounds like that could wait a while. Ted suggested
> having hg hooks to notify pulse about new pushes, and then we could listen
> for the pulse events to trigger builds. This sounds like a better way
> forward than having special hooks in hg for buildbot.

That'd be fine, though it'd raise the operational requirements for
pulse a fair bit. You might want to use your own broker, but then
isn't that what scheduledb is supposed to do?

> For mozilla-central we also have a 3 minute "tree stable timer". When we
> detect a new push, we start the timer. If a new push is detected, the timer
> is reset. Once the timer expires, we then schedule builds. The reason for
> this is to give people a small window to push bustage fixes if they realize
> they missed something in the original push. If this isn't used, we can set
> this to 0, and builds will start as soon as we know about the push. The
> 'try' branch does not share this behaviour - builds are started as soon as
> we know about the push.

I don't think the 3 min timer is a big deal timing-wise, but we could
start the pull on initial trigger-push so that it can be ready when
the 3 min timer expires.

The 3-min timer is probably a bigger issue for inadvertent
change-bundling than for the extra few minutes on builds.

> Anybody want to get a version of ccache working for cl.exe?

It worked for OpenOffice in 2007, not sure if the patch still applies
to modern ccache. Might be worth trying.

http://artax.karlin.mff.cuni.cz/~kendy/ccache/

Some results from that era showed that PCH is about as big a win as
ccache, for the OOo build.

Mike

Kyle Huey

unread,

Sep 1, 2010, 11:38:14 AM9/1/10

to dev-b...@lists.mozilla.org, dev-pl...@lists.mozilla.org

Early this summer I wrote a patch to use precompiled headers in our
build system. Some basic work putting XPCOM headers into a PCH was
able to get me a 10% speedup in layout/base. The results were
encouraging, but the patch was pretty invasive so I put it on the back
burner. It deserves more investigation after we branch.

- Kyle

Kyle Huey

unread,

Sep 1, 2010, 6:05:16 PM9/1/10

to Steve Fink, dev-b...@lists.mozilla.org, dev-pl...@lists.mozilla.org

cl

- Kyle

Steve Fink

unread,

Sep 1, 2010, 6:28:33 PM9/1/10

to Mark Banner, dev-b...@lists.mozilla.org

On 09/01/2010 02:06 AM, Mark Banner wrote:
> I'm not sure you could get away without that, but an idea we had on
> Thunderbird a while back (which we haven't had a real need to
> implement), was this:
>
> - Someone pushes.
> - A single builder (probably Linux) gets triggered to pull the source.
> - It zips (tarball, whatever you prefer) up the mozilla-central
> directory and uploads it to ftp
> - The rest of the builders download the source, unpack and build from that.
>
> I think we did some experiments and found doing a full mozilla-central
> clone on Windows was longer than the linux single clone +
> upload/download, though it was a while ago, so I may have got that wrong.
>
> Whilst this might not quite fit in the one-minute-builds, it could cut
> down on the amount of time that all builders spend getting the source
> code and cut down the load on the server.

If the build slaves can see each other, it might be faster and simpler
to rsync off of the initial builder. (Or do it in an rsync tree, if you
don't like all the contention on the initial builder, but that smacks of
overengineering and might be counterproductive anyway.)

It's nice having a full hg repo lying around, though. I'd be more
tempted to have slaves rsync that and then update. Although I keep
forgetting that hg has this obnoxious desire to keep a full repo per clone.

Steve Fink

unread,

Sep 1, 2010, 6:45:02 PM9/1/10

to Chris AtLee, dev-b...@lists.mozilla.org, dev-pl...@lists.mozilla.org

On 09/01/2010 07:40 AM, Chris AtLee wrote:
> For pulling the hg changes, we have lots of room for improvement here.
> The main problem (especially for try builds) is that we have separate
> clones of hg repositories per build type. So the directory on the build
> slave for a mozilla-central optimized build would have its own copy of
> the repo, as would the directory for the mozilla-central debug build.
> These build directories get deleted very regularly to make room for
> other jobs, taking the hg clones with them. See e.g. bug 589885 for one
> thing we're working towards for speeding this up.

Ugh. I always forget about hg's "one object repo per checkout" model.
Though I don't understand why you can't just use a single repo per slave
and use update -r -C. You talk about that some in the bug 589885, but I
didn't follow. Is that the same as what the bug is talking about?

Alternatively, you could use the share extension
<http://mercurial.selenic.com/wiki/ShareExtension>. It seems like it
ought to be safe since slaves are read-only (or is that a bad assumption?)

> We are already using ccache on linux to speed up builds (look for
> "ccache -s" in the build logs to see how we're doing). We're going to be
> revisiting it for OSX builds as well as looking at ccache version 3 (bug
> 588150). We did experiment using a shared network drive for ccache, but
> it performed horribly.

Yes, I knew the Linux builds were using ccache, but I was chasing after
the cases that ccache couldn't help because it hadn't built the exact
sources yet. That's why I went all crazy with stealing the ccache
outputs from someone else who had built from the same thing already.

Steve Fink

unread,

Sep 1, 2010, 6:45:02 PM9/1/10

to Chris AtLee, dev-b...@lists.mozilla.org, dev-pl...@lists.mozilla.org

On 09/01/2010 07:40 AM, Chris AtLee wrote:

> For pulling the hg changes, we have lots of room for improvement here.
> The main problem (especially for try builds) is that we have separate
> clones of hg repositories per build type. So the directory on the build
> slave for a mozilla-central optimized build would have its own copy of
> the repo, as would the directory for the mozilla-central debug build.
> These build directories get deleted very regularly to make room for
> other jobs, taking the hg clones with them. See e.g. bug 589885 for one
> thing we're working towards for speeding this up.

Ugh. I always forget about hg's "one object repo per checkout" model.

Though I don't understand why you can't just use a single repo per slave
and use update -r -C. You talk about that some in the bug 589885, but I
didn't follow. Is that the same as what the bug is talking about?

Alternatively, you could use the share extension
<http://mercurial.selenic.com/wiki/ShareExtension>. It seems like it
ought to be safe since slaves are read-only (or is that a bad assumption?)

> We are already using ccache on linux to speed up builds (look for

> "ccache -s" in the build logs to see how we're doing). We're going to be
> revisiting it for OSX builds as well as looking at ccache version 3 (bug
> 588150). We did experiment using a shared network drive for ccache, but
> it performed horribly.

Yes, I knew the Linux builds were using ccache, but I was chasing after

Nicholas Nethercote

unread,

Sep 1, 2010, 7:09:49 PM9/1/10

to Steve Fink, dev-b...@lists.mozilla.org, Chris AtLee, dev-pl...@lists.mozilla.org, Mike Shaver

On Wed, Sep 1, 2010 at 3:03 PM, Steve Fink <sf...@mozilla.com> wrote:
>
> The opinion on IRC is that bustage fixes within this window are rare in
> practice. Bustage fixes almost always happen immediately after a build
> starts burning.
>
> Shaving a 30-minute first report down to 27 minutes isn't all that tempting,
> though. It matters more when the first build comes back a lot faster than it
> does now.

Don't diss a 10% improvement!

Nick