create second set of mozilla-central based builders that latch on 'full green'?

2 views
Skip to first unread message

Andrew Sutherland

unread,
May 11, 2010, 12:55:40 AM5/11/10
to tb-pl...@mozilla.org
Now that active development is starting to happen on comm-central and
it's not just a canary for 'future problems', perhaps we should consider
actually having two sets of Thunderbird comm-central builders:

1) Thunderbird-canary. Always builds latest mozilla-central.
2) Thunderbird. Always builds the most recent revision of
mozilla-central that got greens on all platforms in a Thunderbird-canary
build.

We basically are just doing #1 right now. I think this creates an
awkward situation for us and for mozilla-central developers. They
should not need to care that their changes are going to cause breakage
for us, and we should not:

a) have to run around doing heroic things to un-break the build when
they do cause comm-central-only breakage, or worse yet, request that
they back things out because of our problems.

b) have the comm-central tree in a questionable state where we cannot
tell mozilla-central breakage from our own breakage and hence need to
close the tree.

I would suggest that we would generally aim for Thunderbird to generally
stay within a few commits of Thunderbird-canary most of the time.
Intermittent oranges do happen, and they would likely cause somewhat
jerky motion of the revision we use, but I would expect it to be
manageable. When a comm-central-affecting commit does land in
mozilla-central, I expect we would try and resolve it on a timescale of
~3 real days, with larger problems taking up to a week before people we
should start getting concerned. The goal is to avoid requiring heroics
or the introduction of sloppy fixes that are created and reviewed under
duress.

I would suggest that we do not keep the 'good' revision in revision
control, but do publish it at a public URL and that we do point
client.py at it so that random people who want to build Thunderbird do
not need to deal with breakage.

I would also suggest we increase the build pool size and take steps to
ensure that Thunderbird-canary cannot steal builders from Thunderbird,
as I have found it continues to do such things currently.

Andrew

PS: If we already have a bug to do this, I also also suggest someone
reply with a link to the bug :)
_______________________________________________
tb-planning mailing list
tb-pl...@mozilla.org
https://mail.mozilla.org/listinfo/tb-planning

Mark Banner

unread,
May 11, 2010, 12:37:23 PM5/11/10
to tb-pl...@mozilla.org
On 11/05/2010 05:55, Andrew Sutherland wrote:
 Now that active development is starting to happen on comm-central and it's not just a canary for 'future problems', perhaps we should consider actually having two sets of Thunderbird comm-central builders:
I think you're right that we need to consider both alternate methods of allowing development to continue.

1) Thunderbird-canary.  Always builds latest mozilla-central.
2) Thunderbird.  Always builds the most recent revision of mozilla-central that got greens on all platforms in a Thunderbird-canary build.
...

a) have to run around doing heroic things to un-break the build when they do cause comm-central-only breakage, or worse yet, request that they back things out because of our problems.
Ok, so first and foremost, we should *never* be requesting that they back-out because of our problems. The only time I think they should consider backing out because of problems raised by our tree is when we've clearly picked up an issue in one of our test cases that they haven't covered, i.e. its a true bustage that could affect Firefox as well (for instance, I know we've picked up js faults in the past).


b) have the comm-central tree in a questionable state where we cannot tell mozilla-central breakage from our own breakage and hence need to close the tree.
Indeed, that is the most difficult issue.

So before we go into the specifics of your proposal, I'd like to just go through some of the most frequent bustage areas that we've been seeing:
  • Build Config
    • m-c has been doing a lot of rework on build config areas and have been either breaking non-libxul builds, or changing things such that we need to port them across to comm-central build config.
    • These are two things that I think we can consider improving:
      • Ted is currently working on a way to include app specific components within the libxul library [1]. This is for non-xulrunner builds, but would mean that we could be built like Firefox, as mailnews could be linked into libxul *and* use internal linkage. I probably need to cover this in more detail elsewhere, but this would certainly help to match us closer to FF and reduce bustage.
        • This would also mean we could go onto using packaged tests, which would further help with matching FF.
      • Reworking how comm-central works. I've had on my plate for a while (and unfortunately I've just not got round to it yet) to start a conversation on how to improve how we do comm-central. Basically looking at the ways of reducing the need to port bugs from mozilla-central all the time.
  • Code Bustage
    • I think these are generally less frequent, but would be where the two sets of builders would help. These also tend to be the ones that take longer to fix.
    • Apart from just fixing the bustages as they come along, there isn't much we can do here.

I would suggest that we would generally aim for Thunderbird to generally stay within a few commits of Thunderbird-canary most of the time.  Intermittent oranges do happen, and they would likely cause somewhat jerky motion of the revision we use, but I would expect it to be manageable.  When a comm-central-affecting commit does land in mozilla-central, I expect we would try and resolve it on a timescale of ~3 real days, with larger problems taking up to a week before people we should start getting concerned.  The goal is to avoid requiring heroics or the introduction of sloppy fixes that are created and reviewed under duress.
This all sounds reasonable, especially the timescales, although I'd still want to keep those reasonably short. I've found in the past that when we've had bustages, grabbing the m-c person soon after the bustage can help provide a quick solution if its not an obvious c-c failure. This probably also falls into the category of making the m-c person feel responsible, but I've found talking to the person who wrote the bug valuable.

My other concern with m-c bustages is having multiple bustages at the same time - having the builders able to narrow down the bustages to one or two changesets certainly helps.


I would suggest that we do not keep the 'good' revision in revision control, but do publish it at a public URL and that we do point client.py at it so that random people who want to build Thunderbird do not need to deal with breakage.
Agreed, that feels like a good solution to the problem - we're not constantly checking into revision control, and we're able to have some sort of app that we can fine tune/poke occasionally.


I would also suggest we increase the build pool size and take steps to ensure that Thunderbird-canary cannot steal builders from Thunderbird, as I have found it continues to do such things currently.
Again, agreed. I think balancing the builders out is one thing we still need to work on and consider. For instance, the l10n nightly repacks frequently steal builders. Unfortunately buildbot doesn't have a good priority system at the moment, so we'd need our build guy to consider the options here, but I'm sure we can find a good solution.


Mark.

Ben Bucksch

unread,
May 11, 2010, 12:51:36 PM5/11/10
to tb-pl...@mozilla.org
I think these are all great ideas. Thanks for moving forward on this.

One point, though:

On 11.05.2010 06:55, Andrew Sutherland wrote:
> I would suggest that we do not keep the 'good' revision in revision
> control, but do publish it at a public URL and that we do point
> client.py at it so that random people who want to build Thunderbird do
> not need to deal with breakage.

Nice idea, but I think it is better to commit to repo somehow. Maybe at
slower intervals.

Point being that I need to be able to check out from an arbitrary in the
*past*, and know which version of m-c to build with. Being able to go
back in to the past is one of the things that a version control software
promises, and it's useful e.g. when finding regression windows, or just
doing software history.
Basically, I think it's a reasonable expectation to be able to check out
any revision of comm-central, build it in the usual way, and have it
working.

Ben Bucksch

unread,
May 11, 2010, 1:10:52 PM5/11/10
to tb-pl...@mozilla.org
On 11.05.2010 18:51, Ben Bucksch wrote:
 I think these are all great ideas. Thanks for moving forward on this.

One point, though:

On 11.05.2010 06:55, Andrew Sutherland wrote:
I would suggest that we do not keep the 'good' revision in revision control, but do publish it at a public URL and that we do point client.py at it so that random people who want to build Thunderbird do not need to deal with breakage.

Nice idea, but I think it is better to commit to repo somehow. Maybe at slower intervals.

Actually, let me refine that. My proposal earlier was:
We commit the *first* (not the last known) revision of m-c that this version of c-c works with, let me call it "GECKO_VERSION_MIN". In other words, whenever we make a m-c bustage/compatibility fix, we also change the "known good" m-c revision. All subsequent checkouts and revisions of c-c then take that this m-c version. Newer versions of m-c can be used on your own risk (that is what tinderbox-canary would track). But you'll always have a good, compatible version of m-c matching this version of c-c.

You could also add a second version "also known to work with", let's say "GECKO_VERSION_GOOD", which is updated in the repo every week or so, and published on a webserver (SSL, please!). That's the version you talked about.

Please put these revision number in a separate file, let's say build/gecko-version.txt. All that together would give you a range of versions of m-c that a given c-c works with: anything between GECKO_VERSION_MIN and GECKO_VERSION_GOOD is sure to be fine.

(If you're looking at an old revision of c-c, you can also look at the following revisions of build/gecko-version.txt. The next change of GECKO_VERSION_MIN *after* the current revision would likely be the first revision that does not work with this c-c anymore.)

It's not much work to implement this:
  • whenever you make a m-c bustage or compatibility fix, you update GECKO_VERSION_MIN and GECKO_VERSION_GOOD in build/gecko-version.txt in the same commit.
  • the tinderbox-canary builds update GECKO_VERSION_GOOD in build/gecko-version.txt automatically, at reasonable intervals, e.g. one a week or every 3 days.
  • "client.py checkout" is modified to source/read build/gecko-version.txt, and checks out the GECKO_VERSION_GOOD revision. It has a commandline flag --mozilla=latest (or similar) which checks out m-c trunk instead.
    (Optionally, other values of --mozilla= could be: "min" (GECKO_VERSION_MIN), "good" (GECKO_VERSION_GOOD, the default), "goodweb" (fetches the latest good version from a website, i.e. asuth' idea.)

Mark Banner

unread,
May 11, 2010, 2:13:05 PM5/11/10
to tb-pl...@mozilla.org
On 11/05/2010 18:10, Ben Bucksch wrote:
On 11.05.2010 18:51, Ben Bucksch wrote:
 I think these are all great ideas. Thanks for moving forward on this.

One point, though:

On 11.05.2010 06:55, Andrew Sutherland wrote:
I would suggest that we do not keep the 'good' revision in revision control, but do publish it at a public URL and that we do point client.py at it so that random people who want to build Thunderbird do not need to deal with breakage.

Nice idea, but I think it is better to commit to repo somehow. Maybe at slower intervals.
I disagree with that. Firstly, it is more non-code noise in the repo (and more changesets to pick up), secondly we should be able to do anything from client.py with a public URL/web based app that we can do from a revision in the repo. In fact, we can do more - e.g. if we'd incorrectly marked a revision as "good", we could later go back and mark it as bad.

I would support being able to have an offline copy of the file (again not in source control) that could be referenced if you're wanting to do a set of builds and regression tests whilst offline.


Actually, let me refine that. My proposal earlier was:
We commit the *first* (not the last known) revision of m-c that this version of c-c works with, let me call it "GECKO_VERSION_MIN". In other words, whenever we make a m-c bustage/compatibility fix, we also change the "known good" m-c revision. All subsequent checkouts and revisions of c-c then take that this m-c version. Newer versions of m-c can be used on your own risk (that is what tinderbox-canary would track). But you'll always have a good, compatible version of m-c matching this version of c-c.
You could also add a second version "also known to work with", let's say "GECKO_VERSION_GOOD", which is updated in the repo every week or so, and published on a webserver (SSL, please!). That's the version you talked about.
I think that we should be updating the good version every time we get a green build. If a developer is going to run client.py to update, they will expect to get the latest good updates to the source, not a several day old version (unless, of course, there's bustage).

For instance, if I want to safely pick up the latest m-c changes, e.g. because I'm working on something that wants them; then I don't want to have to wait 3 days for the automation (or whatever) to decide that its time to pick it up, I want to get using that set as soon as I can, i.e. as soon as there's a green tree.


It's not much work to implement this:
  • whenever you make a m-c bustage or compatibility fix, you update GECKO_VERSION_MIN and GECKO_VERSION_GOOD in build/gecko-version.txt in the same commit.
That is a step I bet will be forgotten/messed up and why I think this should be all automated based on Thunderbird-canary output which can feed into the public info.

Standard8

Ben Bucksch

unread,
May 11, 2010, 5:50:15 PM5/11/10
to tb-pl...@mozilla.org
On 11.05.2010 20:13, Mark Banner wrote:
> we should be able to do anything from client.py with a public URL/web
> based app that we can do from a revision in the repo.

No, a web service is a different entity than the repo - not
downloadable, not mirrored locally, not archivable. You could in theory
provide history data on which c-c revision worked with which m-c, but
even then, I'd need a network connection, and the server would need to
be around. Is the web service still going to be active in 10 years?

What about offline use?

The VCS repo is an archive. It needs to be self-contained (if you
include the other hg repos we depend on), self-sufficient to build the
source.

And for any given revision in the past. Having the build system rely on
a web service for basic functions, like creating a working build, is a
receipt for disaster in various situations.

> I would support being able to have an offline copy of the file (again
> not in source control) that could be referenced if you're wanting to
> do a set of builds and regression tests whilst offline.

Why should I have to do extra work to make my build work when I am on
the road? And do that before I leave?

Please note that the periodic commits (GECKO_VERSION_GOOD) were
*optional* in my proposal. The critical part was GECKO_VERSION_MIN, and
that doesn't need periodic commits, is simple to do, and simple to
implement in client.py. It's the part which is important for archiving.

> For instance, if I want to safely pick up the latest m-c changes, e.g.
> because I'm working on something that wants them

Then you do "client.py checkout --mozilla=latest".

If you want to know whether that will build, you check Tinderbox-canary
(just like I now have to check Tinderbox before I hg pull, if I want to
be sure that it will build).

Andrew Sutherland

unread,
May 11, 2010, 6:30:58 PM5/11/10
to tb-pl...@mozilla.org
On 05/11/2010 02:50 PM, Ben Bucksch wrote:
> No, a web service is a different entity than the repo - not
> downloadable, not mirrored locally, not archivable. You could in
> theory provide history data on which c-c revision worked with which
> m-c, but even then, I'd need a network connection, and the server
> would need to be around. Is the web service still going to be active
> in 10 years?

Why would you need to build a version of Thunderbird based off an
arbitrary date stamp 10 years down the road? We cut release branches
for our alphas, betas, release candidates, and official builds.
Wouldn't you want to build one of those instead?

The only useful example that comes to my mind is after-the-fact
statistics gathering like Joshua performed in his animated coverage. If
you're going to burn that many resources, you could always recompute the
information if it is no longer available.

Our revision control system tracks changes to the code, but it is not a
time machine and we should not try and turn it into one. The code has
many implied dependencies that we do not fully control. For example, on
linux, new system library dependencies are periodically added or go
away. On OS X, the supported SDKs and platforms and such change all the
time.

>> I would support being able to have an offline copy of the file (again
>> not in source control) that could be referenced if you're wanting to
>> do a set of builds and regression tests whilst offline.
>
> Why should I have to do extra work to make my build work when I am on
> the road? And do that before I leave?

If you do "python client.py checkout" and it pulls to the latest good
rev, you wouldn't need to do any work.


Andrew

Ben Bucksch

unread,
May 11, 2010, 8:02:20 PM5/11/10
to tb-pl...@mozilla.org
On 12.05.2010 00:30, Andrew Sutherland wrote:
> Our revision control system tracks changes to the code, but it is not
> a time machine

Actually, that's precisely what a VCS is.

> and we should not try and turn it into one. The code has many implied
> dependencies that we do not fully control. For example, on linux, new
> system library dependencies are periodically added or go away.

That's manageable. Banks run 40 year old software. My previous customer
runs software which hasn't been ported to Win32 yet, on a Win3.1 or 3.0
in a VM - OS dependencies are common. The national library in Germany
keeps not just books, but software.

The main practical reason is, as I said before, regression back-tracking
using binary searches. There's git/hg bisect just for that. It's a real
pain to check out some revision and have the build fail mid-way.

Ben

Justin Wood (Callek)

unread,
May 11, 2010, 10:56:27 PM5/11/10
to Mark Banner, tb-pl...@mozilla.org
On 5/11/2010 12:37 PM, Mark Banner wrote:
On 11/05/2010 05:55, Andrew Sutherland wrote:
 Now that active development is starting to happen on comm-central and it's not just a canary for 'future problems', perhaps we should consider actually having two sets of Thunderbird comm-central builders:
I think you're right that we need to consider both alternate methods of allowing development to continue.
1) Thunderbird-canary.  Always builds latest mozilla-central.
2) Thunderbird.  Always builds the most recent revision of mozilla-central that got greens on all platforms in a Thunderbird-canary build.
...
a) have to run around doing heroic things to un-break the build when they do cause comm-central-only breakage, or worse yet, request that they back things out because of our problems.
Ok, so first and foremost, we should *never* be requesting that they back-out because of our problems. The only time I think they should consider backing out because of problems raised by our tree is when we've clearly picked up an issue in one of our test cases that they haven't covered, i.e. its a true bustage that could affect Firefox as well (for instance, I know we've picked up js faults in the past).

b) have the comm-central tree in a questionable state where we cannot tell mozilla-central breakage from our own breakage and hence need to close the tree.
Indeed, that is the most difficult issue.

So before we go into the specifics of your proposal, I'd like to just go through some of the most frequent bustage areas that we've been seeing:
  • Build Config
    • m-c has been doing a lot of rework on build config areas and have been either breaking non-libxul builds, or changing things such that we need to port them across to comm-central build config.
    • These are two things that I think we can consider improving:
      • Ted is currently working on a way to include app specific components within the libxul library [1]. This is for non-xulrunner builds, but would mean that we could be built like Firefox, as mailnews could be linked into libxul *and* use internal linkage. I probably need to cover this in more detail elsewhere, but this would certainly help to match us closer to FF and reduce bustage.
        • This would also mean we could go onto using packaged tests, which would further help with matching FF.
      • Reworking how comm-central works. I've had on my plate for a while (and unfortunately I've just not got round to it yet) to start a conversation on how to improve how we do comm-central. Basically looking at the ways of reducing the need to port bugs from mozilla-central all the time.

I have to say, that Reworking on comm-central works I've also had on my plate for a while, perhaps we (me, you, KaiRo, and Gozer; maybe even ted) should try to sketch out a block of time to discuss/brainstorm.  I have some ideas myself on this front, that would reduce/eliminate the need for 99% of the build-config ports! [If it all works according to plan]

--
~Justin Wood (Callek)

Reply all
Reply to author
Forward
0 new messages