Power off unused FF2.0, FF3.0 machines?

John O'Duinn

unread,

Nov 3, 2008, 1:09:05 PM11/3/08

to dev-pl...@lists.mozilla.org, dev-apps-firefox

hi;

This came up in the last couple of weekly Firefox meetings, I'm now
pushing out for wider feedback.

For an active code branch, we spin up a whole fleet of machines to
support all the active development work. Once that branch goes into
maintenance mode, there are fewer developers landing fewer security-only
patches. It got me wondering if we still needed all those machines in
use or could we reallocate them somewhere else.

With mozilla1.8 and mozilla1.9.0 in maintenance mode, I've been looking
to see if there are any machines on those branches which are no longer
needed. If there are unneeded machines still running, I'd like to power
them off, in advance of the EOL for these branches. Obviously, I'm not
intending to power off any machines that *are* being used.

According to the 1.8 and 1.9 release drivers, they do not use the perf
numbers on those maintenance branches. As far as I can tell, no-one else
is using them either. Hence, I believe the following 32 machines could
be safely powered off:

FF2.0 Talos
qm-mini-ubuntu04
qm-mini-vista04
qm-mini-xp04
qm-plinux-fast02
qm-pmac-fast02
qm-pmac04
qm-pxp-fast02

FF3.0 full talos:
qm-mini-ubuntu01
qm-mini-ubuntu02
qm-mini-ubuntu03
qm-mini-ubuntu05
qm-mini-vista01
qm-mini-vista02
qm-mini-vista03
qm-mini-vista05
qm-mini-xp01
qm-mini-xp02
qm-mini-xp03
qm-mini-xp05
qm-pleopard-trunk06
qm-pleopard-trunk07
qm-pleopard-trunk08
qm-pmac01
qm-pmac02
qm-pmac03
qm-pmac05

FF3.0 fast talos
qm-plinux-fast01
qm-pmac-fast01
qm-pxp-fast01

FF3.0 jss perf
qm-pxp-jss01
qm-pxp-jss02
qm-pxp-jss03

Obviously, if anyone *is* using some/all of these machines, please let
me know and we'll continue to leave them as is and continue to support
them. However, if they are really not being used by anyone, I'll happily
power these down after sending announcements. We can then cleanup and
reallocate them to upcoming moz2 project branches. We can also start
simplifying the Talos code wherever it contains FF2.0 and FF3.0 specific
code... which makes future Talos development much easier. All yummy
goodness. :-)

(One possible concern which came up in Firefox meeting was about how
developers on FF3.next would do perf comparison in graph server between
FF3.next and the most recent FF3.0 or FF2.0 values; one proposed
solution was to extend the last known FF2.0/FF3.0 value as a flatline on
the graph. This seemed ok, and could likely be done without keeping all
32 machines around!)

Hope all that makes sense - let me know if you have any questions!

tc
John.
=====
If you want to see a complete list off all our systems, across all
branches, see https://wiki.mozilla.org/ReleaseEngineering:Farm.

Mike Shaver

unread,

Nov 3, 2008, 1:16:06 PM11/3/08

to jod...@mozilla.com, dev-pl...@lists.mozilla.org, dev-apps-firefox

On Mon, Nov 3, 2008 at 11:09 AM, John O'Duinn <jod...@mozilla.com> wrote:
> We can then cleanup and
> reallocate them to upcoming moz2 project branches. We can also start
> simplifying the Talos code wherever it contains FF2.0 and FF3.0 specific
> code... which makes future Talos development much easier. All yummy
> goodness. :-)

Indeed, this is good stuff, and a very important part of making sure
that we have our resources allocated where they do the most good.

> (One possible concern which came up in Firefox meeting was about how
> developers on FF3.next would do perf comparison in graph server between
> FF3.next and the most recent FF3.0 or FF2.0 values; one proposed
> solution was to extend the last known FF2.0/FF3.0 value as a flatline on
> the graph. This seemed ok, and could likely be done without keeping all
> 32 machines around!)

As one person who was concerned about that comparison, I'd be fine
with that solution. Is there a bug on file for it?

Mike

Samuel Sidler

unread,

Nov 3, 2008, 1:54:04 PM11/3/08

to dev. planning, dev-apps-firefox

On Nov 3, 2008, at 10:09 AM, John O'Duinn wrote:

> FF2.0 Talos
> qm-mini-ubuntu04
> qm-mini-vista04
> qm-mini-xp04
> qm-plinux-fast02
> qm-pmac-fast02
> qm-pmac04
> qm-pxp-fast02

Just to say it again, 1.8 branch drivers don't look at Talos. It was
added for the benefit of Firefox 3 developers. They used 1.8 Talos for
comparison of 1.9 Talos.

Which is to say, I'm fine with powering these machines down.

Most developers wait for these machines to go green after checkin. I'm
not clear if they *need* to, or if these machines can be powered down.
As of right now, they aren't heavily monitored by 1.9 branch drivers.
I'd like to hear what developers think about this...

> (One possible concern which came up in Firefox meeting was about how
> developers on FF3.next would do perf comparison in graph server
> between
> FF3.next and the most recent FF3.0 or FF2.0 values; one proposed
> solution was to extend the last known FF2.0/FF3.0 value as a
> flatline on
> the graph. This seemed ok, and could likely be done without keeping
> all
> 32 machines around!)

That seems fine to me for the 1.8 branch (and probably for 1.9; just
want to hear from developers), but I don't think any machines should
be powered down until this is done.

-Sam

Boris Zbarsky

unread,

Nov 3, 2008, 2:59:27 PM11/3/08

to

John O'Duinn wrote:
> According to the 1.8 and 1.9 release drivers, they do not use the perf
> numbers on those maintenance branches.

So if we land a security fix that severely regresses performance nothing
happens?

I certainly don't think we need the number of boxes we have right now,
but a single perf box per OS on our actively-maintained branch (1.9.0)
might be good, if we'll actually kinda glance at the numbers (say
weekly). If we won't, there's no point, of course.

-Boris

Mike Beltzner

unread,

Nov 3, 2008, 5:45:32 PM11/3/08

to bzba...@mit.edu, dev-pl...@lists.mozilla.org

I'll admit that caught me as odd as well. Since the 1.9 boxes are used for
our shipping product, it feels like we need to ensure that checkins on that
branch are tested and checked for performance costs.

While branch drivers may not be looking at those numbers, they probably
should be. The level of redundancy may be ripe for reduction, though. I
don't have a good sense of how many of those machines are required to detect
a performance regression.

cheers,
mike

-Boris
_______________________________________________
dev-planning mailing list
dev-pl...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-planning

Mike Shaver

unread,

Nov 3, 2008, 5:54:22 PM11/3/08

to Mike Beltzner, bzba...@mit.edu, dev-pl...@lists.mozilla.org

On Mon, Nov 3, 2008 at 3:45 PM, Mike Beltzner <belt...@mozilla.com> wrote:
> I'll admit that caught me as odd as well. Since the 1.9 boxes are used for
> our shipping product, it feels like we need to ensure that checkins on that
> branch are tested and checked for performance costs.

With few exceptions (but with some deviation for merge), these patches
are backported from the trunk, where their performance impact is
carefully monitored. I don't think that the risk of a severe
branch-only performance impact is high enough to warrant the
significant investment in hardware and (worse) maintenance,
personally.

We should probably know what the "confidence cost" is of reducing
Talos redundancy, etc., but given the low probability of a branch-only
regression severe enough for us to back out a patch and the high cost
of false positives, I would want to be very cautious about increasing
the size of our error bars here...

Mike

Karl Tomlinson

unread,

Nov 3, 2008, 6:17:24 PM11/3/08

to

John O'Duinn writes:

> (One possible concern which came up in Firefox meeting was about how
> developers on FF3.next would do perf comparison in graph server between
> FF3.next and the most recent FF3.0 or FF2.0 values; one proposed
> solution was to extend the last known FF2.0/FF3.0 value as a flatline on
> the graph.

A problem with this flatline extension approach is that sometimes
performance tests themselves are changed, which changes the
reported numbers.

It would be hard to know how the extension should be adjusted
without some machines to run the new versions of the tests.

For me, I'm not interested in comparing with FF2.0, but being
able to compare with FF3.0 is useful.

Boris Zbarsky

unread,

Nov 3, 2008, 8:22:53 PM11/3/08

to

Mike Shaver wrote:
> With few exceptions (but with some deviation for merge), these patches
> are backported from the trunk, where their performance impact is
> carefully monitored.

Yeah, it's the merges that worry me a bit. I admit it's not a huge
worry, but patches start looking more and more different between trunk
and branch as time goes on.

In particular, we've had a number of 1.8 checkins where we accepted a
perf hit for a safe fix while we did a more invasive refactoring to not
take a hit on then-trunk (now 1.9.0). I don't think any of these hits
were huge, to be honest, but they can add up. That might be ok.

-Boris

Daniel Veditz

unread,

Nov 4, 2008, 2:25:17 PM11/4/08

to

John O'Duinn wrote:
> According to the 1.8 and 1.9 release drivers, they do not use the perf
> numbers on those maintenance branches.

I'm fine with killing the 1.8 machines but uncomfortable shutting off
all visibility on what's happening with our most current official
release. Four machines per OS and two flavors of Mac and Windows might
be overkill, but I think we need to keep some of the 1.9.0 machines
until 3.1 is well adopted (more than 33-50% of Firefox users?).

John O'Duinn

unread,

Nov 5, 2008, 11:53:58 AM11/5/08

to Mike Shaver, dev-pl...@lists.mozilla.org, dev-apps-firefox

Mike Shaver wrote:
> On Mon, Nov 3, 2008 at 11:09 AM, John O'Duinn <jod...@mozilla.com> wrote:
>> We can then cleanup and
>> reallocate them to upcoming moz2 project branches. We can also start
>> simplifying the Talos code wherever it contains FF2.0 and FF3.0 specific
>> code... which makes future Talos development much easier. All yummy
>> goodness. :-)
>
> Indeed, this is good stuff, and a very important part of making sure
> that we have our resources allocated where they do the most good.

k.

>> (One possible concern which came up in Firefox meeting was about how
>> developers on FF3.next would do perf comparison in graph server between
>> FF3.next and the most recent FF3.0 or FF2.0 values; one proposed
>> solution was to extend the last known FF2.0/FF3.0 value as a flatline on

>> the graph. This seemed ok, and could likely be done without keeping all
>> 32 machines around!)
>

> As one person who was concerned about that comparison, I'd be fine
> with that solution. Is there a bug on file for it?

Not yet. If this is the path we all decide on, I'll happily file the
bug. So far, this seems ok to folks?

> Mike
tc
John.

John O'Duinn

unread,

Nov 5, 2008, 12:30:49 PM11/5/08

to Boris Zbarsky, dev-pl...@lists.mozilla.org

My concern here is that I dont think we do "kinda glance at the
numbers". I just did a quick scan of talos bugs and found:

https://bugzilla.mozilla.org/show_bug.cgi?id=437468 (bogus results from
approx 07aug-17aug)
https://bugzilla.mozilla.org/show_bug.cgi?id=447681 (missing machines,
unclear for how long)
https://bugzilla.mozilla.org/show_bug.cgi?id=450773 (bogus results from
01jul-21aug.)
https://bugzilla.mozilla.org/show_bug.cgi?id=451280 (bogus results
07aug-19aug)

...which were all noticed by Alice during housekeeping, and indicates
that the graphs are not being used. (...or that people dont file talos
bugs! :-) )

Not to say we cant change going forward, but historically it seems like
we dont look at talos numbers on the maintenance branches.

tc
John.

Karl Tomlinson

unread,

Nov 5, 2008, 2:21:43 PM11/5/08

to

John O'Duinn writes:

> Not to say we cant change going forward, but historically it seems like
> we dont look at talos numbers on the maintenance branches.

I look at a subset of the talos numbers on the 1.9 branch
regularly.

When there is a change in performance on the 2.0 branch and no
obvious cause, checking the 1.9 branch provides a good hint as to
whether the code changed or the test changed.

Maybe there's another way to provide this control, but keeping
something alive for 1.9 also provides useful information about our
current product.

John O'Duinn

unread,

Nov 5, 2008, 6:39:37 PM11/5/08

to Mike Beltzner, bzba...@mit.edu, dev-pl...@lists.mozilla.org

hi;

For FF3.0, we're running 3 different sets of perf tests, each with their
own set of machines, on different o.s.

slow/superset talos: (19 machines on ubuntu, vista, xp, leopard, tiger)
fast/subset talos: (3 machines on linux, tiger, xp)
jss talos: (3 machines on xp)

Obviously, we dont want to shutdown machines that are being used, but if
there are specific perftests, or specific o.s., which *not* being used...

tc
John.
=====

Mike Beltzner wrote:
> I'll admit that caught me as odd as well. Since the 1.9 boxes are used for
> our shipping product, it feels like we need to ensure that checkins on that
> branch are tested and checked for performance costs.
>

> While branch drivers may not be looking at those numbers, they probably
> should be. The level of redundancy may be ripe for reduction, though. I
> don't have a good sense of how many of those machines are required to detect
> a performance regression.
>
> cheers,
> mike
>
> ----- Original Message -----
> From: dev-planning-bounces+beltzner=mozil...@lists.mozilla.org
> <dev-planning-bounces+beltzner=mozil...@lists.mozilla.org>
> To: dev-pl...@lists.mozilla.org <dev-pl...@lists.mozilla.org>
> Sent: Mon Nov 03 11:59:27 2008
> Subject: Re: Power off unused FF2.0, FF3.0 machines?
>

> John O'Duinn wrote:
>> According to the 1.8 and 1.9 release drivers, they do not use the perf
>> numbers on those maintenance branches.
>
> So if we land a security fix that severely regresses performance nothing
> happens?
>
> I certainly don't think we need the number of boxes we have right now,
> but a single perf box per OS on our actively-maintained branch (1.9.0)
> might be good, if we'll actually kinda glance at the numbers (say
> weekly). If we won't, there's no point, of course.
>
> -Boris

John O'Duinn

unread,

Nov 5, 2008, 7:06:35 PM11/5/08

to Karl Tomlinson, dev-pl...@lists.mozilla.org

Karl Tomlinson wrote:
> John O'Duinn writes:
>
>> (One possible concern which came up in Firefox meeting was about how
>> developers on FF3.next would do perf comparison in graph server between
>> FF3.next and the most recent FF3.0 or FF2.0 values; one proposed
>> solution was to extend the last known FF2.0/FF3.0 value as a flatline on
>> the graph.
>
> A problem with this flatline extension approach is that sometimes
> performance tests themselves are changed, which changes the
> reported numbers.
>
> It would be hard to know how the extension should be adjusted
> without some machines to run the new versions of the tests.

hi Karl;

Thankfully, we dont change testsuites that often, afaict its happened
just once since I joined (may2007). Each time we change testsuites, it
requires us to manually go back and backfill for important branch points
so we have valid historical numbers backfilled. A real pain.

I'm planning on backing up all the VMs/images before deleting, so if we
did need to revive one of these VMs in order to backfill historical
data, we could do it. It would be slightly more of a pain, because of
the machine revival, but I believe less painful then keeping the
machines alive *just* for this.

Seem reasonable?

> For me, I'm not interested in comparing with FF2.0, but being
> able to compare with FF3.0 is useful.

ok.

tc
John.

John O'Duinn

unread,

Nov 5, 2008, 7:34:44 PM11/5/08

to Daniel Veditz, dev-pl...@lists.mozilla.org

Daniel Veditz wrote:
> John O'Duinn wrote:
>> According to the 1.8 and 1.9 release drivers, they do not use the perf
>> numbers on those maintenance branches.
>
> I'm fine with killing the 1.8 machines but uncomfortable shutting off
> all visibility on what's happening with our most current official
> release.

Sounds like we're all agreeing there, so I've filed two bugs:

bug#463323: create flatline effect in graphs by repeating last recorded
value
bug#463325: mothball all 7 talos machines on FF2.0

Four machines per OS and two flavors of Mac and Windows might
> be overkill, but I think we need to keep some of the 1.9.0 machines
> until 3.1 is well adopted (more than 33-50% of Firefox users?).

For FF3.0, which o.s., and which test suites would you like to keep?

tc
John.

John O'Duinn

unread,

Nov 5, 2008, 7:35:27 PM11/5/08

to Karl Tomlinson, dev-pl...@lists.mozilla.org

Karl Tomlinson wrote:
> John O'Duinn writes:
>
>> Not to say we cant change going forward, but historically it seems like
>> we dont look at talos numbers on the maintenance branches.
>
> I look at a subset of the talos numbers on the 1.9 branch
> regularly.

hi Karl;

Which subset do you look at?

tc
John.
=====

> When there is a change in performance on the 2.0 branch and no
> obvious cause, checking the 1.9 branch provides a good hint as to
> whether the code changed or the test changed.
>
> Maybe there's another way to provide this control, but keeping
> something alive for 1.9 also provides useful information about our
> current product.

Karl Tomlinson

unread,

Nov 5, 2008, 9:49:23 PM11/5/08

to

John O'Duinn writes:

> Karl Tomlinson wrote:
>
>> A problem with this flatline extension approach is that sometimes
>> performance tests themselves are changed, which changes the
>> reported numbers.
>

> Thankfully, we dont change testsuites that often, afaict its happened
> just once since I joined (may2007). Each time we change testsuites, it
> requires us to manually go back and backfill for important branch points
> so we have valid historical numbers backfilled. A real pain.

I'm aware of two cases in the last few months.

If testing infrastructure is going to maintained and errors fixed,
there are going to be changes. These will hopefully be small, but
I'm not sure that this can be guaranteed.

Perhaps, if numbers do change significantly, a fudge factor can be
applied to square things up but then it wouldn't really feel like
comparing apples with apples.

> I'm planning on backing up all the VMs/images before deleting, so if we
> did need to revive one of these VMs in order to backfill historical
> data, we could do it. It would be slightly more of a pain, because of
> the machine revival, but I believe less painful then keeping the
> machines alive *just* for this.
>
> Seem reasonable?

I fear that this is enough of a pain that it wouldn't happen.

John O'Duinn writes:

> Karl Tomlinson wrote:
>> John O'Duinn writes:
>>

>>> Not to say we cant change going forward, but historically it seems like
>>> we dont look at talos numbers on the maintenance branches.
>>
>> I look at a subset of the talos numbers on the 1.9 branch
>> regularly.
>
> hi Karl;
>
> Which subset do you look at?

I look at Ts, Tp, and Tp_RSS (and equivalents), but I assume
others are important too. There may be a case for dropping the
fast and nochrome versions of these tests.

I used "subset" because I don't look at more than 1 result of the
same test on the same OS (but different machine).

There are much fewer checkins on the stable branches, so I expect
we'd get enough data points from a single machine per test (or
multiple tests).

Our code is different enough on different platforms that we'd need
at least Linux/Windows/Mac builds. Perhaps we can get away with
only one kind of OS for each of Windows and Mac (as we do for
Linux).

John O'Duinn

unread,

Nov 25, 2008, 8:57:46 PM11/25/08

to Daniel Veditz, dev-pl...@lists.mozilla.org

John O'Duinn wrote:

> Daniel Veditz wrote:
>> John O'Duinn wrote:
>>> According to the 1.8 and 1.9 release drivers, they do not use the perf
>>> numbers on those maintenance branches.
>> I'm fine with killing the 1.8 machines but uncomfortable shutting off
>> all visibility on what's happening with our most current official
>> release.

> Sounds like we're all agreeing there, so I've filed two bugs:
>
> bug#463323: create flatline effect in graphs by repeating last recorded
> value

This is now fixed and in production.

> bug#463325: mothball all 7 talos machines on FF2.0

Now scheduling to turn off. The curious can follow along in the bug!

> Four machines per OS and two flavors of Mac and Windows might
>> be overkill, but I think we need to keep some of the 1.9.0 machines
>> until 3.1 is well adopted (more than 33-50% of Firefox users?).

> For FF3.0, which o.s., and which test suites would you like to keep?

Dan; gentle ping... which suites would you like to keep?

tc
John.
=====
> tc
> John.

Daniel Veditz

unread,

Nov 25, 2008, 10:40:44 PM11/25/08

to jod...@mozilla.com

John O'Duinn wrote:
> John O'Duinn wrote:

>> Daniel Veditz wrote:
>> Four machines per OS and two flavors of Mac and Windows might
>>> be overkill, but I think we need to keep some of the 1.9.0 machines
>>> until 3.1 is well adopted (more than 33-50% of Firefox users?).
>> For FF3.0, which o.s., and which test suites would you like to keep?
> Dan; gentle ping... which suites would you like to keep?

I'd like Linux, WinXP and a flavor of Mac (whichever gives most reliable
results, but all things equal go for the one with the most users). For
tests we need at least Ts, Tp, Twinopen, Tsspider, Tdhtml, Tsvg and Tgfx
-- whatever is actively being used on 3.1.

Two machines per OS might save us from wasting time tracking down
"regressions" that turn out to be machine issues.

John O'Duinn

unread,

Nov 26, 2008, 4:21:06 PM11/26/08

to Karl Tomlinson, dev-pl...@lists.mozilla.org

hi;

Karl Tomlinson wrote:
> John O'Duinn writes:
>
>> Karl Tomlinson wrote:
>>
>>> A problem with this flatline extension approach is that sometimes
>>> performance tests themselves are changed, which changes the
>>> reported numbers.
>> Thankfully, we dont change testsuites that often, afaict its happened
>> just once since I joined (may2007). Each time we change testsuites, it
>> requires us to manually go back and backfill for important branch points
>> so we have valid historical numbers backfilled. A real pain.
>
> I'm aware of two cases in the last few months.

Reviving this thread, now that bug#463323 is fixed.

I wonder if we were talking about different things?!? I was talking
about Talos perf test suites not changing, as we need to re-do
historical builds for calibration. Are you talking about changes to
unittests, maybe? hmmm... If you are talking about changes to Talos perf
tests, can you send me bug#s?

I dont think we've done recalibration since late2007/early2008, so
maybe we missed something?

> If testing infrastructure is going to maintained and errors fixed,
> there are going to be changes. These will hopefully be small, but
> I'm not sure that this can be guaranteed.
>
> Perhaps, if numbers do change significantly, a fudge factor can be
> applied to square things up but then it wouldn't really feel like
> comparing apples with apples.

Having accurate numbers for accurate apples-to-apples comparisons is
important. This means that whenever we create a new pageset in
talos, we've gone back and re-tested important historical
milestones/releases with the new pageset, generating new accurate data
to compare with.

(Its not fun to do, but it is important to have accurate data!)

>> I'm planning on backing up all the VMs/images before deleting, so if we
>> did need to revive one of these VMs in order to backfill historical
>> data, we could do it. It would be slightly more of a pain, because of
>> the machine revival, but I believe less painful then keeping the
>> machines alive *just* for this.
>> Seem reasonable?
> I fear that this is enough of a pain that it wouldn't happen.

This may no longer be relevant, based on discussion above? If you're
using something, lets keep it up while we continue to support that
branch. I'm looking for *unused* machines right now! :-)

tc
John.
=====

>
> John O'Duinn writes:
>
>> Karl Tomlinson wrote:
>>> John O'Duinn writes:
>>>
>>>> Not to say we cant change going forward, but historically it seems like
>>>> we dont look at talos numbers on the maintenance branches.
>>> I look at a subset of the talos numbers on the 1.9 branch
>>> regularly.
>> hi Karl;
>>
>> Which subset do you look at?
>
> I look at Ts, Tp, and Tp_RSS (and equivalents), but I assume
> others are important too. There may be a case for dropping the
> fast and nochrome versions of these tests.

Good point. I'll break this question out into a separate thread, in case
folks miss it down here.

tc
John.
=====

>
> I used "subset" because I don't look at more than 1 result of the
> same test on the same OS (but different machine).
>
> There are much fewer checkins on the stable branches, so I expect
> we'd get enough data points from a single machine per test (or
> multiple tests).
>
> Our code is different enough on different platforms that we'd need
> at least Linux/Windows/Mac builds. Perhaps we can get away with
> only one kind of OS for each of Windows and Mac (as we do for
> Linux).

Karl Tomlinson

unread,

Nov 26, 2008, 6:45:36 PM11/26/08

to

John O'Duinn writes:

> hi;
>
> Karl Tomlinson wrote:
>> John O'Duinn writes:
>>
>>> Karl Tomlinson wrote:
>>>
>>>> A problem with this flatline extension approach is that sometimes
>>>> performance tests themselves are changed, which changes the
>>>> reported numbers.
>>> Thankfully, we dont change testsuites that often, afaict its happened
>>> just once since I joined (may2007). Each time we change testsuites, it
>>> requires us to manually go back and backfill for important branch points
>>> so we have valid historical numbers backfilled. A real pain.
>>
>> I'm aware of two cases in the last few months.
>
> Reviving this thread, now that bug#463323 is fixed.
>
> I wonder if we were talking about different things?!? I was talking
> about Talos perf test suites not changing, as we need to re-do
> historical builds for calibration. Are you talking about changes to
> unittests, maybe? hmmm... If you are talking about changes to Talos perf
> tests, can you send me bug#s?
>
> I dont think we've done recalibration since late2007/early2008, so
> maybe we missed something?

I'm talking about changes to Talos perf test infrastructure
(rather than page sets or unit tests).

One change was accidental on 2008-08-07
https://bugzilla.mozilla.org/show_bug.cgi?id=450401
and was corrected 2008-09-26.

The correction means that recalibration is not necessary but the
period of effect on results is long enough to demonstrate that it
is useful to have a control.

Another change was corrections to median calculations 2008-10-24
https://bugzilla.mozilla.org/show_bug.cgi?id=459598

Fortunately the incorrect calculations seemed to mostly cancel out
so the changes in reported summary numbers were small, even though
the effect on individual page numbers was sometimes much larger.

But this is the kind of change that I expect could happen in the
future.

>>> I'm planning on backing up all the VMs/images before deleting, so if we
>>> did need to revive one of these VMs in order to backfill historical
>>> data, we could do it. It would be slightly more of a pain, because of
>>> the machine revival, but I believe less painful then keeping the
>>> machines alive *just* for this.
>>> Seem reasonable?
>> I fear that this is enough of a pain that it wouldn't happen.
>
> This may no longer be relevant, based on discussion above? If you're
> using something, lets keep it up while we continue to support that
> branch. I'm looking for *unused* machines right now! :-)

OK. Maybe once the number of changes landing on 1.9.1 reduces
(and we are sure that code changes never land on 1.9.1 and 1.9.2
concurrently) then maybe I won't need to look at 1.9.0.

But Dan's suggestion for keeping something running on our current
official release (until 3.1 is well adopted) sounds sensible to me.

Mike Beltzner

unread,

Dec 2, 2008, 12:10:08 AM12/2/08

to Karl Tomlinson, dev-pl...@lists.mozilla.org

On 26-Nov-08, at 3:45 PM, Karl Tomlinson wrote:

>> This may no longer be relevant, based on discussion above? If you're
>> using something, lets keep it up while we continue to support that
>> branch. I'm looking for *unused* machines right now! :-)

In that case, we should keep at least one 1.9.0 box up running the
same test suite that's being run on 1.9.1 and trunk.

> OK. Maybe once the number of changes landing on 1.9.1 reduces
> (and we are sure that code changes never land on 1.9.1 and 1.9.2
> concurrently) then maybe I won't need to look at 1.9.0.
>
> But Dan's suggestion for keeping something running on our current
> official release (until 3.1 is well adopted) sounds sensible to me.

The more I think about the way we do performance testing, the more I
think this makes sense. It needn't be a particularly quick box, and
the tests don't need to be run continuously, but we should have one
machine at least tasked with taking the latest-release-builds from
each branch and running it through the exact same set of tests to
provide a proper comparison.

cheers,
mike

alice nodelman

unread,

Dec 2, 2008, 7:02:20 PM12/2/08

to Karl Tomlinson

It's true, performance tests will change. We'll find more bugs, or
we'll change our mind about how tests are run. When we've hit this in
the past (ie, the two bugs mentioned) we've never gone back to backfill
data. This is also true of times when talos boxes have gone off the
rails and reported garbage numbers for weeks at a time, we reset the
machines and do not go back to fill in data.

This is not to say that we can't backfill, or that there aren't cases
where we should. But we shouldn't use that as a reason to keep machines
constantly cycling out new data. If we hit a significant branch point
or some other point of interest (say, an updated Tp test page set) we
can consider doing a round of tests to get proper comparison numbers -
but this should be done on a case by case basis.

alice.