(Very) Broken tests

Marcin Owsiany

unread,

May 20, 2008, 4:27:45 PM5/20/08

to puppe...@googlegroups.com

Hi all,

After I managed to invoke the rspec test runner (see #1237), I realized
with sadness that they are far from usable. I got:
3053 examples, 219 failures, 30 pending - when running as root
and
3053 examples, 145 failures, 31 pending - when running as non-root
This is on master branch.

Before you ask - no, the ones failing as non-root are not a subset of
the ones failing as root.

For me personally (and I think for anyone new to puppet, who would like
to produce high-quality code) this is a huge barrier to puppet
development.

The reason is that until you gain intimate knowledge of how puppet works
inside, there is no way you can be even remotely sure that a change that
you are about to introduce won't break something. (Even with tests you
cannot be certain, but at least you can be reasonably confident).

So I have several questions:

- do we at all agree that there should be NO failing tests at any point
in time, and that any failing test is a bug? Even in master branch?
And as a result, no commit should introduce a test failure?

- is there anyone at all for whom all tests work at the moment? Or am I
just very unlucky and it does work for everyone except for me? :)

- how do we tackle this problem? I cannot go through 200 failing tests
alone and fix them in reasonable time. We need to split the work
somehow.

- how do we make sure that this problem does not reappear? Should we
set up some continuous integration environment and assume "project
culture" would convince developers to fix the problems? Or should we
go a step further and only ever release software tested and built by
the continuous integration service?

I know someone (Luke?) mentioned that it would be nice to have a
build farm which would run the tests on all supported platforms. But
I think that running them on a regular basis even on a single
platform would be much better than the current situation.

--
Marcin Owsiany <mar...@owsiany.pl> http://marcin.owsiany.pl/
GnuPG: 1024D/60F41216 FE67 DA2D 0ACA FC5E 3F75 D6F6 3A0D 8AA0 60F4 1216

"Every program in development at MIT expands until it can read mail."
-- Unknown

Paul Lathrop

unread,

May 20, 2008, 4:59:33 PM5/20/08

to puppe...@googlegroups.com

I could be wrong, I speak to my understanding of things, not gospel
from on-high :-)

Response inline:

On Tue, May 20, 2008 at 1:27 PM, Marcin Owsiany <mar...@owsiany.pl> wrote:
> After I managed to invoke the rspec test runner (see #1237), I realized
> with sadness that they are far from usable. I got:
> 3053 examples, 219 failures, 30 pending - when running as root
> and
> 3053 examples, 145 failures, 31 pending - when running as non-root
> This is on master branch.

Odd. I get *very* different results. As myself: "3099 examples, 76
failures, 31 pending" As root: "3099 examples, 191 failures, 30
pending" - but we could have different states. My master is at commit
fe157f239a301abb52f81c62719355c8e50c970c

> For me personally (and I think for anyone new to puppet, who would like
> to produce high-quality code) this is a huge barrier to puppet
> development.

Development should not be taking place on the master branch. Most
contributions should be coded against the current stable branch
(0.24.x as of this writing). Test results against 0.24.x commit
84a787a2a764a5035f7cbb8d30f94fc601bed154 look much better:

Non-root: 2697 examples, 0 failures, 27 pending
root: 2697 examples, 35 failures, 27 pending

I'm not sure whether we should address the failures that occur when
running as root - that's a call for someone more test-savvy than I.

> The reason is that until you gain intimate knowledge of how puppet works
> inside, there is no way you can be even remotely sure that a change that
> you are about to introduce won't break something. (Even with tests you
> cannot be certain, but at least you can be reasonably confident).

Well, as I mentioned above, you can develop against the stable branch
which doesn't have failures (except running as root). This should help
you to see if you break something.

Also, keep in mind we don't have 100% test coverage yet. I am working
on migrating tests to RSpec from Test::Unit and I'm afraid the work is
pretty slow going.

> So I have several questions:
>
> - do we at all agree that there should be NO failing tests at any point
> in time, and that any failing test is a bug? Even in master branch?
> And as a result, no commit should introduce a test failure?

Ideally, there would be no failing tests in the "official" repository.
I definitely think Luke is working towards this goal. I know for a
fact he will not accept new commits which lack tests or cause an
existing test to fail (I've submitted commits like that). We should
definitely work to fix the tests. Whether that takes priority over
testing code that currently lacks tests is another question for Luke.

> - is there anyone at all for whom all tests work at the moment? Or am I
> just very unlucky and it does work for everyone except for me? :)

See above.

> - how do we tackle this problem? I cannot go through 200 failing tests
> alone and fix them in reasonable time. We need to split the work
> somehow.

If you are eager to contribute, this is clearly an area that needs
work. So, instead of thinking of it as fixing 200 failing tests, think
of it as fixing 1. When you are done, fix another. Hopefully others
will also help with this :-) Certainly I will if Luke says that this
should be higher-priority than the work I'm doing to migrate tests to
RSpec.

> - how do we make sure that this problem does not reappear? Should we
> set up some continuous integration environment and assume "project
> culture" would convince developers to fix the problems? Or should we
> go a step further and only ever release software tested and built by
> the continuous integration service?

Right now I believe we are still in the "benevolent dictator" stage
(possibly dictators, I think James Turnbull can commit to the official
repo, too). We just need a confirmation from the people with commit
access to the official repo that commits which cause test failures (or
lack tests) will not be integrated into the official repository going
forward. Once we have that, we will be able to make progress fixing
the broken tests because there won't be more broken tests cropping up.

> I know someone (Luke?) mentioned that it would be nice to have a
> build farm which would run the tests on all supported platforms. But
> I think that running them on a regular basis even on a single
> platform would be much better than the current situation.

To that end, I'll send a mail to the list offering up some cycles on
my FreeBSD box for this.

Thanks for bringing this up, I think it is a good idea to address this.

--Paul

Rick Bradley

unread,

May 20, 2008, 5:05:39 PM5/20/08

to puppe...@googlegroups.com

On Tue, May 20, 2008 at 3:59 PM, Paul Lathrop <pa...@tertiusfamily.net> wrote:
> Development should not be taking place on the master branch. Most
> contributions should be coded against the current stable branch
> (0.24.x as of this writing). Test results against 0.24.x commit
> 84a787a2a764a5035f7cbb8d30f94fc601bed154 look much better:
>
> Non-root: 2697 examples, 0 failures, 27 pending
> root: 2697 examples, 35 failures, 27 pending
>
> I'm not sure whether we should address the failures that occur when
> running as root - that's a call for someone more test-savvy than I.

I can't speak for the benevolent dictator(s), but I can say that, as
far as I understand it, development is active on 0.24.x and effort has
been made to keep tests green on that branch for some time now. I've
never tried to run the specs as root, only as a non-root user, and I'm
guessing that that's the general practice.

Rick

James Turnbull

unread,

May 20, 2008, 7:34:27 PM5/20/08

to puppe...@googlegroups.com

Paul Lathrop wrote:
> Right now I believe we are still in the "benevolent dictator" stage
> (possibly dictators, I think James Turnbull can commit to the official
> repo, too). We just need a confirmation from the people with commit
> access to the official repo that commits which cause test failures (or
> lack tests) will not be integrated into the official repository going
> forward. Once we have that, we will be able to make progress fixing
> the broken tests because there won't be more broken tests cropping up.
>

Paul

I can answer one part of this but only focussed around the 0.24.x branch
which I currently maintain/administer. As far as I am concerned - no
tests, no commit. Simple as that. Any commit that creates failed
tests needs to update the tests that now fail. I am sure Luke feels
similar.

Regards

James Turnbull

Luke Kanies

unread,

May 20, 2008, 10:54:23 PM5/20/08

to puppe...@googlegroups.com

On May 20, 2008, at 3:27 PM, Marcin Owsiany wrote:

>
> Hi all,
>
> After I managed to invoke the rspec test runner (see #1237), I
> realized
> with sadness that they are far from usable. I got:
> 3053 examples, 219 failures, 30 pending - when running as root
> and
> 3053 examples, 145 failures, 31 pending - when running as non-root
> This is on master branch.

I get zero failures on master or 0.24.x, although I haven't run either
of them as root.

I recently discovered that my ~/.puppet directory was sometimes
getting modified by tests, so last week or so I chowned the whole
thing to root; this caught a few tests that I had to further mock, but
otherwise, no tests should fail for anyone ever.

I definitely consider this a very big problem.

>
> Before you ask - no, the ones failing as non-root are not a subset of
> the ones failing as root.
>
> For me personally (and I think for anyone new to puppet, who would
> like
> to produce high-quality code) this is a huge barrier to puppet
> development.

I completely agree, and I guess the extent of the failures is a clear
indication of the lack of breadth in people running the tests.

>
> So I have several questions:
>
> - do we at all agree that there should be NO failing tests at any
> point
> in time, and that any failing test is a bug? Even in master branch?
> And as a result, no commit should introduce a test failure?

Well, James and I certainly agree, and we're the respective
maintainers of the stable and dev branches.

Using 'autotest' can go a long way toward this, but you still have to
remember to run the whole test suite before committing, and you also
need to run the test suite on other platforms, which I assume is the
problem here.

>
> - is there anyone at all for whom all tests work at the moment? Or
> am I
> just very unlucky and it does work for everyone except for me? :)

Looks like others are getting similar failures, but they all work for
me. I've never left a failing test in spec/ for more than one commit
(i.e., I've accidentally committed one, but fixed it asap).

I consider every failing test in either test/ or spec/ a bug, although
9/10s of the time, failing bugs in test/ will need to get rewritten in
spec/ or removed rather than getting fixed.

>
> - how do we tackle this problem? I cannot go through 200 failing tests
> alone and fix them in reasonable time. We need to split the work
> somehow.

Paul is basically right -- the only way to do it is to approach them
one at a time. I have to believe, else I'd go insane, that many of
your failing tests are related; ideally you'd pick a pattern of
failure and see if you could track down the root cause. This should
allow you to get rid of swathes of failures, leaving the smaller, one-
offs to tackle later.

>
> - how do we make sure that this problem does not reappear? Should we
> set up some continuous integration environment and assume "project
> culture" would convince developers to fix the problems? Or should we
> go a step further and only ever release software tested and built by
> the continuous integration service?

If we have a continuous integration service, then I would definitely
never release a product that had non-green tests on any supported
platform. Is anyone in a position to set such a thing up and maintain
it? I'm willing to cover the costs of the ec2 instances, as long as
they're only running during the actual test process (e.g., once a day
for an hour or so, rather than 24hrs a day).

One of the benefits of using ec2 is that it would basically allow
anyone to upload an AMI of their favorite platform running Puppet, and
it should be straightforward to integrate that platform into the
continuous test service while also making it easy for people to give
Puppet a go on ec2.

One implication of requiring that all tests for supported platforms be
green, of course, is that every supported platform needs at least one
advocate willing to tackle platform-specific problems. David
Lutterkort is always on hand to tackle Red Hat problems, and Russ
Allbery and many others seem to always have the answers for Debian;
these platforms wouldn't be so stably supported by Puppet without this
dedication.

If there's a platform you care about, run the Puppet tests on it and
file bugs when they fail, at least until we get a continuous
integration service.

>
> I know someone (Luke?) mentioned that it would be nice to have a
> build farm which would run the tests on all supported platforms. But
> I think that running them on a regular basis even on a single
> platform would be much better than the current situation.

I run them essentially every day on my mac, and often on my debian
box. I never knowingly commit a broken test, and do my best to never
unknowingly commit them.

The only conclusion I can reach is that we need others running tests,
or we need to make it more obvious that if tests fail, you should file
a bug.

And, of course, we need people like Paul to help make the current
tests better.

--
Never interrupt your enemy when he is making a mistake.
--Napolean Bonaparte
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com

Luke Kanies

unread,

May 20, 2008, 11:01:31 PM5/20/08

to puppe...@googlegroups.com

On May 20, 2008, at 3:59 PM, Paul Lathrop wrote:

>
> I could be wrong, I speak to my understanding of things, not gospel
> from on-high :-)

I definitely appreciate your willingness to put your hand up. There's
no on-high here, I'm just stumbling slightly further in front of the
rest of you. :)

>
> Response inline:
>
> On Tue, May 20, 2008 at 1:27 PM, Marcin Owsiany <mar...@owsiany.pl>
> wrote:
>
>> For me personally (and I think for anyone new to puppet, who would
>> like
>> to produce high-quality code) this is a huge barrier to puppet
>> development.
>
> Development should not be taking place on the master branch. Most
> contributions should be coded against the current stable branch
> (0.24.x as of this writing). Test results against 0.24.x commit
> 84a787a2a764a5035f7cbb8d30f94fc601bed154 look much better:

I would modify this slightly -- if you want your code to be released
as part of any 0.24.x release, it should take place against the 0.24.x
branch. If you're willing to wait until 0.25, or you require features
only available in the master branch, then it should take place against
that branch.

>
> Non-root: 2697 examples, 0 failures, 27 pending
> root: 2697 examples, 35 failures, 27 pending
>
> I'm not sure whether we should address the failures that occur when
> running as root - that's a call for someone more test-savvy than I.

I often used to run the test/unit tests as root, but the spec/ tests
are written so that they should not need to run as root, at least at
this point.

I assume that at some point we'll have integration tests that do
things like create users, install packages, etc., which would
obviously need to run as root, but we don't have them yet.

> Well, as I mentioned above, you can develop against the stable branch
> which doesn't have failures (except running as root). This should help
> you to see if you break something.
>
> Also, keep in mind we don't have 100% test coverage yet. I am working
> on migrating tests to RSpec from Test::Unit and I'm afraid the work is
> pretty slow going.

Now that Paul is a testing jedi in training, hopefully he can get
others to join the party.

>
>> So I have several questions:
>>
>> - do we at all agree that there should be NO failing tests at any
>> point
>> in time, and that any failing test is a bug? Even in master branch?
>> And as a result, no commit should introduce a test failure?
>
> Ideally, there would be no failing tests in the "official" repository.
> I definitely think Luke is working towards this goal. I know for a
> fact he will not accept new commits which lack tests or cause an
> existing test to fail (I've submitted commits like that). We should
> definitely work to fix the tests. Whether that takes priority over
> testing code that currently lacks tests is another question for Luke.

Broken test are essentially the highest priority, as Marcin is
completely right -- if tests are broken, you have no confidence at
all. Obviously there will sometimes be exceptions to this rule, but
as a rule, it's a good one.

> If you are eager to contribute, this is clearly an area that needs
> work. So, instead of thinking of it as fixing 200 failing tests, think
> of it as fixing 1. When you are done, fix another. Hopefully others
> will also help with this :-) Certainly I will if Luke says that this
> should be higher-priority than the work I'm doing to migrate tests to
> RSpec.

I really had no idea this many failures were happening. Please,
broken tests are a stain upon this earth -- fix them!

> Right now I believe we are still in the "benevolent dictator" stage
> (possibly dictators, I think James Turnbull can commit to the official
> repo, too). We just need a confirmation from the people with commit
> access to the official repo that commits which cause test failures (or
> lack tests) will not be integrated into the official repository going
> forward. Once we have that, we will be able to make progress fixing
> the broken tests because there won't be more broken tests cropping up.

I'm the only person with commit rights to the central repo hosted on
reductivelabs.com, but James is maintaining the official 0.24.x branch
(which at this point I periodically sync to the reductivelabs.com
repo). We're still working out the details, obviously.

--
A computer lets you make more mistakes faster than any invention in
human history--with the possible exceptions of handguns and tequila.
-- Mitch Ratcliffe

Marcin Owsiany

unread,

May 21, 2008, 10:22:29 AM5/21/08

to puppe...@googlegroups.com

On Tue, May 20, 2008 at 09:54:23PM -0500, Luke Kanies wrote:
>
> On May 20, 2008, at 3:27 PM, Marcin Owsiany wrote:
>
> >
> > Hi all,
> >
> > After I managed to invoke the rspec test runner (see #1237), I
> > realized
> > with sadness that they are far from usable. I got:
> > 3053 examples, 219 failures, 30 pending - when running as root
> > and
> > 3053 examples, 145 failures, 31 pending - when running as non-root
> > This is on master branch.
>
> I get zero failures on master or 0.24.x, although I haven't run either
> of them as root.

Hm, even on Debian? I'm wondering what's the reason of #1244

> no tests should fail for anyone ever.

Good to hear that, it's encouraging. In that case I will work on them
for a while, maybe most of them do have a common cause that will be easy
to fix.

> If we have a continuous integration service, then I would definitely
> never release a product that had non-green tests on any supported
> platform. Is anyone in a position to set such a thing up and maintain
> it?

This would be a fun project, however I know little of that EC2 thing.
Let me see if I can get the tests to pass first. Then I'll try to create
a basic set of puppet manifests which would turn a bare base Debian
install into a basic continuous integration service, in a VM. If that
works, I'll have a look at EC2.

> they're only running during the actual test process (e.g., once a day
> for an hour or so, rather than 24hrs a day).

The tests seem to take only a couple of minutes on my oldish laptop, so
I guess 5 minute runs every, say, 6 hours would be better.

Marcin

David Schmitt

unread,

May 21, 2008, 11:07:32 AM5/21/08

to puppe...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 21 May 2008, Marcin Owsiany wrote:
> > they're only running during the actual test process (e.g., once a day
> > for an hour or so, rather than 24hrs a day).
>
> The tests seem to take only a couple of minutes on my oldish laptop, so
> I guess 5 minute runs every, say, 6 hours would be better.

It might be interesting to just run them for every commit. This would help in
correlating failures with their causes.

Regards, DavidS

- --
The primary freedom of open source is not the freedom from cost, but the free-
dom to shape software to do what you want. This freedom is /never/ exercised
without cost, but is available /at all/ only by accepting the very different
costs associated with open source, costs not in money, but in time and effort.
- -- http://www.schierer.org/~luke/log/20070710-1129/on-forks-and-forking
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFINDq3/Pp1N6Uzh0URAnq3AJ4lvgTNE0dz15zyBRrzfB0pZGozfgCfR2Qo
DcOzv1ChRTL+2aOVbNj6v3Y=
=Mnkq
-----END PGP SIGNATURE-----

Thom May

unread,

May 21, 2008, 11:19:49 AM5/21/08

to puppe...@googlegroups.com

On Wed, May 21, 2008 at 5:07 PM, David Schmitt <da...@schmitt.edv-bus.at> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wednesday 21 May 2008, Marcin Owsiany wrote:
>> > they're only running during the actual test process (e.g., once a day
>> > for an hour or so, rather than 24hrs a day).
>>
>> The tests seem to take only a couple of minutes on my oldish laptop, so
>> I guess 5 minute runs every, say, 6 hours would be better.
>
> It might be interesting to just run them for every commit. This would help in
> correlating failures with their causes.

This ought to be relatively easy with buildbot or something similar. I
have basically no time
till late next month, but unless someone beats me to it I'll volunteer
to get at least a master
and a debian/ubuntu x64 slave going after then.
Luke's suggestion of EC2 slaves would be very cute but I dunno how
much additional work that would be.
-T

Luke Kanies

unread,

May 21, 2008, 11:06:04 PM5/21/08

to puppe...@googlegroups.com

Wouldn't be that much work, probably, but it's about $70/month to run
them constantly, so it'd be expensive.

I guess it's a better idea to have a clear way for people to set up
build hosts and report in the results.

--
The great thing about television is that if something important
happens anywhere in the world, day or night, you can always change
the channel. -- From "Taxi"

Luke Kanies

unread,

May 21, 2008, 11:12:02 PM5/21/08

to puppe...@googlegroups.com

On May 21, 2008, at 9:22 AM, Marcin Owsiany wrote:

>
> On Tue, May 20, 2008 at 09:54:23PM -0500, Luke Kanies wrote:
>>
>> On May 20, 2008, at 3:27 PM, Marcin Owsiany wrote:
>>
>>>
>>> Hi all,
>>>
>>> After I managed to invoke the rspec test runner (see #1237), I
>>> realized
>>> with sadness that they are far from usable. I got:
>>> 3053 examples, 219 failures, 30 pending - when running as root
>>> and
>>> 3053 examples, 145 failures, 31 pending - when running as non-root
>>> This is on master branch.
>>
>> I get zero failures on master or 0.24.x, although I haven't run
>> either
>> of them as root.
>
> Hm, even on Debian? I'm wondering what's the reason of #1244

90% of the time you get "no providers for...", it's a problem related
to the type in question getting reloaded when it shouldn't be.

E.g., this process happens:

1 - type gets loaded
2 - type loads all of its providers
3 - type gets reloaded, replacing the provider list
4 - new type tries loading providers but nothing happens because
they've all been loaded already (but lost)

The problem is that Puppet's loader doesn't play well with Ruby's
'require'. Clearly a bug, but not usually something that's hit except
during testing, and easily fixed, so I haven't refactored to fix it.

However, the ldap providers I recently merged seem to have added a new
kind of broken there, and I'm not sure that James has merged in my
recent fixes. I *expect* this problem is ldap related, but I've been
away from home all week and thus have only tested on my mac..

>
>> no tests should fail for anyone ever.
>
> Good to hear that, it's encouraging. In that case I will work on them
> for a while, maybe most of them do have a common cause that will be
> easy
> to fix.

Great.

>
>> If we have a continuous integration service, then I would definitely
>> never release a product that had non-green tests on any supported
>> platform. Is anyone in a position to set such a thing up and
>> maintain
>> it?
>
> This would be a fun project, however I know little of that EC2 thing.
> Let me see if I can get the tests to pass first. Then I'll try to
> create
> a basic set of puppet manifests which would turn a bare base Debian
> install into a basic continuous integration service, in a VM. If that
> works, I'll have a look at EC2.

That's a great place to start.

>
>> they're only running during the actual test process (e.g., once a day
>> for an hour or so, rather than 24hrs a day).
>
> The tests seem to take only a couple of minutes on my oldish laptop,
> so
> I guess 5 minute runs every, say, 6 hours would be better.

I was just thinking of cost. Either we have registered VMs that are
consistent and we can load whenever we want, but we can't run all the
time because it's expensive, or we host the VMs ourselves somewhere.
The former is easier because of ec2, I think, but the latter is
cheaper. My concern with the latter is the difficulty in federating
the results of disparate test runs on different systems maintained by
essentially random community members. Guaranteeing consistent results
will always be difficult.

--
SELF-EVIDENT, adj. Evident to one's self and to nobody else.
-- Ambrose Bierce

Paul Lathrop

unread,

May 22, 2008, 1:58:24 PM5/22/08

to puppe...@googlegroups.com

I've published a bunch of fixes for the RSpec *unit* tests, and
created/updated tickets as appropriate.

I'm much less confident about messing with the failing integration tests.

I just wanted to mention that before you file a bug about a failing
test, you should really be sure your code is up-to-date with respect
to the official repository. It would also help if you filled out
platform information in the bug to help people reproduce the test
failures.

With all the patches I made in the last couple days applied, both
master and 0.24.x tests run perfectly as both a normal user and as
root, on my OSX 10.5 machine :-) I'll wait until these patches are
integrated into the official repository and then I'll do another round
of verification.

--Paul

James Turnbull

unread,

May 22, 2008, 2:19:13 PM5/22/08

to puppe...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Please remember that the current primary repository for the 0.24.x
branch is now my Github repo:

git://github.com/jamtur01/puppet.git

Regards

James Turnbull

- --
James Turnbull (ja...@lovedthanlost.net)
Author of:
* Pulling Strings with Puppet
(http://www.amazon.com/gp/product/1590599780/)
* Pro Nagios 2.0
(http://www.amazon.com/gp/product/1590596099/)
* Hardening Linux
(http://www.amazon.com/gp/product/1590594444/)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFINbkh9hTGvAxC30ARAtvBAJ40lO8TmMWrxhaEKKzvb19y2sSamwCgxe7r
Y5p2K89y/1ZzFFO+9803E6c=
=7eMN
-----END PGP SIGNATURE-----

Reply all

Reply to author

Forward