Exhaustive Unit Testing

Emanuele D'Arrigo

unread,

Nov 26, 2008, 9:54:52 PM11/26/08

to

Hi everybody,

another question on unit testing, admittedly not necessarily a python-
specific one...

I have a class method about 60 lines long (*) and due to some 9 non-
trivial IFs statements (3 and 2 of which nested) the number of
possible paths the program flow can take is uncomfortably large, each
path characterized by an uncomfortably long list of careful turns to
take. Given that all paths have large portions of them overlapping
with each other, is there a strategy that would allow me to collapse
the many cases into only a few representative ones?

I can sense that exhaustively testing all paths is probably overkill.
On the other hand just going through all nodes and edges between the
nodes -at least once- is probably not enough. Is there any technique
to find the right(ish) middle?

Thanks for your help!

Manu

(*) for the 50/500 purists: it's my only method/function with more
than 50 lines so far... and it include comments!

Steven D'Aprano

unread,

Nov 27, 2008, 12:00:15 AM11/27/08

to

On Wed, 26 Nov 2008 18:54:52 -0800, Emanuele D'Arrigo wrote:

> Hi everybody,
>
> another question on unit testing, admittedly not necessarily a python-
> specific one...
>
> I have a class method about 60 lines long (*) and due to some 9 non-
> trivial IFs statements (3 and 2 of which nested) the number of possible
> paths the program flow can take is uncomfortably large, each path
> characterized by an uncomfortably long list of careful turns to take.

Nine non-trivial if statements, some of which are nested... uncomfortably
large... uncomfortably long... these are all warning signs that your
method is a monster.

I would say the obvious solution is to refactor that method into multiple
simpler methods, then call those. Not only does that make maintaining
that class method easier, but you can also unit-test each method
individually.

With nine if-statements, you have 512 program paths that need testing.
(In practice, there may be fewer because some of the statements are
nested, and presumably some paths will be mutually exclusive.) Refactor
the if...else blocks into methods, and you might have as many as 18 paths
to test. So in theory you should reduce the number of unit-tests
significantly.

> Given that all paths have large portions of them overlapping with each
> other, is there a strategy that would allow me to collapse the many
> cases into only a few representative ones?

That would be impossible to answer without knowing what the method
specifically does and how the many cases interact with each other.

> I can sense that exhaustively testing all paths is probably overkill.

I don't agree. If you don't test all the paths, then by definition you
have program paths which have never been tested. Unless those paths are
so trivially simple that you can see that they must be correct just by
looking at the code, then the chances are very high that they will hide
bugs.

> On
> the other hand just going through all nodes and edges between the nodes
> -at least once- is probably not enough. Is there any technique to find
> the right(ish) middle?

Refactor until your code is simple enough to unit-test effectively, then
unit-test effectively.

--
Steven

Emanuele D'Arrigo

unread,

Nov 27, 2008, 8:37:49 AM11/27/08

to

On Nov 27, 5:00 am, Steven D'Aprano

<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> Refactor until your code is simple enough to unit-test effectively, then
> unit-test effectively.

<sigh> I suspect you are right...

Ok, thank you!

Manu

Stefan Behnel

unread,

Nov 27, 2008, 8:52:19 AM11/27/08

to

Steven D'Aprano wrote:
> If you don't test all the paths, then by definition you
> have program paths which have never been tested. Unless those paths are
> so trivially simple that you can see that they must be correct just by
> looking at the code, then the chances are very high that they will hide
> bugs.

Not to mention that you can sometimes look at awfully trivial code three
times and only see the obvious bug in that code the fourth time you put an
eye on it a good night's sleep later.

Expecting code paths in a nested if statement with nine conditions that are
not worth testing due to obvious triviality, is pretty hubristic.

Stefan

Roy Smith

unread,

Nov 27, 2008, 9:54:21 AM11/27/08

to

In article <492ea611$0$32680$9b4e...@newsspool2.arcor-online.net>,
Stefan Behnel <stef...@behnel.de> wrote:

> Not to mention that you can sometimes look at awfully trivial code three
> times and only see the obvious bug in that code the fourth time you put an
> eye on it a good night's sleep later.

Or never see it.

Lately, I've been using Coverity (static code analysis tool) on a large C++
project. I'm constantly amazed at the stuff it finds which is *so* obvious
after it's pointed out, that I just never saw before.

> Expecting code paths in a nested if statement with nine conditions that are
> not worth testing due to obvious triviality, is pretty hubristic.

There's a well known theory in studies of the human brain which says people
are capable of processing about 7 +/- 2 pieces of information at once.
With much handwaving, that could be applied to testing by saying most
people can think through 2 conditionals (4 paths), but it's pushing it when
you get to 3 conditionals (8 paths), and once you get to 4 or more, it's
probably hopeless to expect somebody to be able to fully understand all the
possible code paths.

Emanuele D'Arrigo

unread,

Nov 27, 2008, 11:32:12 AM11/27/08

to

On Nov 27, 5:00 am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:

> Refactor until your code is simple enough to unit-test effectively, then
> unit-test effectively.

Ok, I've taken this wise suggestion on board and of course I found
immediately ways to improve the method. -However- this generates
another issue. I can fragment the code of the original method into one
public method and a few private support methods. But this doesn't
reduce the complexity of the testing because the number and complexity
of the possible path stays more or less the same. The solution to this
would be to test the individual methods separately, but is the only
way to test private methods in python to make them (temporarily) non
private? I guess ultimately this would only require the removal of the
appropriate double-underscores followed by method testing and then
adding the double-underscores back in place. There is no "cleaner"
way, is there?

Manu

Terry Reedy

unread,

Nov 27, 2008, 1:33:20 PM11/27/08

to pytho...@python.org

Use single underscore names to mark methods as private. The intent of
double-underscore mangling is to avoid name clashes in multiple
inheritance. If you insist on double underscores, use the mangled name
in your tests. 'Private' is not really private in Python.

IDLE 3.0rc3
>>> class C(): __a=1

>>> C._C__a
1

Terry

bearoph...@lycos.com

unread,

Nov 27, 2008, 1:45:47 PM11/27/08

to

Emanuele D'Arrigo:

>I can fragment the code of the original method into one public method and a few private support methods.<

Python also support nested functions, that you can put into your
method. The problem is that often unit test functions aren't able to
test nested functions.

A question for other people: Can Python change a little to allow
nested functions to be tested? I think this may solve some of my
problems.

Bye,
bearophile

Terry Reedy

unread,

Nov 27, 2008, 6:47:18 PM11/27/08

to pytho...@python.org

The problem is that inner functions do not exist until the outer
function is called and the inner def is executed. And they cease to
exist when the outer function returns unless returned or associated with
a global name or collection.

A 'function' only needs to be nested if it is intended to be different
(different default or closure) for each execution of its def.

Benjamin

unread,

Nov 27, 2008, 7:29:52 PM11/27/08

to

On Nov 27, 5:47 pm, Terry Reedy <tjre...@udel.edu> wrote:

> bearophileH...@lycos.com wrote:
> > Emanuele D'Arrigo:
> >> I can fragment the code of the original method into one public method and a few private support methods.<
>
> > Python also support nested functions, that you can put into your
> > method. The problem is that often unit test functions aren't able to
> > test nested functions.
>
> > A question for other people: Can Python change a little to allow
> > nested functions to be tested? I think this may solve some of my
> > problems.
>
> The problem is that inner functions do not exist until the outer
> function is called and the inner def is executed. And they cease to
> exist when the outer function returns unless returned or associated with
> a global name or collection.

Of course, you could resort to terrible evil like this:

import types

def f():
def sub_function():
return 4

nested_function = types.FunctionType(f.func_code.co_consts[1], {})
print nested_function() # Prints 4

But don't do that!

Steven D'Aprano

unread,

Nov 27, 2008, 8:00:51 PM11/27/08

to

On Thu, 27 Nov 2008 10:45:47 -0800, bearophileHUGS wrote:

> A question for other people: Can Python change a little to allow nested
> functions to be tested? I think this may solve some of my problems.

Remember that nested functions don't actually exist as functions until
the outer function is called, and when the outer function is called they
go out of scope and cease to exist. For this to change wouldn't be a
little change, it would be a large change.

I can see benefit to the idea. Unit testing, as you say. I also like the
idea of doing this:

def foo(x):
def bar(y):
return y+1
return x**2+bar(x)

a = foo.bar(7)

However you can get the same result (and arguably this is the Right Way
to do it) with a class:

class foo:
def bar(y):
return y+1
def __call__(self, x):
return x**2 + self.bar(x)
foo = foo()

Unfortunately, this second way can't take advantage of nested scopes in
the same way that functions can.

--
Steven

Steven D'Aprano

unread,

Nov 27, 2008, 8:30:45 PM11/27/08

to

On Thu, 27 Nov 2008 08:32:12 -0800, Emanuele D'Arrigo wrote:

> On Nov 27, 5:00 am, Steven D'Aprano
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> Refactor until your code is simple enough to unit-test effectively,
>> then unit-test effectively.
>
> Ok, I've taken this wise suggestion on board and of course I found
> immediately ways to improve the method. -However- this generates another
> issue. I can fragment the code of the original method into one public
> method and a few private support methods. But this doesn't reduce the
> complexity of the testing because the number and complexity of the
> possible path stays more or less the same. The solution to this would be
> to test the individual methods separately,

Naturally.

> but is the only way to test
> private methods in python to make them (temporarily) non private? I
> guess ultimately this would only require the removal of the appropriate
> double-underscores followed by method testing and then adding the
> double-underscores back in place. There is no "cleaner" way, is there?

"Private" methods in Python are only private by convention.

>>> class Parrot:
... def _private(self): # private, don't touch
... pass
... def __reallyprivate(self): # name mangled
... pass
...
>>> Parrot._private
<unbound method Parrot._private>
>>> Parrot.__reallyprivate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: class Parrot has no attribute '__reallyprivate'
>>> Parrot._Parrot__reallyprivate
<unbound method Parrot.__reallyprivate>

In Python, the philosophy is "We're all adults here" and consequently if
people *really* want to access your private methods, they can. This is a
Feature, not a Bug :)

Consequently, I almost always use single-underscore "private by
convention" names, rather than double-underscore names. The name-mangling
is, in my experience, just a nuisance.

--
Steven

bearoph...@lycos.com

unread,

Nov 28, 2008, 3:06:01 AM11/28/08

to

Terry Reedy:

>The problem is that inner functions do not exist until the outer function is called and the inner def is executed. And they cease to exist when the outer function returns unless returned or associated with a global name or collection.<

OK.

>A 'function' only needs to be nested if it is intended to be different (different default or closure) for each execution of its def.<

Or maybe because you want to denote a logical nesting, or maybe
because you want to keep the outer namespace cleaner, etc etc.

-----------------------

Benjamin:

>Of course, you could resort to terrible evil like this:<

My point was of course to ask about possible changes to CPython, so
you don't need evil hacks anymore.

-----------------------

Steven D'Aprano:

>For this to change wouldn't be a little change, it would be a large change.<

I see, then my proposal has little hope, I presume. I'll have to keep
moving functions outside to test them and move them inside again when
I want to run them.

>However you can get the same result (and arguably this is the Right Way to do it) with a class:<

Of course, that's the Right Way only for languages that support only a
strict form of the Object Oriented paradigm, like for example Java.

Thank you to all the people that have answered.

Bye,
bearophile

Steven D'Aprano

unread,

Nov 28, 2008, 5:14:51 AM11/28/08

to

On Fri, 28 Nov 2008 00:06:01 -0800, bearophileHUGS wrote:

>>For this to change wouldn't be a little change, it would be a large
>>change.<
>
> I see, then my proposal has little hope, I presume. I'll have to keep
> moving functions outside to test them and move them inside again when I
> want to run them.

Not so. You just need to prevent the nested functions from being garbage
collected.

def foo(x):
if not hasattr(foo, 'bar'):
def bar(y):
return x+y
def foobar(y):
return 3*y-y**2
foo.bar = bar
foo.foobar = foobar
return foo.bar(3)+foo.foobar(3)

However, note that there is a problem here: foo.bar will always use the
value of x as it were when it was called the first time, *not* the value
when you call it subsequently:

>>> foo(2)
5
>>> foo(3000)
5

Because foo.foobar doesn't use x, it's safe to use; but foo.bar is a
closure and will always use the value of x as it was first used. This
could be a source of puzzling bugs.

There are many ways of dealing with this. Here is one:

def foo(x):
def bar(y):
return x+y
def foobar(y):
return 3*y-y**2
foo.bar = bar
foo.foobar = foobar
return bar(3)+foobar(3)

I leave it as an exercise for the reader to determine how the behaviour
of this is different to the first version.

--
Steven

Nigel Rantor

unread,

Nov 28, 2008, 7:04:55 AM11/28/08

to Roy Smith, pytho...@python.org

Roy Smith wrote:
>
> There's a well known theory in studies of the human brain which says people
> are capable of processing about 7 +/- 2 pieces of information at once.

It's not about processing multiple taks, it's about the amount of things
that can be held in working memory.

n

Bruno Desthuilliers

unread,

Nov 28, 2008, 7:33:28 AM11/28/08

to

Steven D'Aprano a écrit :
(snip)

> Consequently, I almost always use single-underscore "private by
> convention" names, rather than double-underscore names. The name-mangling
> is, in my experience, just a nuisance.

s/just/most often than not/ and we'll agree on this !-)

Roy Smith

unread,

Nov 28, 2008, 8:44:31 AM11/28/08

to

In article <mailman.4622.1227875...@python.org>,
Nigel Rantor <wig...@wiggly.org> wrote:

Yes, and that's what I'm talking about here. If there's N possible code
paths, can you think about all of them at the same time while you look at
your code and simultaneously understand them all? If I write:

if foo:
do_this()
else:
do_that()

it's easy to get a good mental picture of all the consequences of foo being
true or false. As the number of paths go up, it becomes harder to think of
them all as a coherent piece of code and you have to resort to examining
the paths sequentially.

Terry Reedy

unread,

Nov 28, 2008, 1:02:14 PM11/28/08

to pytho...@python.org

bearoph...@lycos.com wrote:
> Terry Reedy:
>

>> A 'function' only needs to be nested if it is intended to be
>> different (different default or closure) for each execution of its
>> def.<
>
> Or maybe because you want to denote a logical nesting, or maybe
> because you want to keep the outer namespace cleaner, etc etc.

I was already aware of those *wants*, but they are not *needs*, in the
sense I meant. A single constant function does not *need* to be nested
and regenerated with each call.

A standard idiom, I think, is give the foo-associated function a private
foo-derived name such as _foo or _foo_bar. This keeps the public
namespace clean and denotes the logical nesting. I *really* would not
move things around after testing.

For the attribute approach, you could lobby for the so-far rejected

def foo(x):
return foo.bar(3*x)

def foo.bar(x):
return x*x

In the meanwhile...

def foo(x):
return foo.bar(3*x)

def _(x):
return x*x
foo.bar = _

Or write a decorator so you can write

@funcattr(foo. 'bar')
def _

(I don't see any way a function can delete a name in the namespace that
is going to become the class namespace, so that
@fucattr(foo)
def bar...
would work.)

Terry Jan Reedy

Fuzzyman

unread,

Nov 28, 2008, 7:35:02 PM11/28/08

to

Your experiences are one of the reasons that writing the tests *first*
can be so helpful. You think about the *behaviour* you want from your
units and you test for that behaviour - *then* you write the code
until the tests pass.

This means that your code is testable, but which usually means simpler
and better.

By the way, to reduce the number of independent code paths you need to
test you can use mocking. You only need to test the logic inside the
methods you create (testing behaviour), and not every possible
combination of paths.

Michael Foord
--
http://www.ironpythoninaction.com/

Emanuele D'Arrigo

unread,

Nov 28, 2008, 10:03:54 PM11/28/08

to

Thank you to everybody who has replied about the original problem. I
eventually refactored the whole (monster) method over various smaller
and simpler ones and I'm now testing each individually. Things have
gotten much more tractable. =)

Thank you for nudging me in the right direction! =)

Manu

Emanuele D'Arrigo

unread,

Nov 28, 2008, 10:33:16 PM11/28/08

to

On Nov 29, 12:35 am, Fuzzyman <fuzzy...@gmail.com> wrote:
> Your experiences are one of the reasons that writing the tests *first*
> can be so helpful. You think about the *behaviour* you want from your
> units and you test for that behaviour - *then* you write the code
> until the tests pass.

Thank you Michael, you are perfectly right in reminding me this. At
this particular point in time I'm not yet architecturally foresighted
enough to be able to do that. While I was writing the design documents
I did write the list of methods each object would have needed and from
that description theoretically I could have made the tests first. In
practice, while eventually writing the code for those methods, I've
come to realize that there was a large amount of variance between what
I -thought- I needed and what I -actually- needed. So, had I written
the test before, I would have had to rewrite them again. That been
said, I agree that writing the tests before must be my goal. I hope
that as my experience increases I'll be able to know beforehand the
behaviors I need from each method/object/module of my applications.
One step at the time I'll get there... =)

Manu

Roel Schroeven

unread,

Nov 29, 2008, 5:36:56 AM11/29/08

to

Fuzzyman schreef:

> By the way, to reduce the number of independent code paths you need to
> test you can use mocking. You only need to test the logic inside the
> methods you create (testing behaviour), and not every possible
> combination of paths.

I don't understand that. This is part of something I've never understood
about unit testing, and each time I try to apply unit testing I bump up
against, and don't know how to resolve. I find it also difficult to
explain exactly what I mean.

Suppose I need to write method spam() that turns out to be somewhat
complex, like the class method Emanuele was talking about. When I try to
write test_spam() before the method, I have no way to know that I'm
going to need so many code paths, and that I'm going to split the code
out into a number of other functions spam_ham(), spam_eggs(), etc.

So whatever happens, I still have to test spam(), however many codepaths
it contains? Even if it only contains a few lines with fors and ifs and
calls to the other functions, it still needs to be tested? Or not? From
a number of postings in this thread a get the impression (though that
might be an incorrect interpretation) that many people are content to
only test the various helper functions, and not the spam() itself. You
say you don't have to test every possible combination of paths, but how
thorough is your test suite if you have untested code paths?

A related matter (at least in my mind) is this: after I've written
test_spam() but before spam() is correctly working, I find out that I
need to write spam_ham() and spam_eggs(), so I need test_spam_ham() and
test_spam_eggs(). That means that I can never have a green light while
coding test_spam_ham() and test_stam_eggs(), since test_spam() will
fail. That feels wrong. And this is a simple case; I've seen cases where
I've created various new classes in order to write one new function.
Maybe I shouldn't care so much about the failing unit test? Or perhaps I
should turn test_spam() of while testing test_spam_ham() and
test_spam_eggs().

I've read "test-driven development" by David Astels, but somehow it
seems that the issues I encounter in practice don't come up in his examples.

--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov

Roel Schroeven

Steven D'Aprano

unread,

Nov 29, 2008, 7:06:00 AM11/29/08

to

On Sat, 29 Nov 2008 11:36:56 +0100, Roel Schroeven wrote:

> Fuzzyman schreef:
>> By the way, to reduce the number of independent code paths you need to
>> test you can use mocking. You only need to test the logic inside the
>> methods you create (testing behaviour), and not every possible
>> combination of paths.
>
> I don't understand that. This is part of something I've never understood
> about unit testing, and each time I try to apply unit testing I bump up
> against, and don't know how to resolve. I find it also difficult to
> explain exactly what I mean.
>
> Suppose I need to write method spam() that turns out to be somewhat
> complex, like the class method Emanuele was talking about. When I try to
> write test_spam() before the method, I have no way to know that I'm
> going to need so many code paths, and that I'm going to split the code
> out into a number of other functions spam_ham(), spam_eggs(), etc.
>
> So whatever happens, I still have to test spam(), however many codepaths
> it contains? Even if it only contains a few lines with fors and ifs and
> calls to the other functions, it still needs to be tested? Or not?

The first thing to remember is that it is impractical for unit tests to
be exhaustive. Consider the following trivial function:

def add(a, b): # a and b ints only
return a+b+1

Clearly you're not expected to test *every imaginable* path through this
function (ignoring unit tests for error handling and bad input):

assert add(0, 0) == 1
assert add(1, 0) == 2
assert add(2, 0) == 3
assert add(3, 0) == 4
...
assert add(99736263, 8264891001) = 8364627265
...

Instead, your tests for add() can rely on the + operator being
sufficiently tested that you can trust it, and so you only need to test
the logic of your function. To do that, it would be sufficient to test a
relatively small representative sample of data. One test would probably
be sufficient:

assert add(1, 3) == 5

That test would detect almost all bugs in the function, although of
course it won't detect every imaginable bug. A second test will make the
chances of such false negatives virtually disappear.

Now imagine a more complicated function:

def spam(a, b):
return spam_eggs(a, b) + spam_ham(a) - 2*spam_tomato(b)

Suppose spam_eggs has four paths that need testing (paths A, B, C, D),
spam_ham and spam_tomato have two each (E F and G H), and let's assume
that they are all independent. Then your spam unit tests need to test
every path:

A E G
A E H
A F G
A F H
B E G
B E H
...
D F H

for a total of 4*2*2=16 paths, in the spam unit tests.

But suppose that we have tested spam_eggs independently. It has four
paths, so we need four tests to cover them all. Now our spam testing can
assume that spam_eggs is correct, in the same way that we earlier assumed
that the plus operator was correct, and reduce the number of tests to a
small set of representative data.

No matter which path through spam_eggs we take, we can trust the result,
because we have unit tests that will fail if spam_eggs has a bug. So
instead of testing every path, I choose a much more limited set:

A E G
A E H
A F G
A F H

I arbitrarily choose path A alone, confident that paths B C and D are
correct, but of course I could make other choices. There's no need to
test paths B C and D *within spam's unit tests*, because they are already
tested elsewhere. To test them again within spam doesn't gain me anything.

Consequently, we reduce our total number of tests from 16 to 8 (four
tests for spam, four for spam_eggs).

> From
> a number of postings in this thread a get the impression (though that
> might be an incorrect interpretation) that many people are content to
> only test the various helper functions, and not the spam() itself. You
> say you don't have to test every possible combination of paths, but how
> thorough is your test suite if you have untested code paths?

The success of this tactic assumes that you can identify code paths and
make them independent. If they are dependent, then you can't be sure that
path E G after A is the same as E G after D.

Real world example: compare driving your car from home to the mall to the
park, compared to driving from work to the mall to the park. The journey
from the mall to the park is the same, no matter how you got to the mall.
If you can drive from home to the mall and then to the park, and you can
drive from work to the mall, then you can be sure that you can drive from
work to the mall to the park even though you've never done it before.

But if you can't be sure the paths are independent, then you can't make
that simplifying assumption, and you do have to test more paths in more
places.

> A related matter (at least in my mind) is this: after I've written
> test_spam() but before spam() is correctly working, I find out that I
> need to write spam_ham() and spam_eggs(), so I need test_spam_ham() and
> test_spam_eggs(). That means that I can never have a green light while
> coding test_spam_ham() and test_stam_eggs(), since test_spam() will
> fail. That feels wrong.

I would say that means you're letting your tests get too far ahead of
your code. In theory, you should never have more than one failing test at
a time: the last test you just wrote. If you have to refactor code so
much that a bunch of tests start failing, then you need to take those
tests out, and re-introduce them one at a time.

In practice, I can't imagine too many people have the discipline to
follow that practice precisely. I know I don't :)

--
Steven

Fuzzyman

unread,

Nov 29, 2008, 10:41:24 AM11/29/08

to

Personally I find writing the tests an invaluable part of the design
process. It works best if you do it 'top-down'. i.e. You have a
written feature specification (a user story) - you turn this into an
executable specification in the form of a functional test.

Next to mid level unit tests and downwards - so your tests become your
design documents (and the way you think about design), but better than
a document they are executable. So just like code conveys intent so do
the tests (it is important that tests are readable).

For the situation where you don't really know what the API should look
like, Extreme Programming (of which TDD is part) includes a practise
called spiking. I wrote a bit about that here:

http://www.voidspace.org.uk/python/weblog/arch_d7_2007_11_03.shtml#e867

Mocking can help reduce the number of code paths you need to test for
necessarily complex code. Say you have a method that looks something
like:

def method(self):
if conditional:
# do stuff
else:
# do other stuff
# then do more stuff

You may be able to refactor this to look more like the following

def method(self):
if conditional:
self.method2()
else:
self.method3()
self.method4()

You can then write unit tests that *only* tests methods 2 - 4 on their
own. That code is then tested. You can then test method by mocking out
methods 2 - 4 on the instance and only need to test that they are
called in the right conditions and with the right arguments (and you
can mock out the return values to test that method handles them
correctly).

Mocking in Python is very easy, but there are plenty of mock libraries
to make it even easier. My personal favourite (naturally) is my own:

http://www.voidspace.org.uk/python/mock.html

All the best,

Michael
--
http://www.ironpythoninaction.com/

> Manu

Roel Schroeven

unread,

Nov 29, 2008, 11:13:00 AM11/29/08

to

Thanks for your answer. I still don't understand completely though. I
suppose it's me, but I've been trying to understand some of this for
quite some and somehow I can't seem to wrap my head around it.

Steven D'Aprano schreef:

> On Sat, 29 Nov 2008 11:36:56 +0100, Roel Schroeven wrote:
>
> The first thing to remember is that it is impractical for unit tests to
> be exhaustive. Consider the following trivial function:
>
> def add(a, b): # a and b ints only
> return a+b+1
>
> Clearly you're not expected to test *every imaginable* path through this
> function (ignoring unit tests for error handling and bad input):
>
> assert add(0, 0) == 1
> assert add(1, 0) == 2
> assert add(2, 0) == 3
> assert add(3, 0) == 4
> ...
> assert add(99736263, 8264891001) = 8364627265
> ...

OK

> ...

> I arbitrarily choose path A alone, confident that paths B C and D are
> correct, but of course I could make other choices. There's no need to
> test paths B C and D *within spam's unit tests*, because they are already
> tested elsewhere.

Except that I'm always told that the goal of unit tests, at least
partly, is to protect us agains mistakes when we make changes to the
tested functions. They should tell me wether I can still trust spam()
after refactoring it. Doesn't that mean that the unit test should see
spam() as a black box, providing a certain (but probably not 100%)
guarantee that the unit test is still a good test even if I change the
implementation of spam()?

And I don't understand how that works in test-driven development; I
can't possibly adapt the tests to the code paths in my code, because the
code doesn't exist yet when I write the test.

> To test them again within spam doesn't gain me anything.

I would think it gains you the freedom of changing spam's implementation
while still being able to rely on the unit tests. Or maybe I'm thinking
too far?

> The success of this tactic assumes that you can identify code paths and
> make them independent. If they are dependent, then you can't be sure that
> path E G after A is the same as E G after D.
>
> Real world example: compare driving your car from home to the mall to the
> park, compared to driving from work to the mall to the park. The journey
> from the mall to the park is the same, no matter how you got to the mall.
> If you can drive from home to the mall and then to the park, and you can
> drive from work to the mall, then you can be sure that you can drive from
> work to the mall to the park even though you've never done it before.
>
> But if you can't be sure the paths are independent, then you can't make
> that simplifying assumption, and you do have to test more paths in more
> places.

OK, but that only works if I know the code paths, meaning I've already
written the code. Wasn't the whole point of TDD that you write the tests
before the code?

>> A related matter (at least in my mind) is this: after I've written
>> test_spam() but before spam() is correctly working, I find out that I
>> need to write spam_ham() and spam_eggs(), so I need test_spam_ham() and
>> test_spam_eggs(). That means that I can never have a green light while
>> coding test_spam_ham() and test_stam_eggs(), since test_spam() will
>> fail. That feels wrong.
>
> I would say that means you're letting your tests get too far ahead of
> your code. In theory, you should never have more than one failing test at
> a time: the last test you just wrote. If you have to refactor code so
> much that a bunch of tests start failing, then you need to take those
> tests out, and re-introduce them one at a time.

I still fail to see how that works. I know I must be wrong since so many
people successfully apply TDD, but I don't see what I'm missing.

Let's take a more-or-less realistic example: I want/need a function to
calculate the least common multiple of two numbers. First I write some
tests:

assert(lcm(1, 1) == 1)
assert(lcm(2, 5) == 10)
assert(lcm(2, 4) == 4)

Then I start to write the lcm() function. I do some research and I find
out that I can calculate the lcm from the gcd, so I write:

def lcm(a, b):
return a / gcd(a, b) * b

But gcd() doesn't exist yet, so I write some tests for gcd(a, b) and
start writing the gcd function. But all the time while writing that, the
lcm tests will fail.

I don't see how I can avoid that, unless I create gcd() before I create
lcm(), but that only works if I know that I'm going to need it. In a
simple case like this I could know, but in many cases I don't know it
beforehand.

Steven D'Aprano

unread,

Nov 29, 2008, 10:42:50 PM11/29/08

to

On Sat, 29 Nov 2008 17:13:00 +0100, Roel Schroeven wrote:

> Except that I'm always told that the goal of unit tests, at least
> partly, is to protect us agains mistakes when we make changes to the
> tested functions. They should tell me wether I can still trust spam()
> after refactoring it. Doesn't that mean that the unit test should see
> spam() as a black box, providing a certain (but probably not 100%)
> guarantee that the unit test is still a good test even if I change the
> implementation of spam()?

Yes, but you get to choose how strong that guarantee is. If you want to
test the same thing in multiple places in your code, you're free to do
so. Refactoring merely reduces the minimum number of tests you need for
complete code coverage, not the maximum.

The aim here isn't to cut the number of unit tests down to the absolute
minimum number required to cover all paths through your code, but to
reduce that minimum number to something tractable: O(N) or O(N**2)
instead of O(2**N), where N = some appropriate measure of code complexity.

It is desirable to have some redundant tests, because they reduce the
chances of a freakish bug just happening to give the correct result for
the test but wrong results for everything else. (Assuming of course that
the redundant tests aren't identical -- you gain nothing by running the
exact same test twice.) They also give you extra confidence that you can
refactor the code without introducing such freakish bugs. But if you find
yourself making such sweeping changes to your code base that you no
longer have such confidence, then by all means add more tests!

> And I don't understand how that works in test-driven development; I
> can't possibly adapt the tests to the code paths in my code, because the
> code doesn't exist yet when I write the test.

That's where you should be using mocks and stubs to ease the pain.

http://en.wikipedia.org/wiki/Mock_object
http://en.wikipedia.org/wiki/Method_stub

> > To test them again within spam doesn't gain me anything.
>
> I would think it gains you the freedom of changing spam's implementation
> while still being able to rely on the unit tests. Or maybe I'm thinking
> too far?

No, you are right, and I over-stated the case.

[snip]

>> I would say that means you're letting your tests get too far ahead of
>> your code. In theory, you should never have more than one failing test
>> at a time: the last test you just wrote. If you have to refactor code
>> so much that a bunch of tests start failing, then you need to take
>> those tests out, and re-introduce them one at a time.
>
> I still fail to see how that works. I know I must be wrong since so many
> people successfully apply TDD, but I don't see what I'm missing.
>
> Let's take a more-or-less realistic example: I want/need a function to
> calculate the least common multiple of two numbers. First I write some
> tests:
>
> assert(lcm(1, 1) == 1)
> assert(lcm(2, 5) == 10)
> assert(lcm(2, 4) == 4)

(Aside: assert is not a function, you don't need the parentheses.)

Arguably, that's too many tests. Start with one.

assert lcm(1, 1) == 1

And now write lcm:

def lcm(a, b):
return 1

That's a stub, and our test passes. So add another test:

assert lcm(2, 5) == 10

and the test fails. So let's fix the function by using gcd.

def lcm(a, b):
return a/gcd(a, b)*b

(By the way: there's a subtle bug in lcm() that will hit you in Python 3.
Can you spot it? Here's a hint: your unit tests should also assert that
the result of lcm is always an int.)

Now that we've introduced a new function, we need a stub and a test for
it:

def gcd(a, b):
return 1

Why does the stub return 1? So it will make the lcm test pass. If we had
more lcm tests, it would be harder to write a gcd stub, hence the
insistence of only adding a single test at a time.

assert gcd(1, 1) == 1

Now that all the tests work and we get a nice green light. Let's add
another test. We need to add it to the gcd test suite, because it's the
latest, least working function. If you add a test to the lcm test suite,
and it fails, you don't know if it failed because of an error in lcm() or
because of an error in gcd(). So leave lcm alone until gcm is working:

assert gcd(2, 5) == 2

Now go and fix gcd. At some time you have to decide to stop using a stub
for gcd, and write the function properly. For a function that simple,
"now" is that time, but just for the exercise let me write a slightly
more complicated stub. This is (probably) the next simplest stub which
allows all the tests to pass while still being "wrong":

def gcd(a, b):
if a == b:
return 1
else:
return 2

When you're convinced gcd() is working, you can go back and add
additional tests to lcm.

In practice, of course, you can skip a few steps. It's hard to be
disciplined enough to program in such tiny little steps. But the cost of
being less disciplined is that it takes longer to have all the tests pass.

--
Steven

Steven D'Aprano

unread,

Nov 30, 2008, 12:03:08 AM11/30/08

to

On Sun, 30 Nov 2008 03:42:50 +0000, Steven D'Aprano wrote:

> def lcm(a, b):
> return a/gcd(a, b)*b
>
> (By the way: there's a subtle bug in lcm() that will hit you in Python
3. Can you spot it?

Er, ignore this. Division in Python 3 only returns a float if the
remainder is non-zero, and when dividing by the gcd the remainder should
always be zero.

> Here's a hint: your unit tests should also assert that
> the result of lcm is always an int.)

Still good advise.

--
Steven

Terry Reedy

unread,

Nov 30, 2008, 1:11:48 AM11/30/08

to pytho...@python.org

Steven D'Aprano wrote:
> On Sun, 30 Nov 2008 03:42:50 +0000, Steven D'Aprano wrote:
>
>> def lcm(a, b):
>> return a/gcd(a, b)*b
>>
>> (By the way: there's a subtle bug in lcm() that will hit you in Python
> 3. Can you spot it?
>
> Er, ignore this. Division in Python 3 only returns a float if the
> remainder is non-zero, and when dividing by the gcd the remainder should
> always be zero.

You were right the first time.
IDLE 3.0rc3
>>> a = 4/2
>>> a
2.0

lcm should return an int so should be written a//gcd(a,b) * b
to guarantee that.

Roel Schroeven

unread,

Nov 30, 2008, 4:20:32 AM11/30/08

to

Steven D'Aprano schreef:
> [..]

Thank you for elaborate answer, Steven. I think I'm really starting to
get it now.

James Harris

unread,

Nov 30, 2008, 10:24:07 AM11/30/08

to

Difficult to say without seeing the code. You could post it, perhaps.
On the other hand a general recommendation from Programming Pearls
(Jon Bentley) is to convert code to data structures. Maybe you could
convert some of the code to decision tables or similar.

James