Is code duplication allowed in this instance?

Klone

unread,

Jul 3, 2009, 6:46:32 AM7/3/09

to

Hi all. I believe in programming there is a common consensus to avoid
code duplication, I suppose such terms like 'DRY' are meant to back
this idea. Anyways, I'm working on a little project and I'm using TDD
(still trying to get a hang of the process) and am trying to test the
functionality within a method. Whoever it so happens to verify the
output from the method I have to employ the same algorithm within the
method to do the verification since there is no way I can determine
the output before hand.

So in this scenario is it OK to duplicate the algorithm to be tested
within the test codes or refactor the method such that it can be used
within test codes to verify itself(??).

Rickard Lindberg

unread,

Jul 3, 2009, 7:29:44 AM7/3/09

to pytho...@python.org

> Whoever it so happens to verify the
> output from the method I have to employ the same algorithm within the
> method to do the verification since there is no way I can determine
> the output before hand.

Can you clarify this scenario a bit?

If you are performing black-box testing I don't see why you need to
use the same algorithm in the test code. But maybe your case is
special. On the other hand you can not perform black-box testing if
the output is not known for a given input.

--
Rickard Lindberg

Francesco Bochicchio

unread,

Jul 3, 2009, 8:34:27 AM7/3/09

to

If the purpose of the test is to verify the algorithm, you obviously
should not use the algorithm
to verify itself ... you should use a set of pairs (input data,
exoected output data) data that you know is
well representative of the data your algorithm will process. Possibly
to prepare the test data set
you might need a different - and already proven - implementation of
the algorithm.

Another thing I sometime do when testing mathematics function is use
counter-proof: for instance, if my function
computes the roots of a quadratic equation, the test verifies that the
roots applied to the equation
actually give (almost) zero as result. This kind of test might not be
as rigorous as preparing the data set with the known
answers, but it is easier to setup and could give you a first idea if
your code is "correct enough" to stand
more formal proof.

Ciao
----
FB

Lie Ryan

unread,

Jul 3, 2009, 8:49:21 AM7/3/09

to

When unittesting a complex output, usually the first generation code
would generate the output and then you (manually) verify this output and
copy the output to the test code. The testing code should be as dumb as
possible to avoid bugs in the testing code itself. The most important
thing is not to use a possibly buggy algorithm implementation to check
the algorithm itself. If your test code looks like this:

# this the function to be tested
def func(a, b):
return a + b

class TC(unittest.TestCase):
def test_foo(self):
a, b = 10, 20 # worse: use random module
result = a + b
self.assertEqual(func(a, b), result)

then something is definitely wrong. Instead the testing code should be
simple and stupid like:

class TC(unittest.TestCase):
def test_foo(self):
self.assertEqual(func(10, 20), 30)

There are exceptions though, such as if you're 100% sure your
first-generation program is correct and you simply want to replicate the
same function with different algorithm, or probably optimizing the
function. For that case, you could do something like this:

def func(a, b):
# some fancy new algo
pass
class TC(unittest.TestCase):
def original_func(self, a, b):
return a + b
def test_foo(self):
self.assertEquals(func(a, b), self.original_func(a, b))

Steven D'Aprano

unread,

Jul 3, 2009, 9:11:00 AM7/3/09

to

Neither -- that's essentially a pointless test. The only way to
*correctly* test a function is to compare the result of that function to
an *independent* test. If you test a function against itself, of course
it will always pass:

def plus_one(x):
"""Return x plus 1."""
return x-1 # Oops, a bug.

# Test it is correct:
assert plus_one(5) == plus_one(5)

The only general advice I can give is:

(1) Think very hard about finding an alternative algorithm to calculate
the same result. There usually will be one.

(2) If there's not, at least come up with an alternative implementation.
It doesn't need to be particularly efficient, because it will only be
called for testing. A rather silly example:

def plus_one_testing(x):
"""Return x plus 1 using a different algorithm for testing."""
if type(x) in (int, long):
temp = 1
for i in range(x-1):
temp += 1
return temp
else:
floating_part = x - int(x)
return floating_part + plus_one_testing(int(x))

(The only problem is, if a test fails, you may not be sure whether it's
because your test function is wrong or your production function.)

(3) Often you can check a few results by hand. Even if it takes you
fifteen minutes, at least that gives you one good test. If need be, get a
colleague to check your results.

(4) Test failure modes. It might be really hard to calculate func(arg)
independently for all possible arguments, but if you know that func(obj)
should fail, at least you can test that. E.g. it's hard to test whether
or not you've downloaded the contents of a URL correctly without actually
downloading it, but you know that http://example.com/ should fail because
that domain doesn't exist.

(5) Test the consequences of your function rather than the exact results.
E.g. if it's too difficult to calculate plus_one(x) independently:

assert plus_one(x) > x # except for x = inf or -inf
assert plus_one( -plus_one(x) ) == x # -(x+1)+1 = x

(6) While complete test coverage is the ideal you aspire to, any tests
are better than no tests. But they have to be good tests to be useful.
Even *one* test is better than no tests.

Hope this helps.

--
Steven

Bearophile

unread,

Jul 3, 2009, 10:34:23 AM7/3/09

to

Francesco Bochicchio:

> Possibly to prepare the test data set you might need a
> different - and already proven - implementation of
> the algorithm.

Usually a brute force or slow but short algorithm is OK (beside some
hard-coded input-output pairs).

Sometimes you may use the first implementation of your code that's
usually simpler, before becoming complex because of successive
optimizations. But you have to be careful, because the first
implementation too may be buggy.

Some other times there are simpler algorithms to test if the output of
another algorithm is correct (think of exponential algorithms with a
polinomial test of the correct result), for example to test a fancy
Introsort implementation you can use a very small and simple O(n^2)
sorter, or better a simple linear loop that tests the result to be
sorted.

Here, beside unit tests, are also useful languages (or tools) that
allow to add pre- and post-conditions, class invariants, etc.

Bye,
bearophile

Klone

unread,

Jul 4, 2009, 7:18:50 PM7/4/09

to

Thank you all. I think I get the gist and am about trying your
suggestions.

Lawrence D'Oliveiro

unread,

Jul 5, 2009, 5:54:37 AM7/5/09

to

In message <ddb65af7-8820-4a83-bc92-
c1d1c6...@y17g2000yqn.googlegroups.com>, Klone wrote:

> So in this scenario is it OK to duplicate the algorithm to be tested
> within the test codes or refactor the method such that it can be used
> within test codes to verify itself(??).

I think you should be put on the management fast-track.

David Robinow

unread,

Jul 5, 2009, 8:21:33 AM7/5/09

to pytho...@python.org

Heavens, no. He's too valuable as a managee.