Mutation testing anyone?

henry

unread,

Sep 3, 2012, 2:15:27 PM9/3/12

to java...@googlegroups.com

I've been spending an unhealthy proportion of my free time working on a new mutation testing system (http://pitest.org), mainly because it's been quite good fun, but also because I wanted to address the issues I saw in the previous generation of mutation testers - far too slow, difficult to use, in the habit of not working when you add some innocent sounding technology into the build.

I'm beginning to build up a user base, but largely of developers that have not previously tried mutation testing. I was wondering if anyone in this group had any real world experience of using other mutation testers (e.g http://jester.sourceforge.net/, http://jumble.sourceforge.net/) and how this went, what problems were encountered etc.

Thanks

Henry

Ricky Clarkson

unread,

Sep 4, 2012, 8:41:57 AM9/4/12

to java...@googlegroups.com

There was another product in this space, by the name of Agitator if I recall, and ultimately it failed because few actually have test suites that are complete enough to be candidates. Even Kent Beck doesn't care about coverage[1] and these frameworks aren't going to help anyone with much less than 100% test coverage.

At that point your tests might just be testing that the code does what the code does, not what it should do.

Such bright people as yourself, in my opinion at least, could do a lot of useful work in static analysis, proving code correct instead of proving that a project's tests don't prove its code correct, which is a given.

[1] http://stackoverflow.com/questions/153234/how-deep-are-your-unit-tests/ top answer

--
You received this message because you are subscribed to the Google Groups "Java Posse" group.
To view this discussion on the web visit https://groups.google.com/d/msg/javaposse/-/JoR9CaBOi-MJ.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.

henry

unread,

Sep 4, 2012, 4:32:47 PM9/4/12

to java...@googlegroups.com

On Tuesday, September 4, 2012 1:42:03 PM UTC+1, Ricky Clarkson wrote:

There was another product in this space, by the name of Agitator if I recall

There looks to still be a product knocking around with this name (http://www.agitar.com/solutions/products/software_agitation.html). It doesn't seem to relate to mutation testing though.

I'd expect a commercial mutation testing tool to be a hard sell, particularly going back a few years. The landscape has changed quite a lot now though, with TDD putting a much heavier emphasis on unit testing. I'm having moderate success at giving a tool away for free.

Even Kent Beck doesn't care about coverage[1] and these frameworks aren't going to help anyone with much less than 100% test coverage.

I don't think that is quite what Kent is saying, for one thing he doesn't mention coverage. What he does say is

"my philosophy is to test as little as possible to reach a given level of confidence"

So the question is, what is the level of confidence you are aiming at? Code coverage in general, and mutation testing in particular, gives you a tool by which you can gauge the level of confidence you can have in a test suite, but doesn't force you to aim at any particular level.

I can't definitively state what Kent meant by his answer back in 2008, or what's going through his head now. What I can say is that he's interested enough in coverage and mutation testing to have given PIT a try.

At that point your tests might just be testing that the code does what the code does, not what it should do.

Such bright people as yourself, in my opinion at least, could do a lot of useful work in static analysis, proving code correct instead of proving that a project's tests don't prove its code correct, which is a given.

Fundamentally a unit test only ever tests that your code "does what it does".

When first developing a feature the degree to which your unit tests reflect what your code should be doing is a function of how well your processes have captured and communicated the requirements, and the competence of the programmer. Once the feature has gone through some form of validation (UAT, shipping & use) and any required face palming and rework has been performed, then there should be much higher confidence that the code "does what it should".

Prior to validation both mutation testing and static analysis can help with the later part of that function, and identify bugs where the code does not match the programmers intent. But only if you have a test suite that you trust confirms the code "does what it does" can you re-factor and add new features without fear of regression.

Static analysis could only give you this if you provided it with some sort of specification of what the program should do . . . kind of like a unit test.

Ricky Clarkson

unread,

Sep 4, 2012, 9:24:55 PM9/4/12

to java...@googlegroups.com

> "my philosophy is to test as little as possible to reach a given level of
> confidence"
>
> So the question is, what is the level of confidence you are aiming at? Code
> coverage in general, and mutation testing in particular, gives you a tool by
> which you can gauge the level of confidence you can have in a test suite,
> but doesn't force you to aim at any particular level.

I'd guess that mutation testing will find problems in test suites that
otherwise give you reasonable confidence because the test suites are
tailored for regressions that the project's developers are likely to
make, so you'd have to have 100% coverage and very strict tests to
pass mutation testing. Perhaps I'm being overly pessimistic and you
do optimise for common mistakes, e.g., passing null/returning null,
depending on mutable values in equals, etc.

> I can't definitively state what Kent meant by his answer back in 2008, or
> what's going through his head now. What I can say is that he's interested
> enough in coverage and mutation testing to have given PIT a try.

Excellent.

> Fundamentally a unit test only ever tests that your code "does what it
> does".

I'd have thought that tests should test that your code does what it
should do. I mean, it's obvious that it does what it does, why test
that? Sure, regressions, but if you end up with 100 test failures
because you deliberately change the behaviour of something will you
spot that one test that fails for a 'valid' reason instead of just
failing because your code changed, or will you just fix all tests by
updating them for the new behaviour?

> Static analysis could only give you this if you provided it with some sort
> of specification of what the program should do . . . kind of like a unit
> test.

You can also call it a type signature, although that's a bit of a
stretch outside of dependently-typed languages.

Reply all

Reply to author

Forward