>I'd love to see a "make fuzz" target in most of FLOSS projects.
The problem isn't the 'make fuzz' part (which is exactly what I use in my
code), it's the 'run fuzz' part. My standard regression tests run in
a couple of minutes, the fuzzing runs for months.
Peter.
>What I don't understand is why the code coverage achieved even by the
>native OpenSSL and OpenSSH test suites (counted by lines) is barely
>cracking 50%.
That's pretty simple, (a) you can't easily generate all the test cases needed
to exercise all the code (you could probably spend close to a lifetime doing
test cases just for basic TLS + certificates), and (b) a lot of those paths
will be error handling which are equally hard to test because you can't easily
simulate all those errors (a subset of this is the Zeno's-paradox issue of
testing the code that tests the error handling, which in turn needs to be
tested...).
So AFL's great contribution is that you can generate at least some of the
zillions of test cases that are required to exercise all the code paths.
In terms of error handling, a friend of mine once tested a widely-used OSS
code base by instrumenting the malloc function so that it failed on subsequent
invocations. So the first time the code was run, it failed on the first
alloc, the second time it failed on the second alloc, and so on. All this was
doing was testing the out-of-memory error handling. Luckily, he'd had the
foresight to cap the coredump count at 1,000 before he ran the test code...
Peter.
>I think it still useful to have a `make fuzz` that runs AFL for a few
>minutes, or until the user cancels it. In some open source project AFL finds
>multiple bugs in < 5 minutes! And if the devs let it run overnight once in a
>while they can only benefit from the deeper coverage. Lastly, integration with
>CI servers could make a full run feasible at least once.
That one's really more of a user-education problem I think, for each OSS
project someone needs to sit down and create the build environment/scripts and
generate the test cases. Having said that, AFL is the first fuzzer I've found
that doesn't require insane amounts of effort to produce useful results, which
makes it much easier to demonstrate its utility to a sceptical developer, and
thereby encourage them to use it further.
Peter.
Michael Rash <michae...@gmail.com> writes:
>What I don't understand is why the code coverage achieved even by the
>native OpenSSL and OpenSSH test suites (counted by lines) is barely
>cracking 50%.
That's pretty simple, (a) you can't easily generate all the test cases needed
to exercise all the code (you could probably spend close to a lifetime doing
test cases just for basic TLS + certificates), and (b) a lot of those paths
will be error handling which are equally hard to test because you can't easily
simulate all those errors (a subset of this is the Zeno's-paradox issue of
testing the code that tests the error handling, which in turn needs to be
tested...).
So AFL's great contribution is that you can generate at least some of the
zillions of test cases that are required to exercise all the code paths.
In terms of error handling, a friend of mine once tested a widely-used OSS
code base by instrumenting the malloc function so that it failed on subsequent
invocations. So the first time the code was run, it failed on the first
alloc, the second time it failed on the second alloc, and so on. All this was
doing was testing the out-of-memory error handling. Luckily, he'd had the
foresight to cap the coredump count at 1,000 before he ran the test code...
Peter.
>Ok, couldn't the AFL test cases be incorporated into the OpenSSL test suite
>directly as a means to expand code coverage?
Uhh, all 20 million of them?
Peter :-).
>OpenSSL has a pretty large library of test cases as-is, and surely many of
>the AFL test cases would be quite short.
>
>A representative 1000 cases of up to a kilobyte wouldn't be too bad, right?
I'm not sure what this would achieve though because it's static data rather
than the dynamic mutations that AFL does. Let's say you take an input set of
20M test cases and, using some sort of optimisation algorithm (toplogical
sorting or simulated annealing or choose your pet method) pick out the 1,000
that provide the best coverage of code paths. Presumably you're going to
correct any problems that these find before you ship, so what you've got is a
regression test with 1,000 test cases (alongside the existing test suite). If
you change any of the code then you lose coverage, because the (static) set of
1,000 cases can't adapt to new code paths. So as an extension of the existing
test suite it's fine, but due to its static nature I can't see how it can
exercise newly-added code, for that you actually need to run AFL on it.
(What we really need is someone with a spare supercomputer to offer cycles for
OSS projects).
Peter.
(What we really need is someone with a spare supercomputer to offer cycles for
OSS projects).
Peter.