The internals of afl-fuzz aren't documented particularly well outside
the comments in the code, so I figured it may be useful to describe
the underlying mechanics in case anybody wants to replicate it in
another tool, or make improvements or tweaks to the current codebase.
Enjoy!
== The instrumentation ==
The code injected by afl-gcc / afl-clang is essentially equivalent to:
cur_location = $compile_time_random;
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location;
...where shared_mem[] is a 64 kB region of memory created by the
parent process and used to record instrumentation data. The size is
chosen empirically to minimize the likelihood of collisions in typical
targets while keeping the map small enough to allow microsecond-range
hashing / byte counting / etc. SHM is used to minimize the number of
syscalls and context switches when writing or reading the data.
The aforementioned code is injected at the beginning of every function
and at every straightforward conditional branch (there is also some
overspray on non-conditional branches due to the simplicity of
afl-as).
As should be evident, the code doesn't merely track branch directions,
but also keeps track of hit count using an 8-bit counter. It would
have been nicer to use saturating arithmetics, but this would require
additional instructions on x86, so the counter can sometimes overflow.
On the receiving side, the fuzzer uses a simple lookup table to
convert raw counts into more coarse buckets, so that, for example,
branch hit count changing from 1 to 2 would be seen as an interesting
development - but from 40 to 41, not so much.
Based on a fair amount of testing, this form of instrumentation
considerably more useful than simple block or edge coverage; you can
do your own testing by enabling COVERAGE_ONLY in config.h, recompiling
the target, and seeing how far you can get.
You can also try out afl-showmap to preview the captured
instrumentation data without doing any actual fuzzing work.
As should be evident, the instrumentation isn't thread-aware and
out-of-order thread execution may result in spurious variations in the
output bitmap. This is usually very self-limiting, so I haven't put
any effort into fixing it.
== The fork server ==
The afl-fuzz process runs the target binary in a fairly graceful
single-core model where the fuzzer always sleeps while the child is
running and vice versa; I'm trying to preserve this model to permit
effortless parallelization.
To avoid the significant overhead of execve(), we use a nice fork
server model devised by Jann Horn:
http://lcamtuf.blogspot.com/2014/10/fuzzing-binaries-without-execve.html
In essence, the instrumentation stops the program at main() and then
simply calls fork() to create running copy-on-write instances on
demand. This is very fast everywhere except for MacOS X, because the
underlying behavior of fork() on that platform seems to be stuck in
the early 90s (huh). Apparently, running afl-fuzz in a Linux VM on
MacOS X is a lot faster than running it natively.
== The main loop ==
In principle, the fuzzer follows a simple algorithm mentioned in the
README and consisting of several steps:
1) Grabbing the next file from the input queue,
2) If not already done for this file, sequentially removing
variable-length blocks from the input while testing if the recorded
instrumentation output changes in any way. If any portions can be
removed without affecting the recorded path, the old file is replaced
with a new, smaller input.
This is not a substitute for starting with small files, but in many
cases, can make the target program faster and improve the odds of
actually hitting something interesting when making random tweaks (also
see docs/perf_tips.txt).
3) Mutating the file using a bunch of common-sense strategies,
starting with a deterministic pass (e.g., sequential bit flips) that
take time proportional to the size of the file; and then continuing
with random, stacked tweaks that run for a roughly fixed number of
cycles (well, that's a lie - see calculate_score).
One nice thing about afl-fuzz is that I could select the strategies
for optimal yields without having to guess too much:
http://lcamtuf.blogspot.com/2014/08/binary-fuzzing-strategies-what-works.html
The mutation strategies used here are considerably more sophisticated
than tools such as zzuf or honggfuzz, but generally don't try to
reason about the structure of the underlying data - nothing similar to
radamsa or custom template-based tools.
When any of the mutations result in an execution trace that shows a
new state transition (i.e., previously-unseen byte is set) or a
noteworthy change to the coarse hit count, it is added at the end of
the queue. Changes to execution path that don't trigger new state
transitions, and merely alter the ordering or sequence of existing
ones, are not taken into account.
4) Recomputing a small set of "favored" inputs that offer
(approximately) the smallest and fastest-executing subset of all input
files generated so far while including every state transition seen so
far. This is discussed in some additional detail here:
https://groups.google.com/forum/#!topic/afl-users/_4PAO2aZCes
Only the favored inputs are guaranteed to be fuzzed in every queue
cycle. The paths that aren't favored are skipped 99% of the time when
there are not-yet-fuzzed favored entries ahead of them in the queue;
or 90% of the time when there's nothing urgent in the pipeline.
(If you want to do something similar for an existing data set, check
out experimental/minimization_script/).
5) Running all new queue entries from other synced fuzzers (when
running in sync mode) and importing them if they result in any locally
new state transitions being seen (see docs/parallel_fuzzing.txt).
6) If at the end of the queue, increasing the cycle counter and
restarting from the first file on the list.
== The exit criteria ==
The fuzzer never stops on its own, but there are three significant
milestones in the fuzzing process:
1) The completion of the first (and usually longest) queue cycle. This
is essentially the point where the fuzzer has completed deterministic
steps and some modest amount of random fuzzing on all the recursively
discovered favored paths.
2) Completion of deterministic steps on all input paths - "pending" on
the status screen reads "0". This usually takes several dozen queue
cycles and usually indicates fairly thorough coverage.
3) The point where no new paths are seen for a week or so, suggesting
that there probably isn't much more to be found (perhaps unless you
parallelize).
== Crash and hang deduplication ==
Hangs and crashes are deemed "unique" if their traces show branches
that have not been seen in any previously-recorded fault. This is a
compromise solution somewhat similar to looking at the backtrace in
gdb and marking crashes as interesting if there is any new function
seen in the call path.
The alternative of looking at the faulting instruction or the last
instrumentation data point is problematic, because it can group
together unrelated faults simply because they all happen in a common
function such as strcpy() or free(). What may be obvious to a human
being is fairly hard to decide programatically, so we err on the side
of verbosity.
The "new branch" approach can lead to some count inflation early in
the game, but the effect should quickly taper off.
== Session resume ==
Sessions can be seamlessly resumed by using the old output directory
(-o) as the new input (-i). The fuzzer loses the current position in
the queue cycle and will start over from the first file, but it does
remember which files have gone through deterministic testing, speeding
things up.
== The -C mode ==
The feature is described here:
http://lcamtuf.blogspot.com/2014/11/afl-fuzz-crash-exploration-mode.html
It essentially works in a manner identical to the normal mode, except
that mutations are rejected if they no longer cause a crash. I found
this to be extremely useful for evaluating the impact of crashes
without digging into the codebase.
/mz