Slow havoc stage

244 views
Skip to first unread message

lukas...@gmail.com

unread,
May 3, 2015, 4:30:13 PM5/3/15
to afl-...@googlegroups.com
Hi,

I'm currently fuzzing a complex target using a single 60-bytes initial testfile. The deterministic stages clock at around 300 execs per second, the havoc stage however drops down to 35-60 execs per second. Is there an inherent reason for this drop in performance within afl-fuzz or should I check if the (patched) target somehow screws up? The input passed via stdin is allowed to grow to 65kb but (currently) does not grow beyond ~120 bytes.

Only somewhat related to that: The havoc stage doubles the amount of total executions if a new execution path was found during the current round. I'm not sure if this is cpu time spent wisely: If a random change in a testfile causes the execution path to differ then we can only assume the chance for every bit in the input to alter the execution path being a constant. Doubling the execution time overvalues the finding with respect to the other input files. afl-fuzz should just continue to the next input (and come back to the current input again later on).

Best regards
Lukas

Michal Zalewski

unread,
May 3, 2015, 4:52:48 PM5/3/15
to afl-users
> I'm currently fuzzing a complex target using a single 60-bytes initial
> testfile. The deterministic stages clock at around 300 execs per second, the
> havoc stage however drops down to 35-60 execs per second. Is there an
> inherent reason for this drop in performance within afl-fuzz or should I
> check if the (patched) target somehow screws up?

The only difference is that havoc can make more striking changes to
the file, including making it shorter or (somewhat) longer. Perhaps
the program doesn't take shorter / longer inputs kindly?

However, for a program that indeed executes at 300 execs/sec
initially, I'm not sure if 35 execs/sec is possible. For fast
programs, -t should be auto-calibrated to 20 ms, capping the speed
closer to 50 exec/sec.

> Only somewhat related to that: The havoc stage doubles the amount of total
> executions if a new execution path was found during the current round. I'm
> not sure if this is cpu time spent wisely: If a random change in a testfile
> causes the execution path to differ then we can only assume the chance for
> every bit in the input to alter the execution path being a constant.

Some paths are dead ends and we want to be over with them quickly;
hence the initial cap of 5k execs per havoc stage, adjusted slightly
based on speed and a couple of other factors. But when we have
evidence that fuzzing a particular input is productive, we want to
give them a bit more air time.

Every additional find cements the possibility that it's not just a
random fluke. There is an upper cap for the number of execs to make
sure we move on fairly quickly anyway. To be honest, I suspect that
the cycle increase method is non-critical (+const number, x1.5, x2,
x4, jump straight to max, etc), since it will ultimately account only
for a small percentage of all execs carried out. But you're welcome to
experiment with various approaches for different benchmark binaries
and see if anything works markedly better than x2.

Cheers,
/mz

Sami Liedes

unread,
May 3, 2015, 5:12:12 PM5/3/15
to afl-users
On Sun, May 03, 2015 at 01:52:26PM -0700, Michal Zalewski wrote:
> for a small percentage of all execs carried out. But you're welcome to
> experiment with various approaches for different benchmark binaries
> and see if anything works markedly better than x2.

Hmm. I think it should be possible to be more rigorous about measuring
the optimum tuning for different binaries and starting test cases. If
we cache all the results, re-running with different tunables but the
same binary and starting test cases should be a lot faster, so it
should be possible to explore the tunable space to see what works. If
I find some time, I might try to do some measurements...

I think I mentioned that caching seems to help a lot also inside a
single run, since a lot of the test cases produced by afl are
identical (did I?).

Sami
signature.asc

Michal Zalewski

unread,
May 3, 2015, 5:48:42 PM5/3/15
to afl-users
> Hmm. I think it should be possible to be more rigorous about measuring
> the optimum tuning for different binaries and starting test cases. If
> we cache all the results, re-running with different tunables but the
> same binary and starting test cases should be a lot faster, so it
> should be possible to explore the tunable space to see what works. If
> I find some time, I might try to do some measurements...

I've done a lot of that, but generally focusing on stuff that accounts
for most of the overall exec count. The havoc ramp-up strategy doesn't
matter much in that context, at least that's my current belief.

> I think I mentioned that caching seems to help a lot also inside a
> single run, since a lot of the test cases produced by afl are
> identical (did I?).

What percentage of duplicates are you seeing? If it's a significant
percentage, it's something to fix.

There is a performance cost to calculating a hash of every generated
input and keeping a huge lookup table eventually spanning billions of
inputs, but if you're seeing a lot, it may be worth doing. I'm a bit
surprised, though. Afl-tmin generates some redundant test cases, and
so do the minimization and calibration stages in afl-fuzz, but the
bulk of the fuzzing should be fairly non-repetitive.

/mz

lukas...@gmail.com

unread,
May 4, 2015, 1:34:14 AM5/4/15
to afl-...@googlegroups.com

There is a performance cost to calculating a hash of every generated
input and keeping a huge lookup table eventually spanning billions of
inputs, but if you're seeing a lot, it may be worth doing.

 It's worth the trouble if the cost of execution is much larger than the cost of identifying a dupe. Consider slow moving targets that have ramp-up costs of initializing a parser (e.g. libpython). One might use a bloom filter instead of a hash table to keep track of which inputs already went through the binary. Since the chances of a getting a false positive in the filter are known in advance, one could skip inputs it probably already ran only with a certain probability. If we plan on 100 million executions, a filter of around 60mb with three hash-functions keeps the chance of a false positive below ten percent. If the number of executions grows beyond 10^9, things become impractical without on-disk storage.

Michal Zalewski

unread,
May 4, 2015, 1:48:53 AM5/4/15
to afl-users
> It's worth the trouble if the cost of execution is much larger than the
> cost of identifying a dupe.

I don't think that's true?

It's worth the trouble only if (num_dupes * cost_of_exec) >>
(total_inputs * time_to_check_each_input_for_dupes).

I don't expect there to be a lot of dupes, which would make this
wasteful. If anybody wants to measure it and finds otherwise, it makes
sense to implement some dupe-related optimizations, but doing them
just in case is harmful.

> One might use a bloom filter instead of a hash table to keep track of which
> inputs already went through the binary.

Bloom filters have the same computational cost, just use less memory.

/mz

Michal Zalewski

unread,
May 4, 2015, 2:11:35 AM5/4/15
to afl-users
I've done some rudimentary testing. I don't see many dupes during the
havoc stage (~1% wasted execs over 100k execs with a 500 byte file),
but there were a bit more than I expected during the deterministic
stages. I suspect this may be due to a bug in some of the existing
dupe-avoidance checks, which should be fairly easy to fix.

/mz

Michal Zalewski

unread,
May 4, 2015, 3:29:51 AM5/4/15
to afl-users
Basically, in files that are mostly zeros, the dupe checks for
interesting 16 and interesting 32 are way too optimistic and produce
quite a few collisions. This can be fixed without hash lookups, I just
need to write more rigorous checks to confirm that a particular value
couldn't have been generated by other deterministic stages.

The number of collisions for havoc isn't huge, but could be probably
lowered by making sure that the minimum stacking is 2, not 1. Would
need to test what impact it has on coverage.

/mz

Sami Liedes

unread,
May 4, 2015, 4:16:32 AM5/4/15
to afl-users
On Sun, May 03, 2015 at 10:48:31PM -0700, Michal Zalewski wrote:
> > It's worth the trouble if the cost of execution is much larger than the
> > cost of identifying a dupe.
>
> I don't think that's true?
>
> It's worth the trouble only if (num_dupes * cost_of_exec) >>
> (total_inputs * time_to_check_each_input_for_dupes).
>
> I don't expect there to be a lot of dupes, which would make this
> wasteful. If anybody wants to measure it and finds otherwise, it makes
> sense to implement some dupe-related optimizations, but doing them
> just in case is harmful.

It might be that I overestimate the overhead caused by forking and
file I/O, but somehow it seems hard to believe that hashing could
cause a measurable difference with even the most trivial test program.
Even a cryptographic hash function like SHA1 can crunch several
gigabytes per second single-threaded, and you hardly need a
cryptographic hash for AFL.

Of course there's also an implementation complexity cost; if you
really can get to the point where there are very few dupes, it
probably is just not worth it. Anyway, good to hear you've measured
this.

You mentioned you tried with a 500 byte input file. What about much
smaller files, like 5-50 bytes? It seems to me that collisions in the
havoc stage should be much more likely with smaller files, but then I
only have a vague idea about the stuff done by havoc.

Then there's of course collisions between different workers, but as
long as you have entirely separate workers, that's going to be hard to
fix.

Sami
signature.asc

Michal Zalewski

unread,
May 4, 2015, 4:35:37 AM5/4/15
to afl-users
> It might be that I overestimate the overhead caused by forking and
> file I/O, but somehow it seems hard to believe that hashing could
> cause a measurable difference with even the most trivial test program.

With fast binaries (2-4k execs/sec), larger memory operations -
including hashing the bitmap or even scanning it for new bits - start
to have a very measurable performance impact in afl-fuzz. There's
already a bunch of optimizations in place to counter that.

Of course, I don't object to incorporating some hypothetical
near-zero-overhead hashing scheme, but having spent far too much time
fine-tuning all the bitmap manipulation functions so that they don't
cost us much with 64 kB output maps - and seeing the performance drop
when you go with, say, 256 or 512 kB - I'm kinda skeptical.

> You mentioned you tried with a 500 byte input file. What about much
> smaller files, like 5-50 bytes? It seems to me that collisions in the
> havoc stage should be much more likely with smaller files, but then I
> only have a vague idea about the stuff done by havoc.

Sure. Tiny files are usually less interesting because it's easier to
get sorta-exhaustive coverage, but in general, with smaller files or
longer run times, the likelihood of dupes goes up.

/mz

Michal Zalewski

unread,
May 5, 2015, 2:25:42 AM5/5/15
to afl-users
Good dupe detection and reduced dupes for havoc are both done (phew,
proved to be a bit PITA).

/mz
Reply all
Reply to author
Forward
0 new messages