"FidgetyAFL" implemented in 2.31b

1,494 views
Skip to first unread message

Michal Zalewski

unread,
Aug 18, 2016, 4:45:28 PM8/18/16
to afl-users
While trying to do some initial experiments with the AFLFast fork
described in an earlier thread, I noticed that at least in my setup,
two very simple changes to stock AFL seem to produce at least roughly
comparable performance gains - significantly improving coverage in
libpng, libjpeg-turbo, zlib, and GNU patch within the first hours of
fuzzing.

The net effect of the changes is that when a new path is discovered,
AFL gets to it much sooner, works on it a bit less, and the average
queue cycle is shorter.

This is now implemented in AFL 2.31b. If you're running a parallelized
(-M / -S) setup, you should get all the benefits automatically.
Otherwise, the changes may offer a slight bump in the normal mode, but
should offer a marked improvement when using -d.

Many thanks to the authors of the CCS paper for giving me some food
for thought and experimenting with the constants for the "havoc"
stage; I'm hoping they can have a look at my results and see if they
reproduce the findings. In the meantime, I'll try to evaluate their
implementation more thoroughly and experiment with several other
ideas.

Testing welcome. I'd be interested to hear about any cases where the
new constants consistently deliver much worse or much better results
than before =)

Michal Zalewski

unread,
Aug 18, 2016, 9:39:22 PM8/18/16
to afl-users
Here's a plot for GNU patch, starting with testcases/others/text,
fuzzed with -d:

http://lcamtuf.coredump.cx/afl/old_new.png
Message has been deleted

Thuan Pham

unread,
Aug 18, 2016, 11:29:35 PM8/18/16
to afl-users
Hi Michal,
Thank you for your quick response. We are running some experiments to compare AFLFast and AFL-2.31b on Binutils utilities (the benchmark we used in our CCS'16 paper). We will share the results here once the experiments have been done.
Thuan

Kurt Roeckx

unread,
Aug 21, 2016, 4:30:53 PM8/21/16
to afl-...@googlegroups.com
On Thu, Aug 18, 2016 at 01:45:06PM -0700, Michal Zalewski wrote:
> While trying to do some initial experiments with the AFLFast fork
> described in an earlier thread, I noticed that at least in my setup,
> two very simple changes to stock AFL seem to produce at least roughly
> comparable performance gains - significantly improving coverage in
> libpng, libjpeg-turbo, zlib, and GNU patch within the first hours of
> fuzzing.
>
> The net effect of the changes is that when a new path is discovered,
> AFL gets to it much sooner, works on it a bit less, and the average
> queue cycle is shorter.
>
> This is now implemented in AFL 2.31b. If you're running a parallelized
> (-M / -S) setup, you should get all the benefits automatically.
> Otherwise, the changes may offer a slight bump in the normal mode, but
> should offer a marked improvement when using -d.

I just upgraded from 2.30b to 2.32b. Of the 2 tests I was still
running, they each gave like 1 new path a day, and the cycles
done turned to green after a few hours. On one of them it found
just 100 new ones in 10 minutes time, the other had 36 new ones in
the first 10 minutes.

So I started one of the other ones, it also found 40 new in the
first 15 minutes.

But after that first burst it seems to slow down again.


Kurt

Kurt Roeckx

unread,
Aug 22, 2016, 10:27:06 AM8/22/16
to afl-...@googlegroups.com
That's of course because it starts to do other things than havoc
on the new files it found.


Kurt

Simon Pinfold

unread,
Sep 1, 2016, 7:47:03 PM9/1/16
to afl-users
So here are my findings for comparing afl-2.30b with afl-fidgety (2.32b)

As mentioned this is for libxml, tcpdump, libjpeg and libpng at 1, 2, 4, 8 and 16 slave instances (with no master) synchronising every 30 minutes.
For instance counts less than 16 I ran 4 independent tests on 2.30b for each and for 16 I ran 2 independent tests. 
I then ran a single test at each size on afl-fidgety. The thinking is that by having multiple runs on 2.30b we are better able to determine when afl-fidgety diverges from 2.30b rather than just looking at differences that would exist within runs of 2.30b.

I compared map_size and paths_total for each test.

For libxml afl-fidgety showed faster discovery of paths and increased map coverage for the first day or two of running.
afl-fidgety had better map coverage for the first 12 hours at which point it become even with 2.30.
afl-fidgety maintained better path coverage for the 1-3 days depending on how many instances were working - with more instances 2.30 was able to catch up to afl-fidgety faster
Graphs: 


For tcpdump we see similar results with a faster increase of map coverage for the first few days. 
We also see vastly improved path discovery for the first 5 days - slightly more for the single instance and slightly less for 8 and 16.
Graphs:


For djpeg we see improved performance at fist until 2.30b is able to catch up.
Graphs:


And finally for libpng there is slight increase in map coverage and better path discovery for the first portion of fuzzing.
Graphs:


That's all I have for now - I will check out afl-fast and see if I can add it to the comparison.
I hope this is helpful.

This testing was performed as part of a study into using AFL at different scales and is supported by the AWS Cloud Credits for Research program.

Cheers,
Simon

Marcel Böhme

unread,
Sep 9, 2016, 11:43:25 AM9/9/16
to afl-users

tldr; Indeed, FidgetyAFL had an edge over our old version of AFLFast. However, our new version well outperforms FidgetyAFL in terms of time-to-exposure (at least for our subjects).


On Friday, 19 August 2016 04:45:28 UTC+8, Michal Zalewski wrote:

Many thanks to the authors of the CCS paper for giving me some food
for thought and experimenting with the constants for the "havoc"
stage; I'm hoping they can have a look at my results and see if they
reproduce the findings. In the meantime, I'll try to evaluate their
implementation more thoroughly and experiment with several other
ideas.

Testing welcome. I'd be interested to hear about any cases where the
new constants consistently deliver much worse or much better results
than before =)

Hi Michal,

FidgetyAFL was a hard nut to crack :) From our own experiments, we already knew that EXPLORE (a constant schedule that assigns low energy -- now implemented in FidgetyAFL) outperforms all other schedules at least for the first 5 hours on our subjects. While the other schedules would overtake EXPLORE later on, the initial sprint is sufficient to outperform FAST at least in terms of time-to-exposure -- the measure of efficiency which might be more interesting than the number of unique crashes generated after 6h/12h/24h. So, FidgetyAFL with -d seems to outperform the old version of AFLFast (FAST) with -d.

FidgetyAFLFast: In the recent days, I found some time to sit down and integrate both, FidgetyAFL (2.23b) and AFLFast.old, and to conduct some preliminary experiments. I re-added some heuristics from AFL, that I removed earlier intending to focus AFLFast on low-frequency paths. Turns out without these heuristics, AFLFast generated too large test cases that took too long to execute swiftly (unfortunately, it is easy to generate more slow seeds from already slow seeds). We also do not re-order the queue anymore. In the new version of AFLFast, the main idea is still to increase energy* as the same seed is being chosen more and more often and to assign proportionally more energy* to seeds exercising lower-frequency paths. But in contrast to the old version, the new version of AFLFast also outperforms FidgetyAFL even in the initial 5 hours for our subjects (graphs for 10 runs of two binutils tools follow).

energy*: A seed with more "energy" is fuzzed for a longer time.

Deterministic Stage: As you said earlier, the deterministic stage is an important component of AFL (e.g., to fuzz existing (valid) seed files thoroughly). So, I added the deterministic stage back in -- but with the following heuristic: Seeds with low depth (e.g., original seeds and their "children") will have the deterministic stage first thing they are chosen while seeds with high depth (descendants that come from an original seed via several intermediate seeds) need to have sufficiently high energy to be considered for the deterministic stage (note that our schedules increase energy up to a maximum for "promising seeds"). I haven't really evaluated this choice for the deterministic stage, yet, or the concrete constants that I have chosen (coming up!). However, I would expect the deterministic stage to be not very effective once we are "too far away" from the original seed (the rapid mixing problem). Let me know if you share this intuition or think otherwise.

I'll be posting more details and graphs about our tiny evaluation later. I might also play with the constants for a bit and see what works and what doesn't. Of course, the mailing list is more than welcome to conduct your own experiments.

@Michal: You can inspect our changes w.r.t. AFL 2.23b here: https://github.com/mirrorer/afl/compare/master...mboehme:master. The Fidgety-AFLFast-hybrid is available here: https://github.com/mboehme/aflfast.

Cheers!
Marcel

Marcel Böhme

unread,
Sep 10, 2016, 12:49:04 PM9/10/16
to afl-users
Hi Michal,
 
I'll be posting more details and graphs about our tiny evaluation later. 

Here are some results when skipping the deterministic stage (-d option) -- comparing FidgetyAFL, the old version of AFLFast (AFLFast.old), and the new hybrid version of AFLFast (AFLFast.new) for two of our binutils subjects (average over 6 runs x 12 hours x {nm,cxxfilt} x {FidgetyAFL, AFLFast.old, AFLFast.new} = 36 runs on a 40-core PC).

Map_size: AFLFast.new achieves the same map_size about 2x faster than FidgetyAFL.
For instance, for nm, AFLFast.new needs 1.5hours while FidgetyAFL needs 4 hours to achieve the same 10%. For cxxfilt, AFLFast.new needs 3hours while FidgetyAFL needs 7 hours to achieve the same 11.25%.

Unique_crashes: AFLFast.new achieves the same unique_crashes about 2x faster than FidgetyAFL. 
For instance, for nm, AFLFast.new needs about 4 hours while FidgetyAFL needs about 12hours to achieve 750 unique crashes. For cxxfilt, AFLFast.new needs 6 hours while FidgetyAFL needs about 10hours to achieve 1500 crashes.

Total_paths: AFLFast.new explores the same number of paths about 3x faster than FidgetyAFL.
For instance, for nm, AFLFast.new needs about 3 hours to explore 10k paths while FidgetyAFL needs about 11 hours. For cxxfilt, AFLFast.new needs about 4.5 hours while FidgetyAFL needs about 11 hours to explore about 15k paths.

Time to exposure: AFLFast.new exposes the same vulnerabilities significantly faster than FidgetyAFL. Moreover, in 12hours AFLFast.new produces about 2x as many unique crashes (279) that cannot be assigned one of the known vulnerabilities as FidgetyAFL (158).




Cheers!

- Marcel


Reply all
Reply to author
Forward
0 new messages