AFLGo: Directing AFL to reach specific target locations

957 views
Skip to first unread message

Marcel Böhme

unread,
Aug 24, 2017, 1:55:37 AM8/24/17
to afl-users
Dear all,

AFLGo extends AFL such that the fuzzer can be directed towards a given set of target locations. The corresponding paper, Directed Greybox Fuzzing, has just been accepted at the ACM Conference on Computer and Communications Security (CCS) 2017.


Given a set of target locations (e.g., folder/file.c:582), AFLGo generates inputs specifically with the objective to exercise these target locations. Unlike AFL, AFLGo spends most of its time budget on reaching specific target locations without wasting resources stressing unrelated program components. This is particularly interesting in the context of
  • patch testing by setting changed statements as targets. When a critical component is changed, we would like to check whether this introduced any vulnerabilities. AFLGo, a fuzzer that can focus on those changes, has a higher chance of exposing the regression.
  • static analysis report verification by setting statements as targets that a static analysis reports as potentially dangerous or vulnerability-inducing. When assessing the security of a program, static analysis tools might identify dangerous locations, such as critical system calls. AFLGo can generate inputs that actually show that this is indeed no false positive.
  • information flow detection by setting sensitive sources and sinks as targets. To expose data leakage vulnerabilities, a security researcher would like to generate executions that exercise sensitive sources containing private information and sensitive sinks where data becomes visible to the outside world. A directed fuzzer can be used to generate such executions efficiently.
  • crash reproduction by setting method calls in the stack-trace as targets. When in-field crashes are reported, only the stack-trace and some environmental parameters are sent to the in-house development team. To preserve the user's privacy, the specific crashing input is often not available. AFLGo could help the in-house team to swiftly reproduce these crashes.
  • critical-component testing. In some cases, a security researcher wants to fuzz only specific components that she deems critical to the security of the system. AFLGo can be directed towards those critical components and does not waste resources fuzzing unrelated components.
We have integrated AFLGo as patch testing tool into OSS-Fuzz. Given a specific commit of any integrated open-source library or program, AFLGo is directed specifically towards the changed statements in the given commit. This allows us to find vulnerabilities faster, right when they are introduced. This makes OSS-Fuzz a truly fully-automated continuous integration (CI) platform.


Let me know if you have any questions!

Cheers!
- Marcel

Konstantin Serebryany

unread,
Aug 30, 2017, 9:22:05 PM8/30/17
to afl-...@googlegroups.com
Marcel, 

Thanks for sharing the links. And thanks for advertising OSS-Fuzz in your paper :) 
Some comments follow. 

Heartbleed is not the best example to demonstrate the advantages of a fuzzing engine because 
AFL and libFuzzer already find it in < 1 minute (AFL -- in persistent mode; I've just verified with 2.50b). 
We don't need to improve fuzzing for shallow bugs like this -- we just need to ensure the code is fuzzed continuously. 

These are two different bugs in openssl, both of which took us many CPU years to find. 
Now we know where the bugs are -- can we find them quicker with AFLGo? 
If you can, that will support your claims from Section 7. 

The existence of virtual/indirect function calls makes the distance function incorrect (as you mention in 5.1),
i.e. this method is less suitable for cases where you can't construct full CG at compile-time. 

The general idea of computing certain properties of the CFG/CG at compile-time and then using it while fuzzing is great.
libFuzzer is also moving in that direction, but you've done more. 

Giving different weights to different seeds is (obviously?) very important, but I am not convinced that your approach is best.
Focusing on the shortest path to the target(s) is not necessary the winning strategy. 
What if the shortest path is not actually possible? 
If you concentrate on the shortest path and give less "energy" to the longer path(s) you just make things worse. 
(But that's just my hand-waving) 

In section 5 you compare the ability of Katch and AFLGo to find bugs in binutils. 
But wouldn't it be more interesting to compare AFL vs AFLGo? (Or did I miss it?)

In section 6 you test your integration with OSS-Fuzz and it may sound like you've found 26 bugs that OSS-Fuzz didn't. But 
  libxml2/expat/libc++abi: there were a few bugs there that were kept private according to OSS-Fuzz's disclosure policy. It's likely that you re-discovered some of those. 
  libming/libdwarf/libav: we don't have them on OSS-Fuzz (I wish we did though)
I.e. Section 6 does not actually demonstrate the advantages of AFLGo over what we use in OSS-Fuzz (libFuzzer+AFL). 
(It just demonstrates that AFLGo can find tons of bugs; but the same is true for vanilla AFL)

Thanks! 

--kcc 


--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marcel Böhme

unread,
Aug 31, 2017, 9:09:03 AM8/31/17
to afl-...@googlegroups.com, Thuan Pham
Hi KCC,

Thanks for sharing the links. And thanks for advertising OSS-Fuzz in your paper :) 
Strongly believe OSS-Fuzz is the right path towards a more secure open-source software landscape! 
BTW: Great talk at Usenix Security :) 

These are two different bugs in openssl, both of which took us many CPU years to find. 
Now we know where the bugs are -- can we find them quicker with AFLGo? 
We will definitely look at it and get back to you with the results.

The existence of virtual/indirect function calls makes the distance function incorrect (as you mention in 5.1),
i.e. this method is less suitable for cases where you can't construct full CG at compile-time. 
TL;DR: We implemented tracing but scratched it again because it was too slow.
Originally, we implemented a tracing function (inspired by Cristian Holler’s LLCov). We executed the seeds 
and/or the existing test suite (e.g., make test) which would generate a trace of executed functions and BBs.
This helped  to increase the completeness of the CFGs and CGs. However, in our preliminary experiments
it seemed to take too much time to actually incorporate the trace into the CFGs and CGs. So, we skipped it.
However, some old scripts and code is still there. You can define AFLGO_TRACING in config.h and set the
environment variable AFLGO_PROFILER_FILE to compile with tracing. The script add_edges.py will take
that file and connect the traced nodes in the given *.dot file. No guarantees though :)

If someone solves the incompleteness problem for CG / CFGs in LLVM, our distance measures should
improve, too. But as long as the CG is “sufficiently” complete, our distance values should be pretty good
quality.

The general idea of computing certain properties of the CFG/CG at compile-time and then using it while fuzzing is great.
libFuzzer is also moving in that direction, but you've done more. 
Nice. Pretty sure, there will be much work in this direction from the community.

Giving different weights to different seeds is (obviously?) very important, but I am not convinced that your approach is best.
Focusing on the shortest path to the target(s) is not necessary the winning strategy. 
What if the shortest path is not actually possible? 
If you concentrate on the shortest path and give less "energy" to the longer path(s) you just make things worse. 
This is being taken care of by the Simulated Annealing meta-heuristic. You can think of it as giving the distance
values more and more “importance" as the fuzzing continues. The beginning of AFLGo is pretty much AFL. At the end,
AFLGo is pretty much Hill Climbing (i.e., mostly fuzzes “close” seeds). In SA-speak, we distinguish between an
exploration phase and an exploitation phase. The exploitation uses the seeds found after sufficient exploration.

In section 5 you compare the ability of Katch and AFLGo to find bugs in binutils. 
But wouldn't it be more interesting to compare AFL vs AFLGo? (Or did I miss it?)
The hypothesis we tested in Sec. 5 was that Directed Greybox Fuzzing performs better (or worse) than
Directed Symbolic Execution for patch testing (modulo threats to construct validity, meaning if Google puts
its weight behind Klee and derivates, who knows how badly AFL (and derivatives) would perform :) 
However, we do compare AFLGo with AFL in another section.

In section 6 you test your integration with OSS-Fuzz and it may sound like you've found 26 bugs that OSS-Fuzz didn't. But 
  libxml2/expat/libc++abi: there were a few bugs there that were kept private according to OSS-Fuzz's disclosure policy. It's likely that you re-discovered some of those. 
Absolutely! In that case, we are happy to share the credit with OSS-Fuzz. (Since they remained
undisclosed, we were obviously unaware). The guys over at GNU GCC wanted to get their
demangler bugs reported publicly. We definitely asked LLVM about the disclosure policy.
Either they did not get back to us or we were supposed to report publicly (would need to check).

I.e. Section 6 does not actually demonstrate the advantages of AFLGo over what we use in OSS-Fuzz (libFuzzer+AFL). 
(It just demonstrates that AFLGo can find tons of bugs; but the same is true for vanilla AFL)
Absolutely! If computational resources are no issue at all, there is really no difference. AFL and AFLGo have the same
effectiveness (i.e., given enough time, both find the same bugs). But they are different in terms of efficiency (i.e., AFLGo
finds bugs with shorter fuzzing session). In a reliability-driven regime,  you can reduce cost while maintaining reliability.
In a resource-constrained regime, you can maximise reliability while maintaining the same cost.

Speaking of which, how do you guys allocate resources for a fuzzing campaign at OSS-Fuzz?
Do you have a fixed time budget, or do you keep fuzzing until there is no more progress 
(e.g., no paths discovered in the last cycle or setting AFL_EXIT_WHEN_DONE)?

Cheers!

Marcel and Thuan

You received this message because you are subscribed to a topic in the Google Groups "afl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/afl-users/qcqFMJa2yn4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to afl-users+...@googlegroups.com.

Konstantin Serebryany

unread,
Aug 31, 2017, 1:36:38 PM8/31/17
to afl-...@googlegroups.com, Thuan Pham
We keep fuzzing even when there is no progress, but we rebuild each target from fresh top-of-tree every day, 
so that every day we fuzz something new. The reasons we should never stop fuzzing:
* the target code evolves. Even if no new bugs are introduced, code changes may cause an old bug to be easier to find. 
* the fuzzing tools evolve (we use top-of-tree libFuzzer)
* Time passes, i.e. more samples are tested. 
* We may discover new inputs by fuzzing other targets.
  E.g. we fuzz 6 crypto libraries, and the types of inputs for them overlap. And we do corpus cross-pollination between targets. 
* Code owners periodically add new samples to their seed corpora
* probably a few more reasons exist 

--kcc 
 
(e.g., no paths discovered in the last cycle or setting AFL_EXIT_WHEN_DONE)?

Cheers!

Marcel and Thuan

To unsubscribe from this group and all its topics, send an email to afl-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Marcel Böhme

unread,
Aug 31, 2017, 11:52:59 PM8/31/17
to afl-...@googlegroups.com, Thuan Pham
Hi KCC,

We keep fuzzing even when there is no progress, but we rebuild each target from fresh top-of-tree every day, 
so that every day we fuzz something new.

Just want to be sure, I get the scheduling of the fuzzing campaigns right. So, every day
1) you choose a set of targets from all projects (perhaps by criticality or whether the project has changed?),
2) you build the target binaries from top-of-tree, and
3) you fuzz the chosen target binaries for 24 hours.
4) Next day (once finished), restart at (1).

If this were the case, you would essentially work in a resource-constrained regime, assigning each 
individual fuzzing campaign a fixed time budget.

Say, there are 10 commits to project OpenSSL since the last time you fuzzed it. Over many weeks of 
fuzzing OpenSSL, no error were found. We argue that it seems more reasonable to spend the fixed 
time budget directing the fuzzer towards the recent changes in the 10 commits. Arguably, only those 
could have introduced any new (regression) errors and need to be checked much more carefully.

Thanks again for your feedback!

Best regards,
Marcel + Thuan


To unsubscribe from this group and all its topics, send an email to afl-users+...@googlegroups.com.

Konstantin Serebryany

unread,
Sep 1, 2017, 1:02:03 AM9/1/17
to afl-...@googlegroups.com, Thuan Pham
On Thu, Aug 31, 2017 at 8:52 PM, Marcel Böhme <boehme...@gmail.com> wrote:
Hi KCC,

We keep fuzzing even when there is no progress, but we rebuild each target from fresh top-of-tree every day, 
so that every day we fuzz something new.

Just want to be sure, I get the scheduling of the fuzzing campaigns right. So, every day
1) you choose a set of targets from all projects (perhaps by criticality or whether the project has changed?),

we take all targets from all projects
 
2) you build the target binaries from top-of-tree, and
3) you fuzz the chosen target binaries for 24 hours.
4) Next day (once finished), restart at (1).

If this were the case, you would essentially work in a resource-constrained regime, assigning each 
individual fuzzing campaign a fixed time budget.

Say, there are 10 commits to project OpenSSL since the last time you fuzzed it. Over many weeks of 
fuzzing OpenSSL, no error were found. We argue that it seems more reasonable to spend the fixed 
time budget directing the fuzzer towards the recent changes in the 10 commits. Arguably, only those 
could have introduced any new (regression) errors and need to be checked much more carefully.

Maybe. I am not very strong and statistics. 
If our goal is to find regressions ASAP -- then maybe your strategy is a win. 
But our bigger goal is to not release buggy software, i.e. it's fine if we report a regression in a few days,
if that gives us better long run results. 

I don't think your paper compares AFL and AFLGo in the long run (e.g. months of CPU time),
so we don't know if your approach makes things better or worse. 
Intuitively, it *may* make things worse (I am not a statistician!)

--kcc 
 

Thanks again for your feedback!

Best regards,
Marcel + Thuan

qsp

unread,
Sep 8, 2017, 4:21:44 PM9/8/17
to afl-users
Hi KCC,

I'm very curious when you said AFL could find heartbleed in less than 1 minute.
So I tried to run AFL 2.51 on OpenSSL 1.0.1 using a dummy communication here

At the time I am writing this message, it has been 1 hour 48 minutes, and AFL discovered 148 paths, but it hasn't found a crash yet.
Could you tell me how to setup to test heartbleed?

Thanks,
--qsp
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+...@googlegroups.com.

Konstantin Serebryany

unread,
Sep 8, 2017, 4:30:58 PM9/8/17
to afl-...@googlegroups.com
On Fri, Sep 8, 2017 at 1:21 PM, qsp <dark2...@gmail.com> wrote:
Hi KCC,

I'm very curious when you said AFL could find heartbleed in less than 1 minute.
So I tried to run AFL 2.51 on OpenSSL 1.0.1 using a dummy communication here

Did you use the persistent mode?
This openssl API is very fast, and if you run AFL in the default out-of-process mode it spends >90% of time doing fork. 
 
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+unsubscribe@googlegroups.com.

Macnair, Michael

unread,
Sep 11, 2017, 1:17:13 PM9/11/17
to afl-...@googlegroups.com
Hi qcc,

> Could you tell me how to setup to test heartbleed?

One of the challenges in my afl training workshop is to find heartbleed: https://github.com/ThalesIgnite/afl-training/tree/master/challenges/heartbleed
The README.md / HINTS.md / ANSWERS.md walks you through it.

Regards,
Michael

qsp

unread,
Sep 11, 2017, 5:54:24 PM9/11/17
to afl-users
@KCC: sorry, what is the persistent mode, it is of AFL or of OpenSSL? The target you showed me uses a function of LLVMFuzzer so I cannot compile.

@Michael: thanks for the link. I can re-discover heartbleed using your version of KCC's file. But it took more than 5 minutes on my relatively new desktop. Not ``less than 10 seconds'' as in the README.

Konstantin Serebryany

unread,
Sep 11, 2017, 8:41:36 PM9/11/17
to afl-...@googlegroups.com
On Mon, Sep 11, 2017 at 2:54 PM, qsp <dark2...@gmail.com> wrote:
@KCC: sorry, what is the persistent mode, it is of AFL or of OpenSSL?


 
The target you showed me uses a function of LLVMFuzzer so I cannot compile.

Yes you can, follow the instructions from here: 
I would suggest to try the small example from this file first. 
If it works, only then try openssl. 

--kcc 
 

@Michael: thanks for the link. I can re-discover heartbleed using your version of KCC's file. But it took more than 5 minutes on my relatively new desktop. Not ``less than 10 seconds'' as in the README.

On Monday, 11 September 2017 10:17:13 UTC-7, michael.macnair wrote:
Hi qcc,

> Could you tell me how to setup to test heartbleed?

One of the challenges in my afl training workshop is to find heartbleed: https://github.com/ThalesIgnite/afl-training/tree/master/challenges/heartbleed
The README.md / HINTS.md / ANSWERS.md walks you through it.

Regards,
Michael

--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages