Tracking basic blocks

Stefan Nagy

unread,

Jan 10, 2018, 2:40:27 PM1/10/18

to afl-users

Hello all,

I'm wanting to keep track of basic blocks as they are covered during fuzzing. My intuition is that this requires inserting some code in the handling of the bitmap to de-randomize block IDs. If anyone can share a better understanding of how AFL maintains basic block coverage I would be appreciative.

Best,

-Steve

Michal Zalewski

unread,

Jan 10, 2018, 2:52:51 PM1/10/18

to afl-users

The internal AFL IDs are not guaranteed to be unique, etc. You're
probably best off doing this using a completely separate mechanism.

The simplest near-real-time solution I can think of is: check
<out_dir>/queue/ regularly, and when there is a new file, pass it to
the target binary compiled with gcov, capture gcov data, add it to
your list of basic blocks explored in your fuzzing project.

Also check out afl-cov :-)

/mz

Stefan Nagy

unread,

Jan 15, 2018, 3:57:02 PM1/15/18

to afl-users

Thank you!

Stefan Nagy

unread,

Jan 16, 2018, 6:07:09 PM1/16/18

to afl-users

Michal,

The internal AFL IDs are not guaranteed to be unique

Can you please elaborate on how they might not be unique?

Right now my goal is to extract the addresses of each basic block and their coverage for later processing.

My problem with using gcov is that this limits me to white-box only.

I believe monitoring the bitmap would be more fruitful; coverage of path A->B would intuitively mean blocks A,B are both covered.

My understanding of AFL's bitmap is that each entry corresponds to path hashes, and each value is its updated coverage.

If this is correct, how exactly does AFL associate the bitmap path hash with the basic block addresses?

Lastly, with afl-showmap, can the tuples outputted be cross referenced with the bitmap? I'm not totally sure I understand what its output is meant to represent.

Thank you for your help!

Best,

-Steve

Brandon Perry

unread,

Jan 16, 2018, 7:06:18 PM1/16/18

to afl-...@googlegroups.com

On Jan 16, 2018, at 5:07 PM, Stefan Nagy <sna...@vt.edu> wrote:

Michal,

The internal AFL IDs are not guaranteed to be unique
Can you please elaborate on how they might not be unique?

Right now my goal is to extract the addresses of each basic block and their coverage for later processing.

My problem with using gcov is that this limits me to white-box only.

I believe you can instrument black-box binaries with coverage using DynamoRIO, but YMMV.

https://www.chromium.org/developers/code-coverage

I believe monitoring the bitmap would be more fruitful; coverage of path A->B would intuitively mean blocks A,B are both covered.

My understanding of AFL's bitmap is that each entry corresponds to path hashes, and each value is its updated coverage.
If this is correct, how exactly does AFL associate the bitmap path hash with the basic block addresses?

Lastly, with afl-showmap, can the tuples outputted be cross referenced with the bitmap? I'm not totally sure I understand what its output is meant to represent.

Thank you for your help!

Best,
-Steve

On Monday, January 15, 2018 at 3:57:02 PM UTC-5, Stefan Nagy wrote:
Thank you!

On Wednesday, January 10, 2018 at 2:52:51 PM UTC-5, Michal Zalewski wrote:
> I'm wanting to keep track of basic blocks as they are covered during
> fuzzing. My intuition is that this requires inserting some code in the
> handling of the bitmap to de-randomize block IDs. If anyone can share a
> better understanding of how AFL maintains basic block coverage I would be
> appreciative.

The internal AFL IDs are not guaranteed to be unique, etc. You're
probably best off doing this using a completely separate mechanism.

The simplest near-real-time solution I can think of is: check
<out_dir>/queue/ regularly, and when there is a new file, pass it to
the target binary compiled with gcov, capture gcov data, add it to
your list of basic blocks explored in your fuzzing project.

Also check out afl-cov :-)

/mz

--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc

Michal Zalewski

unread,

Jan 16, 2018, 7:46:04 PM1/16/18

to afl-users

>> The internal AFL IDs are not guaranteed to be unique
> Can you please elaborate on how they might not be unique?

See docs/technical_details.txt. They are just small random integers
that can repeat (and in some cases, must repeat).

> My understanding of AFL's bitmap is that each entry corresponds to path
> hashes

Not really, each byte set in the bitmap corresponds to the (probably
not unique) ID of an edge; that ID is either chosen by a fair dice
roll (source mode) or by mangling the address of the underlying
instructions (QEMU mode).

> how exactly does AFL associate the bitmap path hash with the basic block addresses?

See above; AFL really doesn't care about any meaningful representation
of the underlying code, it just wants to see new values when something
changes with the execution path.

> Lastly, with afl-showmap, can the tuples outputted be cross referenced with
> the bitmap?

The first number in each line is just the integer offset of a byte set
in the map. The second number corresponds to the execution count
(after bucketing, as per docs/technical_details.txt).

/mz

Hongxu Chen

unread,

Jan 16, 2018, 10:07:08 PM1/16/18

to afl-...@googlegroups.com

Hi Michal,

What do you mean by saying "in some cases, must repeat" for the IDs in shared memory?

Do you mean that it is desired to have the same IDs for a few edges (which can somewhat benefit the greybox fuzzing), or due to the nature of the randomness the duplicate is inevitable?

Best Regards,

Hongxu

/mz

--
You received this message because you are subscribed to the Google Groups "afl-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+unsubscribe@googlegroups.com.

Michal Zalewski

unread,

Jan 16, 2018, 10:30:59 PM1/16/18

to afl-users

> What do you mean by saying "in some cases, must repeat" for the IDs in
> shared memory?

The IDs are 16-bit. If you have more than ~65k edges, duplicates are inevitable.

/mz

Hongxu Chen

unread,

Jan 17, 2018, 3:55:20 AM1/17/18

to afl-...@googlegroups.com

Hi Michal,

> The IDs are 16-bit. If you have more than ~65k edges, duplicates are inevitable.

I realized the duplicates are not a big issue since the actual execution may not cover all the instrumented "edges"; and for the extreme cases, we can still increase "MAP_SIZE_POW2" and recompile afl-gcc/afl-clang-fast and the target binaries.

I'm wondering whether the callback based approach like libfuzzer can overcome this for general purposes so that we don't need to care about the possible scalability issues at all (although may largely sacrifice the running performance of the fuzzer)?

Best Regards,

Hongxu

Michal Zalewski

unread,

Jan 17, 2018, 3:58:50 AM1/17/18

to afl-users

> I'm wondering whether the callback based approach like libfuzzer can
> overcome this for general purposes so that we don't need to care about the
> possible scalability issues at all (although may largely sacrifice the
> running performance of the fuzzer)?

It's not really about callbacks vs no callbacks, just the size of the
output trace. AFL needs to be able to perform certain operations on it
(comparisons, etc) very quickly, and the current size seems optimal
for almost all targets. You can change map size, but that's still not
gonna guarantee unique IDs, just make collisions less likely to occur.

/mz

Hongxu Chen

unread,

Jan 17, 2018, 4:29:28 AM1/17/18

to afl-...@googlegroups.com

Hi Michal,

Yes, the essential is not about whether using callback or not. And the collisions cannot be fully avoided.

What I'm curious most is whether there is some "precise"/"general purposed" approach to track the "edge" (in a greybox/blackbox sense similar to AFL). Till now I have only found some LLVM sanitize coverage based instrumentation approaches. In practice this may not be necessary, but I hope to know the alternatives for empirical study purposes.

Would you please share your experience?

Best Regards,

Hongxu

/mz

Stefan Nagy

unread,

Jan 17, 2018, 3:34:54 PM1/17/18

to afl-users

See above; AFL really doesn't care about any meaningful representation
of the underlying code, it just wants to see new values when something
changes with the execution path.

So, bitmap collision issues aside, all I need is to insert some code before the path ID's are generated... ideally to output the addresses of the path's inbound/outbound basic blocks. This is where I'm stuck. I've been looking through afl-fuzz's source code but it sounds like the path ID generation is instead handled in afl-gcc/g++/etc. Am I correct?

Michal Zalewski

unread,

Jan 17, 2018, 3:36:43 PM1/17/18

to afl-users

> So, bitmap collision issues aside, all I need is to insert some code before
> the path ID's are generated... ideally to output the addresses of the path's
> inbound/outbound basic blocks. This is where I'm stuck. I've been looking
> through afl-fuzz's source code but it sounds like the path ID generation is
> instead handled in afl-gcc/g++/etc. Am I correct?

If you're talking about fuzzing closed-source binaries, then it's
computed from block address in the code in qemu_mode/.

Stefan Nagy

unread,

Jan 17, 2018, 8:09:05 PM1/17/18

to afl-users

If you're talking about fuzzing closed-source binaries, then it's
computed from block address in the code in qemu_mode/.

I'll look there. Thank you for your help!

My only obstacle now is installing the QEMU code. Running build_qemu_support.sh results in the following error:

afl-2.49b/qemu_mode/qemu-2.3.0/rules.mak:57: recipe for target 'user-exec.o' failed
make[1]: *** [user-exec.o] Error 1
Makefile:173: recipe for target 'subdir-x86_64-linux-user' failed
make: *** [subdir-x86_64-linux-user] Error 2

Any suggestions?

Stefan Nagy

unread,

Jan 18, 2018, 5:13:47 PM1/18/18

to afl-users

Please disregard- I managed to get it installed by using afl-2.52b and qemu-2.10.2.

Can you please elaborate on how afl-fuzz interacts with afl-qemu-trace? My understanding is that afl-qemu-trace is a copy of the qemu binary, and that afl-fuzz passes parameters to it. The specifics on how basic blocks are instrumented is lost on me.