afl++ goes LTO (well, soon hopefully)

Marc

unread,

Jan 24, 2020, 4:57:57 AM1/24/20

to afl-...@googlegroups.com

Hi guys,

TLDR: afl++ with lto instrumentation for less map collisions

git clone https://github.com/vanhauser-thc/AFLplusplus
cd AFLplusplus
git checkout -b lto # <- this is currently in an experimental branch!
make ; cd llvm_mode ; make ; cd ..
make installl
-> compile targets like this:
RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure
--disable-shared
make
profit!

Lots less map collisions in many cases.
Read
https://github.com/vanhauser-thc/AFLplusplus/blob/lto/llvm_mode/README.lto.md

xample build output from a libtiff build:
libtool: link: afl-clang-lto -shared -fPIC -DPIC .libs/tif_aux.o [..]
../port/.libs/libport.a -Wl,--no-whole-archive -llzma -ljbig -ljpeg -lz
-lm -Wl,libtiff.so.5 -o .libs/libtiff.so.5.2.2
[!] WARNING: object archive ../port/.libs/libport.a is not handled yet
[+] Running bitcode linker, creating /tmp/.afl-1727354-1579619386.ll
[+] Performing instrumentation via opt, creating
/tmp/.afl-1727354-1579619386.bc
afl-llvm-lto-instrumentation++2.60e by Marc "vanHauser" Heuse <m...@mh-sec.de>
[+] Module has 637 functions, 25695 callsites and 11487 total basic blocks.
[+] Instrumented 11009 locations in 624 functions with 12062 edges and
resulting in 156 potential collision(s), whereas afl-clang-fast/afl-gcc
would have produced 1045 collision(s) on average (non-hardened mode,
ratio 100%).
[+] Running real linker /bin/x86_64-linux-gnu-ld
[+] Linker was successful

Long version:

problem description: every basic block receives a random ID when
instrumented, and an edge is calculated map[prev_id << 1) ^ current_id].
so naturally there will be collisions in the map where two edges result
in the same value.
What some people are not aware about: with 256 edges there is already a
50% change of one collisions (people think birthday paradox, however it
is rather a "balls in bins" problem).

This is the major thing I experiment on because collisions = bad for bad
discovery, more collisions = worse path discovery.

After many paths that went nowhere we finally found a way that is a good
start: llvm LTO. in -flto mode the llvm IR is kept until it is linked.
Now the issue is that no linker can run llvm passes. So we created an
proxy linker that gathers and combines all IR files, and then runs the
instrumentation pass. And then calls the real linker with the new
instrumented code. This is what afl-ld is doing now that is run when
afl-clang-lto is used for compiling.

And what is special is that instrumented pass - because we see all code
to be instrumented, we can select "by hand" what the basic block ID
should be so they do not collide.

This sounds great, is great, but has caveats. It is still WIP. and we
still get collisions because of 2 things:-

1) just too many edges, because of the depedencies to several previous
blocks especially with callsites, latest at 10k edges you would have
collisions in a 64k map even with the best algorithms
2) the current algo is still simple. it produces just 10-20% of the
collisions compared to normal afl/afl++ in most cases, however in a few
it produces way more :-(

This approach is working for all targets we experimented with
(bogofilter-1.2.5, libjpeg-turbo-1.3.1 (needs CFLAGS=-fPIC),
ibpng-1.2.53, libxml2-2.9.2, tiff-4.0.4, unrar-nonfree-5.6.6).

Please test this. Bug reports are welcome, patches even more :)

If you have ideas how to better walk the module and selecting good IDs
for basic blocks, please hit me off-list!

Regards,
Marc

--
Marc Heuse
www.mh-sec.de

PGP: AF3D 1D4C D810 F0BB 977D 3807 C7EE D0A0 6BE9 F573

Michal Zalewski

unread,

Jan 24, 2020, 12:33:28 PM1/24/20

to afl-users

> TLDR: afl++ with lto instrumentation for less map collisions

Neat. Another obvious trick is to increase map size; 64k seemed
optimal back in the day, but it's quite possible that especially on
64-bit architectures, you could maybe get away with more today. The
most basicway to benchmark the cost is to run some trivial, no-op
target in persistent mode - this gives you a sense of the cost of
comparing, hashing, and zeroing the map; and then run it against some
consistently-performing real-world target, such as libpng, to see if
there are any major slowdowns compared to 64 kB because your map is
taking up too much cache.

/mz

Marc

unread,

Jan 24, 2020, 10:43:40 PM1/24/20

to afl-...@googlegroups.com, Michal Zalewski

Hi Michal!

thats what I initially played with when I spawned off afl to afl++ ...
Back when you initially created afl the L2 cache was 64kb. Today it is
usually 1MB. So increasing the map = less collisions = profit.

It does not really matter if your target is test-instr or a real target
like libtiff - the smaller the map, the more tests per seconds, the
larger - the slower. Roughly:
2^14 115%
2^15 107%
2^16 100%
2^17 90%
2^18 75%
2^19 57%
2^20 38%

So 64kb is still the sweet spot speed wise, though 128kb is an option
too now.

But my real goal is to continue the path with LTO but do it differently:
1. do a first pass that just builds the CFG over the whole module
2. assign IDs to all paths like InsTrim is doing, but assign them
non-colliding
3. do the second pass where the blocks are instrumented with the
pre-computed values
4. the shmem size is not hard-compiled in afl but variable, the size
being decided at the end of step 2. the shmem ID + size is communicated
to afl-fuzz via the forkserver file descriptor (so the other way around)

this way it is the optimum of non-colliding edge coverage plus best
speed possible.
(and to give credit, that is the idea hexoder came up with 9 months ago,
starting with the LTO requirement we could not solve until now)

but its still quite a path until that is done :)

Out of curiosity - are you planning to continue to research in the area
of fuzzing? If so something new after web fuzzing and source fuzzing, or
a different approach at source fuzzing?

Reply all

Reply to author

Forward