Hi guys,
TLDR: afl++ with lto instrumentation for less map collisions
git clone
https://github.com/vanhauser-thc/AFLplusplus
cd AFLplusplus
git checkout -b lto # <- this is currently in an experimental branch!
make ; cd llvm_mode ; make ; cd ..
make installl
-> compile targets like this:
RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure
--disable-shared
make
profit!
Lots less map collisions in many cases.
Read
https://github.com/vanhauser-thc/AFLplusplus/blob/lto/llvm_mode/README.lto.md
xample build output from a libtiff build:
libtool: link: afl-clang-lto -shared -fPIC -DPIC .libs/tif_aux.o [..]
../port/.libs/libport.a -Wl,--no-whole-archive -llzma -ljbig -ljpeg -lz
-lm -Wl,libtiff.so.5 -o .libs/libtiff.so.5.2.2
[!] WARNING: object archive ../port/.libs/libport.a is not handled yet
[+] Running bitcode linker, creating /tmp/.afl-1727354-1579619386.ll
[+] Performing instrumentation via opt, creating
/tmp/.afl-1727354-1579619386.bc
afl-llvm-lto-instrumentation++2.60e by Marc "vanHauser" Heuse <
m...@mh-sec.de>
[+] Module has 637 functions, 25695 callsites and 11487 total basic blocks.
[+] Instrumented 11009 locations in 624 functions with 12062 edges and
resulting in 156 potential collision(s), whereas afl-clang-fast/afl-gcc
would have produced 1045 collision(s) on average (non-hardened mode,
ratio 100%).
[+] Running real linker /bin/x86_64-linux-gnu-ld
[+] Linker was successful
Long version:
problem description: every basic block receives a random ID when
instrumented, and an edge is calculated map[prev_id << 1) ^ current_id].
so naturally there will be collisions in the map where two edges result
in the same value.
What some people are not aware about: with 256 edges there is already a
50% change of one collisions (people think birthday paradox, however it
is rather a "balls in bins" problem).
This is the major thing I experiment on because collisions = bad for bad
discovery, more collisions = worse path discovery.
After many paths that went nowhere we finally found a way that is a good
start: llvm LTO. in -flto mode the llvm IR is kept until it is linked.
Now the issue is that no linker can run llvm passes. So we created an
proxy linker that gathers and combines all IR files, and then runs the
instrumentation pass. And then calls the real linker with the new
instrumented code. This is what afl-ld is doing now that is run when
afl-clang-lto is used for compiling.
And what is special is that instrumented pass - because we see all code
to be instrumented, we can select "by hand" what the basic block ID
should be so they do not collide.
This sounds great, is great, but has caveats. It is still WIP. and we
still get collisions because of 2 things:-
1) just too many edges, because of the depedencies to several previous
blocks especially with callsites, latest at 10k edges you would have
collisions in a 64k map even with the best algorithms
2) the current algo is still simple. it produces just 10-20% of the
collisions compared to normal afl/afl++ in most cases, however in a few
it produces way more :-(
This approach is working for all targets we experimented with
(bogofilter-1.2.5, libjpeg-turbo-1.3.1 (needs CFLAGS=-fPIC),
ibpng-1.2.53, libxml2-2.9.2, tiff-4.0.4, unrar-nonfree-5.6.6).
Please test this. Bug reports are welcome, patches even more :)
If you have ideas how to better walk the module and selecting good IDs
for basic blocks, please hit me off-list!
Regards,
Marc
--
Marc Heuse
www.mh-sec.de
PGP: AF3D 1D4C D810 F0BB 977D 3807 C7EE D0A0 6BE9 F573