updated afl-dyninst and new afl-dynamorio

571 views
Skip to first unread message

Marc Heuse

unread,
Aug 11, 2018, 4:28:47 AM8/11/18
to afl-...@googlegroups.com
well, AFL seems sadly kinda dead with the hype of the initial years gone
and Michael now in a new company, new position and new topics ...

but I still like to use it, play with it and enhance it - unlike other
groups who develop on top of afl, keep the stuff to themselves and do
not give back to the community (yes I am talking about you guys doing
CollAFL and Angora).

so here are some new play things:

afl-dyninst - used as an alternative to afl-qemu to instrument a
blackbox binary and then just use afl-fuzz on it.
It used to be at 0.25x speed compared to a native compile.
I made several changes, now it is at 0.65x speed.
It is faster than afl-qemu, so this is now the fastest option for guided
blackbox binary fuzzing - and works on intel and PPC. plus aarch64+arm
once the implementation is completed in dyninst.
get it at https://github.com/vanhauser-thc/afl-dyninst

afl-dynamorio - after I did afl-pin I wanted to have an implementation
with dynamorio as well because I have to do a lot of blackbox fuzzing on
ARM - and to play with dynamorio :)
it is 10x faster than afl-pin! but 0.12x compared to afl-qemu :(
get it at https://github.com/vanhauser-thc/afl-dynamorio

as AFL is unmaintained currently - if you want to have improvements on
2.52b - head over to collected AFL patches by several people that add
features, performance, coverage - or fix bugs :)
get them at https://github.com/vanhauser-thc/afl-patches
=> If you have patches for AFL, send them to this list or to me and I
will add them.

Regards,
Marc "van Hauser" Heuse

--
Marc Heuse
www.mh-sec.de

PGP: AF3D 1D4C D810 F0BB 977D 3807 C7EE D0A0 6BE9 F573

Stefan Nagy

unread,
Aug 13, 2018, 11:45:56 AM8/13/18
to afl-users
It used to be at 0.25x speed compared to a native compile. 
I made several changes, now it is at 0.65x speed. 
By "native speed", are you referring to Dyninst forkserver-only instrumentation? 

It's unclear to me why instrumenting blocks where isEntryBlock() returns false yields any performance benefit.  I noticed you also toggle setSaveFPR(false) and setTrampRecursive(true), but these gave me mixed results when I was messing with them a few months back. Could you please provide some detail on the three performance modes? 

Marc Heuse

unread,
Aug 13, 2018, 12:50:10 PM8/13/18
to afl-...@googlegroups.com, Stefan Nagy
Hi Stefan,

Am 13.08.2018 um 17:45 schrieb Stefan Nagy:
> It used to be at 0.25x speed compared to a native compile. 
> I made several changes, now it is at 0.65x speed. 
>
> By "native speed", are you referring to /Dyninst/ forkserver-only
> instrumentation? 

no, with native I meant afl-gcc source code compilation.
I agree my writing was unclear. it was before my morning coffee :)

> It's unclear to me why instrumenting blocks where isEntryBlock() returns
> false yields any performance benefit.  I noticed you also
> toggle setSaveFPR(false) and setTrampRecursive(true), but these gave me
> mixed results when I was messing with them a few months back. Could you
> please provide some detail on the three performance modes? 

the original implementation was instrumenting every basic block. but
writing to the map for every basic block a) pollutes the map and b) is
unnecessary overhead.
this means e.g. a -> b -> c -> d -> e .... its a single path and we
dont need b, c, d, ...

so which basic blocks do we want? those where there are either 2+
callees or 2+ callers. and if you add one -x it does check for that and
increases the speed by ~40% on average.

if you add another -x it just toggles the setSaveFPR and
setTrampRecursive. This increases things a lot. but why? I am not so
sure :) I just tested various options on various binaries and these two
made a difference without a negative side effect.

if you specify -xxx ... well then I basically do what -xx is doing but
afterwards search for the inserted instructions of dyninst and replace
them with my own.
oh yes this can go wrong :) and its experimental but there is a big
warning to do that and check if the program still works fine.


the most efficient implementation would be if BPatch_constExpr would
support XOR and working with arrays, but this is not the case (yet).


Regards,
Marc

Stefan Nagy

unread,
Aug 13, 2018, 3:36:50 PM8/13/18
to afl-users
no, with native I meant afl-gcc source code compilation. 
I agree my writing was unclear. it was before my morning coffee :) 
No worries! Thanks for elaborating :)

the original implementation was instrumenting every basic block. but 
writing to the map for every basic block a) pollutes the map and b) is 
unnecessary overhead. 
this means e.g.  a -> b -> c -> d -> e .... its a single path and we 
dont need b, c, d, ... 
But doesn't doing this inherently lose necessary coverage information? AFL would track each unique edge (e.g., a->b, b->c), so how is discarding blocks between the starting/terminating blocks of the path (e.g., b c d) a suitable replacement for tracking the path's four edges? 

Marc Heuse

unread,
Aug 14, 2018, 1:18:01 AM8/14/18
to Stefan Nagy, afl-...@googlegroups.com
Hi Stefan,

Am 13.08.2018 um 21:36 schrieb Stefan Nagy:
> the original implementation was instrumenting every basic block. but 
> writing to the map for every basic block a) pollutes the map and b) is 
> unnecessary overhead. 
> this means e.g.  a -> b -> c -> d -> e .... its a single path and we 
> dont need b, c, d, ... 
>
> But doesn't doing this inherently lose necessary coverage information?
> AFL would track each unique edge (e.g., a->b, b->c), so how is
> discarding blocks between the starting/terminating blocks of the path
> (e.g., b c d) a suitable replacement for tracking the path's four edges? 

if you have some blocks with a single path, that e.g. look like this:

x1 y1
\ /
a-b-c-d-e-f
/ \
x2 y2

you basically only need to record the paths which have decisions, e.g.
x1, a, f, y2.
recording b, c, d and e provides no benefit as they always come after
each other - plus it would pollute the map and looses precious CPU time.

Heiko Eißfeldt

unread,
Aug 14, 2018, 1:24:26 AM8/14/18
to afl-users

the original implementation was instrumenting every basic block. but 
writing to the map for every basic block a) pollutes the map and b) is 
unnecessary overhead. 
this means e.g.  a -> b -> c -> d -> e .... its a single path and we 
dont need b, c, d, ... 
But doesn't doing this inherently lose necessary coverage information? AFL would track each unique edge (e.g., a->b, b->c), so how is discarding blocks between the starting/terminating blocks of the path (e.g., b c d) a suitable replacement for tracking the path's four edges? 

If b,c,d only have one predecessor and successor basic block each, they can be eliminated from the chain, since there is only one possible path through b,c,d.

Stefan Nagy

unread,
Aug 14, 2018, 2:07:15 AM8/14/18
to afl-...@googlegroups.com
Thank you both! I think I was envisioning a scenario where a magic bytes operation was unrolled (e.g., laf-intel), but this isn't applicable since each resulting single-byte comparison block would have multiple successors. 

--
You received this message because you are subscribed to a topic in the Google Groups "afl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/afl-users/6NTPAkK7JEk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to afl-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Stefan Nagy
Department of Computer Science
Virginia Tech

Marc Heuse

unread,
Jan 4, 2019, 5:35:28 AM1/4/19
to afl-...@googlegroups.com
Hi guys,

happy new year to everyone.

I can make two small things available:

llvm_mode is the fastest out-of-the-AFL-box instrumentation.
I made a small tweaking on the IR analysis to eliminate a few basic
block instrumentation that dont add value. This eliminates just 5-10% of
the instrumentations, but still this improves the speed even further
plus reducing the map pollution for large programs.
Patch is in https://github.com/vanhauser-thc/afl-patches

And I was able to build dyninst10 and update afl-dyninst to build with
dyninst10. It was not as easy and straight forward as I have hoped, but
its done now and it works!
This means: you can now use afl-dyninst (and therefore AFL) on blackbox
binaries with AARCH64, and soon ARM and PowerPC.
https://github.com/vanhauser-thc/afl-dyninst
Reply all
Reply to author
Forward
0 new messages