DynamoRIO may be more viable
--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Sun, Feb 1, 2015, 12:15 AM Michal Zalewski <lca...@gmail.com> wrote:
> It's too bad DynamoRIO was too slow.PIN may be ill-suited for our needs, but DynamoRIO may be more viable,
at least to get in the same ballpark as QEMU. I think Andrew played
with it a bit.
DynamoRio seems slow. I've implemented the fork server code, and it was clocking in at 60 execs a second. Something seems non deterministic, with some fuzz runs at 10 execs a second, then ctrl-c, restart, and it then sitting on 30 execs a second. Restart again, it would sit on 60 a second. Haven't debugged that, just an observation.
Qemu with just fork server support was around 130 execs a second.
So good news on the DynamoRio side, the authors have filed bugs for translate but don't execute support, so that it might be possible to have performance closer to the current qemu implementation. They're interested in checking out performance against qemu, so there might be some easy wins there.
There are some things I can do to make the code faster, however. I need to play with the persistence support more (caching translations to disk, it appears..), and perhaps in lining the instrumentation call. But from what I've read, that should probably be optimized automatically for me with -opt_clean call
I'll continue playing with it to see how good I can get it.
If we can get DynamoRio support to a decent speed, it would be interesting if someone ported AFL to Windows 😁 just don't expect lcamtuf to do it, or support it :)
Yeah. I agree 100% with Ben. My attempts to build a fork server inside of pin tool for aflpin failed gloriously with internal errors from pin such as 'thread not found' so I would love to see that work in DynamoRio.
-Parker
--
Forkserver for DynamoRIO under Linux works well. I've written code to do it, it's just not as fast as the QEMU approach ... Just yet. It's about half the speed of qemu mode is without the translate but don't execute mode.
I've tried passing various options to DynamoRIO without much luck to get improved performance.
I've spoken to the DynamoRIO authors, and they're interested in implementing a translate but don't execute option (which is why the qemu mode is pretty fast without any other optimization on my behalf*).
Additionally, they've indicated they want to spend some time comparing the performance of DynamoRIO against QEMU, and seeing if their is any techniques they can borrow or improve upon.
As for windows, (Hi Ben, BTW) , that would be the most difficult part, due to no native fork from what I recall.
So it's probably possible and feasible to do it, if you have suitable Windows development skills.
Finishing up afl network support is probably a more achievable goal for myself at the moment.
* one idea, you could keep count of the expected order of execution of basic block translation and ensue they're in the quick lookup hash table, instead of the current approach where they are pushed to the end. Another idea, use shared memory instead of read/write system calls for message passing.
That increase of performance is probably minimal to the decode, translate, encode steps for binary translation, however.
So I have heard DynamoRIO is kind of flaky on OSX, but I'd like to
give it a shot nevertheless. Is this code available anywhere?
1) Network support, if we can get it working nicely,
2) Performance improvements for the compile-time instrumentation. For
example, if we did it as GCC or clang plugin, we could likely improve
performance by instrumenting more conservatively and by not having to
save all registers, etc. I think there are significant gains here.
3) Making the binary-only instrumentation better. I'm not sure there's
a lot to be gained by moving between QEMU and DR or something else.
One obvious option would be relying on static translation by
disassembling the binary and putting it back together just once. I've
been told that mcsema may be worth looking at.
4) Making the fork server better. One option may be preforking, but
this breaks the one-fuzzer-takes-one-core deal. Another would be
auto-detecting a more distant forkserver init location by watching how
many instrumented locations we can skip without changing the observed
behavior of the binary. (That last part is particularly interesting!)
5) Test cases and dictionaries! We particularly need a good PDF dictionary.
6) Perhaps improvements to fuzzing strategies. For example, radamsa
had this nice idea of trying to determine if a particular chunk of the
file looks like text or binary data and applying slightly different
mutations depending on that. We get instrumentation feedback to see if
any of this would yield better coverage per number of execs done.
/mz
[*] Say: an outdated version of 'as' in Xcode that doesn't work with
newer versions of clang and flat out crashes with some inputs;
multiple differences in clang-emitted code compared to Linux and *BSD,
necessitating a patchwork of ifdefs in afl-as.c; differences in how
relocations are done in Mach-O binaries compared to ELF; presence of a
crash reporter without a clear C-accessible API to query its state;
etc.