When fuzzbench was made publicly available, the very first results were not that bad for honggfuzz. It was in the middle of the pack. Also, honggfuzz-specific features (minimizing hamming distance for certain string/mem*cmp calls) were responsible for it doing particularly well in the libxml2 and in the php benchmarks.
During the next 2 few weeks I worked on improving other benchmarks, and the results can be seen here (2020-03-16). Some benchmarks (like proj4, systemd or freetype2) improved, while others stayed pretty much the same. Of the “improvements” I made, there are 2 categories
“Improvements” that maybe have helped, but nobody knows for sure:
Reading inputs in smaller batches:
Originally (as it happens with afl+descendants and libfuzzer+descendants) honggfuzz was reading files from the seed corpus one by one, testing whether they add to coverage metrics (otherwise discarding them). I thought that maybe by starting with some smaller input size, like 1kB or even a couple of bytes, testing it for coverage, and then increasing the size in the next step (either by a factor of 2, or by adding next 1kB and repeating the process) I could make honggfuzz prefer smaller inputs and discard unnecessary parts of seed files.
This potentially could work well for e.g. the sqlite3 benchmark, which provided big seed files (>1MB). But given that honggfuzz achieved pretty much the same results in the initial and in the mid-March run, this is unlikely. Still the feature is present, and can be seen here.
Preferring faster/smaller/more-potent inputs:
In a similar vein to the above, I tried to improve the fuzzing scheduler by preferring potentially more interesting inputs (already in the dynamic corpus/queue) over others. The procedure is self-explanatory in the code, enough to say that inputs with bigger penalty factor were picked less likely for processing, by using a form of:
if (rnd() % per_input_skip_factor) {
.. skip_input_in_this_round ..
};
It smoothened the execution speed of the sqlite3 benchmark, resulting in higher execs/sec than before. But, again, this didn’t help it much to achieve better results.
Improvements that actually worked:
Fixing builds:
The initial results for the systemd benchmark were disappointing. After a somewhat lengthy debugging session, I figured out that it’s not about the fuzzer operations, but it was caused by the code linking stage.
Systemd uses the main instrumented binary and the instrumented libsystemd.so where most of the benchmark’s logic resides. I will spare you the more technical details, by just saying that hfuzz-cc incorrectly interacted with ASAN: while the main binary was instrumented properly, all instrumentation calls in the libsystemd.so were redirected to empty ASAN stubs.
After the fix was implemented, honggfuzz did well in this category.
Figuring all of it was not easy, and required some longish hours with binutils and gdb, but I feel like I understand static linking way better now :). Enough to say that this was all very honggfuzz/asan-specific.
Improving string/mem*cmp calls, by adding good candidates to the internal/dynamic dictionary:
Both honggfuzz and libfuzzer+descendants (and some afl-inspired fuzzers) are using a rather obvious trick to improve code coverage. They are instrumenting str/mem*cmp function calls, tring to minimize the hamming distance (or byte-based hamming distance) of specific string/memory comparisons. This typically takes a form of:
intercept_strcmp(const char* s1, const char *s2) {
uintptr_t pc = __builtin_return_address(0);
int score = compute_score(s1, s2);
if (score < cmpmap_score_for_pc(pc)) {
cmpmap_score_for_pc(pc) = score;
save_input_as_interesting();
}
return original_strcmp(s1, s2);
}
It still works reasonably well, being in part responsible (along with fixing the build) for good results of the systemd benchmark, which is heavily text-based. Still, something better was needed, and libfuzzer was already using it, by utilizing a form of a temporary dictionary for recent string/mem*cmp values.
I decided to go a step further and actually check whether values used in mem/str*cmp comparisons are valid dictionary candidates, by checking whether pointers passed to those functions reside within read-only sections of loaded files (binary + libraries). This can be done by calling dl_iterate_phdr() (or by analyzing /proc/<pid>/maps, but this would be Linux-specific), and checking whether addresses used fall within bounds of loaded sections of files. If so, fuzzer adds them to a dynamic dictionary. The current implementation of this check can be seen here.
Caveat: doing dl_iterate_phdr() with each string/mem*cmp will slow down the fuzzing process immensely for some benchmarks, hence the call is only performed every ~100 calls or so. This is fine, as fuzzed inputs repeat often, and within a minute or so pretty much all interesting string/memory values land in the dynamic dictionary, and are available for the corpus mutation stage.
This improved honggfuzz results for the freetype2 (initial vs mid-March) and for proj4 (initial vs mid-March) benchmarks.
EOPart 1.
In AFL++ also we log routines tokens from the execution (however this is a mode never tested in fuzzbench).
https://github.com/AFLplusplus/AFLplusplus/blob/master/llvm_mode/afl-llvm-rt.o.c#L545
We don't log simply the tokens of *cmp functions, but from all functions that takes two pointers as first arguments.
We dump the first 32 bytes of memory as you can see.
I found that this improves coverage not only when logging read-only tokens as many tokens are computed at runtime (many are in heap).
To log only read-only tokens, why not do it before fuzzing extracting them from binaries?
Andrea
Andrea
Hi Robert, glad to see that tokens extraction improved honggfuzz in fuzzbench.
In AFL++ also we log routines tokens from the execution (however this is a mode never tested in fuzzbench).
https://github.com/AFLplusplus/AFLplusplus/blob/master/llvm_mode/afl-llvm-rt.o.c#L545
We don't log simply the tokens of *cmp functions, but from all functions that takes two pointers as first arguments.
We dump the first 32 bytes of memory as you can see.
I found that this improves coverage not only when logging read-only tokens as many tokens are computed at runtime (many are in heap).
To log only read-only tokens, why not do it before fuzzing extracting them from binaries?
> Indeed, though if we wanted to keep those tokens somehow still attached
> to the binary (e.g. in some separate section), because in my view the
> instrumented binary should be the only thing that's needed to start
> effective fuzzing, then this would require at least some libclang module
> (which might be a good idea anyway)? Pretty hard to do that with
> preprocessor + gcc|clang options I guess, but I need to think about
> those libclang modules, these are pretty powerful.
yes that was my thinking, to put this into the laf-intel llvm_mode pass.
I will implement a prototype in the next days in afl++ (branch:
autodictionary).
Thanks for sharing your insights! Not sure if you looked at your improvements longitudinally, but the attached report shows 19 Mar to 01 April, when the edge counts were comparative. You made significant improvements on most all of the benchmarks. Do you have any ideas on what would have caused the regressions for curl, openssl and mbedtls?
Curl and openssl both have a large number of seeds, so maybe reading inputs in small batches slows down the initial coverage for those two? Mbedtls only has two seeds, so the difference there must be something else, and actually the regression happened after 03-19, which
--
You received this message because you are subscribed to the Google Groups "fuzzing-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fuzzing-discu...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/fuzzing-discuss/CAP145pg6WVsSbTXGpLg%3DAwuSQAbCphvyB%3DX1C5gBUM8Aw9aeaA%40mail.gmail.com.
yes that was my thinking, to put this into the laf-intel llvm_mode pass.
I will implement a prototype in the next days in afl++ (branch:
autodictionary).
I would put that dictionary at compile time into a global array that is
transfered via the forkserver FDs to afl-fuzz.
Lets see how this will work out.