afl-tmin and afl-showmap clarifications

1,155 views
Skip to first unread message

markusteu...@gmail.com

unread,
Jan 12, 2016, 11:01:53 AM1/12/16
to afl-users
Hi there,

After getting slightly frustrated with afl-tmin and trying to build a smarter strategy to minimize test cases, I just wanted to confirm my method/assumptions:

1. If I have 2 input files (A and B, A is larger than B) and I run afl-showmap (-e, so no hit counts) on them, I can assume that B is at least as good an input as A if all numbers that are present in the output of afl-showmap when processing A are also present in the output of afl-showmap after processing B. If there are even more numbers there (in B's map), this is a very good thing, since it would mean that the smaller input has even better branch coverage than the larger one.

2. It is ok to do this with the -e switch because a large input file would likely hit the same parts in the code more often (imagine a patterned image that hits the relevant decoder area more often, the larger the input file is). Not counting/considering hits does not make the smaller file worse, as long as all the numbers are in the map.

3. My strategy is to generate a map of A, take a certain subset of the file A as B, generate the map of B and if the numbers in the map of B are a superset of the map of A, B is the new best known small file. If not, take a different subset of A's content and test again etc. As far as I understood afl-tmin, it uses a similar strategy (removing blocks of data at a time), as well as additionally modifying the resulting content to achieve a test case with lower entropy (character minimization).

4. If the map of B doesn't contain at least all numbers in the map of A, but a few new ones that are not present in A's map, B is an interesting NEW test case that is useful in itself - but B can't be considered a smaller version of A.

Is this a useful approach? Initial (simple) tests are promising, I could reduce https://upload.wikimedia.org/wikipedia/commons/3/3c/Serious_bunny_%285767442711%29.jpg (a ~1.9 MB jpg file) with an instrumented version of djpeg from https://github.com/libjpeg-turbo/libjpeg-turbo to a <200 kB file within less than a minute while even creating a slightly larger map (original: 1365, new: 1407 entries) that is a strict superset of the initial picture (I didn't extract additional test cases. There were also a lot of potential ones, but I didn't bother to deduplicate or look deeper into them). Meanwhile afl-tmin struggles to even remove a single block of data...

The approach I took is general enough to not be limited to specific file formats (it can minimize video files or other compressed formats as well as text based stuff - e.g. code as input for instrumented compilers) and it might be nice to implement it in C for tmin or a similar tool (currently I just have an ugly prototype in python that just calls afl-showmap and parses the resulting output file). I'm not that much of a C wizard though, definitely not on the low level stuff that afl might usually require, so I hope the potential of a tool that might ingest a Blu-Ray file and spit out a usefully small test case for your video decoder hitting the same spots in the program might be motivation enough for some here to take a closer look...

If there is interest, I can of course publish the stuff I have so far - I just wanted to clarify first that I don't have some wrong assumptions or use the tools in a completely wrong way to begin with and my good results would actually be just flukes. The algorithm I use is also already well published, so there is no real secret sauce to this (so far - but I have a few ideas on how to improve it, which is why I started this experiment in the first place).

Thanks for reading through the wall of text and looking forward to your answers,
Markus

Michal Zalewski

unread,
Jan 12, 2016, 11:10:18 AM1/12/16
to afl-users
Hey,

> After getting slightly frustrated with afl-tmin and trying to build a
> smarter strategy to minimize test cases, I just wanted to confirm my
> method/assumptions:

I think you are generally correct, although if you're using
afl-showmap -e, it's only fair to compare it to afl-tmin -e.

The algorithm used by afl-tmin is fairly simple and yeah, relies on
removing continuous blocks. If you need to remove non-continuous
blocks (e.g., both an opening and a closing HTML tag in different
parts of the file), or if you need to also adjust length fields, it
won't do well.

It's probably difficult to support such strategies without making the
tool a lot slower for simple cases, so that's why it doesn't go there.
An interesting alternative would be to use afl-fuzz itself for
non-deterministic trimming, but it's not supported right now.

> 1. If I have 2 input files (A and B, A is larger than B) and I run
> afl-showmap (-e, so no hit counts) on them, I can assume that B is at least
> as good an input as A if all numbers that are present in the output of
> afl-showmap when processing A are also present in the output of afl-showmap
> after processing B.

afl-tmin does not make this assumption, since while the new test case
*may* be better for some intents and purposes, it's also now
functionally different. The extra code may not necessarily be
desirable. For example, perhaps the program now exits with an error.

Anyway, the trimming tool selects only for inputs that exhibit the
same measured behavior. That said, I can add support for such a mode.
It may be interesting to some folks.

/mz

Ben Nagy

unread,
Jan 12, 2016, 5:40:41 PM1/12/16
to afl-...@googlegroups.com
* markusteu...@gmail.com <markusteu...@gmail.com> [2016-01-12 06:48:40 -0800]:

> Is this a useful approach? Initial (simple) tests are promising, I could
> reduce
> https://upload.wikimedia.org/wikipedia/commons/3/3c/Serious_bunny_%285767442711%29.jpg
> (a ~1.9 MB jpg file) with an instrumented version of djpeg from
> https://github.com/libjpeg-turbo/libjpeg-turbo to a <200 kB file within
> less than a minute while even creating a slightly larger map (original:
> 1365, new: 1407 entries) that is a strict superset of the initial picture

Looks like cool research. I think it's just worth mentioning, though,
that there are two common use cases for tmin. The first is reducing and
minimising an input corpus to a small [number] set of small [size] files
that can then be used as a starting point for fuzzing a different target;
your approach seems promising here. You'd need to run your new minimizer
on a set of files and then reduce the number of files again (since some
would likely now be redundant).

The second use case is to reduce a large file to try and isolate a
particular issue - eg to do root cause analysis. For this case you
probably don't want to be accepting cases that add blocks. I'm hoping
that would be a simple modification to the approach you have right now.

Cheers,

ben

markusteu...@gmail.com

unread,
Jan 12, 2016, 6:46:01 PM1/12/16
to afl-users

Am Dienstag, 12. Januar 2016 17:10:18 UTC+1 schrieb Michal Zalewski:

afl-tmin does not make this assumption, since while the new test case
*may* be better for some intents and purposes, it's also now
functionally different. The extra code may not necessarily be
desirable. For example, perhaps the program now exits with an error.

Ok, so 2 inputs are only considered the same if their maps are exactly the same, not just a subset of each other. This can be an even harder requirement to achieve: if -e is not used, then all branches also need to be reached the same amount of time, right? Well, checking for strict equality instead of subsets definitely stalls the progress of my tool quite a bit - which is great because that means it's time to implement and try my a bit more creative solutions. :)

Reducing the input to reach at least the same branches is apparently much easier than getting the exact same map, so it might be a useful mode to have. On the other hand that is almost a fuzzing approach in itself, just with a slightly different goal/method (not covering as many branches as possible with any kind of input mutation, but at least a given set of branches and potentially more only by removing or maybe mutating bytes from a given input file).

Maybe an interesting strategy might be a mix of tmin and cmin: Take a large input file, run something like tmin, but if unknown branches are hit, save an extra copy of that input. After a while(?) or after tmin is finished, run cmin (or rather: pcmin) on all resulting test cases to eliminate duplicates that are hard to discover with only tmin, then run tmin on each of them again, potentially producing even more test cases. This approach should generate a growing corpus of smaller and smaller test cases from a single (or several) seed input(s). Once this process finishes, afl-fuzz can take over and add more data to the mix again in a semi-informed way...

Thinking about it a little more, an interesting way to fuzz something larger (image/film/compression formats) would be to "attack from both sides": While afl-fuzz starts from a (near) empty file and builds up tiny test cases like in the "jpg from thin air" example, afl-tmin and cmin get fed a few huge files (e.g. a largish image library) to break down into something more manageable. The impact of even a few kB of input seem to be quite large though, so the price to pay for getting some nice map coverage might be a lot fewer executions and afl-fuzz might come up with much smaller input files by itself that also reach some of these paths... probably another one of these cases of "it depends". After a while both corpora get combined, (p)cmin'ed and the result used again as input to afl-fuzz and the modified tmin to add/remove data and discover new branches along the way.

Cheers,
Markus

Michal Zalewski

unread,
Jan 12, 2016, 6:53:11 PM1/12/16
to afl-users
> Ok, so 2 inputs are only considered the same if their maps are exactly the
> same, not just a subset of each other.

Yup. The primary purpose of afl-tmin is to give you a smaller variant
of a test case that still seemingly does the same thing; it's a nice
and well-defined property that makes it useful in a variety of
scenarios. Now, in some uses (such as input corpus minimization), the
"subset" approach may actually make sense, too.

> Maybe an interesting strategy might be a mix of tmin and cmin: Take a large
> input file, run something like tmin, but if unknown branches are hit, save
> an extra copy of that input. After a while(?) or after tmin is finished, run
> cmin (or rather: pcmin) on all resulting test cases to eliminate duplicates
> that are hard to discover with only tmin, then run tmin on each of them
> again, potentially producing even more test cases.

Yep, that's one of the use cases where the approach could work :-) I
don't know how much of an improvement it would be over afl-tmin +
afl-cmin as implemented today, but there's only one way to find out.

/mz
Reply all
Reply to author
Forward
0 new messages