afl-cmin.py

Kuang-che Wu

unread,

Nov 12, 2016, 1:21:51 PM11/12/16

to afl-users, floyd

Hi all,

FYI, I reimplemented afl-cmin in python. It uses less memory, less disk
space, and is faster. If you often run afl-cmin and/or your corpus size
is large (in term of file count), you may find it useful.

https://github.com/kcwu/afl-kit

Non-scientific performance test:

program | worker | temp disk (mb) | memory | time (min)
----------- | ------ | -------------- | ------ | ----------
afl-cmin | 1 | 9782 | 7.8gb | 27
[afl-pcmin] | 8 | 9762 | 7.8gb | 13.8
afl-cmin.py | 1 | 359 | <50mb | 11.9
afl-cmin.py | 8 | 1136 | <250mb | 1.8

[afl-pcmin]: https://github.com/bnagy/afl-trivia

Detail of this table
- the input are 79k files, total 472mb. the output are 5k files, total 39mb.
- `temp disk` is the size of `.traces` folder after run with `AFL_KEEP_TRACES=1`.

Because original afl-cmin is shell script, its performance bottleneck is text
processing -- it spend much more time on minimization step (like write text
file and run sort(1)) than collecting traces via afl-showmap.

Now afl-cmin.py's minimization step is fast enough. Its current bottleneck is
running afl-showmap and parsing the trace output. Here are my suggestions if
someone want to improve performance further:
- make afl-showmap support forkserver[1] and persistent mode
- make afl-showmap output traces in binary
- write afl-cmin in C and run target by itself (avoid calling external afl-showmap)

[1]: https://groups.google.com/d/msg/afl-users/4XFVP-iAG_k/tM6wRetcJQAJ

Regards,
kcwu

Jussi Judin

unread,

Nov 14, 2016, 7:28:30 AM11/14/16

to afl-users, floyd

Sounds good. I have cases where I run out of disk space on my primary laptop when minimizing test sets in order of 500k files where the directories generated by afl-cmin get to order of 100 gigabytes.

One thing that could be useful is file deduplication in the beginning based on file hashes. Now anyone who merges multiple afl queues and minimizes them at once needs to remember to do this preprocessing before running afl-cmin or suffer the wrath of quadratic behavior of afl and its queue imports.

--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kuang-che Wu

unread,

Nov 16, 2016, 7:29:24 PM11/16/16

to afl-...@googlegroups.com

On Mon, Nov 14, 2016 at 12:26:06PM +0000, Jussi Judin wrote:
> Sounds good. I have cases where I run out of disk space on my primary laptop when minimizing test sets in order of 500k files where the directories generated by afl-cmin get to order of 100 gigabytes.
>
> One thing that could be useful is file deduplication in the beginning based on file hashes. Now anyone who merges multiple afl queues and minimizes them at once needs to remember to do this preprocessing before running afl-cmin or suffer the wrath of quadratic behavior of afl and its queue imports.

Thanks. I implemented your suggestion to dedup in the beginning. Moreover, now afl-cmin.py supports multiple -i and globbing. And output file named as 'id:000001,hash:value'.

So, you can use afl-cmin.py in workflow like this

1. Run many instances of afl-fuzz and have mutiple queues in sync_dir.1 directory
2. afl-cmin.py -i 'sync_dir.1/*/queue' -o sync_dir.2/prev/queue --as_queue ...
3. Run another batch of afl-fuzz in sync_dir.2. They will automatically sync queue from sync_dir.2/prev/queue.

David Manouchehri

unread,

Dec 14, 2016, 8:52:45 AM12/14/16

to afl-users, fl...@floyd.ch

Great work, I went from around 5 minutes to under 35 seconds.

ubuntu@ip-172-31-63-239:/mnt/mounty$ time pypy afl-kit/afl-cmin.py -w 8 -i /tmpshark/allmind/ -o /mnt/mounty/merged/ -m 5000 -t 5000 -- /tmpshark/test8 -r @@ -w /dev/null

Hint: install python module "tqdm" to show progress bar

2016-12-14 13:46:44,044 - INFO - Found 4187 input files in 1 directories

2016-12-14 13:46:44,211 - INFO - Remain 1045 files after dedup

2016-12-14 13:46:44,211 - INFO - Testing the target binary

2016-12-14 13:46:44,362 - INFO - ok, 27278 tuples recorded

2016-12-14 13:46:44,401 - INFO - Obtaining trace results

2016-12-14 13:47:18,142 - INFO - Found 34886 unique tuples across 1045 files (1020 effective)

2016-12-14 13:47:18,164 - INFO - Processing candidates and writing output

2016-12-14 13:47:18,693 - INFO - narrowed down to 538 files, saved in "/mnt/mounty/merged/"

real 0m35.038s

user 3m25.016s

sys 0m50.880s

Reply all

Reply to author

Forward