Hi all,
FYI, I reimplemented afl-cmin in python. It uses less memory, less disk
space, and is faster. If you often run afl-cmin and/or your corpus size
is large (in term of file count), you may find it useful.
https://github.com/kcwu/afl-kit
Non-scientific performance test:
program | worker | temp disk (mb) | memory | time (min)
----------- | ------ | -------------- | ------ | ----------
afl-cmin | 1 | 9782 | 7.8gb | 27
[afl-pcmin] | 8 | 9762 | 7.8gb | 13.8
afl-cmin.py | 1 | 359 | <50mb | 11.9
afl-cmin.py | 8 | 1136 | <250mb | 1.8
[afl-pcmin]:
https://github.com/bnagy/afl-trivia
Detail of this table
- the input are 79k files, total 472mb. the output are 5k files, total 39mb.
- `temp disk` is the size of `.traces` folder after run with `AFL_KEEP_TRACES=1`.
Because original afl-cmin is shell script, its performance bottleneck is text
processing -- it spend much more time on minimization step (like write text
file and run sort(1)) than collecting traces via afl-showmap.
Now afl-cmin.py's minimization step is fast enough. Its current bottleneck is
running afl-showmap and parsing the trace output. Here are my suggestions if
someone want to improve performance further:
- make afl-showmap support forkserver[1] and persistent mode
- make afl-showmap output traces in binary
- write afl-cmin in C and run target by itself (avoid calling external afl-showmap)
[1]:
https://groups.google.com/d/msg/afl-users/4XFVP-iAG_k/tM6wRetcJQAJ
Regards,
kcwu