Hi,Just some braindump:I've toyed around trying to retrieve important constant values (think"\x89PNG") from a binary or it's source to auto-generate a dictionaryfor AFL; this proved somewhat ineffective and noisy. However I came torealize that we don't actually need to know wide constants in advancebecause we can in fact dramatically increase the chances of findingthem dynamically.The idea is to turn the chance of AFL hitting a 32-bit wide constantfrom 1/2^32 to 1/4*2^8 (or 1/2^64 to 1/8*2^8 for 64-bit) by reducingthe width of comparison operations in the binary.
Let say we iterate over all CMPx-, TEST- and alike instructions and
annotate them with u8-wide equivalents. For example, a 32bit
comparison
CMP $0xDEADBEEF, %FOO
JNE next_basic_block_where_cur_loc_is_updated
is either exchanged with or preceded by
CMP $0xDE000000, %FOO & $0xFF000000
JNE next_basic_block_where_cur_loc_is_updated
CMP $0x00AD0000, %FOO & 0x00FF0000
JNE next_basic_block_where_cur_loc_is_updated
...
and so on. This would increase the chance of hitting an observable
state change from 1/2^32 to 1/2^8 and should very effectively enable
AFL to get beyond wide constant comparisons. This comes at the cost of
a more dense map and a somewhat slower binary.
I don't find myself capable of actually implementing this in LLVM: We
probably need another pass that walks the AST before actual basic
blocks are generated. Another way would be to play around with
the x86-target definition files; this would require a custom-built
LLVM though.
Braindump concluded.
Regards