The goal is to speed up fuzzing when we have some knowledge of the input structure and permissible data (but can’t use protobufs).
We are thinking about two extension points:
1. A way to influence (and observe) which words are added to libFuzzer’s dictionaries. This will influence the other dictionary-based mutators in libFuzzer.
Possible signature: bool LLVMFuzzerShouldAddWordToDictionary(const uint8_t *Data, size_t Size);
This would allow us to prevent libFuzzer getting “side-tracked” on certain keywords, because, e.g., simply repeating them yields an increase in the feedback metrics. Based on the application-specific knowledge we have, we can tell the fuzzer “please, even if you think this is a good addition to the dictionary, just don’t, trust me”.
2. A “give me random word” query function which can be used from LLVMFuzzerCustomMutator(). This way our custom mutator can define mutation semantics *and* have access to dictionary words.
Possible signature: int LLVMFuzzerGetRandomDictionaryWord(uint8_t *Data, size_t MaxSize);
This would increase fuzzing speed by producing fewer bad inputs. Extension 2 can also be implemented via 1, although that requires copying all words to a user-managed, duplicated dictionary data structure.
Let me know what you think. Are we missing a clever way to combine existing functionality to accomplish this?
Thanks,
Julian