Access dictionaries from custom mutators

Julian Lettner

unread,

Oct 2, 2020, 7:16:26 PM10/2/20

to libf...@googlegroups.com

Hi libFuzzer developers!

Hello Kostya!

libFuzzer maintains dictionaries that can improve fuzzing efficiency:

https://llvm.org/docs/LibFuzzer.html#dictionaries

  // Dictionary provided by the user via -dict=DICT_FILE.
  Dictionary ManualDictionary;
  // Persistent dictionary modified by the fuzzer, consists of
  // entries that led to successful discoveries in the past mutations.
  Dictionary PersistentAutoDictionary;

For some of our custom mutators (LLVMFuzzerCustomMutator()) it would be beneficial to have access to these dictionaries.

From a design perspective, are you okay with adding a way to access these dictionaries (in a backwards-compatible way)?

Thanks,

Julian

Konstantin Serebryany

unread,

Oct 7, 2020, 2:45:33 PM10/7/20

to Julian Lettner, libfuzzer

Hi Julian,

Yea, as you might have guessed I am not super excited about exposing the internals of LF in the interface.

I would rather look at exposing the functionality, not the implementation detail.

There is already LLVMFuzzerMutate which gives you the dictionaries indirectly -- it will apply a random dictionary entry

to the input with some probability (or will do some other mutation).

If you need a finer control, let's think of an API function that will give such finer control w/o exposing the implementation details.

--kcc

--
You received this message because you are subscribed to the Google Groups "libfuzzer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libfuzzer+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/libfuzzer/671A03A3-D6BC-447D-B19C-AF206B33E4A7%40apple.com.

Julian Lettner

unread,

Oct 8, 2020, 6:06:12 PM10/8/20

to Konstantin Serebryany, libfuzzer, Filippo Bigarella, Mateusz Krzywicki

Hi Kostya!

On Oct 7, 2020, at 11:45 AM, Konstantin Serebryany <konstantin....@gmail.com> wrote:

Hi Julian,

Yea, as you might have guessed I am not super excited about exposing the internals of LF in the interface.
I would rather look at exposing the functionality, not the implementation detail.

I absolutely agree.

There is already LLVMFuzzerMutate which gives you the dictionaries indirectly -- it will apply a random dictionary entry
to the input with some probability (or will do some other mutation).

If you need a finer control, let's think of an API function that will give such finer control w/o exposing the implementation details.

The goal is to speed up fuzzing when we have some knowledge of the input structure and permissible data (but can’t use protobufs).

We are thinking about two extension points:

1. A way to influence (and observe) which words are added to libFuzzer’s dictionaries. This will influence the other dictionary-based mutators in libFuzzer.

Possible signature: bool LLVMFuzzerShouldAddWordToDictionary(const uint8_t *Data, size_t Size);

This would allow us to prevent libFuzzer getting “side-tracked” on certain keywords, because, e.g., simply repeating them yields an increase in the feedback metrics. Based on the application-specific knowledge we have, we can tell the fuzzer “please, even if you think this is a good addition to the dictionary, just don’t, trust me”.

2. A “give me random word” query function which can be used from LLVMFuzzerCustomMutator(). This way our custom mutator can define mutation semantics *and* have access to dictionary words.

Possible signature: int LLVMFuzzerGetRandomDictionaryWord(uint8_t *Data, size_t MaxSize);

This would increase fuzzing speed by producing fewer bad inputs. Extension 2 can also be implemented via 1, although that requires copying all words to a user-managed, duplicated dictionary data structure.

Let me know what you think. Are we missing a clever way to combine existing functionality to accomplish this?

Thanks,

Julian

Konstantin Serebryany

unread,

Oct 13, 2020, 12:08:15 AM10/13/20

to Julian Lettner, libfuzzer, Filippo Bigarella, Mateusz Krzywicki

Hi Julian,

These two interface functions sound reasonable at first glance, but then ...

will it be any better than passing a dictionary directly to the custom mutator?

Do you think you need this for the manual dictionary (from an external file)

or for the automatic dictionary (from e.g. memcmp interceptors)? Or both?

Manual dictionary entries should be much simpler to implement directly in the custom mutator.