Fuzzing a function with multiple parameters

Vincent Ulitzsch

unread,

Apr 13, 2018, 1:45:28 PM4/13/18

to libfuzzer

Hi,

I want to fuzz a function that accepts multiple arguments.

In particular, I want to fuzz a function that accepts "bytecode" and "data", given to the bytecode, when that particular bytecode is executed. I tried to write a testharness that splits the libFuzzer input, such that the first four bytes determine how much of the input is code and the rest of the LibFuzzer input is given to the function as data. So the code of my testharness looks something like this:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  if (size <= 4)
    return 0;
  // Extract Bytecode length:
  uint8_t codeLen = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
  if (codeLen+4 < size){
     return 0;  
  }
  uint8_t *code = &data[4];
  uint8_t *input = &data[4+codeLen];
  execByteCode(code,codeLen,input,size-codeLen-4);
  return 0;
}

But this does not seem satisfactory to me: One because it dismisses a lot of input libFuzzer generates. 
And second because I have a hunch that it biases the Fuzzer towards generating longer files, because the codeLen
numbers can get really big (But this might as well not be true). I have thought about multiple other splitting 
methods, but none that really satisfied me.

So my question is, what is best practice when encountering such a scenario? 

Best & Thanks, 
Vincent

Konstantin Serebryany

unread,

Apr 13, 2018, 1:58:56 PM4/13/18

to Vincent Ulitzsch, libfuzzer

Hi Vincent,

What you have described is one of the most commonly use approaches for this kind of task.

Another approach is to define a multi-byte sentinel (magic value) and split the input at that sentinel

(and reject inputs that don't have the sentinel, or have it more than once).

We are considering to add some "standard" libFuzzer API for this task, but so far didn't come up with anything really good.

More comments inline

On Fri, Apr 13, 2018 at 2:42 AM, Vincent Ulitzsch <vincent....@gmail.com> wrote:

Hi,

I want to fuzz a function that accepts multiple arguments.
In particular, I want to fuzz a function that accepts "bytecode" and "data", given to the bytecode, when that particular bytecode is executed. I tried to write a testharness that splits the libFuzzer input, such that the first four bytes determine how much of the input is code and the rest of the LibFuzzer input is given to the function as data. So the code of my testharness looks something like this:
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  if (size <= 4)
    return 0;
  // Extract Bytecode length:
  uint8_t codeLen = 

You probably want this to be uint32_t.

Also, you may simply do this: memcpy(&codeLen, data, sizeof(codeLen));

data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
  if (codeLen+4 < size){
     return 0;  
  }
  uint8_t *code = &data[4];
  uint8_t *input = &data[4+codeLen];

Not that this way you will not catch some of the buffer overflows, e.g. if execByteCode reads one byte past the code buffer

it will land in the input buffer, which are the same heap allocation, and thus asan won'g flag it.

We usually recommend creating separate heap allocations for these buffers.


  execByteCode(code,codeLen,input,size-codeLen-4);
  return 0;
}

But this does not seem satisfactory to me: One because it dismisses a lot of input libFuzzer generates.

That's fine. an input dismissed very early costs very little.

And second because I have a hunch that it biases the Fuzzer towards generating longer files, because the codeLen
numbers can get really big (But this might as well not be true).

coverage guided fuzzing will fix this to some extent, I wouldn't worry too much.

Also, libFuzzer tries hard to minimize the inputs anyway.

I have thought about multiple other splitting 
methods, but none that really satisfied me.

So my question is, what is best practice when encountering such a scenario? 

Best & Thanks, 
Vincent

--
You received this message because you are subscribed to the Google Groups "libfuzzer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libfuzzer+unsubscribe@googlegroups.com.
To post to this group, send email to libf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/libfuzzer/dd2946ff-0061-488c-b38f-5c81a4029c77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Markus Teufelberger

unread,

Apr 16, 2018, 9:19:57 AM4/16/18

to libfuzzer

I thought this is a task for https://github.com/google/libprotobuf-mutator?

Meaning one defines all possible input parameters, uses the protobuf-mutator to generate the actual inputs and feeds them to the function that way...

Konstantin Serebryany

unread,

Apr 16, 2018, 12:52:11 PM4/16/18

to Markus Teufelberger, libfuzzer

Yes, protobuf-mutator is another solution,

but for many use cases too heavy-weight.

On Mon, Apr 16, 2018 at 6:19 AM, Markus Teufelberger <markusteu...@gmail.com> wrote:

I thought this is a task for https://github.com/google/libprotobuf-mutator?
Meaning one defines all possible input parameters, uses the protobuf-mutator to generate the actual inputs and feeds them to the function that way...

--

You received this message because you are subscribed to the Google Groups "libfuzzer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libfuzzer+unsubscribe@googlegroups.com.
To post to this group, send email to libf...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/libfuzzer/d33d02ab-0162-4b8a-84ba-4488be7c649b%40googlegroups.com.

Reply all

Reply to author

Forward