Retrieving coverage information in libFuzzer

598 views

Skip to first unread message

Jonas Möller

unread,

May 19, 2022, 1:12:02 PM5/19/22

to libf...@googlegroups.com

Hello,

as part of a research project I am currently trying to port Nezha [1], a differential fuzzing framework based on libfuzzer, to a more recent LLVM version. Unfortunately, Nezha uses the deprecated -fsanitize-coverage=trace-pc instrumentation to get coverage information. This has been replaced by -fsanitize-coverage=pc-table (set implicitly by -fsanitize=fuzzer). If I am not mistaken, the new ModulePCTable reports frequency based coverage information for each PC and module. Is this correct so far?

Furthermore, I want to extend Nezha's coverage metric using execution path information. As far as I can tell, this is not possible with the PC table, since I could not distinguish between the calls "foo-bar-foo" and "foo-foo-bar". Would it instead be possible to hook into the __sanitizer_cov_trace_* callbacks (thus, basically extending TPC.HandleCmp to record execution traces)? Or are there edge cases where the PCTable would record coverage which would not be reported by the __sanitizer_cov_trace_* callbacks (e.g. PCTable reports edge-level coverage while the callbacks only report bb-level coverage)?

I am grateful for any help or suggestions.

Sincerely,
Jonas

[1] https://github.com/nezha-dt/nezha

Konstantin Serebryany

unread,

May 23, 2022, 3:40:47 PM5/23/22

to Jonas Möller, libfuzzer

Hi Jonas,

On Thu, May 19, 2022 at 10:12 AM Jonas Möller <jo.mo...@tu-braunschweig.de> wrote:

Hello,

as part of a research project I am currently trying to port Nezha [1], a differential fuzzing framework based on libfuzzer, to a more recent LLVM version.

I am a fan of the Nezha approach!!

>> NEZHA exploits the behavioral asymmetries between multiple test programs to focus on inputs that are more likely to trigger semantic bugs.

(purely based on reading the paper, I haven't actually tried it in full)

Unfortunately, Nezha uses the deprecated -fsanitize-coverage=trace-pc instrumentation to get coverage information. This has been replaced by -fsanitize-coverage=pc-table (set implicitly by -fsanitize=fuzzer).

pc-table is not an instrumentation, it simply creates a table to be used with either =trace-pc-guard or =inline-8bit-counters.

-fsanitize=fuzz uses =inline-8bit-counters,pc-table

If I am not mistaken, the new ModulePCTable reports frequency based coverage information for each PC and module. Is this correct so far?

err. Not sure I understand this :(

Furthermore, I want to extend Nezha's coverage metric using execution path information. As far as I can tell, this is not possible with the PC table, since I could not distinguish between the calls "foo-bar-foo" and "foo-foo-bar".

Right, =inline-8bit-counters,pc-table can't give you paths.

=trace-pc or =trace-pc-guard can.

Would it instead be possible to hook into the __sanitizer_cov_trace_* callbacks (thus, basically extending TPC.HandleCmp to record execution traces)?

Mmm. That would be a very indirect way to get paths.

I'd rely on =trace-pc-guard instead.

Or are there edge cases where the PCTable would record coverage which would not be reported by the __sanitizer_cov_trace_* callbacks (e.g. PCTable reports edge-level coverage while the callbacks only report bb-level coverage)?

I am grateful for any help or suggestions.

We are very close to open-sourcing another fuzzing engine where what you want might be easier to achieve.

BTW, implementing something Nezha-like in that engine is on my list :)

Stay tuned for details.

Sincerely,
Jonas

[1] https://github.com/nezha-dt/nezha

--
You received this message because you are subscribed to the Google Groups "libfuzzer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libfuzzer+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/libfuzzer/CA5253C5-FF8A-4452-9484-0939FCD12878%40tu-braunschweig.de.

Jonas Möller

unread,

May 23, 2022, 8:26:21 PM5/23/22

to libfuzzer, Konstantin Serebryany

Hello Konstantin,

thanks for your response!

If I am not mistaken, the new ModulePCTable reports frequency based
coverage information for each PC and module. Is this correct so far?

err. Not sure I understand this :(

Just for clarification: I was referring to the "ModulePCTable" variable in the "TracePC"-class (FuzzerTracePC.cpp) and more specifically the UpdateObservedPCs function:

      for (size_t i = 0; i < NumModules; i++) {
        auto &M = Modules[i];
        for (size_t r = 0; r < M.NumRegions; r++) {
          auto &R = M.Regions[r];
          if (!R.Enabled) continue;
          for (uint8_t *P = R.Start; P < R.Stop; P++)
            if (*P) {
              const PCTableEntry *TE = &ModulePCTable[i].Start[M.Idx(P)]
              Observe(TE);
            }
        }
      }

My interpretation of this code is that each region of a module references a section of an array (using R.Start and R.Stop). Each item in the array (references by *P) represents the amount of calls to the respective PC. So if *P == 0 the corresponding PC has not been accessed during execution.

Or, as a debug explanation:

printf("PC: %lu has been called %d time(s)\n", ModulePCTable[i].Start[M.Idx(P)].PC, *P);

So, as you said in your response, I am not able to get execution paths (e.g. foo-bar-foo), but only frequency information (e.g. foo: 2, bar: 1) from this information.

pc-table is not an instrumentation, it simply creates a table to be used
with either =trace-pc-guard or =inline-8bit-counters.
-fsanitize=fuzz uses =inline-8bit-counters,pc-table

A little bit off topic: I am a little bit confused by this. I found that trace-pc-guard has been deprecated since commit 50a1c697127749eec567d14819d549b63af1242f and has been replaced with pc-table. Unfortunately, I have not found a reason for this switch. If both can be used (seemingly interchangeably) why was pc-table preferred?

As I far as I understand your explanation, it is possible to use pc-table and trace-pc-guard in conjunction, but trace-pc-guard can not be used with inline-8bit-counters? Or in other words, both trace-pc-guard and inline-8bit-counters can be used to populate the pc-table?

Would it instead be possible to hook into the __sanitizer_cov_trace_*
callbacks (thus, basically extending TPC.HandleCmp to record execution
traces)?

Mmm. That would be a very indirect way to get paths.
I'd rely on =trace-pc-guard instead.

Since libFuzzer currently uses inline-8bit-counters (and this method is incompatible with trace-pc-guard) wouldn't this require a sizeable rewriting of the tracing logic to populate the pc-table? I would like to avoid rewriting large sections and keep the changes to a minimum. Or is there something I am missing?

We are very close to open-sourcing another fuzzing engine where what you
want might be easier to achieve.
BTW, implementing something Nezha-like in that engine is on my list :)
Stay tuned for details.

I am definitely curious about the result :)

Sincerely,

Jonas

PS: I hope this reaches the mailing list correctly and does not create a new thread.

Konstantin Serebryany

unread,

May 25, 2022, 12:45:02 PM5/25/22

to Jonas Möller, libfuzzer

On Mon, May 23, 2022 at 5:26 PM Jonas Möller <jo.mo...@tu-braunschweig.de> wrote:

Hello Konstantin,

thanks for your response!
If I am not mistaken, the new ModulePCTable reports frequency based
coverage information for each PC and module. Is this correct so far?
err. Not sure I understand this :(
Just for clarification: I was referring to the "ModulePCTable" variable in the "TracePC"-class (FuzzerTracePC.cpp) and more specifically the UpdateObservedPCs function:

for (size_t i = 0; i < NumModules; i++) {
auto &M = Modules[i];
for (size_t r = 0; r < M.NumRegions; r++) {
auto &R = M.Regions[r];
if (!R.Enabled) continue;
for (uint8_t *P = R.Start; P < R.Stop; P++)
if (*P) { const PCTableEntry *TE = &ModulePCTable[i].Start[M.Idx(P)]
Observe(TE);
}
}
}

My interpretation of this code is that each region of a module references a section of an array (using R.Start and R.Stop). Each item in the array (references by *P) represents the amount of calls to the respective PC. So if *P == 0 the corresponding PC has not been accessed during execution.

Or, as a debug explanation:

printf("PC: %lu has been called %d time(s)\n", ModulePCTable[i].Start[M.Idx(P)].PC, *P);

So, as you said in your response, I am not able to get execution paths (e.g. foo-bar-foo), but only frequency information (e.g. foo: 2, bar: 1) from this information.

RIght. These are counters and nothing more.

pc-table is not an instrumentation, it simply creates a table to be used
with either =trace-pc-guard or =inline-8bit-counters.
-fsanitize=fuzz uses =inline-8bit-counters,pc-table
A little bit off topic: I am a little bit confused by this. I found that trace-pc-guard has been deprecated since commit 50a1c697127749eec567d14819d549b63af1242f and has been replaced with pc-table. Unfortunately, I have not found a reason for this switch.

Performance.

If we only need the counters, then using inline-8bit-counters is much faster.

trace-pc-guard was removed from fsanitize=fuzzer, but it's not deprecated as an instrumentation machamism available in SanitizerCoverage.

If both can be used (seemingly interchangeably) why was pc-table preferred?

As I far as I understand your explanation, it is possible to use pc-table and trace-pc-guard in conjunction, but trace-pc-guard can not be used with inline-8bit-counters?

It is entirely possible to use both at the same time.

% cat cov.cc
#include <stdio.h>
__attribute__((noinline))
void foo() { printf("foo\n"); }

int main(int argc, char **argv) {
if (argc == 2)
foo();
printf("main\n");
}
% clang -O2 -fsanitize-coverage=inline-8bit-counters,trace-pc-guard,pc-table cov.cc
% objdump -d a.out | grep -A 20 main.:
00000000004280b0 <main>:
4280b0: 53 push %rbx
4280b1: 89 fb mov %edi,%ebx
4280b3: bf 84 fb 43 00 mov $0x43fb84,%edi
4280b8: e8 43 1b ff ff call 419c00 <__sanitizer_cov_trace_pc_guard> <<<<< trace-pc-guard
4280bd: 80 05 cd 7a 01 00 01 addb $0x1,0x17acd(%rip) # 43fb91 <__start___sancov_cntrs+0x1> <<<<< inline-8bit-counters

libFuzzer currently doesn't do it, but nothing prevents you from doing it.

But, if you already inject calls into the code via trace-pc-guard, there is little point in using inline-8bit-counters
because you can increment counters inside the trace-pc-guard callback.

Or in other words, both trace-pc-guard and inline-8bit-counters can be used to populate the pc-table?

pc-table is populated at compile/link time, and is a static information about the PCs that are instrumented by either (or both) of trace-pc-guard and inline-8bit-counters

Would it instead be possible to hook into the __sanitizer_cov_trace_*
callbacks (thus, basically extending TPC.HandleCmp to record execution
traces)?
Mmm. That would be a very indirect way to get paths.
I'd rely on =trace-pc-guard instead.
Since libFuzzer currently uses inline-8bit-counters (and this method is incompatible with trace-pc-guard) wouldn't this require a sizeable rewriting of the tracing logic to populate the pc-table? I would like to avoid rewriting large sections and keep the changes to a minimum. Or is there something I am missing?

It should be possible to make relatively local modifications in libFuzzer to support trace-pc-guard.