auto generating fuzzers from normal functions and methods

53 views
Skip to first unread message

thepudds

unread,
Sep 25, 2019, 6:55:46 AM9/25/19
to golang-fuzzing-proposal
Hi there,

Starting with a summary for the busy reader...

Summary:

  * You could imagine more than one community tool for making it easy to generate fuzz functions that a user would then check in. 

  * Rich signature support makes that easier.

  * Some of those tools might generate many fuzz functions.

  * The "first class fuzzing" proposal should ideally be reasonably graceful in the face of many fuzz functions across many packages.

Additional Details:

As a companion mini-project, I have a first cut of a 'genfuzzfuncs' utility.  This is something that I don't think should be part of 'cmd/go' or part of the "first class support for fuzzing" proposal, but it is a simple example of what other people might build as community tools if that proposal moves forward. 

'genfuzzfuncs' is similar in spirit to something like 'cweill/gotests' (which automatically generates table-driven test functions based on the user's functions). 'genfuzzfuncs' automatically outputs a bunch of fuzz functions for the user's APIs based on a user-supplied package name and optional func regex (defaulting to targetting public APIs only). It relies on the current fzgo support for fuzzing rich signatures.

For example, if you run genfuzzfuncs against github.com/google/uuid, it generates a uuid_fuzz.go file with 30 or so functions like:

    func Fuzz_UUID_MarshalText(u1 uuid.UUID) {
        u1.MarshalText()
    }

    func Fuzz_UUID_UnmarshalText(u1 *uuid.UUID, data []byte) {
        if u1 == nil {
            return
        }
        u1.UnmarshalText(data)
    }
    
The full set of auto-generated uuid fuzzing functions is here: https://github.com/thepudds/fzgo/blob/master/genfuzzfuncs/examples/uuid/uuid_fuzz.go

The intent is for a quick way to generate a bunch of fuzz functions. Whereas 'cweill/gotests' generates empty skeletons, 'genfuzzfuncs' tries to generate fuzz functions that are runnable immediately. If needed, they can be edited (including deleting something that shouldn't be fuzzed, or adding a bit of validation logic or checking of invariants or whatever other small bits of additional smarts, or merging some functions, etc.), and then the theory is you would check-in the results to use with cmd/go's fuzzing support.

You can then fuzz them by name, or run many at once across all the packages in your project at once via something like:

   fzgo test -fuzz=. ./...

While I don't think something like 'genfuzzfuncs' would be part of cmd/go, I think one relationship to the "first class fuzzing" proposal is that people might end up with 10s or 100s of fuzz functions without too much manual toil, and it is probably important for the "first class fuzzing" proposal to handle many fuzz functions reasonably well. More concretely, the ability to fuzz many functions in one invocation is important (which the proposal now includes), as well (arguably), I think it relates to the questions around "how should >1 corpus location be managed", including (even more arguably) is it easy to transition between different corpus locations when you have 10s or 100s of fuzz functions that might be in different directory hierarchies across 10s or more packages within your project.

   
In that example, it is fuzzing 12 or so modules-related functions. A human might not take the time to hand write a fuzz function or set of fuzz functions for all of those functions.

But if it takes almost no human time to create those, then why not? Especially if the end result is a hardening of the API, or in some cases at least double-checking that any "expected" panics from invalid input are documented. (For example, if run 'genfuzzfuncs' against the stdlib strings package, it immediately finds that strings.NewReplacer panics if given an odd number of strings in the slice parameter. If you were the author of that package, you could then make the choice to make sure that is documented, and then manually add the 2-3 lines to the auto-generated fuzz function to return gracefully if handed an odd length slice).

In any event, mainly just some food for thought at this point... but the combination of first class cmd/go support + rich signatures support + some community tools to make it easy to autogenerate fuzz functions might be part of what makes things very easy to get started, and help people get hooked and see the value, and help transition fuzzing from being thought of as a security tool to instead be a commonly used tool for "normal" developers. (It might be hard for fuzzing to be as common as unit testing, but maybe fuzzing can get at least get to the point of being thought of as being in the same ball park as unit testing?).

Regards,
thepudds

thepudds

unread,
Sep 25, 2019, 8:23:45 AM9/25/19
to golang-fuzzing-proposal
A few more quick points about 'genfuzzfuncs' as it stands.

The details in this post are maybe less directly applicable to the "first class fuzzing proposal" discussion (aside from -- if it's easy, people might have many packages with fuzz functions).

The examples in the prior post are from the version of 'genfuzzfuncs' that I posted to the fzgo repo a few months ago, as are points 1-2 below, though points 3-4 are about a WIP local version I have.

1. You can optionally ask genfuzzfuncs to hunt for a suitable constructor to use if you are fuzzing a method. For example, if asked to generate fuzz functions for the stdlib strings package with the constructor option enabled, it finds and uses the strings.NewReplacer constructor to help fuzz the NewReplacer.Replace method:

func Fuzz_Replacer_Replace(oldnew []string, s string) {
r := strings.NewReplacer(oldnew...)
r.Replace(s)
}

In that example, it automatically creating a "rich signature" function that merged the parameters needed for the constructor (oldnew []string in this example) with the parameters needed for the method (s string in this example).

That is helpful for example if a user's API takes a struct that only has private variables but there is a constructor for that struct where the constructor takes more easily fuzzable primitive parameters.

2. By default, the autogenerated fuzz functions check for nil for pointer parameters (for example, in order to avoid panicing on nil receivers by default), though that can be turned off. It is an example of a knob that is probably useful, but maybe more importantly, the total number of "probably useful knobs" is such that it wouldn't make sense for it to be part of cmd/go.

3. While supporting arbitrary interfaces in a function signature is not a goal, it is a goal to support at least some of the most common stdlib interfaces that could be automatically supported. For example, I have most of the pieces in place to handle a user's signature that contains an io.Reader or io.Writer (e.g., creating an ioutil.Discarder inside the visible wrapper if the user's API has an io.Writer, though maybe that is a bit debatable as an approach).

4. I've also have most of the pieces working for an option to asking genfuzzfuncs to generate a bunch of functions or methods under test in loop with a big switch statement inside, where the fuzzing in effect would control the sequence of functions called, which could help automatically find problematic sequences of functions or methods to call, and which could be a simplified but automated way to generate something like SMAT (e.g., https://github.com/mschoch/smat/blob/master/examples/bolt/boltsmat.go).  The intent is also to support an additional option to invoke the actions in separate goroutines under the race detector, though that doesn't currently work with go-fuzz. That would not be super sophisticated, but could be easy for a user to get a first cut quickly. (Side note: this might imply an additional API for rich signatures to make that convenient, but that probably also better as a topic of another thread).

Regards,
thepudds

Dmitry Vyukov

unread,
Oct 23, 2019, 9:46:10 AM10/23/19
to thepudds, golang-fuzzing-proposal
This is very cool!

Have you tried to run it on stdlib? If yes, what are the results?
Finding new bugs in stdlib is always a good demo of a tool :)

I agree that this should be an independent layer on top of go-fuzz (or go tool).
I agree that structured inputs are especially helpful here.
And I agree that multiple fuzz functions should be supported
gracefully to move fuzzing closer to unit testing (you have lots of
unit tests and you are not restricted to running 1 test at a time, you
just say "run the tests").

Why is the constructor mode is optional?

I agree that generating nil checks is probably the right thing. Since
it's doing assumptions and will have some false positives, the right
way to look at it may be as follows: it generates as much code as
possible, maybe with blocks with some additional checks, maybe with
some TODOs for a human (e.g. // TODO: delete this if the method should
handle nil) and then human looks at it at deletes excessive code
(deleting code is easier than writing!). Then maybe it can avoid some
of the knobs?

It would be super awesome if it could detect round-trip functions
(like UUID.MarshalText/UnmarshalText) and generate round-trip fuzz
tests. Not sure how hard it is to do, but could be tuned to work at
least on main things from stdlib (compress/encoding).

Another idea looking at uuid_fuzz.go. If I would be writing a fuzz
function for that package I would try to combine as many methods in a
single Fuzz function as possible. E.g. Unmarshal and then call all of
the accessors. This would both help to reduce the number of fuzz
functions, but also would help with coverage guiding (one method helps
to progress with another, then vise versa and so on). Though, an
obvious problem is that some methods may already receive the object in
some limited states (rather than full generality of what fuzzer could
generate). Perhaps your idea of calling different methods in a loop is
better solution for this.
> --
> You received this message because you are subscribed to the Google Groups "golang-fuzzing-proposal" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-fuzzing-pr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-fuzzing-proposal/d220ac98-0dc2-4a5e-938d-3c68751775bb%40googlegroups.com.

t hepudds

unread,
Dec 17, 2021, 5:55:21 AM12/17/21
to Dmitry Vyukov, golang-fuzzing-proposal
Hi Dmitry,

> "Have you tried to run it on the stdlib?"

Sorry for the 2+ year delay in answering your question, but yes, I did run fzgo/genfuzzfuncs on stdlib, which resulted in 2k+ fuzzing targets.

Hopefully more interesting though:

I recently took the project off pause, updated it to work with cmd/go native fuzzing, and renamed it 'fzgen':
   
   https://github.com/thepudds/fzgen
   
It currently supports:

   * automatically finding problematic call sequences by looping over calls, with the sequence & args under the control of the fuzzer.
      -- this was part of what I was describing in this thread here a couple of years ago.
   * can automatically wire outputs to inputs and reuse input args across API calls.
   * can emit a valid chunk of Go code as a reproducer for a given crasher (currently best effort).
   * auto-gen fuzzing wrappers for Go 1.18.
   * supports rich types (e.g, structs, maps, some common interfaces like io.Reader, etc.), even with Go 1.18.

Three quick examples from fzgen if interested:

   1. Finding a data race by looping over a chain of calls under the control of the fuzzer (bug is in toy code):
         https://github.com/thepudds/fzgen#example-easily-finding-a-data-race
     
   2. Using the same automatic technique to report a deadlock (this time in real code):
         https://github.com/thepudds/fzgen#example-finding-a-real-concurrency-bug-in-real-code
     
   3. Automatic roundtrip checks (because google.UUID implements encoding.BinaryMarshaler/TextMarshaler):
         https://github.com/thepudds/fzgen/blob/master/examples/outputs/google-uuid/autofuzzchain_test.go#L118
     
It is still WIP, but I am hoping to have some time to move fzgen forward from here, both in terms of the code gen and the runtime fuzzing behavior, which are designed to work together.

I would be very interested in any quick/brief feedback (from you or anyone else on this list).

Best regards,
thepudds

Dmitry Vyukov

unread,
Dec 20, 2021, 6:12:09 AM12/20/21
to t hepudds, golang-fuzzing-proposal
On Fri, 17 Dec 2021 at 11:55, t hepudds <thepud...@gmail.com> wrote:
>
> Hi Dmitry,
>
> > "Have you tried to run it on the stdlib?"
>
> Sorry for the 2+ year delay in answering your question, but yes, I did run fzgo/genfuzzfuncs on stdlib, which resulted in 2k+ fuzzing targets.

Hi,

I've lost the context a bit :)

I've tried to run it on some of my code and it generated something
reasonably looking:

unc Fuzz_Reporter_ContainsCrash(f *testing.F) {
f.Fuzz(func(t *testing.T, data []byte) {
var cfg *mgrconfig.Config
var output []byte
fz := fuzzer.NewFuzzer(data)
fz.Fill(&cfg, &output)
if cfg == nil {
return
}

reporter, err := report.NewReporter(cfg)
if err != nil {
return
}
reporter.ContainsCrash(output)
})
}

func Fuzz_GetLinuxMaintainers(f *testing.F) {
f.Fuzz(func(t *testing.T, kernelSrc string, file string) {
report.GetLinuxMaintainers(kernelSrc, file)
})
}

Nice!

Katie Hockman

unread,
Dec 20, 2021, 10:51:49 AM12/20/21
to Dmitry Vyukov, t hepudds, golang-fuzzing-proposal
This is extremely cool. I've shared it with the rest of the Go team who's also interested in fuzzing.
I'll likely have some questions/suggestions after I've had more time to look at it, but wanted to let you know how excited I was to see this!

Reply all
Reply to author
Forward
0 new messages