List of possible modifications to the March 2017 proposal

t hepudds

unread,

May 8, 2019, 4:39:13 PM5/8/19

to Golang Fuzzing

Hello there,

Here is a list of some possible modifications for the March 2017 proposal.

Some of these won't make sense or are smarter to defer, but I wanted to at least attempt to enumerate a partial list of topics for discussion.

This list is based on some personal notes I had built up, and hence this list a bit rough.

For reference, the March 2017 proposal itself is here: https://docs.google.com/document/u/1/d/1zXR-TFL3BfnceEAWytV8bnzB2Tfp6EPFinWVJ5V4QC8/pub

===================================================================

Possible modifications to the March 2017 Proposal

===================================================================

-------------------------------------------------------------------

1. Include -fuzztime?

-------------------------------------------------------------------

Some of the commentators at #19109 suggested -fuzztime duration as a way of controlling when to stop fuzzing. The proposal document does not include -fuzztime and go-fuzz does not support it, but it seems useful in general and -fuzztime is in the fzgo prototype (and it proved useful while testing the prototype).

-------------------------------------------------------------------

2. Change the instrumented cache location: GOPATH/pkg/____?

-------------------------------------------------------------------

The March 2017 proposal suggested GOPATH/pkg/GOOS_GOARCH_fuzz for a cache, but the fzgo prototype instead uses GOPATH/pkg/fuzz/GOOS_GOARCH. That seemed closer to Go 1.11's new GOPATH/pkg/mod, for example.

-------------------------------------------------------------------

3. Allow multiple fuzz functions to match?

-------------------------------------------------------------------

The March 2017 proposal says:

"-fuzz flag must match one and only one fuzz function. This is different from -run and -bench flags which allow multiple matches. It's unclear what would be the behavior when multiple fuzz functions are selected, so for now we restrict it to only one function. Selecting multiple packages is not supported either. The restriction can be removed when/if we figure out a sane behavior for this."

Initially, the fzgo prototype disallowed multiple fuzz functions to match (per the March 2017 proposal), but as an experiment fzgo now allows multiple fuzz functions to match in order to support something like 'go test -fuzz=. ./...' when there are multiple fuzz functions across multiple packages. Fuzzing happens in a round-robin manner if multiple fuzz functions match. Each matching fuzz function initially has a short turn, and then progressively longer turns executing.

Some related discussion in:

go-fuzz issue #230: "make it easier to fuzz multiple functions"

https://github.com/dvyukov/go-fuzz/issues/230

This one seems fine to postpone, though I suspect the majority of work to implement it is required by the suggested behavior in the proposal for `go test ./...` (without a -fuzz flag specified) to generate new inputs for "1s in normal mode" given `go test ./...` can match multiple packages with Fuzz functions in that scenario (which is partly why the fzgo prototype supports multiple fuzz functions matching the -fuzz argument).

-------------------------------------------------------------------

5. Running go test . shouldn't dirty your VCS status

-------------------------------------------------------------------

Some discussion:

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-479247350

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-479393958

Dmitry wrote there: "This sounds reasonable. I would say a requirement."

-------------------------------------------------------------------

6. Should `go test` without `-fuzz` ever be non-deterministic?

-------------------------------------------------------------------

It seems reasonable to expect non-determinism if you explicitly ask for fuzzing, such as by supplying `-fuzz`.

However, if you are just doing something like `go test .` (without a `-fuzz` flag), then any non-determinism should be carefully evaluated given people might not expect non-determinism in that situation.

One example would be in a normal (non-fuzz) invocation of something like `go test ./...`. If that fails in something like travis, can you repeat that failure?

There are multiple forms of non-determinism, including:

6a. Non-determism from `rand` or similar

The March 2017 proposal says:

"go test runs fuzz functions as unit tests. ... Fuzz functions are executed on all inputs from the corpus and on some amount of newly generated inputs (for 1s in normal mode and for 0.1s in short mode)"

If there are newly generated inputs, are those random inputs? Or deterministic in some way? Or random, but with a seed printed if there is a failure (so that the same seed can be re-used to recreate the failure if needed)?

6b. Execution time-based non-determinism

If the execution time is "1s in normal mode and for 0.1s in short mode", does that mean you might get a failure on a powerful machine that does not occur within the same wall clock time on a less powerful machine?

-------------------------------------------------------------------

7. Should the proposal include a richer Fuzz function signature, or wait on that?

-------------------------------------------------------------------

For example, should the propsoal for the first version to include something like:

FuzzRegexp(f *testing.F, re string, data []byte, posix bool)

Or is it better to wait and not propose that as part of the first version?

The proposal currently suggests the signature as:

"Fuzz function accepts two arguments: (*testing.F, data []byte)."

and also says:

"The fuzz function signature can later be allowed to accept multiple randomly-generated arguments of different types. ...

But multiple values can be extracted from the byte slice, just with more work and leading to worser fuzzer efficiency (as it will not understand the structure of the input). So such support is explicitly not part of this proposal."

Some related discussion:

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-471690355

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-471695939

-------------------------------------------------------------------

8. Change how the paths are created in <pkgpath>/testdata, and/or when -fuzzdir is specified?

-------------------------------------------------------------------

There might be some minor tension between making things simple for something like oss-fuzz vs. making things friendly for interactive execution by a human, but that tension is hopefully solvable.

Some related discussion in:

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-472090597

and the next few comments

including one possible approach:

* an empty -fuzzdir could imply <pkgpath>/testdata/fuzz/FuzzFunc/corpus (so by default, the package's testdata directory is used, and there is no redundancy on the package name)

* a non-empty -fuzzdir such as -fuzzdir=/tmp/fuzz could mean it creates directories corresponding to the package path (e.g., /tmp/fuzz/example.com/full/package/path/FuzzFunc/corpus).

or another possible approach:

* Another option is to just -fuzzdir as is and don't append anything. Then it's implied to be user (script, infrastructure) responsibility to pass proper paths in whatever convention they want. Then we can drop package name from testdata.

-------------------------------------------------------------------

9. Should the proposal address how a corpus repo is found?

-------------------------------------------------------------------

Large projects will likely have a separate corpus repo, or perhaps even store the corpus outside of VCS.

How should a separate corpus repo be found?

Convention? Automatically walk directories to find a sibling? A new env variable? Does the GOFLAGS env variable help? Do modules help?

Some discussion:

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-479393430

-------------------------------------------------------------------

10. Source-to-source transform vs. compiler-based instrumentation?

-------------------------------------------------------------------

Primary discussion in:

https://github.com/golang/go/issues/29430

Does it make sense for the proposal to suggest a perhaps more likely to be accepted approach of source-to-source implementation to start, and save modifying the compiler for possible future work? The results would not be as good and might have more corner cases, so perhaps not.

Now that the new 1.13 `-trimpath` support landed in #16860, a related question from that same issue in comment:

https://github.com/golang/go/issues/29430#issuecomment-485797818

"I think I might understand how that helps with caching the build results of the transformed code (where that transform code might be in a temporary directory).

When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future go test -fuzz=.), then that piece of code could directly use the cmd/go/internal/cache internal API to cache the s2s transform results?"

-------------------------------------------------------------------

11. Allow multiple corpus locations?

-------------------------------------------------------------------

This is a longer topic, but wanted to at least record the topic itself.

Dmitry wrote:

"I wonder if it's a good idea to instead allow 2/2+ directories with input corpus?

For example, if we read inputs from testdata/something/something, but also from -fuzzdir/-workdir if provided. Then testdata/ could contain hand-written inputs and regression tests and is checked-in with the code (that's small number of higher-quality inputs with low churn, so no different from unit-tests and makes sense to check-in). The second dir can contain the random inputs, there are more of them and high churn. So that is preferably checked-in somewhere else (stored in an archive or something else). Then workflow would be to simply copy the crashing input from the second dir into testdata/ and run go test -run=file_name (if the auto-generated regression test uses t.Run then this will work auto-magically)."

That quote is from:

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-478467743

And then the probably too enthusiastic response:

https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-479247350

-------------------------------------------------------------------

END

-------------------------------------------------------------------

I actually have a slightly longer list that is a bit rougher. I will hopefully return to that to try to post later, but wanted to at least post this partial list...

Regards,

thepudds

Josh Bleecher Snyder

unread,

May 8, 2019, 4:51:00 PM5/8/19

to t hepudds, Golang Fuzzing

> 7. Should the proposal include a richer Fuzz function signature, or wait on that?

The Go project strongly prefers stability; to the extent possible,
questions like this should be prototyped and answered externally.

So I think we should continue to improve go-fuzz where it is, and at
least try to answer this question in situ.

> "I think I might understand how that helps with caching the build results of the transformed code (where that transform code might be in a temporary directory).
> When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future go test -fuzz=.), then that piece of code could directly use the cmd/go/internal/cache internal API to cache the s2s transform results?"

Your understanding is correct: Adding -trimpath would speed up the 'go
build' part.

Right now, the instrumentation itself is pretty cheap and fast.
(Although I do have ideas for deeper analysis I'd like to do that
would change that.) So while it would be nice to be able to cache
instrumented code, I don't think it is a priority.

The other significant build time cost is in go/packages. I believe
that the tools team is actively working on optimizing go/packages.

I'm not sure that this is an area that needs more attention in the
near term, once the -trimpath piece is sorted out. I have a local
change that checks the go version and adds -trimpath as appropriate; I
will turn it into a PR once I can confirm it actually moves the
needle, which it does not right now, due to a bug.

> That quote is from:
> https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-478467743
>
> And then the probably too enthusiastic response:
> https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-479247350

FWIW, I think multiple corpus locations is a great idea. And there is
no need to apologize for your enthusiasm. :)

-josh

Romain Baugue

unread,

May 8, 2019, 6:22:53 PM5/8/19

to Golang Fuzzing

Here are a few thoughts while reading your list:

1. Completely agree with this. It seems very useful, given that in an automated setting one cannot Ctrl+C the fuzzer to stop it.

2. Anything goes for now as long as it's coherent I think. It may change later when (if) integrating everything into the go tool, but for now….

3. This seems a core feature of what we want to achieve. It's not immediately critical, but it shouldn't be too low on the priority list given its huge impact on the inner-working of the tool: I think the way we handle multiple fuzzing functions may be the biggest issue at hand.

4. You're missing and entry here :D

5. Agree with that.

6. I would add that in this case, fuzzs will behave the same way as benchmarks: non of them are deterministic, and this is exactly why they are not the default (in addition to the time consumption thing).

7. I think the way multiple arguments is handled could mean big changes in the final implementation, so even if this is not the very first thing to try, I would at least try a few solutions in the prototype and include it in the proposal, and have it ready for the first version (hopefully).

8, 9. I would postpone this for later I think, it doesn't seem like it would be very complex to change later and actually working on (or with) the prototype may yield a convention that will become obvious.

10. Not sure how this works exactly as I'm just not literate enough on the subject. I'll have a look at how go-fuzz works internally in the first place before saying anything.

11. I think we should not need multiple corpus locations: either there is a real need because both corpus are meant to be run differently, or have different purposes, in which case it also makes sense to run the tool twice, or there is none and why have them in the first place?

Josh Bleecher Snyder

unread,

May 8, 2019, 7:06:38 PM5/8/19

to Romain Baugue, Golang Fuzzing

> 3. This seems a core feature of what we want to achieve. It's not immediately critical, but it shouldn't be too low on the priority list given its huge impact on the inner-working of the tool: I think the way we handle multiple fuzzing functions may be the biggest issue at hand.

By way of bringing you up to speed,
https://github.com/dvyukov/go-fuzz/commit/7f2a17801b97aa9d84d34eb8c23c3201b09ded79
took some steps towards this. Finishing the job will be quite tricky;
it will require plumbing quite a lot of context through go-fuzz. I'm
not entirely convinced that it shouldn't be done simply by exec'ing
several copies of go-fuzz, either in serial or in parallel. That
exec'ing could be done by 'go test' or under the hood by 'go-fuzz'
(which starts other copies of go-fuzz, which then starts other copies
of go-fuzz to be workers!) or whatever.

> 8, 9. I would postpone this for later I think, it doesn't seem like it would be very complex to change later and actually working on (or with) the prototype may yield a convention that will become obvious.

I disagree. It sounds like the sort of thing that doesn't matter much,
but it is hard to come up with a design that will work well for a wide
variety of humans, for CI, for oss-fuzz, for fuzzbuzz, etc. This is
one of the areas about which I have the most concern from a design
perspective (even if the code involved is fairly trivial).

-josh

6. Should `go test` without `-fuzz` ever be non-deterministic?

There is also now a "Changes to Proposal Document" section at the top that mentions each of those three items.