List of possible modifications to the March 2017 proposal

61 views
Skip to first unread message

t hepudds

unread,
May 8, 2019, 4:39:13 PM5/8/19
to Golang Fuzzing
Hello there,

Here is a list of some possible modifications for the March 2017 proposal. 

Some of these won't make sense or are smarter to defer, but I wanted to at least attempt to enumerate a partial list of topics for discussion. 

This list is based on some personal notes I had built up, and hence this list a bit rough.


===================================================================
Possible modifications to the March 2017 Proposal
===================================================================

-------------------------------------------------------------------
1. Include -fuzztime?
-------------------------------------------------------------------

Some of the commentators at #19109 suggested -fuzztime duration as a way of controlling when to stop fuzzing. The proposal document does not include -fuzztime and go-fuzz does not support it, but it seems useful in general and -fuzztime is in the fzgo prototype (and it proved useful while testing the prototype).

-------------------------------------------------------------------
2. Change the instrumented cache location: GOPATH/pkg/____?
-------------------------------------------------------------------

The March 2017 proposal suggested GOPATH/pkg/GOOS_GOARCH_fuzz for a cache, but the fzgo prototype instead uses GOPATH/pkg/fuzz/GOOS_GOARCH.  That seemed closer to Go 1.11's new GOPATH/pkg/mod, for example.

-------------------------------------------------------------------
3. Allow multiple fuzz functions to match?
-------------------------------------------------------------------

The March 2017 proposal says:

  "-fuzz flag must match one and only one fuzz function. This is different from -run and -bench flags which allow multiple matches. It's unclear what would be the behavior when multiple fuzz functions are selected, so for now we restrict it to only one function. Selecting multiple packages is not supported either. The restriction can be removed when/if we figure out a sane behavior for this."

Initially, the fzgo prototype disallowed multiple fuzz functions to match (per the March 2017 proposal), but as an experiment fzgo now allows multiple fuzz functions to match in order to support something like 'go test -fuzz=. ./...' when there are multiple fuzz functions across multiple packages. Fuzzing happens in a round-robin manner if multiple fuzz functions match. Each matching fuzz function initially has a short turn, and then progressively longer turns executing.

Some related discussion in:

  go-fuzz issue #230: "make it easier to fuzz multiple functions"
  
This one seems fine to postpone, though I suspect the majority of work to implement it is required by the suggested behavior in the proposal for `go test ./...` (without a -fuzz flag specified) to generate new inputs for "1s in normal mode" given `go test ./...` can match multiple packages with Fuzz functions in that scenario (which is partly why the fzgo prototype supports multiple fuzz functions matching the -fuzz argument).

-------------------------------------------------------------------
5. Running go test . shouldn't dirty your VCS status
-------------------------------------------------------------------

Some discussion:

Dmitry wrote there: "This sounds reasonable. I would say a requirement."

-------------------------------------------------------------------
6. Should `go test` without `-fuzz` ever be non-deterministic?
-------------------------------------------------------------------

It seems reasonable to expect non-determinism if you explicitly ask for fuzzing, such as by supplying `-fuzz`.  

However, if you are just doing something like `go test .` (without a `-fuzz` flag), then any non-determinism should be carefully evaluated given people might not expect non-determinism in that situation.

One example would be in a normal (non-fuzz) invocation of something like `go test ./...`. If that fails in something like travis, can you repeat that failure?

There are multiple forms of non-determinism, including:

6a. Non-determism from `rand` or similar

The March 2017 proposal says:

"go test runs fuzz functions as unit tests. ... Fuzz functions are executed on all inputs from the corpus and on some amount of newly generated inputs (for 1s in normal mode and for 0.1s in short mode)"

If there are newly generated inputs, are those random inputs? Or deterministic in some way? Or random, but with a seed printed if there is a failure (so that the same seed can be re-used to recreate the failure if needed)?

6b. Execution time-based non-determinism

If the execution time is "1s in normal mode and for 0.1s in short mode", does that mean you might get a failure on a powerful machine that does not occur within the same wall clock time on a less powerful machine?

-------------------------------------------------------------------
7. Should the proposal include a richer Fuzz function signature, or wait on that?
-------------------------------------------------------------------

For example, should the propsoal for the first version to include something like:

  FuzzRegexp(f *testing.F, re string, data []byte, posix bool)

Or is it better to wait and not propose that as part of the first version?

The proposal currently suggests the signature as:

  "Fuzz function accepts two arguments: (*testing.F, data []byte)."

and also says:

  "The fuzz function signature can later be allowed to accept multiple randomly-generated arguments of different types. ...
But multiple values can be extracted from the byte slice, just with more work and leading to worser fuzzer efficiency (as it will not understand the structure of the input). So such support is explicitly not part of this proposal."

Some related discussion:


-------------------------------------------------------------------
8. Change how the paths are created in <pkgpath>/testdata, and/or when -fuzzdir is specified?
-------------------------------------------------------------------

There might be some minor tension between making things simple for something like oss-fuzz vs. making things friendly for interactive execution by a human, but that tension is hopefully solvable.

Some related discussion in:
 and the next few comments
 
including one possible approach:
 
 * an empty -fuzzdir could imply <pkgpath>/testdata/fuzz/FuzzFunc/corpus (so by default, the package's testdata directory is used, and there is no redundancy on the package name)
 * a non-empty -fuzzdir such as -fuzzdir=/tmp/fuzz could mean it creates directories corresponding to the package path (e.g., /tmp/fuzz/example.com/full/package/path/FuzzFunc/corpus).

or another possible approach:

 * Another option is to just -fuzzdir as is and don't append anything. Then it's implied to be user (script, infrastructure) responsibility to pass proper paths in whatever convention they want. Then we can drop package name from testdata.
 
-------------------------------------------------------------------
9. Should the proposal address how a corpus repo is found?
-------------------------------------------------------------------

Large projects will likely have a separate corpus repo, or perhaps even store the corpus outside of VCS.

How should a separate corpus repo be found?

Convention? Automatically walk directories to find a sibling? A new env variable? Does the GOFLAGS env variable help? Do modules help?

Some discussion: 


-------------------------------------------------------------------
10. Source-to-source transform vs. compiler-based instrumentation?
-------------------------------------------------------------------

Primary discussion in:

Does it make sense for the proposal to suggest a perhaps more likely to be accepted approach of source-to-source implementation to start, and save modifying the compiler for possible future work? The results would not be as good and might have more corner cases, so perhaps not.

Now that the new 1.13 `-trimpath` support landed in #16860, a related question from that same issue in comment:

"I think I might understand how that helps with caching the build results of the transformed code (where that transform code might be in a temporary directory).
When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future go test -fuzz=.), then that piece of code could directly use the cmd/go/internal/cache internal API to cache the s2s transform results?"

-------------------------------------------------------------------
11. Allow multiple corpus locations?
-------------------------------------------------------------------

This is a longer topic, but wanted to at least record the topic itself.

Dmitry wrote:

  "I wonder if it's a good idea to instead allow 2/2+ directories with input corpus?
For example, if we read inputs from testdata/something/something, but also from -fuzzdir/-workdir if provided. Then testdata/ could contain hand-written inputs and regression tests and is checked-in with the code (that's small number of higher-quality inputs with low churn, so no different from unit-tests and makes sense to check-in). The second dir can contain the random inputs, there are more of them and high churn. So that is preferably checked-in somewhere else (stored in an archive or something else). Then workflow would be to simply copy the crashing input from the second dir into testdata/ and run go test -run=file_name (if the auto-generated regression test uses t.Run then this will work auto-magically)."

That quote is from:

And then the probably too enthusiastic response:

-------------------------------------------------------------------
END
-------------------------------------------------------------------

I actually have a slightly longer list that is a bit rougher. I will hopefully return to that to try to post later, but wanted to at least post this partial list...

Regards,
thepudds

Josh Bleecher Snyder

unread,
May 8, 2019, 4:51:00 PM5/8/19
to t hepudds, Golang Fuzzing
> 7. Should the proposal include a richer Fuzz function signature, or wait on that?

The Go project strongly prefers stability; to the extent possible,
questions like this should be prototyped and answered externally.

So I think we should continue to improve go-fuzz where it is, and at
least try to answer this question in situ.

> "I think I might understand how that helps with caching the build results of the transformed code (where that transform code might be in a temporary directory).
> When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future go test -fuzz=.), then that piece of code could directly use the cmd/go/internal/cache internal API to cache the s2s transform results?"

Your understanding is correct: Adding -trimpath would speed up the 'go
build' part.

Right now, the instrumentation itself is pretty cheap and fast.
(Although I do have ideas for deeper analysis I'd like to do that
would change that.) So while it would be nice to be able to cache
instrumented code, I don't think it is a priority.

The other significant build time cost is in go/packages. I believe
that the tools team is actively working on optimizing go/packages.

I'm not sure that this is an area that needs more attention in the
near term, once the -trimpath piece is sorted out. I have a local
change that checks the go version and adds -trimpath as appropriate; I
will turn it into a PR once I can confirm it actually moves the
needle, which it does not right now, due to a bug.
FWIW, I think multiple corpus locations is a great idea. And there is
no need to apologize for your enthusiasm. :)


-josh

Romain Baugue

unread,
May 8, 2019, 6:22:53 PM5/8/19
to Golang Fuzzing
Here are a few thoughts while reading your list:

1. Completely agree with this. It seems very useful, given that in an automated setting one cannot Ctrl+C the fuzzer to stop it.

2. Anything goes for now as long as it's coherent I think. It may change later when (if) integrating everything into the go tool, but for now….

3. This seems a core feature of what we want to achieve. It's not immediately critical, but it shouldn't be too low on the priority list given its huge impact on the inner-working of the tool: I think the way we handle multiple fuzzing functions may be the biggest issue at hand.

4. You're missing and entry here :D

5. Agree with that.

6. I would add that in this case, fuzzs will behave the same way as benchmarks: non of them are deterministic, and this is exactly why they are not the default (in addition to the time consumption thing).

7. I think the way multiple arguments is handled could mean big changes in the final implementation, so even if this is not the very first thing to try, I would at least try a few solutions in the prototype and include it in the proposal, and have it ready for the first version (hopefully).

8, 9. I would postpone this for later I think, it doesn't seem like it would be very complex to change later and actually working on (or with) the prototype may yield a convention that will become obvious.

10. Not sure how this works exactly as I'm just not literate enough on the subject. I'll have a look at how go-fuzz works internally in the first place before saying anything.

11. I think we should not need multiple corpus locations: either there is a real need because both corpus are meant to be run differently, or have different purposes, in which case it also makes sense to run the tool twice, or there is none and why have them in the first place?

Josh Bleecher Snyder

unread,
May 8, 2019, 7:06:38 PM5/8/19
to Romain Baugue, Golang Fuzzing
> 3. This seems a core feature of what we want to achieve. It's not immediately critical, but it shouldn't be too low on the priority list given its huge impact on the inner-working of the tool: I think the way we handle multiple fuzzing functions may be the biggest issue at hand.

By way of bringing you up to speed,
https://github.com/dvyukov/go-fuzz/commit/7f2a17801b97aa9d84d34eb8c23c3201b09ded79
took some steps towards this. Finishing the job will be quite tricky;
it will require plumbing quite a lot of context through go-fuzz. I'm
not entirely convinced that it shouldn't be done simply by exec'ing
several copies of go-fuzz, either in serial or in parallel. That
exec'ing could be done by 'go test' or under the hood by 'go-fuzz'
(which starts other copies of go-fuzz, which then starts other copies
of go-fuzz to be workers!) or whatever.


> 8, 9. I would postpone this for later I think, it doesn't seem like it would be very complex to change later and actually working on (or with) the prototype may yield a convention that will become obvious.

I disagree. It sounds like the sort of thing that doesn't matter much,
but it is hard to come up with a design that will work well for a wide
variety of humans, for CI, for oss-fuzz, for fuzzbuzz, etc. This is
one of the areas about which I have the most concern from a design
perspective (even if the code involved is fairly trivial).

-josh

Dmitry Vyukov

unread,
May 9, 2019, 4:51:39 AM5/9/19
to Josh Bleecher Snyder, Romain Baugue, Golang Fuzzing
I did not yet read all of this, but I gave all of you comment access
on the proposal doc:
https://docs.google.com/document/d/1zXR-TFL3BfnceEAWytV8bnzB2Tfp6EPFinWVJ5V4QC8/edit
You can either leave comments on the side, or directly suggest changes
by editing the text (then I will be able to just directly click
"Apply").
It would be useful to keep it up-to-date. If we already decided that
some things should be different, let's change the doc.
Also it would be useful to add a section with the current status that
will contain a short list of what's pending (small items, larger
items) with links to more info (open issues, PRs, etc). And also a
short list of the current open questions, again with links to PRs, or
threads on this mailing list.
We have lots of lengthy discussions spread across multiple places, and
even with this list we will still have lengthy discussions. So it's
very hard to grasp, say, what are the current TODO items. We need some
short list with links that captures the current status.

Dmitry Vyukov

unread,
May 9, 2019, 5:04:42 AM5/9/19
to Romain Baugue, Golang Fuzzing
> 1. Completely agree with this. It seems very useful, given that in an automated setting one cannot Ctrl+C the fuzzer to stop it.

I am still against this.
The tool absolutely must gracefully handle async kill. It will be
killed by ^C from console all the time, and that's how it will be used
for manual runs. Any serious CI will want to have own kill rather than
trust a third-party binary. And for one-off uses one can always script
"timeout 60 go-fuzz". Done.
This flag is against orthogonality and minimal amount of composable
features. Adding flags is easy, removing them later is impossible.
hepudds, is there anything that makes replacing your uses of -fuzztime
for testing with timeout command or external kill impossible?

Dmitry Vyukov

unread,
May 9, 2019, 5:06:48 AM5/9/19
to Romain Baugue, Golang Fuzzing
> 2. Anything goes for now as long as it's coherent I think. It may change later when (if) integrating everything into the go tool, but for now….

Agree. Please do a corresponding edit suggestion on the proposal doc.

Dmitry Vyukov

unread,
May 9, 2019, 5:24:31 AM5/9/19
to Romain Baugue, Golang Fuzzing
> 3. This seems a core feature of what we want to achieve. It's not immediately critical, but it shouldn't be too low on the priority list given its huge impact on the inner-working of the tool: I think the way we handle multiple fuzzing functions may be the biggest issue at hand.

Agree with all points.
Please make edit suggestion on the doc.

> 5. Agree with that.

Agree. The fact that any go-fuzz run in go-fuzz-corpus repo produces
diff, was/is very inconvenient.
We still did not come up with a complete plan for paths, right? I
remember there was a bunch of scattered discussions about this.
We need to aggregate all of the discussions and update the doc.
There are one-off runs, where I think should just work with zero
configuration in some reasonable way. CI runs that may specify an
explicit directory. Also some discussion about ability to specify
several paths for corpus for merging or pointing to regression testing
inputs in VCS. There is also a use case of running corpus during
regression testing. Also this is complicated by ability to run
multiple fuzz functions/packages, how are paths overridden in this
case?
I remember there were some discussions re pkg/testdata and GOPATH/pkg.
Go command now also uses $HOME/.cache (or what is that?) so we could
too.

I wonder if we could come up with some convention re corpus location
(for both explicitly checked-in inputs for regression testing and
actual live fuzzing corpus) that would work for all of these cases? In
the end a CI could also copy-in/out fuzzing corpus somewhere into
GOPATH/pkg/... rather then impose own location to go-fuzz.

Dmitry Vyukov

unread,
May 9, 2019, 5:29:36 AM5/9/19
to Romain Baugue, Golang Fuzzing
> 6. I would add that in this case, fuzzs will behave the same way as benchmarks: non of them are deterministic, and this is exactly why they are not the default (in addition to the time consumption thing).

Provided that "go test -fuzz=. ./..." will also work (which seems to
be what we want), then I think we better stick to simpler and safer
deterministic load in normal "go test".
There should be an easy way to "promote" a crashing input to a
unit-test, e.g. by going:
cp $GOAPTH/pkg/.../INPUT testdata/fuzz/
and then _these_ inputs will run during normal go test.

Dmitry Vyukov

unread,
May 9, 2019, 5:33:23 AM5/9/19
to Romain Baugue, Golang Fuzzing
> 7. I think the way multiple arguments is handled could mean big changes in the final implementation, so even if this is not the very first thing to try, I would at least try a few solutions in the prototype and include it in the proposal, and have it ready for the first version (hopefully).

Considering that this extension is backward compatible, I would
postpone this until we have it all working and intergratable with CIs
and std lib tested on CI. There is just so much more important work to
do.
It will be a large change, yes, but any changes that can be done
completely under the hood are so much easier to do.

Dmitry Vyukov

unread,
May 9, 2019, 5:40:47 AM5/9/19
to t hepudds, Golang Fuzzing
> -------------------------------------------------------------------
> 10. Source-to-source transform vs. compiler-based instrumentation?
> -------------------------------------------------------------------
>
> Primary discussion in:
> https://github.com/golang/go/issues/29430
>
> Does it make sense for the proposal to suggest a perhaps more likely to be accepted approach of source-to-source implementation to start, and save modifying the compiler for possible future work? The results would not be as good and might have more corner cases, so perhaps not.
>
> Now that the new 1.13 `-trimpath` support landed in #16860, a related question from that same issue in comment:
> https://github.com/golang/go/issues/29430#issuecomment-485797818
>
> "I think I might understand how that helps with caching the build results of the transformed code (where that transform code might be in a temporary directory).
> When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future go test -fuzz=.), then that piece of code could directly use the cmd/go/internal/cache internal API to cache the s2s transform results?"

Pain point.
Compiler instrumentation will be simpler, faster, easier to maintain
_and_ much simpler to integrate with other build systems. Go team says
they want this on the side as s2s, plugged via go tool, but they also
badly want this in bazel and blaze, which are separate huge Java
systems that are very hard to modify and extend.
I thing we could do is to prototype the compiler pass for this. To
show that it works and is simple. Josh, do you know how stable the SSA
pass API now to maintain a pass on the side for some time?

Dmitry Vyukov

unread,
May 9, 2019, 5:50:13 AM5/9/19
to Josh Bleecher Snyder, Romain Baugue, Golang Fuzzing
> > 8, 9. I would postpone this for later I think, it doesn't seem like it would be very complex to change later and actually working on (or with) the prototype may yield a convention that will become obvious.
>
> I disagree. It sounds like the sort of thing that doesn't matter much,
> but it is hard to come up with a design that will work well for a wide
> variety of humans, for CI, for oss-fuzz, for fuzzbuzz, etc. This is
> one of the areas about which I have the most concern from a design
> perspective (even if the code involved is fairly trivial).

I already commented on part of this answering re point 5 above, before
I read 8, 9 and 11. So the point about discussions about this being
dispersed in multiple places is even more true now :)

The question I have: can we design a solution that does not have
-fuzzdir at all?
That would automatically resolve all problems around -fuzzdir handling
and meaning :)

thepudds, thanks for putting this all together!

Dmitry Vyukov

unread,
May 9, 2019, 5:56:22 AM5/9/19
to t hepudds, Golang Fuzzing
> -------------------------------------------------------------------
> 9. Should the proposal address how a corpus repo is found?
> -------------------------------------------------------------------
>
> Large projects will likely have a separate corpus repo, or perhaps even store the corpus outside of VCS.
>
> How should a separate corpus repo be found?
>
> Convention? Automatically walk directories to find a sibling? A new env variable? Does the GOFLAGS env variable help? Do modules help?
>
> Some discussion:
>
> https://github.com/dvyukov/go-fuzz/issues/218#issuecomment-479393430

I would postpone this for now.
This optional and we don't have enough experience to up with the
single right way to do this. But if we add this now, it will be part
of the interface forever.
With go-fuzz-corpus I found that checking-in thousands of small files
with high churn into VCS is not very useful. Even if it's a separate
repo. OSS-Fuzz uses a single tag.gz to move/store corpus and then
packs/unpacks as necessary. Some cloud VMs also have very slow disks
so moving lots of small files is super slow.
CIs (OSS-Fuzz) will probably have an own way of doing this and won't
need our help.
For a manual use by a single person, it does not matter much.
I am not sure what are use cases that actually require this? Some kind
of semi-manual use by a group of people? This sounds too vague. And
how do they synchronize corpus updates anyways?
There are just too many questions and unknowns.

thepudds

unread,
May 9, 2019, 6:38:13 PM5/9/19
to Golang Fuzzing
Hi all,

FYI, the proposal doc is now updated to reflect the conversation here regarding these items:

   2. Change the instrumented cache location: GOPATH/pkg/____?
   3. Allow multiple fuzz functions to match?
   6. Should `go test` without `-fuzz` ever be non-deterministic?

There is also now a "Changes to Proposal Document" section at the top that mentions each of those three items.

Regards,
thepudds
Reply all
Reply to author
Forward
0 new messages