Hello there,
Here is a list of some possible modifications for the March 2017 proposal.
Some of these won't make sense or are smarter to defer, but I wanted to at least attempt to enumerate a partial list of topics for discussion.
This list is based on some personal notes I had built up, and hence this list a bit rough.
===================================================================
Possible modifications to the March 2017 Proposal
===================================================================
-------------------------------------------------------------------
1. Include -fuzztime?
-------------------------------------------------------------------
Some of the commentators at #19109 suggested -fuzztime duration as a way of controlling when to stop fuzzing. The proposal document does not include -fuzztime and go-fuzz does not support it, but it seems useful in general and -fuzztime is in the fzgo prototype (and it proved useful while testing the prototype).
-------------------------------------------------------------------
2. Change the instrumented cache location: GOPATH/pkg/____?
-------------------------------------------------------------------
The March 2017 proposal suggested GOPATH/pkg/GOOS_GOARCH_fuzz for a cache, but the fzgo prototype instead uses GOPATH/pkg/fuzz/GOOS_GOARCH. That seemed closer to Go 1.11's new GOPATH/pkg/mod, for example.
-------------------------------------------------------------------
3. Allow multiple fuzz functions to match?
-------------------------------------------------------------------
The March 2017 proposal says:
"-fuzz flag must match one and only one fuzz function. This is different from -run and -bench flags which allow multiple matches. It's unclear what would be the behavior when multiple fuzz functions are selected, so for now we restrict it to only one function. Selecting multiple packages is not supported either. The restriction can be removed when/if we figure out a sane behavior for this."
Initially, the fzgo prototype disallowed multiple fuzz functions to match (per the March 2017 proposal), but as an experiment fzgo now allows multiple fuzz functions to match in order to support something like 'go test -fuzz=. ./...' when there are multiple fuzz functions across multiple packages. Fuzzing happens in a round-robin manner if multiple fuzz functions match. Each matching fuzz function initially has a short turn, and then progressively longer turns executing.
Some related discussion in:
go-fuzz issue #230: "make it easier to fuzz multiple functions"
This one seems fine to postpone, though I suspect the majority of work to implement it is required by the suggested behavior in the proposal for `go test ./...` (without a -fuzz flag specified) to generate new inputs for "1s in normal mode" given `go test ./...` can match multiple packages with Fuzz functions in that scenario (which is partly why the fzgo prototype supports multiple fuzz functions matching the -fuzz argument).
-------------------------------------------------------------------
5. Running go test . shouldn't dirty your VCS status
-------------------------------------------------------------------
Some discussion:
Dmitry wrote there: "This sounds reasonable. I would say a requirement."
-------------------------------------------------------------------
6. Should `go test` without `-fuzz` ever be non-deterministic?
-------------------------------------------------------------------
It seems reasonable to expect non-determinism if you explicitly ask for fuzzing, such as by supplying `-fuzz`.
However, if you are just doing something like `go test .` (without a `-fuzz` flag), then any non-determinism should be carefully evaluated given people might not expect non-determinism in that situation.
One example would be in a normal (non-fuzz) invocation of something like `go test ./...`. If that fails in something like travis, can you repeat that failure?
There are multiple forms of non-determinism, including:
6a. Non-determism from `rand` or similar
The March 2017 proposal says:
"go test runs fuzz functions as unit tests. ... Fuzz functions are executed on all inputs from the corpus and on some amount of newly generated inputs (for 1s in normal mode and for 0.1s in short mode)"
If there are newly generated inputs, are those random inputs? Or deterministic in some way? Or random, but with a seed printed if there is a failure (so that the same seed can be re-used to recreate the failure if needed)?
6b. Execution time-based non-determinism
If the execution time is "1s in normal mode and for 0.1s in short mode", does that mean you might get a failure on a powerful machine that does not occur within the same wall clock time on a less powerful machine?
-------------------------------------------------------------------
7. Should the proposal include a richer Fuzz function signature, or wait on that?
-------------------------------------------------------------------
For example, should the propsoal for the first version to include something like:
FuzzRegexp(f *testing.F, re string, data []byte, posix bool)
Or is it better to wait and not propose that as part of the first version?
The proposal currently suggests the signature as:
"Fuzz function accepts two arguments: (*testing.F, data []byte)."
and also says:
"The fuzz function signature can later be allowed to accept multiple randomly-generated arguments of different types. ...
But multiple values can be extracted from the byte slice, just with more work and leading to worser fuzzer efficiency (as it will not understand the structure of the input). So such support is explicitly not part of this proposal."
Some related discussion:
-------------------------------------------------------------------
8. Change how the paths are created in <pkgpath>/testdata, and/or when -fuzzdir is specified?
-------------------------------------------------------------------
There might be some minor tension between making things simple for something like oss-fuzz vs. making things friendly for interactive execution by a human, but that tension is hopefully solvable.
Some related discussion in:
and the next few comments
including one possible approach:
* an empty -fuzzdir could imply <pkgpath>/testdata/fuzz/FuzzFunc/corpus (so by default, the package's testdata directory is used, and there is no redundancy on the package name)
or another possible approach:
* Another option is to just -fuzzdir as is and don't append anything. Then it's implied to be user (script, infrastructure) responsibility to pass proper paths in whatever convention they want. Then we can drop package name from testdata.
-------------------------------------------------------------------
9. Should the proposal address how a corpus repo is found?
-------------------------------------------------------------------
Large projects will likely have a separate corpus repo, or perhaps even store the corpus outside of VCS.
How should a separate corpus repo be found?
Convention? Automatically walk directories to find a sibling? A new env variable? Does the GOFLAGS env variable help? Do modules help?
Some discussion:
-------------------------------------------------------------------
10. Source-to-source transform vs. compiler-based instrumentation?
-------------------------------------------------------------------
Primary discussion in:
Does it make sense for the proposal to suggest a perhaps more likely to be accepted approach of source-to-source implementation to start, and save modifying the compiler for possible future work? The results would not be as good and might have more corner cases, so perhaps not.
Now that the new 1.13 `-trimpath` support landed in #16860, a related question from that same issue in comment:
"I think I might understand how that helps with caching the build results of the transformed code (where that transform code might be in a temporary directory).
When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future go test -fuzz=.), then that piece of code could directly use the cmd/go/internal/cache internal API to cache the s2s transform results?"
-------------------------------------------------------------------
11. Allow multiple corpus locations?
-------------------------------------------------------------------
This is a longer topic, but wanted to at least record the topic itself.
Dmitry wrote:
"I wonder if it's a good idea to instead allow 2/2+ directories with input corpus?
For example, if we read inputs from testdata/something/something, but also from -fuzzdir/-workdir if provided. Then testdata/ could contain hand-written inputs and regression tests and is checked-in with the code (that's small number of higher-quality inputs with low churn, so no different from unit-tests and makes sense to check-in). The second dir can contain the random inputs, there are more of them and high churn. So that is preferably checked-in somewhere else (stored in an archive or something else). Then workflow would be to simply copy the crashing input from the second dir into testdata/ and run go test -run=file_name (if the auto-generated regression test uses t.Run then this will work auto-magically)."
That quote is from:
And then the probably too enthusiastic response:
-------------------------------------------------------------------
END
-------------------------------------------------------------------
I actually have a slightly longer list that is a bit rougher. I will hopefully return to that to try to post later, but wanted to at least post this partial list...
Regards,
thepudds