What should dirty your VCS status?

23 views
Skip to first unread message

thepudds

unread,
May 12, 2019, 9:36:45 AM5/12/19
to Golang Fuzzing
This is the start of a thread specifically on item 5 from the bigger "List of possible modifications to the March 2017 proposal" thread:


In that bigger thread, item 5 was posed as:

  "Running `go test .` shouldn't dirty your VCS status"
  
However, I think it makes sense for this thread to broaden that to something like:

  "Running `go test .` shouldn't dirty your VCS status, but what else should, if anything?

In other words, I think so far there is agreement that running `go test .` (without a `-fuzz` flag) should definitely *not* dirty your VCS status. However, I think it is more of an open question as to what else could or should dirty your VCS status.

I think the _exact_ answer to that question is heavily interwoven with other questions (especially exactly how -fuzzdir will work, will there be > 1 corpus directory and if so how does that behave exactly, etc.).

It might end up being too hard to discuss the VCS status piece separately, so we'll see if this helps or not, but here is a thread where people can comment at least philosophically on what might or might not make sense in terms of creating or modifying files within VCS, even though the final behavior might end up being more heavily dependent on the answers to other questions and hence the actual details might need to be discussed in the context of other questions.
   
In the rest of this post, I will just quote some of the related recent discussion around dirtying VCS status (although I am not trying to quote everything, because that discussion is now spread a bit).
   
From the bigger "List of possible modifications to the March 2017 proposal" thread (https://groups.google.com/d/msg/golang-fuzzing-proposal/9iLIwpglppw/rHdWHmW4AgAJ)
   
    > -------------------------------------------------------------------
    > 5. Running `go test .` shouldn't dirty your VCS status
    > -------------------------------------------------------------------
    > 
    > Some discussion:
    > 
    > Dmitry wrote there: "This sounds reasonable. I would say a requirement."

Romain replied in that bigger thread:
    > -------------------------------------------------------------------
    > Agree with that.
    > -------------------------------------------------------------------

Dmitry also replied:
    > -------------------------------------------------------------------
    > Agree. The fact that any go-fuzz run in go-fuzz-corpus repo produces 
    > diff, was/is very inconvenient. 
    > -------------------------------------------------------------------

Some excerpts from the "Should `go test` without `-fuzz` ever be non-deterministic?" thread (https://groups.google.com/d/msg/golang-fuzzing-proposal/HRBvDSaAIIs/_NEFvxHtAwAJ)

Dmitry replying to Romain:
    > -------------------------------------------------------------------
    > > What about two directories in `testdata`? With a suffix to differentiate the fuzzer-corpus from the test-corpus ? 
    > 
    > This conflicts with "fuzzer must not dirty vcs state". 
    > 
    > I think the fuzzer artifacts dir must be visible to user to some degree. 
    > Fuzzer will store crash outputs there too. And fuzzer will say 
    > "crashed on input X" and user must be able to locate the input X. 
    > Do we have any other options besides $GOPATH/pkg? 
    > -------------------------------------------------------------------
    > 
    > -------------------------------------------------------------------
    > I see what the point about not dirtying the VCS state is, but I'm not sure I completely 
    > agree with it. Things like code generation directives already exist and they dirty the 
    > VCS state at compile-time, why should a testing feature whose purpose is to generate a corpus 
    > not do it?
    > 
    > In addition, using GOPATH/pkg/fuzz/xxx, I'm not sure about the way the user should promote 
    > a generated input to the checked-in corpus. Having to do a manual copy seems clumsy, 
    > error-prone at best and too much arcane for the "standard user" I imagine. We would have to 
    > add tooling for this and I'm not convinced this would be better.
    > -------------------------------------------------------------------

So that is a basic summary of what has been said so far on this topic...

Regards,
thepudds

thepudds

unread,
May 12, 2019, 9:38:24 AM5/12/19
to Golang Fuzzing
My current personal take is: 

   * There seems to be agreement that that running something like `go test .` without a `-fuzz` flag should not dirty your VCS status if it passes. That would be very annoying if it did.

    * If `go test .` without a `-fuzz` flag fails, the same is probably also true, though perhaps slightly more debatable. The proposal was updated this week to eliminate non-determinism and eliminate generating new inputs if there is no `-fuzz` flag. That seems to imply it probably does not need to dirty your VCS status if `go test .` without a `-fuzz` flag fails (e.g., does not need to copy something to `crashers` or similar), because it can just report what file(s) failed from the pre-existing corpus. Hence, it probably should not dirty your VCS status in this case.

    * It seems people will sometimes want to keep their fuzzing setup extremely simple, e.g., perhaps a student or hobbyist, or a small one-person open source project, but also perhaps at least some percentage of the time for medium to large projects, especially when first starting to fuzz.  For someone who wants to keep it as simple as possible, it seems desirable to allow some type of lightweight ability to keep the corpus in the same repo as the code under test. The _exact_ form this would take is an open question, but supporting that in some form would seem to imply that something like `go test -fuzz=. ./...` would therefore dirty your VCS status in at least _some_ cases.

All together, it seems to me the `-fuzz` flag should therefore be allowed to dirty your VCS status in at least some cases. 

However, it also seems more debatable as to exactly how and when VCS status would be dirtied, but the _exact_ behavior there might be driven by other larger questions like "exactly how does > 1 corpus directory work".

As I said, that is my personal take, but I'll end this post with a question for the group:

Setting aside the _exact_ behavior for a moment, does it seem reasonable that the `-fuzz` flag should be allowed to dirty your VCS status in at least some cases?

Regards,
thepudds
Message has been deleted

Romain Baugue

unread,
May 12, 2019, 10:30:19 AM5/12/19
to Golang Fuzzing
For me yes.

I would say that without the `-fuzz` flag, `go test .` isn't doing any real fuzzing, only testing the promoted tests. This is similar to an unit test, and as such deterministic, and shouldn't dirty the VCS.

With that said, the whole point of `-fuzz` is to generate a corpus so it seems reasonable than fuzzing can dirty the VCS state. This will lead to simpler setup with less need for configuration, and easier adoption for small and medium size projects.

Dmitry Vyukov

unread,
May 16, 2019, 10:39:25 AM5/16/19
to Romain Baugue, Golang Fuzzing
Based on my experience with go-fuzz, dirtying vcs status is very
inconvenient. In most cases when I got a diff in corpus, I actually
did not want to check it in.
Consider you pass by some OSS repo and run go test -fuzz there just
for fun. Or you are contributing a change to some package and run
fuzzing to test your code, but you don't want to check-in the corpus
change, you just want to check-in the code change.
And based on our experience with internal fuzzing systems and go-fuzz
as well, most likely we don't want the whole and up-to-date corpus in
vcs at all. It creates too much churn, too much tiny files, huge
updates for git pull, spoiled git log with constant updates, and
bandwidth problems for slow connections.

Romain Baugue

unread,
May 26, 2019, 6:48:20 AM5/26/19
to golang-fuzzing-proposal
> Consider you pass by some OSS repo and run go test -fuzz there just
> for fun. Or you are contributing a change to some package and run
> fuzzing to test your code, but you don't want to check-in the corpus
> change, you just want to check-in the code change.

That's actually what I did when adding fuzzing functions in the stdlib,
and it didn't bother me at all.All the VCS I ever used (not a long list,
but still) have a feature to only check-in some files.

> most likely we don't want the whole and up-to-date corpus in
> vcs at all

Agree with that, but I don't see why it constrain us in any way. We're
not forced to check-in all the repository when fuzzing.

Dmitry Vyukov

unread,
May 28, 2019, 9:26:03 AM5/28/19
to Romain Baugue, golang-fuzzing-proposal, t hepudds, Josh Bleecher Snyder
On Sun, May 26, 2019 at 12:48 PM Romain Baugue <romain...@gmail.com> wrote:
>
> > Consider you pass by some OSS repo and run go test -fuzz there just
> > for fun. Or you are contributing a change to some package and run
> > fuzzing to test your code, but you don't want to check-in the corpus
> > change, you just want to check-in the code change.
>
> That's actually what I did when adding fuzzing functions in the stdlib,
> and it didn't bother me at all.All the VCS I ever used (not a long list,
> but still) have a feature to only check-in some files.


The major problem for me was that when I run 'git status' I can't even
see what has changed and what I need to selectively add, because at
the bottom and at the top of the list I have hundreds of lines with
corpus files and then actual source files may be dispersed somewhere
in between (even if you find one group of them, you don't necessary
find all of them).
I also think it's nice to be able to run 'git add -u' if you worked
only on a single thing.
Over time I had huge diff accumulated for different fuzzers, because
what else you do? If you reset it back all the time, then there is
kinda discards all your progress so I did not want to do it. But I
also did not want (and in some cases users will not be able to) check
it all in each time I run something.

Fuzzing definitely needs to generate new inputs, but it does not have
to be in the VCS. And in the other thread about multiple corpus
locations we seem to converge towards GOPATH/pkg location (?) unless
-fuzzdir is specified. Both of these things don't dirty vcs status.







> > most likely we don't want the whole and up-to-date corpus in
> > vcs at all
>
> Agree with that, but I don't see why it constrain us in any way. We're
> not forced to check-in all the repository when fuzzing.
>
> On Thursday, May 16, 2019 at 4:39:25 PM UTC+2, Dmitry Vyukov wrote:
>>
>> On Sun, May 12, 2019 at 4:30 PM Romain Baugue <romain...@gmail.com> wrote:
>> >
>> > For me yes.
>> >
>> > I would say that without the `-fuzz` flag, `go test .` isn't doing any real fuzzing, only testing the promoted tests. This is similar to an unit test, and as such deterministic, and shouldn't dirty the VCS.
>> >
>> > With that said, the whole point of `-fuzz` is to generate a corpus so it seems reasonable than fuzzing can dirty the VCS state. This will lead to simpler setup with less need for configuration, and easier adoption for small and medium size projects.
>>
>>
>> Based on my experience with go-fuzz, dirtying vcs status is very
>> inconvenient. In most cases when I got a diff in corpus, I actually
>> did not want to check it in.
>> Consider you pass by some OSS repo and run go test -fuzz there just
>> for fun. Or you are contributing a change to some package and run
>> fuzzing to test your code, but you don't want to check-in the corpus
>> change, you just want to check-in the code change.
>> And based on our experience with internal fuzzing systems and go-fuzz
>> as well, most likely we don't want the whole and up-to-date corpus in
>> vcs at all. It creates too much churn, too much tiny files, huge
>> updates for git pull, spoiled git log with constant updates, and
>> bandwidth problems for slow connections.
>
> --
> You received this message because you are subscribed to the Google Groups "golang-fuzzing-proposal" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-fuzzing-pr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-fuzzing-proposal/4b409d2e-c404-446d-82ee-3356f915c449%40googlegroups.com.

Romain Baugue

unread,
May 28, 2019, 9:51:43 AM5/28/19
to 'Dmitry Vyukov' via golang-fuzzing-proposal, Dmitry Vyukov, t hepudds, Josh Bleecher Snyder
I agree it's not fantastic, but the fact to dirty VCS status or not
would depend on the corpus location: if the corpus is in the
repository, it makes sense for me to dirty it. If not, then there is no
question.

On Tue, 28 May 2019 15:25:50 +0200
"'Dmitry Vyukov' via golang-fuzzing-proposal"

Dmitry Vyukov

unread,
May 28, 2019, 10:49:05 AM5/28/19
to Romain Baugue, 'Dmitry Vyukov' via golang-fuzzing-proposal, t hepudds, Josh Bleecher Snyder
On Tue, May 28, 2019 at 3:51 PM Romain Baugue <romain...@gmail.com> wrote:
>
> I agree it's not fantastic, but the fact to dirty VCS status or not
> would depend on the corpus location: if the corpus is in the
> repository, it makes sense for me to dirty it. If not, then there is no
> question.

But the default corpus location is not chosen by a dice roll :)
As I see it: since we decided that a default run should not preferably
dirty vcs status, we need to try to arrange things so that it does
not.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-fuzzing-proposal/20190528155138.4d7fbe40%40rubis.

thepudds

unread,
May 28, 2019, 11:00:40 AM5/28/19
to golang-fuzzing-proposal
For the question of "What should dirty your VCS status?", my personal take is I think it is going to come down to what is the default behavior for corpus locations, and how is >1 corpus managed.

Right now, I personally think this is a workable proposal for that (May 21 post to the "Allow multiple corpus locations?" thread):

In that proposal, the default case is you *don't* dirty your VCS status. That is probably reasonably friendly behavior for beginners and advanced users.

On the other hand, in that proposal, you *do* dirty your VCS status if you supply the non-default '-fuzzdir=testdata' argument (and presumably it is OK to do dirty VCS status then, because that is literally what the user just asked the command to do).
   
In any event, I think that is a reasonable proposal, but an open question on whether it should be *the* proposal, or how to improve it, or whether or not better to go in a completely different direction.
   
All that said, the details of the >1 corpus approach are better discussed in that other thread (with additional counter-proposals or comments, etc.).

Regards,
thepudds

On Tuesday, May 28, 2019 at 10:49:05 AM UTC-4, Dmitry Vyukov wrote:
On Tue, May 28, 2019 at 3:51 PM Romain Baugue <romain...@gmail.com> wrote:
>
> I agree it's not fantastic, but the fact to dirty VCS status or not
> would depend on the corpus location: if the corpus is in the
> repository, it makes sense for me to dirty it. If not, then there is no
> question.

But the default corpus location is not chosen by a dice roll :)
As I see it: since we decided that a default run should not preferably
dirty vcs status, we need to try to arrange things so that it does
not.



> On Tue, 28 May 2019 15:25:50 +0200
> "'Dmitry Vyukov' via golang-fuzzing-proposal"
> > > discussion on the web visit
> > > https://groups.google.com/d/msgid/golang-fuzzing-proposal/4b409d2e-c404-446d-82ee-3356f915c449%40googlegroups.com.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "golang-fuzzing-proposal" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-fuzzing-proposal+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages