Supporting more review flags

Ehsan Akhgari

unread,

Mar 18, 2014, 3:45:56 PM3/18/14

to mozilla-c...@googlegroups.com

Something that we should do in ReviewBoard to make it support our use cases is add the notion of more review flags. Currently RB just supports a "Ship it" which is insufficient for our needs. What we currently have in Bugzilla is 4 different types of flags: review, superreview, feedback and ui-review (the latter is probably the least important).

Each one of these flags can have a specific requestee, and they can be set to a + or - depending on whether the person found the patch satisfactory or not. They can also be cleared or redirected to other requestee's.

What is the best way to extend RB to support all of this? I just quickly asked Steven in person and he mentioned that we can do this through an extension, so I'm starting this thread to get more details on that.

Cheers,
Ehsan

Gregory Szorc

unread,

Mar 18, 2014, 4:22:39 PM3/18/14

to Ehsan Akhgari, mozilla-c...@googlegroups.com

I think we should use this as an opportunity to reconsider the best way
to capture requests and responses for reviews. I say this because the
current Bugzilla mechanism for doing review is horribly inconsistent in
practice and we have the opportunity to start from a clean slate.
Although, I concede trying to maintain compatibility with Bugzilla may
hinder us from improving matters.

Here's my proposal.

First, let's divide things by request and response.

I think requests can be modeled as type and degree/level/thoroughness.

Review Type: {UI, code, architecture (superreview)}
Review Level: {feedback, review}

We could potentially combine these into a single list (but it might be
long).

For responses, I want to eliminate the ambiguity that comes with the
existing Bugzilla system and have responses convey the action that
should be performed. e.g.

Review Response: {land as is, land it with changes (I trust you), looks
good (request re-review with changes), not acceptable}

I'm sure people will point out that we could also add privacy and
security flags and at some point we'll reinvent tracking flags and other
parts of Bugzilla. I concede I'm not sure what to do here. I'm not sure
if it prevents us from doing something better for basic code review on
the ReviewBoard side though.

Ehsan Akhgari

unread,

Mar 18, 2014, 5:46:47 PM3/18/14

to Gregory Szorc, mozilla-c...@googlegroups.com

On Tue, Mar 18, 2014 at 4:22 PM, Gregory Szorc <g...@mozilla.com> wrote:

On 3/18/14, 12:45 PM, Ehsan Akhgari wrote:

Something that we should do in ReviewBoard to make it support our use
cases is add the notion of more review flags. Currently RB just
supports a "Ship it" which is insufficient for our needs. What we
currently have in Bugzilla is 4 different types of flags: review,
superreview, feedback and ui-review (the latter is probably the least
important).

Each one of these flags can have a specific requestee, and they can be
set to a + or - depending on whether the person found the patch
satisfactory or not. They can also be cleared or redirected to other
requestee's.

What is the best way to extend RB to support all of this? I just
quickly asked Steven in person and he mentioned that we can do this
through an extension, so I'm starting this thread to get more details on
that.

I think we should use this as an opportunity to reconsider the best way to capture requests and responses for reviews. I say this because the current Bugzilla mechanism for doing review is horribly inconsistent in practice and we have the opportunity to start from a clean slate. Although, I concede trying to maintain compatibility with Bugzilla may hinder us from improving matters.

Can you please give some examples of the kinds of inconsistencies you're mentioning here?

Here's my proposal.

First, let's divide things by request and response.

I think requests can be modeled as type and degree/level/thoroughness.

Review Type: {UI, code, architecture (superreview)}
Review Level: {feedback, review}

FWIW not all of these are meaningful combinations. Actually I think the only one which might be useful is ui-feedback. I'd like to know more about why you think this model is better before commenting. At a first glance, it looks more complicated to me.

We could potentially combine these into a single list (but it might be long).

I like a single list better because I think it's simpler. I am not sure why you think it will be longer though. For one thing, we have only one type of superreview. With that aside, the only thing that is missing is ui-feedback. :-)

For responses, I want to eliminate the ambiguity that comes with the existing Bugzilla system and have responses convey the action that should be performed. e.g.

Review Response: {land as is, land it with changes (I trust you), looks good (request re-review with changes), not acceptable}

Those are currently captured in part inside the textual feedback (like "I would like to see this part of an update patch to land", or "good to go but please rebase it on top of this other thing that I'm landing tomorrow", etc.) Why is that not enough?

(Note that "because tools" can be one answer to my question above, but I am less interested in solving that abstract problem than a concrete problem of a specific thing that people are asking for and is impossible given how we currently do things.)

I'm sure people will point out that we could also add privacy and security flags and at some point we'll reinvent tracking flags and other parts of Bugzilla. I concede I'm not sure what to do here. I'm not sure if it prevents us from doing something better for basic code review on the ReviewBoard side though.

Privacy and security reviews have a vastly different scope and are typically done on a larger scope. I don't think they're a requirement for the code review system.

Cheers,
Ehsan

Gregory Szorc

unread,

Mar 18, 2014, 6:01:54 PM3/18/14

to Ehsan Akhgari, mozilla-c...@googlegroups.com

When a patch is submitted for review with r?, we have at *least* the
following outcomes:

1. r+ - land as is
2. r+ - nits, but I'll fix them and land for you
3. r+ - land with feedback addressed
4. r? cancelled
5. r? cancelled, f+ left
6. r? cancelled, f- left
7. r- - don't land this patch as is
8. r-, f+ - don't land, but I like the idea
9. r- - this patch is unacceptable

Every individual does things differently. I find this confusing.
Contributors find this confusing. I would like code review responses to
better reflect what the next action is rather than requiring patch
authors to maintain an in-brain lookup table of person to flag
meaning/intent.

>
> For responses, I want to eliminate the ambiguity that comes with the
> existing Bugzilla system and have responses convey the action that
> should be performed. e.g.
>
> Review Response: {land as is, land it with changes (I trust you),
> looks good (request re-review with changes), not acceptable}
>
>
> Those are currently captured in part inside the textual feedback (like
> "I would like to see this part of an update patch to land", or "good to
> go but please rebase it on top of this other thing that I'm landing
> tomorrow", etc.) Why is that not enough?

First, lots of people don't leave those comments! I've seen patches from
first-time contributors get insufficient context and the contributor
immediately asks "what do you mean."

Furthermore, the fields remove ambiguity. It forces the reviewer to make
a call on the future of the patch without leaving room for
interpretation in the textual response (I know I've misinterpreted
comments before and landed prematurely). This can help tremendously when
language is a barrier.

In the case of "land as is," we could have RB automatically trigger
autoland (wouldn't that be nice).

Fields can also help with auditing and verifying the review process.
"You were told to make changes but you pushed the same changeset that
was reviewed!"

Fields can also help with dashboards. They would allow us to measure
which individuals are effective at writing patches that get accepted on
first review, etc.

> I'm sure people will point out that we could also add privacy and
> security flags and at some point we'll reinvent tracking flags and
> other parts of Bugzilla. I concede I'm not sure what to do here. I'm
> not sure if it prevents us from doing something better for basic
> code review on the ReviewBoard side though.
>
>
> Privacy and security reviews have a vastly different scope and are
> typically done on a larger scope. I don't think they're a requirement
> for the code review system.

Works for me!

Ehsan Akhgari

unread,

Mar 18, 2014, 8:29:49 PM3/18/14

to Gregory Szorc, mozilla-c...@googlegroups.com

I bet I can name at least 9 more! But see below.

Every individual does things differently. I find this confusing. Contributors find this confusing. I would like code review responses to better reflect what the next action is rather than requiring patch authors to maintain an in-brain lookup table of person to flag meaning/intent.

A large part of the review process is cultural, because it's ultimately human communication about a piece of code. I'll give you an example: a number of years ago, we changed the bugmail that you would get when you received an r- to say "review not granted" instead of "review rejected", because to some people the idea of having their code rejected was too harsh, and the idea was that this switch will make it easier to r- patches with less worrying about whether you're hurting the requester's feelings.

Also, I'll note that it is very difficult to come up with a set of flags and values which capture *all* possible states that a patch may possibly go through, and we're trading off something very concrete here: simplicity. The current system gives you a choice between ? and + and -, all of which have basic meanings which are rather well understood. It's true that this system gets used by individuals in different ways, but adding to the choices of flag values doesn't necessarily solve that problem. Some people may just never use the more "advanced" status values and stick to the simple ones, for example.

For responses, I want to eliminate the ambiguity that comes with the
existing Bugzilla system and have responses convey the action that
should be performed. e.g.

Review Response: {land as is, land it with changes (I trust you),
looks good (request re-review with changes), not acceptable}

Those are currently captured in part inside the textual feedback (like
"I would like to see this part of an update patch to land", or "good to
go but please rebase it on top of this other thing that I'm landing
tomorrow", etc.) Why is that not enough?

First, lots of people don't leave those comments! I've seen patches from first-time contributors get insufficient context and the contributor immediately asks "what do you mean."

Agreed. But what evidence do we have that adding more status values will help things? Wouldn't educating those reviewers be better?

(I'll note that in *my* experience, most reviewers give enough information to *my* requests to help me understand how to proceed. This is definitely a subjective impression on both our parts, so we should be careful when extrapolating conclusions here.)

Furthermore, the fields remove ambiguity.

That is debated. :-) See <http://en.wikipedia.org/wiki/The_Paradox_of_Choice#How_we_choose>. Not everyone's goal will be to resolve ambiguity (as you testify above) so I don't see why we can assume that giving them more choices on how to respond to a request will solve things.

It forces the reviewer to make a call on the future of the patch without leaving room for interpretation in the textual response (I know I've misinterpreted comments before and landed prematurely). This can help tremendously when language is a barrier.

Like I said above, I can easily imagine the other case where individuals will pick a small subset of the status values that we provide them with and will never use the less ambiguous ones.

Also, speaking as a non-native English speaker, I don't think this will help all that much when there is a language barrier. There's way too many things that someone should read and understand in English -- this is just a drop in the bucket.

In the case of "land as is," we could have RB automatically trigger autoland (wouldn't that be nice).

It's nice indeed, and is planned, but autoland won't work based on _any_ review flag, so it's orthogonal to what we're discussing here.

Fields can also help with auditing and verifying the review process. "You were told to make changes but you pushed the same changeset that was reviewed!"

Because the question of "whether the same patch was pushed" can only answered by a human, it won't help automate things at least. And with autoland, you won't push your stuff anyway, so this will be moot.

Fields can also help with dashboards. They would allow us to measure which individuals are effective at writing patches that get accepted on first review, etc.

I don't understand how having more status values will help build that dashboard unless people use those flags in the ways we expect them to.

But let me ask another question: do we need to resolve this topic here before we can make progress? I think not. It seems like you're asking for more than just the ?/+/- status values. This thread is discussing what it would take from a technical standpoint to support more than one flag per review request in RB. We know from experience that we cannot resolve the issue of what the status values should be by just hypothesizing, we need to be able to experiment with the proposed ideas and decide whether they're good ones. I'll refer you to two great examples of this: the introduction of the feedback flag, and the gradual decrease usage of the superreview flag. The former is a good example since someone thought of an idea to streamline the review process, they posted somewhere (dev-platform?) about it, and others thought it's a good idea too, we experimented with it, and I think at this point many people agree that it indeed improved things by making review requests less ambiguous. The latter is a good example of something that people collectively decided is not a useful idea in the majority of cases, and is currently used very minimally these days. I think this is a very interesting topic, but is by far a cultural issue, and I don't think that the basic choice we're making here ("each review request should support more than one review flag with some status values") is going to make any future improvements in this area impossible or hard to do. Do you agree?