The plan to fix bug triage....

33 views
Skip to first unread message

Anthony LaForge

unread,
Mar 9, 2011, 2:53:11 PM3/9/11
to chromium-dev
Howdy folks,

After spending the last few days pouring over stats/mining data from the issue tracker, some trouble spots emerged that seem reasonably actionable.  Before we jump into AIs, some interesting metrics to keep in mind:

  • 50% (3,902) of our Mstone-X bugs are unclassified, as are 59% (13,161) open bugs (i.e. no Feature|Internals|Webkit|Area-Compat-*)
    • Lesson: We need to be more aggressive about classification.

  • 36% (7707) of the open bugs were filed over a year ago, of those 40% are Available, 36% Unconfirmed, 16% Assigned, 8% Other
    • Lesson: We potentially overuse the Available label, and need to hit harder on Unconfirmed

  • Only 20% of the Fixed bugs were ever in an Available state (i.e. that's an abnormal path to getting fixed)
    • Lesson: Marking a bug Available isn't a great way to get it fixed (this could be discoverability problem since half the bugs aren't classified and the right people aren't on cc)

  • ~30% of all open bugs haven't been commented on by anyone in the last 6 months
    • Lesson: We are tracking things no one cares about

  • Only 38% of the open bugs have been commented on by committers in the last 6 months
    • Lesson: We are not paying attention to 62% of the bugs.
With that in mind, this is the approach that I'm going to carry forward:
  • Phase 1: Fix the discoverability/ triage@ scale problem
    • Clean the label namespace and move things around
    • Assign label owners groups to each label
    • Ensure Auto-CC rules are established for each label (ideally pointing to groups)
      • We need to push the GA+ conversion

  • Phase 2 : Establish a front-line triage system (aka stem the tide)
    • We need to at least ensure that issues are getting properly classified as the come in/ get filed
      • New bug creation page wizard
      • Manually triage the remaining issues (Experiment w/ ChromeTurk)

  • Phase 3 (chrome-team): Classify the backlog
    • We need to classify the backlog (a one time fixit event should kill this) so people can find the bugs they need to work on
    • ChromeTurk might be the answer here

  • Phase 3 (chrome-pmo): Cull the backlog
    • We need to reduce the work that's in scope to things we are actually looking at
      • Anything with an Mstone-* label should be considered near term work
      • We should automatically remove Mstone-* labels from things that aren't being worked on (i.e. no dev activity in the last 180 days)
    • We should be aggressive about saying WontFix and not placating.
Kind Regards,

Anthony Laforge
Technical Program Manager
Mountain View, CA

Stuart Morgan

unread,
Mar 9, 2011, 3:15:44 PM3/9/11
to laforge...@google.com, Anthony LaForge, chromium-dev
Most of this sounds awesome, including all of the new approach
section. I do want to question two of the lessons though:

Le 9 mars 2011 11:53, Anthony LaForge <laf...@google.com> a écrit :
> Lesson: We potentially overuse the Available label

What's wrong with Available? The bug system should reflect reality,
and from that standpoint its better to have a bug marked Available
than to have it assigned to someone who has no plan to work on it in
the foreseeable future. For example, if someone new is starting to
work on a subteam, it's much easier for them to figure out where to
start poking if they can look at Available bugs, rather than try to
guess which bugs they should steal from someone else's queue. (This
isn't theoretical; I have done this every single time I've started
working on a new project.)

I assume the answer is this:

> Only 20% of the Fixed bugs were ever in an Available state (i.e. that's an
> abnormal path to getting fixed)

But as you point out, it's not clear that a lot of Available bugs are
on anyone's radars at all, and that once we fix classification/triage,
Available wouldn't be useful. There's also a bit of self-fulfilling
prophesy here; more important bugs are more likely to be routed
directly to someone as they come in, so are more likely to be fixed.
Less important bugs don't necessarily need to be pushed onto a
specific person. That seems reasonable to me.

I really dislike systems that discourage having bugs in this state
because they create an atmosphere that it's your responsibility to
find a new owner for a bug if you aren't the right owner for it, which
leads to some bugs needlessly rotting because they appear to be in
someone's work queue, but really are only there because ignoring it
becomes easier than getting it moved to the right person (if there
even is one). Knowing who the right owner is shouldn't be the job of
whoever gets a bug, it should be the job of the people watching the
label bucket that the bug is in.

> ~30% of all open bugs haven't been commented on by anyone in the last 6
> months
>
> Lesson: We are tracking things no one cares about

I don't think that's the lesson at all. We strongly discourage "me
too", "ZOMG why haven't you fixed this", etc. type of comments. The
only lesson we should draw from a lack of comments is that there is no
new information. That's not at all the same thing. A good bug (whether
it started that way, or got that way through getting more information
over time) generally doesn't need any more comments until someone is
actively working on it.

-Stuart

Peter Kasting

unread,
Mar 9, 2011, 3:40:02 PM3/9/11
to stuart...@chromium.org, laforge...@google.com, Anthony LaForge, chromium-dev
The overriding theme of my comments below seems to be that a lot of problems are addressable if we can address cultural problems in how we use the bug tracker.  I'm not sure how to enforce "good culture" though.

On Wed, Mar 9, 2011 at 12:15 PM, Stuart Morgan <stuart...@chromium.org> wrote:
The bug system should reflect reality,
and from that standpoint its better to have a bug marked Available
than to have it assigned to someone who has no plan to work on it in
the foreseeable future.

There is a counterargument to this, and that is that there is value in making all bugs be owned by some person whose job is to fix or else re-evaluate/re-assign the bug.

For example, consider forcing all bugs to be assigned with another process that (somehow) forces people to re-examine each of their assigned bugs, say, once every three months.  In this world, it's more likely that obsolete bugs get closed, and that "really should get fixed, but no time right this second" bugs eventually get fixed, than in a world where no one looks at those bugs.

Of course, you can get the same effects with a process that ensures unassigned bugs get retriaged periodically.  But that's because those two processes are effectively equivalent: the second is the same as the first with "unassigned" being a virtual equivalent for "assigned, to the retriage team".

Right now, we're not doing either of these, which I think is unfortunate.

For example, if someone new is starting to
work on a subteam, it's much easier for them to figure out where to
start poking if they can look at Available bugs, rather than try to
guess which bugs they should steal from someone else's queue.

I agree.  In principle, we could make this clear even if all bugs were assigned, by more careful use of Available/Assigned/Started; "available" would mean "no plans to look at, you can take regardless of assignee", "started" would mean "cannot take from assignee", and "assigned" would mean "you can probably take, but ask first".  However, people are not careful in how they use these labels today.

Knowing who the right owner is shouldn't be the job of
whoever gets a bug, it should be the job of the people watching the
label bucket that the bug is in.

I think this is a solvable problem.  If you get a bug, and you don't know the right owner to reassign directly, you should reassign back to the triage team and say you're not the right owner.

> ~30% of all open bugs haven't been commented on by anyone in the last 6
> months
>
> Lesson: We are tracking things no one cares about

I don't think that's the lesson at all. We strongly discourage "me
too", "ZOMG why haven't you fixed this", etc. type of comments. The
only lesson we should draw from a lack of comments is that there is no
new information. That's not at all the same thing. A good bug (whether
it started that way, or got that way through getting more information
over time) generally doesn't need any more comments until someone is
actively working on it.

There is a valid point Anthony is making, though, which is that frequently external circumstances change to make a bug become invalid, or at least different, and it's impossible to separate "still valid, no new info" from "not valid, there was new info but no one posted it because no one cares".

Again, this is what "every bug should be assigned to somebody to look at again periodically" is supposed to deal with.

And as I said atop this mail, I don't know how to make the cultural shifts necessary for that sort of plan to work.

PK

Dirk Pranke

unread,
Mar 9, 2011, 6:53:06 PM3/9/11
to pkas...@google.com, stuart...@chromium.org, laforge...@google.com, Anthony LaForge, chromium-dev
I agree with both yours and Stuart's points. In my experience the best
way to make the cultural change to happen is to set up a process and
make it someone's OKR / responsibility to make sure that process is
happening. If someone isn't being paid to make the process work, we're
probably hosed.

I think either your workflow or Stuart's could work, but we need to be
clear about which one we're following. I personally would like to see
"STARTED" mean "I am actively working on this right now" and
"ASSIGNED" mean "I intend to work on this shortly; feel free to bug me
for status updates". That would leave a lot more bugs in the
"AVAILABLE" state.

It does not capture the "who is the right person to work on this"
aspect that you mention, but I think that's actually an orthogonal
concept. I would hope that most of the time there is more than one
"right" person anyway, and we shouldn't confuse "Dirk could fix this
most easily" with "Dirk should be expected to fix this" or "only Dirk
should fix this".

I also note that I suspect one of the reasons AVAILABLE is not
commonly used is because people file their own bugs and skip steps,
moving straight into ASSIGNED or STARTED.

-- Dirk

> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev
>

John Tamplin

unread,
Mar 9, 2011, 11:22:38 PM3/9/11
to pkas...@google.com, Peter Kasting, stuart...@chromium.org, laforge...@google.com, Anthony LaForge, chromium-dev
On Wed, Mar 9, 2011 at 3:40 PM, Peter Kasting <pkas...@chromium.org> wrote:

> ~30% of all open bugs haven't been commented on by anyone in the last 6
> months
>
> Lesson: We are tracking things no one cares about

I don't think that's the lesson at all. We strongly discourage "me
too", "ZOMG why haven't you fixed this", etc. type of comments. The
only lesson we should draw from a lack of comments is that there is no
new information. That's not at all the same thing. A good bug (whether
it started that way, or got that way through getting more information
over time) generally doesn't need any more comments until someone is
actively working on it.

There is a valid point Anthony is making, though, which is that frequently external circumstances change to make a bug become invalid, or at least different, and it's impossible to separate "still valid, no new info" from "not valid, there was new info but no one posted it because no one cares".

I am not sure I buy that explanation.  If I filed a bug a year ago, and it has sat with no action, what should I be doing to show that I still care?  Should I post every month on it to say "yes, I still care"?  If I have a number of those and that actually got something done, I would probably just setup some automated process, and if everybody followed that procedure nothing has been gained but noise.

I think all you can assume from a bug that has had no comments in 6 months is that nothing has happened worth commenting on.

--
John A. Tamplin
Software Engineer (GWT), Google

Peter Kasting

unread,
Mar 9, 2011, 11:39:23 PM3/9/11
to John Tamplin, stuart...@chromium.org, laforge...@google.com, Anthony LaForge, chromium-dev
On Wed, Mar 9, 2011 at 8:22 PM, John Tamplin <j...@google.com> wrote:
On Wed, Mar 9, 2011 at 3:40 PM, Peter Kasting <pkas...@chromium.org> wrote:
There is a valid point Anthony is making, though, which is that frequently external circumstances change to make a bug become invalid, or at least different, and it's impossible to separate "still valid, no new info" from "not valid, there was new info but no one posted it because no one cares".

I am not sure I buy that explanation.  If I filed a bug a year ago, and it has sat with no action, what should I be doing to show that I still care?  Should I post every month on it to say "yes, I still care"?

No, that's not what I'm saying.

I think all you can assume from a bug that has had no comments in 6 months is that nothing has happened worth commenting on.

Right now, you can't assume anything at all, because you have no idea whether or not anyone has looked at the bug.

What I was trying to say is that in a world where bugs are guaranteed periodic relooks, then you can more safely assume that no comments == nothing has happened, because if it had, someone would have said something.

In other words, I'm not trying to advocate for adding noise, I'm suggesting that we need a more robust process for periodically taking another glance at bugs.

PK

Mark Mentovai

unread,
Mar 9, 2011, 11:40:57 PM3/9/11
to laf...@google.com, chromium-dev
Anthony LaForge wrote:
> Phase 1: Fix the discoverability/ triage@ scale problem
>
> Clean the label namespace and move things around
> Assign label owners groups to each label
> Ensure Auto-CC rules are established for each label (ideally pointing to
> groups)

I’m the OS:Mac triage owner. Please don’t auto-Cc me on each and every
OS:Mac bug. I’ve had excellent success with the triage process that
we’ve been using for about two years. Two times a week (it used to be
three, but the volume’s down a bit lately), very helpful volunteers
(you know who you are!) and I get together and run through the
untriaged bugs. I don’t think we’re missing anything except for the
bugs we intentionally skip because they have active and healthy
triages, belong to areas that handle their own Mac work, and have
triagers who have asked us to exclude their bugs from our sessions.

I don’t need to be Ccd to make this process work. I certainly don’t
need a Cc once I’ve routed a bug properly, except in rare cases where
I think it’s interesting or might need more direct attention. In those
instances, it’s easy enough to adjust the Cc field as needed.

I understand that the goal here is to improve bug handling especially
for bugs that don’t have regular or active or healthy triages, but I
don’t think that my regular, active, and healthy triage would be
improved by an auto-Cc.

(I can talk more about the OS:Mac triage process if you’d like.)

> Phase 2 : Establish a front-line triage system (aka stem the tide)
>
> We need to at least ensure that issues are getting properly classified as
> the come in/ get filed

Yes! I always assumed that this would be best done as bugs make the
jump from Status:Unconfirmed to Status:Untriaged, because it’s the
first step that requires some action by someone familiar with the
project (and product).

Anthony LaForge

unread,
Mar 10, 2011, 1:36:01 AM3/10/11
to Mark Mentovai, chromium-dev
Let me start by thanking everyone for their thoughtful response, I greatly appreciate the depth all of you put into your comments.  Since this thread encompassed a few different directions let me try and tie some things together (starting backwards):

@Mark - I agree, it doesn't make sense to create cc lists for high traffic labels like OS-* or Area-*, however I do want to see Feature-* labels, where there are typically smaller/ targeted teams on an auto cc.  These labels help w/ triage (i.e. the right people come up on cc so it's pretty obvious whose a candidate to assign to) and it makes it simple for anyone who can classify a bug to get an issue in the right person's hand.  My general thinking was that OS-* and Area-* would effectively be the catch-all labels that were actively triaged by groups.  If the Feature-* specific owners are actively triaging their bugs, it should mean that the need for the high level catch alls should reduce (this is what we've seen w/ UI/Internals triage, where we've piloted this approach).

@Peter, Stuart, Dirk -  I should have been more careful on my wording re: bugs that haven't been commented on for 180 days.  There are indeed multiple reasons why this could be, however fundamentally I do view that a level of sustained inactivity (6 months after all is a long time) from both the requester and engineers is a reasonable signal that a bug is likely not critically pressing and indeed could possibly no longer be valid.  What that means in terms of action, is still worth discussing.  At this point, I've softened my position from yesterday that they should be deleted, however I do think that it would be appropriate/ reasonable to remove an mstone-* label from any such bug (since it moved from the realm of work tracking to defect tracking).  I'd anticipate our strategy for managing these bugs will evolve over time, my priority as outline above, is to down the organizational foundations before attacking the backlog... so still time to be swayed.

@Peter - Thank you for coming by my office and discussing this.  You helped me think about the problem and helped me clarify my thinking.  Your point about the differences philosophy about work tracking versus defect tracking was spot on.

As always to everyone, thank you.

Kind Regards,

Anthony Laforge
Technical Program Manager
Mountain View, CA


Stuart Morgan

unread,
Aug 10, 2012, 4:50:10 PM8/10/12
to Anthony LaForge, chromium-dev
2011/3/10 Anthony LaForge <laf...@google.com>:
> @Peter, Stuart, Dirk - I should have been more careful on my wording re:
> bugs that haven't been commented on for 180 days. There are indeed multiple
> reasons why this could be, however fundamentally I do view that a level of
> sustained inactivity (6 months after all is a long time) from both the
> requester and engineers is a reasonable signal that a bug is likely not
> critically pressing and indeed could possibly no longer be valid. What that
> means in terms of action, is still worth discussing. At this point, I've
> softened my position from yesterday that they should be deleted, however I
> do think that it would be appropriate/ reasonable to remove an mstone-*
> label from any such bug (since it moved from the realm of work tracking to
> defect tracking).

Resurrecting an old thread, since it seems to be newly relevant: I
just got a ton of bug emails from a bot that appears to be closing any
bug that hasn't been touched recently. Does this mean the decision
here changed?

A fair number of them are low-priority feature requests that are still
perfectly valid. What are we supposed to do with those bugs? Manually
re-open them every N days (whatever N is for this bot)? Wait for users
to re-file them, and then triage them again, potentially having the
same discussions again, from scratch, about whether we want to do
them? Or are we deciding that any feature that we haven't implemented
in N days doesn't matter, and leaving it closed?

-Stuart

Stuart Morgan

unread,
Aug 10, 2012, 4:53:06 PM8/10/12
to Anthony LaForge, chromium-dev
Sorry, the other email came in while I was writing this one. I see
from that email that it's only being applied to Unconfirmed bugs,
which addresses my concern.

-Stuart

2012/8/10 Stuart Morgan <stuart...@chromium.org>:
Reply all
Reply to author
Forward
0 new messages