Bug Program Next Steps

138 views
Skip to first unread message

Emma Humphries

unread,
Jan 29, 2016, 6:46:26 PM1/29/16
to Firefox Dev, dev-pl...@lists.mozilla.org
Bug Program Next Steps

Over the last week, I’ve asked you to step up and identify developers who
will be responsible for bugs triaged into their component (in Firefox,
Core, Toolkit, Fennec iOS, and Fennec Android.)
Why This Matters

Bugs are a unit of quality we can use to see how we’re doing.

We believe that the number of outstanding, actionable bugs is the best
metric of code quality available, and that how this number changes over
time will be a strong indicator of the evolving quality of Firefox.
Actionable bugs are those with the status NEW for which the next action
required - if any - is unambiguous: does the bug need immediate attention,
will it be worked on for a future release, will it be worked on at all?

There are two parts to maintaining the value of that metric. First, we want
to make better assertions about the quality of our releases by making clear
decisions about which bugs must be fixed for each release (urgent) and
actively tracking those bugs. The other is the number of good bugs filed by
our community. Filing a bug report is a gateway to greater participation in
the Mozilla project, and we owe it to our community to make quick and clear
decisions about each bug.

Making decisions on new bugs quickly helps us avoid point releases, and
gives positive feedback to people filing bugs so that they will file more
good bugs.
What’s Your Role

Starting in the second quarter of this year, if you’ve taken on a
component, I’m expecting you or your team to look at the bugs which landed
in the component on a more frequent basis than a weekly triage.

In February, we’re starting a pilot with four groups of components where
we’ll get the process changes and field tested, so that we can we can take
the changes to all the affected bugzilla components for review and comment
before we implement them across all of the work on Firefox.
Hold on, we already have a weekly triage!

That’s fantastic, but a weekly pace means we miss bugs that affect upcoming
releases. So I’m expecting you to scan that list of inbound bugs daily for
the urgent ones (I’ll define urgent below) and put them into one of the
states described in the next section, the others can go into your regular
triage.
At Your Regular Triage

You’ll look at the bugs which landed in your component and decide on how to
act on them using the states described in the next section.
We don’t have a regular triage

This is a process which you’ll need to start, and the bug program team will
help with this.
This is potentially a lot of work for one person

Looking at the urgent bugs does not have to be one person’s task. You can
have a rotation of people doing this. Look at the Core::Graphics
<https://wiki.mozilla.org/Platform/GFX/TriageSchedule> triage wiki for an
example of what you could be doing.
Bug States

Initially, these states will be marked in bugzilla.mozilla.org using
whiteboard tags during the pilot. The bugzilla team will be making further
changes once we’ve settled on a process.

You’ll be looking at bugs in your component as they land, in your
component. We expect most of these will be NEW bugs, but some will be in
UNCONFIRMED.

There are four states you’ll need to decide to put each bug, and in your
reviews between your team’s weekly triages, we want you to be on the watch
for bugs with characteristics which make getting it in front of someone
urgent: these are bugs with crash, topcrash, regression, or dataloss
keywords; crash logs linked in comments; references to mozregression
results; and others.

The bug should not remain in this state after your review of it.

You’ll need to decide which of the following states you’ll move this bug
into, if you can’t you’ll need to be taking action: such as getting someone
to run mozregression, need info’ing a domain expert, looking at checkins,
and whatever else techniques you have to get a bug reduced.

Once you have an understanding of the bug, you should assign it to one of
these states.
Urgent

Assigned to a developer, release tracking flags nominated, and set at
priority `P1`. If the bug is being worked on by somebody from outside your
core team, a team mentor should be assigned.

All these need to be set for the bug when you assign it to this state. This
state is for bugs you find in your daily review which need a developer
immediately.

If the bug is not in need of immediate attention, then your team’s process
should land the bug in one of the following states.
Backlog

This is a NEW bug that the team acknowledges is a bug, but is not a current
priority, but will consider taking on. If the bug contains regression,
crash, topcrash, or similar keywords and metadata, then the team can
explain why it’s not a high priority.
Is a Bug, Not Prioritized

This is a terminal state for a NEW bug. We acknowledge the bug exists, it
affects people, but it is not important enough to warrant working on it.
The team will review and accept patches from the community for this bug
report.
Closed

This is a terminal state for a NEW bug. After review, the bug is not
something that can be worked on, or describes existing and expected
behavior.
Other States

There are other states we’re looking at for bugs. These will cover a bug as
it’s worked on, as ASSIGNED as is doesn’t provide useful information as to
progress; flag bugs that have stalled, or needing code reviews, or sheriff
attention; bugs that are on a release train; and bugs with fixes in general
or ESR version.
Timeline

Now: finish finding component responsible parties

February: pilot review of NEW bugs with four groups of components, draft
new process

March: comment period for new process, finalize process

Q2: implement new process across all components

Q3: all newly triaged bugs following the new process

Kyle Huey

unread,
Jan 29, 2016, 6:53:33 PM1/29/16
to Emma Humphries, dev-platform, Firefox Dev
On Fri, Jan 29, 2016 at 3:45 PM, Emma Humphries <em...@mozilla.com> wrote:

> We believe that the number of outstanding, actionable bugs is the best
> metric of code quality available, and that how this number changes over
> time will be a strong indicator of the evolving quality of Firefox.
>

Why do we believe that?

- Kyle

Eric Rescorla

unread,
Jan 29, 2016, 6:59:53 PM1/29/16
to Kyle Huey, dev-platform, Emma Humphries, Firefox Dev
I trust we are excluding bugs which are actually feature requests.

-Ekr


>
> - Kyle
>
> _______________________________________________
> firefox-dev mailing list
> firef...@mozilla.org
> https://mail.mozilla.org/listinfo/firefox-dev
>
>

Richard Newman

unread,
Jan 29, 2016, 7:17:34 PM1/29/16
to Emma Humphries, dev-platform, Firefox Dev
>
> Starting in the second quarter of this year, if you’ve taken on a
> component, I’m expecting you or your team to look at the bugs which landed
> in the component on a more frequent basis than a weekly triage.
>

In my experience, component watching adequately serves this purpose, and
component watchers collaboratively respond to, immediately triage, or flag
with a tracking-? flag. That might not be true for some components, but it
is for the three products I watch.


> Assigned to a developer, release tracking flags nominated, and set at
> priority `P1`. If the bug is being worked on by somebody from outside your
> core team, a team mentor should be assigned.
>

I think it's worth pointing out that it's not always worth assigning bugs
to a developer — in my experience that does more harm than good if they're
not working on it. Be realistic. Neither does P1 mean anything useful in
many of the components I follow.

Perhaps it would be worth taking a step back and explaining what problem
you're trying to solve here, and why you think that problem is the one we
should be solving?

Again, speaking from my experience, re-triaging and re-processing the
backlog within the context of current product priorities seems to be much
more of a neglected task than processing the handful of new bugs that
arrive each day — it's rare for a new bug to slip through the cracks in the
Fennec component, but we have a long list of old bugs that we triaged,
declared important, and never looked at again.

You will have a really hard time trying to get buy-in on this process
without demonstrating that there is significant benefit in the additional
time investment.

Emma Humphries

unread,
Jan 29, 2016, 7:56:14 PM1/29/16
to Richard Newman, dev-platform, Firefox Dev
Richard,

Many components have watchers, I am grateful for that. Some components
don't. We need reviews in all components so we don't lose track of bugs we
must fix to avoid a point release.

We're applying consistent process across all components, because we must
reduce the amount of noise in bugzilla. Anyone should be able to see what
we are doing, what we agree to do later, and what we're not going to do and
not have to know the metadata folklore of each component to do it. This
means there's going to be changes in how teams use bugzilla metadata.

We don't want to waste engineering's time, so we're piloting this with a
small number of groups so we can see if we're going in the right direction,
and stated what we expect to see happen with this work so we can decide if
we'll continue with it.

-- Emma Humphries

Gervase Markham

unread,
Jan 30, 2016, 1:55:53 AM1/30/16
to Emma Humphries, Firefox Dev
On 30/01/16 00:45, Emma Humphries wrote:
> This is a terminal state for a NEW bug. We acknowledge the bug exists, it
> affects people, but it is not important enough to warrant working on it.
> The team will review and accept patches from the community for this bug
> report.

Without wanting to pile on, as I know others have concerns about other
parts of this plan, and without wanting to say it's only you doing this,
but: can we all please stop using the word "community", as this sentence
implies, as "people outside the paid 'team' who get to work on things
which are not important enough for the important people to spend their
time on"? Community is not the antonym of "team", nor is it the antonym
of "employee".

The original message was about the world we are working towards. In the
world I'm working towards, every team includes people we pay and people
we don't, on an equal basis, and we are all one community.

Thank you :-)

Gerv

Marco Bonardo

unread,
Jan 30, 2016, 3:33:44 AM1/30/16
to dev-platform, Firefox Dev
On Sat, Jan 30, 2016 at 12:45 AM, Emma Humphries <em...@mozilla.com> wrote:

> Looking at the urgent bugs does not have to be one person’s task. You can
> have a rotation of people doing this. Look at the Core::Graphics
> <https://wiki.mozilla.org/Platform/GFX/TriageSchedule> triage wiki for an
> example of what you could be doing.
>

Most of this plan assumes every component has a team that works on it (let
alone triage meeting). Well, that's not always the case, some of the
components have a responsible that is working on something else or part of
another team.
For example I'm responsible for Places, I watch the component and try to
keep it in order as far as possible, but there's no team and it's not my
primary job at the moment. This is an example but it's valid for a bunch of
other components, especially in Firefox where the teams cross various
components depending on the goal.

Axel Hecht

unread,
Jan 31, 2016, 1:35:42 PM1/31/16
to Firefox Dev

Hi,

I'd like to start my feedback with a request.

It'd help me to get a big-picture of the stuff that surrounds this email. Things I'd like to see is information about who's been consulted going in to this. Also, which threads about bug lifecycle got looked at.
It'd also be nice to see how this one should play into more forward-looking work.
I know that I wasn't consulted because I didn't insist. I did have the chance, though. Others might feel easier if they had similar insight.
Also, the question about "is this the problem to solve" benefit from context. This one might just be a dependency of the ideas on how to get to more heavy stuff.

And now to remember all that when I write my own heavy-weight stuff soon.

More details inline.


On 30/01/16 00:45, Emma Humphries wrote:

Bug Program Next Steps

Over the last week, I’ve asked you to step up and identify developers who will be responsible for bugs triaged into their component (in Firefox, Core, Toolkit, Fennec iOS, and Fennec Android.)

Why This Matters

Bugs are a unit of quality we can use to see how we’re doing.


We believe that the number of outstanding, actionable bugs is the best metric of code quality available, and that how this number changes over time will be a strong indicator of the evolving quality of Firefox. Actionable bugs are those with the status NEW for which the next action required - if any - is unambiguous: does the bug need immediate attention, will it be worked on for a future release, will it be worked on at all?


There are two parts to maintaining the value of that metric. First, we want to make better assertions about the quality of our releases by making clear decisions about which bugs must be fixed for each release (urgent) and actively tracking those bugs. The other is the number of good bugs filed by our community. Filing a bug report is a gateway to greater participation in the Mozilla project, and we owe it to our community to make quick and clear decisions about each bug.


Making decisions on new bugs quickly helps us avoid point releases, and gives positive feedback to people filing bugs so that they will file more good bugs.

There's a school of thought that values a non-actionable bug over a bug not filed.

I know that for my personal crashers, for example, I look at stack traces, and have no clue what could be going wrong there. Or how I could provide value in figuring it out.

I do force myself to file bugs on them at times, though. 'Cause if I don't file 'em, nobody will, and then there's even less chance to figure something out.

I think this is a concern beyond crashes, as we struggle to find a balance between not having insight on how things work, and oth, a gazillion of useless bugs that just say "doesn't work".

What’s Your Role

Starting in the second quarter of this year, if you’ve taken on a component, I’m expecting you or your team to look at the bugs which landed in the component on a more frequent basis than a weekly triage.


In February, we’re starting a pilot with four groups of components where we’ll get the process changes and field tested, so that we can we can take the changes to all the affected bugzilla components for review and comment before we implement them across all of the work on Firefox.

How are those four groups chosen?

Hold on, we already have a weekly triage!

That’s fantastic, but a weekly pace means we miss bugs that affect upcoming releases. So I’m expecting you to scan that list of inbound bugs daily for the urgent ones (I’ll define urgent below) and put them into one of the states described in the next section, the others can go into your regular triage.

At Your Regular Triage

You’ll look at the bugs which landed in your component and decide on how to act on them using the states described in the next section.

We don’t have a regular triage

This is a process which you’ll need to start, and the bug program team will help with this.

This is potentially a lot of work for one person

Looking at the urgent bugs does not have to be one person’s task. You can have a rotation of people doing this. Look at the Core::Graphics triage wiki for an example of what you could be doing.

Bug States

Initially, these states will be marked in bugzilla.mozilla.org using whiteboard tags during the pilot. The bugzilla team will be making further changes once we’ve settled on a process.


You’ll be looking at bugs in your component as they land, in your component. We expect most of these will be NEW bugs, but some will be in UNCONFIRMED.


There are four states you’ll need to decide to put each bug, and in your reviews between your team’s weekly triages, we want you to be on the watch for bugs with characteristics which make getting it in front of someone urgent: these are bugs with crash, topcrash, regression, or dataloss keywords; crash logs linked in comments; references to mozregression results; and others.


The bug should not remain in this state after your review of it.


You’ll need to decide which of the following states you’ll move this bug into, if you can’t you’ll need to be taking action: such as getting someone to run mozregression, need info’ing a domain expert, looking at checkins, and whatever else techniques you have to get a bug reduced.


Once you have an understanding of the bug, you should assign it to one of these states.

I'd ask you to make additions to these states for two use-cases:

Ran out of time

As you're asking for this to be done on a daily basis, I'd expect it to be time-bound. We do have bugs that just take a day or two to figure out what the next step is.

Scheme doesn't work well for this bug

I think this is an important option as we're learning how this works out for us.


I'm also generally concerned how UX bugs or crashes would fit into these buckets. UX bugs, and possibly any other flavor of ideation, have the majority of work associated with "should we do this or not". And crashes as a single crash stack are hardly ever actionable.


Urgent

Assigned to a developer, release tracking flags nominated, and set at priority `P1`. If the bug is being worked on by somebody from outside your core team, a team mentor should be assigned.


All these need to be set for the bug when you assign it to this state. This state is for bugs you find in your daily review which need a developer immediately.


If the bug is not in need of immediate attention, then your team’s process should land the bug in one of the following states.

Backlog

This is a NEW bug that the team acknowledges is a bug, but is not a current priority, but will consider taking on. If the bug contains regression, crash, topcrash, or similar keywords and metadata, then the team can explain why it’s not a high priority.

Is a Bug, Not Prioritized

This is a terminal state for a NEW bug. We acknowledge the bug exists, it affects people, but it is not important enough to warrant working on it. The team will review and accept patches from the community for this bug report.

Closed

This is a terminal state for a NEW bug. After review, the bug is not something that can be worked on, or describes existing and expected behavior.

Other States

There are other states we’re looking at for bugs. These will cover a bug as it’s worked on, as ASSIGNED as is doesn’t provide useful information as to progress; flag bugs that have stalled, or needing code reviews, or sheriff attention; bugs that are on a release train; and bugs with fixes in general or ESR version.

Timeline

Now: finish finding component responsible parties

February: pilot review of NEW bugs with four groups of components, draft new process

March: comment period for new process, finalize process

Q2: implement new process across all components

Q3: all newly triaged bugs following the new process


Are there good metrics to use to evaluate the success and impact of this process?

To pick up on what I said earlier, getting a category for "didn't work" could be one way to measure, or at least enable us to systematically investigate areas for improvements.

Axel

Dave Townsend

unread,
Feb 1, 2016, 11:24:46 AM2/1/16
to Emma Humphries, dev-pl...@lists.mozilla.org, Firefox Dev
On Fri, Jan 29, 2016 at 3:45 PM, Emma Humphries <em...@mozilla.com> wrote:
> Bug Program Next Steps
>
> Over the last week, I’ve asked you to step up and identify developers who
> will be responsible for bugs triaged into their component (in Firefox, Core,
> Toolkit, Fennec iOS, and Fennec Android.)
>
> Why This Matters
>
> Bugs are a unit of quality we can use to see how we’re doing.
>
>
> We believe that the number of outstanding, actionable bugs is the best
> metric of code quality available, and that how this number changes over time
> will be a strong indicator of the evolving quality of Firefox. Actionable
> bugs are those with the status NEW for which the next action required - if
> any - is unambiguous: does the bug need immediate attention, will it be
> worked on for a future release, will it be worked on at all?
>
>
> There are two parts to maintaining the value of that metric. First, we want
> to make better assertions about the quality of our releases by making clear
> decisions about which bugs must be fixed for each release (urgent) and
> actively tracking those bugs. The other is the number of good bugs filed by
> our community. Filing a bug report is a gateway to greater participation in
> the Mozilla project, and we owe it to our community to make quick and clear
> decisions about each bug.
>
>
> Making decisions on new bugs quickly helps us avoid point releases, and
> gives positive feedback to people filing bugs so that they will file more
> good bugs.
>
> What’s Your Role
>
> Starting in the second quarter of this year, if you’ve taken on a component,
> I’m expecting you or your team to look at the bugs which landed in the
> component on a more frequent basis than a weekly triage.

I'm concerned about making this sort of demand when the component
owner is not an employee, or is the expectation that only employees
would be in these roles?

> In February, we’re starting a pilot with four groups of components where
> we’ll get the process changes and field tested, so that we can we can take
> the changes to all the affected bugzilla components for review and comment
> before we implement them across all of the work on Firefox.
>
I think there is a missing state here. One where it isn't worth our
time to attempt to reproduce but would accept a patch for if one was
forthcoming. I see this a lot where edge-case reporters file something
that affects them but very few other people. Sometimes it can take
time to verify that the bug report is accurate but realistically I
already know that even if it is it would fall into the not prioritized
state. Maybe these bugs should just go straight there, or maybe we
should just close them, but we should be clear on that.

> Timeline
>
> Now: finish finding component responsible parties
>
> February: pilot review of NEW bugs with four groups of components, draft new
> process
>
> March: comment period for new process, finalize process
>
> Q2: implement new process across all components
>
> Q3: all newly triaged bugs following the new process
>
>

smaug

unread,
Feb 1, 2016, 12:37:34 PM2/1/16
to Axel Hecht
On 01/31/2016 08:35 PM, Axel Hecht wrote:
> I'm also generally concerned how UX bugs or crashes would fit into these buckets. UX bugs, and possibly any other flavor of ideation, have the
> majority of work associated with "should we do this or not". And crashes as a single crash stack are hardly ever actionable.
>

I could disagree with this. Crashes with a single crash stack can be very actionable. In those cases when crash-stat gives a proper stack trace, it is
often (not at all always though) easy to see why the crash happens.


Benjamin Smedberg

unread,
Feb 2, 2016, 10:44:56 AM2/2/16
to Firefox Dev, dev-pl...@lists.mozilla.org, Richard Newman, Gervase Markham, Marco Bonardo, Dave Townsend, Emma Humphries
On 1/29/2016 6:45 PM, Emma Humphries wrote:
> Why This Matters
>
> Bugs are a unit of quality we can use to see how we’re doing.
>
> We believe that the number of outstanding, actionable bugs is the best
> metric of code quality available, and that how this number changes over
> time will be a strong indicator of the evolving quality of Firefox.
> Actionable bugs are those with the status NEW for which the next action
> required - if any - is unambiguous: does the bug need immediate attention,
> will it be worked on for a future release, will it be worked on at all?

As the sponsor for this project, I want to clarify some things related
to this project. The bug-handling initiative is a key part of our
Firefox quality programs in 2016. I have asked Emma to focus on our
process for making explicit decisions about incoming bugs. There are two
primary reasons for this:

* When we don't see incoming regression bugs, that is high risk to our
quality and shipping schedule. We have done dot-releases or similar
respins numerous times over the past year: in many cases the bugs that
caused those respins were filed well before release but either not seen
or we didn't react to them.
* Our testing community, especially prerelease users, are a core part of
our success. We know that one of the greatest turn-offs for people
filing their first bug is to have it sit with no response. On the other
hand, a quick response is a good predictor of continued and deeper
involvement in the project.

I don't think that we are going to learn much about quality or risk just
by counting the number of open bugs. So I don't think the statement
"best metric of code quality available" is true, and it has generated a
lot of the feedback in this thread. But I do assert that a core risk
metric is the number of new bugs without a *decision*, and we should be
tracking that number and driving it to zero.

Part of the program that is already underway is triaging everything out
of Core:Untriaged and Firefox:Untriaged into the proper component. This
is a task that we've been accomplishing both with our volunteer
community and with the support of Softvision contractors, and we're on
track to burn the Untriaged components to 0 bugs by the end of this quarter.

My intention, once we have this system in place, is to focus activities
on increasing the both the quality and quantity of incoming bug reports,
especially from our prerelease users: building product features which
enable users to file more detailed and useful bug reports, combined with
data collected within the browser. Filing a bug in bugzilla is still a
scary experience for many people, and we can do better.

To keep focus and avoid creeping scope, an explicit non-goal of this
program is to deal with the prioritization of non-critical bugs within a
team or component. The primary goal here is to solve the flow for
incoming bugs to a clear decision-state. Beyond the bucket of "backlog
of work", teams can continue to use aha or tracking bugs or external
spreadsheets.

One other important thing to note is that we plan on implementing
bugzilla UI improvements to make triage much simpler. This may include
things like one-click decision making on bugs and autoloading bug
reports from triage lists. We expect that work to commence once the
initial trials are done in Q1 and we have better experience with the
details.

To reply to a few specific comments and questions...

Richard Newman wrote:

> In my experience, component watching adequately serves this purpose,
> and component watchers collaboratively respond to, immediately triage,
> or flag with a tracking-? flag. That might not be true for some
> components, but it is for the three products I watch.

I disagree that component watching is working well now. There are
components that are well-watched, and it mostly works to surface
high-priority (tracking+) bugs. But for non-critical bugs, they are
often left filed as NEW without even so much as a comment explaining a
decision: the bug reporter is left without a clear understanding of what
to expect or even knowledge that somebody has looked at their bug
report. We owe it to our community of bug filers to be more clear and
explicit with these decisions.

> I think it's worth pointing out that it's not always worth assigning
> bugs to a developer — in my experience that does more harm than good
> if they're not working on it. Be realistic. Neither does P1 mean
> anything useful in many of the components I follow.
Emma is talking explicitly about bugs which rise to the level of a
tracking+ bug (critical regression or a bug in a new feature). Right now
we're living in a world where release drivers mark a bug tracking+ bug
many of these bugs remained assigned to nobody forever. This is a bad
state and the goal is to start sending out daily alerts for tracking+
bugs without an owner.

> Many of our regressions are reported informally on Twitter, as GitHub
> issues, or on IRC. Those are spotted by engineers or QA who go
> looking, and those people file bugs. Those bugs, naturally, enter the
> funnel half-way along, skipping the pre-triage you propose. Can that
> be improved or leveraged?
The goal is that no bug will be able to skip the per-component
decision-making process. Emma will be experimenting with early groups to
figure out how to make this work.

> We have "tracking-+", "P2", etc. bugs that will realistically never be
> addressed, because there's always more important work to do. What can
> we do to not lose sight of those and adjust priority accordingly?

I don't know about P2 in this context, but I don't think that we can
afford to leave tracking+ bugs unassigned or unaddressed. That is an
essential part of our core quality focus. If a team is only working on
tracking+ bugs and is still understaffed, that is something we need to
be aware of and fix.

Gervase Markham wrote:

> Without wanting to pile on, as I know others have concerns about other
> parts of this plan, and without wanting to say it's only you doing this,
> but: can we all please stop using the word "community", as this sentence
> implies, as "people outside the paid 'team' who get to work on things
> which are not important enough for the important people to spend their
> time on"? Community is not the antonym of "team", nor is it the antonym
> of "employee".

This wording was intentional and we thought carefully about it. The
"team" in this case includes both employees and volunteers who are
working together from a shared list of priorities, and is responsible
for a particular area of code and bugzilla component. That team can make
a decision that they, as a group, are not going to prioritize a
particular bug ever, but still accept patches from others outside the
team (employees or volunteers!) who want to fix it.

Marco Bonardo wrote:

> Most of this plan assumes every component has a team that works on it
> (let alone triage meeting). Well, that's not always the case, some of
> the components have a responsible that is working on something else or
> part of another team.
> For example I'm responsible for Places, I watch the component and try
> to keep it in order as far as possible, but there's no team and it's
> not my primary job at the moment. This is an example but it's valid
> for a bunch of other components, especially in Firefox where the teams
> cross various components depending on the goal.
This is both true, and part of the problem. We cannot claim that we have
a focus on quality if we don't even have the resources to look at the
incoming bugs. The goal is to make daily bug triage very fast and painless.

That said, the incoming bug rate for Places is very low: in Q4 2015
there were a total of 30 bugs filed or triaged into Toolkit:Places:
https://bugzilla.mozilla.org/buglist.cgi?list_id=12831181&chfieldto=2015-12-31&query_format=advanced&chfield=[Bug%20creation]&chfieldfrom=2015-09-01&component=Places

Dave Townsend wrote:

> I'm concerned about making this sort of demand when the component
> owner is not an employee, or is the expectation that only employees
> would be in these roles?
It doesn't have to be a single person. Maybe it means a rotation with a
different person each day, or a weeklong rotation. Mike Hoye is working
on a program this quarter to build component triage leads from the
community. In the past, this has been one of our big success points in
developing the community. In the past I have had triage leads for
plugins who were much more knowledgeable than I was about the state of
the various trees. There is also I think a natural progression from
component triage lead to a role within product management that we need
to explore.

--BDS

Richard Newman

unread,
Feb 2, 2016, 12:10:35 PM2/2/16
to Benjamin Smedberg, Gervase Markham, Dave Townsend, Emma Humphries, Firefox Dev, Marco Bonardo, dev-platform
Here's my very lightweight counter-proposal:

Once or twice a week, automatically mail out two lists (in one email) to
the set of people Emma collected. The first are is UNCONFIRMED bugs. The
second is NEW bugs, not filed by one of the reviewers or bug admins of that
component, that haven't been touched in the last week. Highlight bugs from
new Bugzilla registrations. The primary goal is to spot important
regressions. The secondary goal is to respond to new contributors. Let the
component owners own everything else about this process.

In short — shine a light on things, nothing more.

This proposal:

- Doesn't necessarily require extra meetings. The power and memshrink
teams have been doing email triage with some success, and this is even
lighter-weight.
- Isn't a daily obligation.
- Adapts to the processes, individuals, and capacity of each team.
- Allows teams to manage their own costs (e.g., the value versus the
cost of responding to every bug report individually).

Improvements to tools will alter some of those cost-value equations to
yield results that you'd prefer.


Random inline replies below.



> Many of our regressions are reported informally on Twitter, as GitHub
>> issues, or on IRC. Those are spotted by engineers or QA who go looking, and
>> those people file bugs. Those bugs, naturally, enter the funnel half-way
>> along, skipping the pre-triage you propose. Can that be improved or
>> leveraged?
>>
>

The goal is that no bug will be able to skip the per-component
> decision-making process. Emma will be experimenting with early groups to
> figure out how to make this work.


Why is that a goal? What you're suggesting is the exact opposite of the
kind of distributed responsibility that we try to inculcate in maturing
engineers. Is this why you think this has to be daily?

If, say, Aaron files a critical bug, I trust him to set the right flags to
go through our current triage processes, or work directly with engineering
managers to find an assignee… and that can be quicker than even a daily
triage, and avoids the need to process that bug. And if I file a NEW
work-item bug, I don't want it to redundantly end up in a triage list the
next day.


We have "tracking-+", "P2", etc. bugs that will realistically never be
>> addressed, because there's always more important work to do. What can we do
>> to not lose sight of those and adjust priority accordingly?
>>
>
> I don't know about P2 in this context, but I don't think that we can
> afford to leave tracking+ bugs unassigned or unaddressed. That is an
> essential part of our core quality focus. If a team is only working on
> tracking+ bugs and is still understaffed, that is something we need to be
> aware of and fix.


A lot of this depends on your definition of quality.

For example, we have 391 bugs in Firefox for Android that — at some point
in the last three years — we decided were important (tracking+):

http://mzl.la/1SC9IEg

When considering the long-term success of a project, is it more important
to triage non-nominated incoming bugs daily, or spend some time going
through that list of 391 bugs to see what's slipping through the cracks, or
re-triage existing product priorities? I think that's a fairly deep
philosophical question.


This is both true, and part of the problem. We cannot claim that we have a
> focus on quality if we don't even have the resources to look at the
> incoming bugs. The goal is to make daily bug triage very fast and painless.
>

I think Marco has a legit point here: even if triage is fast and painless,
if nobody owns the component, then there's nobody to look at the bugs. And
yes, that implies that — with this definition — there are components in
which we don't have a focus on quality!

I think you're more likely to have someone like Marco volunteer to triage
bugs weekly, not daily — particularly if, as you say, there aren't many
bugs filed per quarter.

Emma Humphries

unread,
Feb 2, 2016, 12:26:38 PM2/2/16
to Richard Newman, Gervase Markham, Dave Townsend, Benjamin Smedberg, Firefox Dev, Marco Bonardo, dev-platform
On Tue, Feb 2, 2016 at 9:04 AM, Richard Newman <rne...@mozilla.com> wrote:

> Here's my very lightweight counter-proposal:
>
> Once or twice a week, automatically mail out two lists (in one email) to
> the set of people Emma collected. The first are is UNCONFIRMED bugs. The
> second is NEW bugs, not filed by one of the reviewers or bug admins of that
> component, that haven't been touched in the last week. Highlight bugs from
> new Bugzilla registrations. The primary goal is to spot important
> regressions. The secondary goal is to respond to new contributors. Let the
> component owners own everything else about this process.
>


​I think what you are describing above is part of a tool kit ​we'd need to
make during the pilot. If we're asking you to change your process, I think
it's incumbent on us to make some tools to aid in the decisions.

But we still need to change the process such that we can see across all the
components and roll the individual decisions up, and that means we need
consistent decisions.

Boris Zbarsky

unread,
Feb 2, 2016, 12:50:37 PM2/2/16
to
On 2/2/16 12:04 PM, Richard Newman wrote:
> Once or twice a week

Once a week is not nearly often enough.

As far as I can tell, we have effectively 4.5 weeks or so of beta before
things are locked down for ship. Lopping a week off that (or more
precisely off whatever time is left after the bug is reported, which is
probably not on day 1 of the beta cycle) seriously impacts getting the
bug fixed before ship.

But it gets worse. The typical lifetime of a bug goes like this. It
gets reported to somewhere like Firefox:Untriaged (I think we have a
guided form that automatically dumps things there or something?).
Someone goes through and triages it, moving it from there to some
component. In a large fraction of cases, it's the wrong component,
because making sense of our components is rocket science. But chances
are, it's a more-correct component than "Firefox:Untriaged". Now the
best-case scenario is that someone is triaging that component and
notices the bug. Either it's in the right place and they try to deal
with it (e.g. evaluate its impact) or they move it to a
hopefully-more-correct component (chances are higher of getting it right
now, but still not perfect). Now someone needs to triage _this_
component to notice the bug.

Notice that in the common case, to just get to an evaluation like "hey,
should this bug be nominated for tracking?" it needs to go through at
least two, more likely three, triage cycles. If each of those is a week
long, you will totally fail to sanely handle any bugs reported against beta.

For reference, I've spent a number of years now doing daily triage
(typically morning and evening in my timezone, actually) of various core
components and in many cases it was still a struggle to get bugs
noticed, nominated, approved, assigned, patched, reviewed, landed before
the "oh, we've locked down the release" cutoff. And I'm talking about
clear "we regressed web compat" bugs here, not something fuzzy that made
it not clear that it really shouldn't ship.

This needs to be a daily process, in my opinion. It certainly needs to
be a daily process in the typical dumping ground components. Those
include *:Untriaged, Core:General, and anywhere people might be tempted
to move bugs they don't really understand. For Core that's probably
Layout, DOM, XML, File Handling, Document Navigation, and maybe a few
others. I won't presume to tell you what they are for Toolkit or Firefox.

> - Doesn't necessarily require extra meetings.

Fwiw, I don't think we need meetings here at least for the parts I care
about. I think the gfx triage rotation works reasonably, with no
meetings involved. Not sure whether it happens daily, but it's rare for
gfx to be a dumping ground component.

> - Isn't a daily obligation.

As I said above, that's a serious drawback in many cases. But I can see
how for some particular components maybe the extra lag is OK. Though if
your bug volume is low enough that you're sure there's nothing critical
in it, then I'm not sure why aiming for triaging it daily is a problem.
Then if some days you fail and miss it... no terrible harm done.

> - Allows teams to manage their own costs

Ah, but there are externalities here. Here's an example; not picking on
particular components here, but using some real component names to
illustrate how this would play out in practice. Say a bug gets filed in
Firefox:Untriaged and then is moved to Core:DOM by the initial triage
pass and the DOM folks take their sweet time looking at it because they
don't want to do daily triage and then determine that it's a layout bug
and move it to Core:Layout. Suddenly what happened is that the
_benefit_ (not doing frequent triage) accrued to the DOM team but the
_cost_ (having to scramble to fix a bug with a lot less schedule room
for it) is shouldered by the layout team.

> (e.g., the value versus the
> cost of responding to every bug report individually).

Responding to every bug is a higher bar than just getting it into the
right place so it can be evaluated. I see no problem with silently
moving a misfiled bug into a more appropriate component so it can be
triaged there.

> If, say, Aaron files a critical bug, I trust him to set the right flags to
> go through our current triage processes, or work directly with engineering
> managers to find an assignee… and that can be quicker than even a daily
> triage, and avoids the need to process that bug. And if I file a NEW
> work-item bug, I don't want it to redundantly end up in a triage list the
> next day.

I can point to explicit instances of engineers filing critical bugs and
then not setting the right flags. I've done it, certainly. It happens,
whether through inexperience or forgetfulness or just the pain of
setting those flags in the Bugzilla UI and a distraction happening at
the wrong moment. Ideally, we would catch it when it happens, set the
flags, and if needed (the inexperience case) point out that it needs to
be done by the filing engineer.

I can also point to instances of engineers filing bugs they just didn't
realize were critical, especially if they're not filing in their own
area of expertise.

That said, I can certainly see an argument for not bothering to triage
bugs filed by whoever is in the triage rotation to start with, since
presumably you can trust them to get it right in most cases. Yes, I
know I argued against it above; I don't feel as strongly about this part
of things as I do about triage frequency.

> When considering the long-term success of a project, is it more important
> to triage non-nominated incoming bugs daily, or spend some time going
> through that list of 391 bugs to see what's slipping through the cracks, or
> re-triage existing product priorities? I think that's a fairly deep
> philosophical question.

Sure. In practice, ideally we would do both.

> I think Marco has a legit point here: even if triage is fast and painless,
> if nobody owns the component, then there's nobody to look at the bugs. And
> yes, that implies that — with this definition — there are components in
> which we don't have a focus on quality!

Nobody "owns" Core:General or Core:Untriaged or Firefox:Untriaged... but
they need to be owned for triage purposes.

But more to the point, yes, we do have such components. We need to
split up the job of triaging them somehow, because bugs that are not
actually in those components can end up in them; see above.

-Boris

Justin Dolske

unread,
Feb 2, 2016, 1:03:31 PM2/2/16
to Benjamin Smedberg, Gervase Markham, Dave Townsend, Richard Newman, Emma Humphries, Firefox Dev, Marco Bonardo, dev-pl...@lists.mozilla.org
On Tue, Feb 2, 2016 at 7:44 AM, Benjamin Smedberg <benj...@smedbergs.us>
wrote:


> To keep focus and avoid creeping scope, an explicit non-goal of this
> program is to deal with the prioritization of non-critical bugs within a
> team or component. The primary goal here is to solve the flow for incoming
> bugs to a clear decision-state. Beyond the bucket of "backlog of work",
> teams can continue to use aha or tracking bugs or external spreadsheets.
>

I'm still a little unclear. The first part seems to imply that a minimal
triage process for this program would be to just flag all new bugs as
either "this is a critical issue that must be fixed for the next release"
or "not". But for many bugs, a "clear decision-state" does involve
interplay with how things get prioritized.

It also seems to be saying there should be a backlog of unprioritized work,
which I think experience with the firefox-backlog flag has shown to work
extremely poorly. (* - outside of a teams that are using it as a side
effect of closely tracked and prioritized work.)

Justin

Anne van Kesteren

unread,
Feb 2, 2016, 1:05:11 PM2/2/16
to Boris Zbarsky, dev-platform
On Tue, Feb 2, 2016 at 6:50 PM, Boris Zbarsky <bzba...@mit.edu> wrote:
> But it gets worse. The typical lifetime of a bug goes like this. It gets
> reported to somewhere like Firefox:Untriaged (I think we have a guided form
> that automatically dumps things there or something?). Someone goes through
> and triages it, moving it from there to some component. In a large fraction
> of cases, it's the wrong component, because making sense of our components
> is rocket science.

This we can improve though. And maybe reduce the number of components
in a couple of cases (I believe I managed to remove two around DOM at
some point, but I never made much of an effort). I forgot exactly
where to file the bugs, but if we think clearer descriptions and
better named components would help triage over the long term, perhaps
it's worth spending a day on.


--
https://annevankesteren.nl/

Gijs Kruitbosch

unread,
Feb 2, 2016, 1:22:29 PM2/2/16
to Anne van Kesteren, Boris Zbarsky
Yes, we can improve it, but as someone who has also done a reasonable
amount of triage of Untriaged (and tried to get some improvements made
to our new bug flow) I don't know that we can eliminate the incredible
gap here.

Let's say that I am a user who's never seen bugzilla before, and I see
black boxes on some page, and I want to report that as a bug.

Where would that go? Graphics? DOM? ImageLib? Network? "Firefox ::
Theme" ? If the page is Facebook, should it go in "Firefox :: Social" ?

Those components are sensible distinctions for an engineer, but not for
someone with comparably little technical background who files a bug
about "Firefox not quite working right".

What's more, while with a bit of experience I can tell you that it's
likely a graphics bug (and could they try safe mode, does that fix it?
What does the 'graphics' section of about:support say? A while back, you
could ask if turning off OMTC helped, etc. etc.), in many cases bugs
aren't as clear-cut as this.

Today I helped triage something one of our employees was seeing in
rackspace's UI. Some part of the website was not displaying the right
text, but internal labels. No errors in the error console. Where does
that go?

Well, you don't really find out until you find the cause of the issue
(shoutout to mozregression!), which in this case seems to have been a
change in our JS parser (block functions) that triggered that page's JS
to be run differently, and now their localization scripts aren't working
correctly anymore. So you ping shu on the bug and ask if there's
anything we can do besides moving this to tech evangelism and contacting
rackspace to fix their site. I don't know - if we break 100 of these
websites, or if the bug is in a common library, maybe we need to back
out the change or turn it on for chrome only or whatever. But I have no
idea because I don't know the code in question.


Sorry, that was quite long. All I'm really saying is: as Boris' original
posts outlined, there are a lot of gaps between "someone files a bug"
and "we stop ourselves shipping brokenness on release". While some of
our components could certainly be clearer, I don't think that's enough
to close the gap between people filing bugs and engineers/relman who
have to decide if/how/when they're getting fixed.

~ Gijs
Reply all
Reply to author
Forward
0 new messages