Re: Taking Action on Intermittents

Ed Morley

unread,

Jun 12, 2017, 11:02:34 AM6/12/17

to dev-tree-...@lists.mozilla.org, sheriffs, Joel Maher, Emma Humphries

Hi Emma/Joel,

Thank you for starting this discussion, I've moved this to
dev.tree-management and CCed sheriffs.

For people joining, see the comments that initiated the discussion on this
Google Document:
https://docs.google.com/document/d/1ZVE5pQ78H-caMSxHiwUwOJl7vfmTb78hlzclEFHpfDY/edit?disco=AAAABNJkjpM

...and then the thread at the end of this email.

To clarify on a few points:
* Mass-closing inactive intermittent-failure bugs is something that has
happened periodically for several years, and is definitely something worth
doing - but in itself is not a new concept.
* The aspect that is new however, is that one can no longer do a simplistic
"has this bug received comments recently" bulk-search, since as of Sept
2015 the OrangeFactor bot only comments on bugs who have seen more than X
failures per interval, which reduces bug-spam, and was by request of both
BMO and Firefox/gecko devs.
* The thresholds for when the bot makes these comments were set by
analysing the data at the time. See:
https://groups.google.com/d/msg/mozilla.dev.tree-management/az643p0u4hs/3el7fqIDBwAJ
* Anyone is welcome to suggest better thresholds. I see this as something
co-owned between the Stockwell project and the sheriffs. (OrangeFactor
itself doesn't have an owner per se.)
* As Joel mentioned below and I detailed on Google docs, the desired
end-state is one where we don't use Bugzilla as a data store, but that's
not an overnight change, and also one that doesn't yet have resources
allocated.

Therefore to bulk-close bugs short-term, we need to either:
(a) remove the threshold on the weekly bot comment (this would increase bug
spam, but to levels still much lower than those before the summaries were
introduced)
(b) make scripts use the OrangeFactor API to determine whether a failure
has occurred. The complication is that SSO makes this annoying, though
there are already scripts out there that work around this (eg
https://github.com/andymckay/bugzilla-scripts/blob/master/intermittent.py)

I'm open to doing (a) fwiw. To do that, please file a bug in Tree
Management::OrangeFactor, and whomever makes the change will need to update
WEEKLY_THRESHOLD here:
https://hg.mozilla.org/automation/orangefactor/file/tip/woo_commenter.py#l26

Kind regards,

Ed

On 9 June 2017 at 22:50, Emma Humphries <em...@mozilla.com> wrote:

>
> On Fri, Jun 9, 2017 at 2:33 PM, Joel Maher <jma...@mozilla.com> wrote:
>
>> There is no short term solution that is easy to track these outside of
>> bugzilla. I really don't understand the driver here, how does this help
>> PI/mozilla ?
>>
>
> In general, anything that reduces the number of bugs for which we don't
> know if we need to take action on or not is a net good.
>
>
>>
>> The sheriffs need a system that is integrated into Treeherder- we need to
>> prioritize that among other work on Treeherder- it is possibly, but needs
>> to be considered as part of the big picture.
>>
>
>
>
>> Until this is in place, we need bugs on file for every intermittent- bugs
>> are free, so I don't understand the need, if we can have them with a
>> priority or tag to reduce visibility that would be ok.
>>
>
> We can can mark those bugs to make them easier to filter (and we can have
> people filter on the filer) but since every action on a bug generates so
> much noise (which we need to reduce, and to do that will take work in BMO,
> and we need more tools people 😎) it's imperfect.
>
> A simple solution would be to add a table to treeherder for intermittent
>> failures (auto classification sort of does this, but is undergoing big
>> changes as we have a summer intern working on that code).
>>
>
> Does that have to be in treeherder, can we keep it in another tool? Is
> there a reason that the record of the failure needs to be a bug instead of
> a row in a spreadsheet or a table?
>
> We will also have new tools coming on line, such as Amplitude, where we
> could store this data.
>
> Bulk close- as I mentioned in the document, we cannot do this until we
>> know that the intermittent is not happening anymore. The only way to do
>> that is to query orange factor for each bug and if there are no failures
>> documented we can close the bug. 30 days is fine, I have found many bugs
>> that have a few instances in the last 30 days but no comments in bugzilla.
>>
>
> Does Orange Factor have an API for this?
>
>
>> I don't know how to do bulk actions on bugzilla, if it is scriptable,
>> then downloading data from orangefactor is very reasonable and you could
>> create a mini in memory data structure while you programatically iterate
>> through intermittent bugs and mark ones closed that are without any
>> failures for 30 days.
>>
>
> We have an API you can use to close bugs. We can also bulk close bugs
> from within BMO. We can also try making some simple tools for this. I have
> an idea for a prototype we could try and I'll elaborate in another email.
>
>
>> We can modify orange factor to ensure in the once/week report that we add
>> a comment to bugzilla for all instances. After doing that we want for a
>> few weeks and then use traditional methods for querying bugzilla.
>>
>
> Can that be one comment to cover all occurrences, or does it need to be
> one comment per-occurrence? I'd prefer the former to keep noise down. We
> could also edit the whiteboard tags on a bug to reflect the number of
> occurrences, which would reduce the number of comments we file.
>
> -- Emma
>
>

On 9 June 2017 at 22:33, Joel Maher <jma...@mozilla.com> wrote:

> There is no short term solution that is easy to track these outside of
> bugzilla. I really don't understand the driver here, how does this help
> PI/mozilla ?
>
> The sheriffs need a system that is integrated into Treeherder- we need to
> prioritize that among other work on Treeherder- it is possibly, but needs
> to be considered as part of the big picture. Until this is in place, we
> need bugs on file for every intermittent- bugs are free, so I don't
> understand the need, if we can have them with a priority or tag to reduce
> visibility that would be ok. A simple solution would be to add a table to
> treeherder for intermittent failures (auto classification sort of does
> this, but is undergoing big changes as we have a summer intern working on
> that code).
>
> Bulk close- as I mentioned in the document, we cannot do this until we
> know that the intermittent is not happening anymore. The only way to do
> that is to query orange factor for each bug and if there are no failures
> documented we can close the bug. 30 days is fine, I have found many bugs
> that have a few instances in the last 30 days but no comments in bugzilla.
> I don't know how to do bulk actions on bugzilla, if it is scriptable, then
> downloading data from orangefactor is very reasonable and you could create
> a mini in memory data structure while you programatically iterate through
> intermittent bugs and mark ones closed that are without any failures for 30
> days.
>
> We can modify orange factor to ensure in the once/week report that we add
> a comment to bugzilla for all instances. After doing that we want for a
> few weeks and then use traditional methods for querying bugzilla.
>
> -Joel
>
>
>
> On Fri, Jun 9, 2017 at 5:18 PM, Emma Humphries <em...@mozilla.com> wrote:
>
>> Thank you both for your comments on my draft bug-handling proposal.
>>
>> I think that doing something about intermittents would be worth our time.
>>
>> Just looking at this report:
>>
>> https://bugzilla.mozilla.org/report.cgi?x_axis_field=&y_axis
>> _field=reporter&z_axis_field=&query_format=report-table&shor
>> t_desc_type=allwordssubstr&short_desc=&product=Core&produ
>> ct=Firefox&product=Firefox+for+Android&product=Firefox+
>> for+iOS&product=Toolkit&longdesc_type=allwordssubstr&
>> longdesc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&
>> status_whiteboard_type=allwordssubstr&status_whiteboard=&
>> keywords_type=allwords&keywords=&bug_id=&bug_id_type=anyexac
>> t&votes=&votes_type=greaterthaneq&emailassigned_to1=1&
>> emailreporter1=1&emailtype1=exact&email1=&emailassigned_
>> to2=1&emailreporter2=1&emailqa_contact2=1&emailtype2=
>> exact&email2=&emailtype3=substring&email3=&chfield=%5BBug+creation%5D&
>> chfieldvalue=&chfieldfrom=2017-01-01&chfieldto=Now&j_
>> top=AND&f1=noop&o1=noop&v1=&format=table&action=wrap
>>
>> which is filers of bugs by frequency in FFx/Core/Toolbox/Fennec
>>
>> https://screenshots.firefox.com/P2hekHLbq0eD3KDv/bugzilla.mozilla.org
>>
>> https://screenshots.firefox.com/6E00g4yA07Thhwd8/bugzilla.mozilla.org
>>
>> brings this home.
>>
>> There are two pieces to this: filing less bugs, and cleaning up the
>> existing ones.
>>
>> From the discussion in the document, it sounded like having a place to
>> keep intermittent failures was the gating factor for filing less of these.
>> What's the simplest thing we can do? Can we track these in a spreadsheet?
>>
>> I'd like to do a bulk close of intermittents. RyanVM's proposed this
>> before.
>>
>> Can we use # of comments as a filter for closing bugs created by the
>> intermittent-bug-filer account?
>>
>> I'd like to get to a couple of first steps we can take before SF on this.
>> I think it will be a great win for PI and Mozilla.
>>
>> -- Emma
>>
>

Emma Humphries

unread,

Jun 14, 2017, 7:10:10 PM6/14/17

to Ed Morley, sheriffs, Joel Maher, dev-tree-...@lists.mozilla.org

Thanks Ed,

The first part of this, updating the test failure threshold in
OrangeFactor, has been checked in,
https://bugzilla.mozilla.org/show_bug.cgi?id=1372277.

As soon as that change goes live, intermittent test failure bugs which have
at least one additional failure a week will get a comment.

Then in three weeks once that change goes live, we should be able to use a
query similar to this to identify intermittent bugs which have not
reappeared.

https://bugzilla.mozilla.org/buglist.cgi?email1=intermittent-bug-filer%40mozilla.bugs&emailreporter1=1&emailtype1=exact&f1=longdescs.count&f2=blocked&f3=flagtypes.name&list_id=13633712&n1=1&o1=changedafter&o2=isempty&o3=notequals&query_format=advanced&resolution=---&v1=-3w&v3=needinfo%3F&order=bug_id&limit=0

I'm excluding bugs which block other bugs, or have open needinfos.

Questions:

1. Do we all agree that NO FAILURES IN THREE WEEKS is a sufficient criteria
for closing Oranges?

2. Is the query above correct? (It wont be at the moment, since we're only
commenting if failures >= 5 for now, but changing to > 0)

3. When we close an Orange, it will be closed with status: RESOLVED and
resolution: INCOMPLETE? Is WORKSFORME a better choice?

4. Who will run the query and clean up the bugs?

5. Will the clean up be automated? Who will write and maintain the code for
that?

I'd appreciate your comments on this by Friday the 16th so I can present
this to a larger audience once we're agreed.

-- Emma

Ed Morley

unread,

Jun 14, 2017, 7:22:52 PM6/14/17

to Emma Humphries, sheriffs, Joel Maher, dev-tree-...@lists.mozilla.org

On 15 June 2017 at 00:09, Emma Humphries <em...@mozilla.com> wrote:

> The first part of this, updating the test failure threshold in
> OrangeFactor, has been checked in, https://bugzilla.mozilla.org/
> show_bug.cgi?id=1372277.
>
> As soon as that change goes live
>

The change is already live - it was deployed earlier today:
https://bugzilla.mozilla.org/show_bug.cgi?id=1372277#c4

Since the threshold is question is for the weekly summary, it won't be seen
to take effect until Sunday midnight UTC+0 when the next weekly summary
cron runs.

2. Is the query above correct? (It wont be at the moment, since we're only
> commenting if failures >= 5 for now, but changing to > 0)
>

That query will likely miss some cases, since the older bugs will have been
filed by humans rather than by the bug filing tool. Though if that part of
the query is removed, additional care will need to be taken to ensure that
non-firefox/gecko bugs aren't closed by accident (eg bugs for other
products that don't use Treeherder, that have used the
"intermittent-failure" keyword when they shouldn't have).

The query will also need to take into account:
* bugs marked with keyword "leave-open"
* (possibly) bugs marked with whiteboard "leave-open" (legacy style and
people still forget to use the keyword)
* (possibly) bugs with "test disabled" (or similar variants) in their
whiteboard (though arguably these should also use the "leave-open" keyword,
and many correctly do so

> 3. When we close an Orange, it will be closed with status: RESOLVED and
> resolution: INCOMPLETE? Is WORKSFORME a better choice?
>

In the past people have used either, though I believe the latter was the
more common of the two.

Hope that helps,

Ed

L. David Baron

unread,

Jun 14, 2017, 8:31:27 PM6/14/17

to Emma Humphries, sheriffs, Joel Maher, dev-tree-...@lists.mozilla.org

On Wednesday 2017-06-14 16:09 -0700, Emma Humphries wrote:
> 1. Do we all agree that NO FAILURES IN THREE WEEKS is a sufficient criteria
> for closing Oranges?

I think there are some categories of intermittent failures where
that's not a sufficient, and it would probably be good to have a way
to mark bugs as rare intermittent oranges that should be left open.

For example:

* failures that happen only around summer time changes or around
the new year

* failures that only happen if the execution of a single test
starts and ends in different days (yes, we've had this, and it
ends up being pretty infrequent if the test is fast)

* failures that are actually understood, but are just rare (e.g.,
bug 1285461, which is an extremely rare intermittent whose cause
I figured out while debugging a different intermittent, and which
doesn't seem a high priority but is still a valid bug in our
code -- and it might be a code bug rather than a test bug).

In some cases, we have bugs on these things that contain debugging
and useful diagnostic information, and closing the bugs and later
filing another one risks losing that information.

-David

--
𝄞 L. David Baron http://dbaron.org/ 𝄂
𝄢 Mozilla https://www.mozilla.org/ 𝄂
Before I built a wall I'd ask to know
What I was walling in or walling out,
And to whom I was like to give offense.
- Robert Frost, Mending Wall (1914)

signature.asc

Joel Maher

unread,

Jun 14, 2017, 8:45:35 PM6/14/17

to L. David Baron, sheriffs, dev-tree-management

Thanks for raising these specific examples :dbaron. I think if they all
have 'leave-open' specified, we should be able to avoid them.

All, If there are other keywords/whiteboards we should look for or consider
using, please share those ideas on this thread.

Thanks,
Joel

Emma Humphries

unread,

Jun 14, 2017, 8:50:48 PM6/14/17

to L. David Baron, sheriffs, Joel Maher, dev-tree-...@lists.mozilla.org

On Wed, Jun 14, 2017 at 5:30 PM, L. David Baron <dba...@dbaron.org> wrote:

> I think there are some categories of intermittent failures where
> that's not a sufficient, and it would probably be good to have a way
> to mark bugs as rare intermittent oranges that should be left open.
>

> For example:
>
> * failures that happen only around summer time changes or around
> the new year
>
> * failures that only happen if the execution of a single test
> starts and ends in different days (yes, we've had this, and it
> ends up being pretty infrequent if the test is fast)
>

Ed pointed out above that we have the leave-open keyword and whiteboard
tag (legacy) as well as the test-disabled keyword and tag.

The query I use
to find bugs to close
can filter for those, but the onus will be on contributors to mark those
bugs with those keywords
because I don't have a way to filter on
those edge cases.

I'll also want to review the bugs marked that way to make sure we're not
just holding bugs open out of fear.

* failures that are actually understood, but are just rare (e.g.,
> bug 1285461, which is an extremely rare intermittent whose cause
> I figured out while debugging a different intermittent, and which
> doesn't seem a high priority but is still a valid bug in our
> code -- and it might be a code bug rather than a test bug).
>

That bug, 1285461, is marked P5, indicating staff won't work on it, but a
patch will be considered. Was that your intention for this bug?

There were a number of bugs we marked as P3's last year, and we should
close these if the test is not failing and they have not been marked as
leave-open or test-disabled.

In some cases, we have bugs on these things that contain debugging
> and useful diagnostic information, and closing the bugs and later
> filing another one risks losing that information.
>

Closing a bug is non-destructive, and closed bugs can be reopened.

-- Emma

Emma Humphries

unread,

Jun 14, 2017, 8:54:48 PM6/14/17

to Joel Maher, sheriffs, L. David Baron, dev-tree-management

What about the [stockwell disabled] whiteboard tag?

-- Emma

On Wed, Jun 14, 2017 at 5:45 PM, Joel Maher <jma...@mozilla.com> wrote:

> Thanks for raising these specific examples :dbaron. I think if they all
> have 'leave-open' specified, we should be able to avoid them.
>
> All, If there are other keywords/whiteboards we should look for or
> consider using, please share those ideas on this thread.
>
> Thanks,
> Joel
>
>
> On Wed, Jun 14, 2017 at 8:30 PM, L. David Baron <dba...@dbaron.org> wrote:
>
>> On Wednesday 2017-06-14 16:09 -0700, Emma Humphries wrote:
>> > 1. Do we all agree that NO FAILURES IN THREE WEEKS is a sufficient
>> criteria
>> > for closing Oranges?
>>

>> I think there are some categories of intermittent failures where
>> that's not a sufficient, and it would probably be good to have a way
>> to mark bugs as rare intermittent oranges that should be left open.
>>
>> For example:
>>
>> * failures that happen only around summer time changes or around
>> the new year
>>
>> * failures that only happen if the execution of a single test
>> starts and ends in different days (yes, we've had this, and it
>> ends up being pretty infrequent if the test is fast)
>>

>> * failures that are actually understood, but are just rare (e.g.,
>> bug 1285461, which is an extremely rare intermittent whose cause
>> I figured out while debugging a different intermittent, and which
>> doesn't seem a high priority but is still a valid bug in our
>> code -- and it might be a code bug rather than a test bug).
>>

>> In some cases, we have bugs on these things that contain debugging
>> and useful diagnostic information, and closing the bugs and later
>> filing another one risks losing that information.
>>

L. David Baron

unread,

Jun 14, 2017, 9:00:46 PM6/14/17

to Emma Humphries, sheriffs, Joel Maher, dev-tree-management

On Wednesday 2017-06-14 17:53 -0700, Emma Humphries wrote:
> What about the [stockwell disabled] whiteboard tag?

Er, yes... and in general, bugs that we worked around by disabling
the test should be left open unless a separate bug was filed to
reenable it.

> On Wed, Jun 14, 2017 at 5:45 PM, Joel Maher <jma...@mozilla.com> wrote:
> > Thanks for raising these specific examples :dbaron. I think if they all
> > have 'leave-open' specified, we should be able to avoid them.

'leave-open' has a different meaning; it means that the bug should
be left open despite a changeset that references it being committed.
I think this probably deserves a different keyword.

However, it seems likely that 'leave-open' bugs are a sign of bugs
where a test-disabling patch was landed, and should probably be
included here as well.

-David

signature.asc

Joel Maher

unread,

Jun 14, 2017, 9:07:23 PM6/14/17

to Emma Humphries, sheriffs, L. David Baron, dev-tree-management

that tag is used for tracking tests we have disabled to fix/reduce a
problem, if there is a leave-open we leave it open, otherwise feel free to
ignore it.

On Wed, Jun 14, 2017 at 8:53 PM, Emma Humphries <em...@mozilla.com> wrote:

> What about the [stockwell disabled] whiteboard tag?
>

> -- Emma

>
> On Wed, Jun 14, 2017 at 5:45 PM, Joel Maher <jma...@mozilla.com> wrote:
>
>> Thanks for raising these specific examples :dbaron. I think if they all
>> have 'leave-open' specified, we should be able to avoid them.
>>

Emma Humphries

unread,

Jun 14, 2017, 9:12:50 PM6/14/17

to Joel Maher, sheriffs, L. David Baron, dev-tree-management

To see if I'm understanding this correctly:

* filter on leave-open, but we need a new keyword for tests
* filter on test-disabled
* don't filter on [stockwell disabled]

Joel, if you would suggest a new keyword to use for test failures so I'm
not overloading 'leave-open', I can create that keyword.

Joel Maher

unread,

Jun 15, 2017, 4:58:47 AM6/15/17

to Emma Humphries, sheriffs, L. David Baron, dev-tree-management

I cannot think of a better keyword than leave-open