Hi Emma/Joel,
Thank you for starting this discussion, I've moved this to
dev.tree-management and CCed sheriffs.
For people joining, see the comments that initiated the discussion on this
Google Document:
https://docs.google.com/document/d/1ZVE5pQ78H-caMSxHiwUwOJl7vfmTb78hlzclEFHpfDY/edit?disco=AAAABNJkjpM
...and then the thread at the end of this email.
To clarify on a few points:
* Mass-closing inactive intermittent-failure bugs is something that has
happened periodically for several years, and is definitely something worth
doing - but in itself is not a new concept.
* The aspect that is new however, is that one can no longer do a simplistic
"has this bug received comments recently" bulk-search, since as of Sept
2015 the OrangeFactor bot only comments on bugs who have seen more than X
failures per interval, which reduces bug-spam, and was by request of both
BMO and Firefox/gecko devs.
* The thresholds for when the bot makes these comments were set by
analysing the data at the time. See:
https://groups.google.com/d/msg/mozilla.dev.tree-management/az643p0u4hs/3el7fqIDBwAJ
* Anyone is welcome to suggest better thresholds. I see this as something
co-owned between the Stockwell project and the sheriffs. (OrangeFactor
itself doesn't have an owner per se.)
* As Joel mentioned below and I detailed on Google docs, the desired
end-state is one where we don't use Bugzilla as a data store, but that's
not an overnight change, and also one that doesn't yet have resources
allocated.
Therefore to bulk-close bugs short-term, we need to either:
(a) remove the threshold on the weekly bot comment (this would increase bug
spam, but to levels still much lower than those before the summaries were
introduced)
(b) make scripts use the OrangeFactor API to determine whether a failure
has occurred. The complication is that SSO makes this annoying, though
there are already scripts out there that work around this (eg
https://github.com/andymckay/bugzilla-scripts/blob/master/intermittent.py)
I'm open to doing (a) fwiw. To do that, please file a bug in Tree
Management::OrangeFactor, and whomever makes the change will need to update
WEEKLY_THRESHOLD here:
https://hg.mozilla.org/automation/orangefactor/file/tip/woo_commenter.py#l26
Kind regards,
Ed
On 9 June 2017 at 22:50, Emma Humphries <
em...@mozilla.com> wrote:
>
> On Fri, Jun 9, 2017 at 2:33 PM, Joel Maher <
jma...@mozilla.com> wrote:
>
>> There is no short term solution that is easy to track these outside of
>> bugzilla. I really don't understand the driver here, how does this help
>> PI/mozilla ?
>>
>
> In general, anything that reduces the number of bugs for which we don't
> know if we need to take action on or not is a net good.
>
>
>>
>> The sheriffs need a system that is integrated into Treeherder- we need to
>> prioritize that among other work on Treeherder- it is possibly, but needs
>> to be considered as part of the big picture.
>>
>
>
>
>> Until this is in place, we need bugs on file for every intermittent- bugs
>> are free, so I don't understand the need, if we can have them with a
>> priority or tag to reduce visibility that would be ok.
>>
>
> We can can mark those bugs to make them easier to filter (and we can have
> people filter on the filer) but since every action on a bug generates so
> much noise (which we need to reduce, and to do that will take work in BMO,
> and we need more tools people 😎) it's imperfect.
>
> A simple solution would be to add a table to treeherder for intermittent
>> failures (auto classification sort of does this, but is undergoing big
>> changes as we have a summer intern working on that code).
>>
>
> Does that have to be in treeherder, can we keep it in another tool? Is
> there a reason that the record of the failure needs to be a bug instead of
> a row in a spreadsheet or a table?
>
> We will also have new tools coming on line, such as Amplitude, where we
> could store this data.
>
> Bulk close- as I mentioned in the document, we cannot do this until we
>> know that the intermittent is not happening anymore. The only way to do
>> that is to query orange factor for each bug and if there are no failures
>> documented we can close the bug. 30 days is fine, I have found many bugs
>> that have a few instances in the last 30 days but no comments in bugzilla.
>>
>
> Does Orange Factor have an API for this?
>
>
>> I don't know how to do bulk actions on bugzilla, if it is scriptable,
>> then downloading data from orangefactor is very reasonable and you could
>> create a mini in memory data structure while you programatically iterate
>> through intermittent bugs and mark ones closed that are without any
>> failures for 30 days.
>>
>
> We have an API you can use to close bugs. We can also bulk close bugs
> from within BMO. We can also try making some simple tools for this. I have
> an idea for a prototype we could try and I'll elaborate in another email.
>
>
>> We can modify orange factor to ensure in the once/week report that we add
>> a comment to bugzilla for all instances. After doing that we want for a
>> few weeks and then use traditional methods for querying bugzilla.
>>
>
> Can that be one comment to cover all occurrences, or does it need to be
> one comment per-occurrence? I'd prefer the former to keep noise down. We
> could also edit the whiteboard tags on a bug to reflect the number of
> occurrences, which would reduce the number of comments we file.
>
> -- Emma
>
>
On 9 June 2017 at 22:33, Joel Maher <
jma...@mozilla.com> wrote:
> There is no short term solution that is easy to track these outside of
> bugzilla. I really don't understand the driver here, how does this help
> PI/mozilla ?
>
> The sheriffs need a system that is integrated into Treeherder- we need to
> prioritize that among other work on Treeherder- it is possibly, but needs
> to be considered as part of the big picture. Until this is in place, we
> need bugs on file for every intermittent- bugs are free, so I don't
> understand the need, if we can have them with a priority or tag to reduce
> visibility that would be ok. A simple solution would be to add a table to
> treeherder for intermittent failures (auto classification sort of does
> this, but is undergoing big changes as we have a summer intern working on
> that code).
>
> Bulk close- as I mentioned in the document, we cannot do this until we
> know that the intermittent is not happening anymore. The only way to do
> that is to query orange factor for each bug and if there are no failures
> documented we can close the bug. 30 days is fine, I have found many bugs
> that have a few instances in the last 30 days but no comments in bugzilla.
> I don't know how to do bulk actions on bugzilla, if it is scriptable, then
> downloading data from orangefactor is very reasonable and you could create
> a mini in memory data structure while you programatically iterate through
> intermittent bugs and mark ones closed that are without any failures for 30
> days.
>
> We can modify orange factor to ensure in the once/week report that we add
> a comment to bugzilla for all instances. After doing that we want for a
> few weeks and then use traditional methods for querying bugzilla.
>
> -Joel
>
>
>
> On Fri, Jun 9, 2017 at 5:18 PM, Emma Humphries <
em...@mozilla.com> wrote:
>
>> Thank you both for your comments on my draft bug-handling proposal.
>>
>> I think that doing something about intermittents would be worth our time.
>>
>> Just looking at this report:
>>
>>
https://bugzilla.mozilla.org/report.cgi?x_axis_field=&y_axis
>> _field=reporter&z_axis_field=&query_format=report-table&shor
>> t_desc_type=allwordssubstr&short_desc=&product=Core&produ
>> ct=Firefox&product=Firefox+for+Android&product=Firefox+
>> for+iOS&product=Toolkit&longdesc_type=allwordssubstr&
>> longdesc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&
>> status_whiteboard_type=allwordssubstr&status_whiteboard=&
>> keywords_type=allwords&keywords=&bug_id=&bug_id_type=anyexac
>> t&votes=&votes_type=greaterthaneq&emailassigned_to1=1&
>> emailreporter1=1&emailtype1=exact&email1=&emailassigned_
>> to2=1&emailreporter2=1&emailqa_contact2=1&emailtype2=
>> exact&email2=&emailtype3=substring&email3=&chfield=%5BBug+creation%5D&
>> chfieldvalue=&chfieldfrom=2017-01-01&chfieldto=Now&j_
>> top=AND&f1=noop&o1=noop&v1=&format=table&action=wrap
>>
>> which is filers of bugs by frequency in FFx/Core/Toolbox/Fennec
>>
>>
https://screenshots.firefox.com/P2hekHLbq0eD3KDv/bugzilla.mozilla.org
>>
>>
https://screenshots.firefox.com/6E00g4yA07Thhwd8/bugzilla.mozilla.org
>>
>> brings this home.
>>
>> There are two pieces to this: filing less bugs, and cleaning up the
>> existing ones.
>>
>> From the discussion in the document, it sounded like having a place to
>> keep intermittent failures was the gating factor for filing less of these.
>> What's the simplest thing we can do? Can we track these in a spreadsheet?
>>
>> I'd like to do a bulk close of intermittents. RyanVM's proposed this
>> before.
>>
>> Can we use # of comments as a filter for closing bugs created by the
>> intermittent-bug-filer account?
>>
>> I'd like to get to a couple of first steps we can take before SF on this.
>> I think it will be a great win for PI and Mozilla.
>>
>> -- Emma
>>
>