A while ago, bug 749010 was reported which was about a perma-orange on
our Linux64 mochitest-chrome suite due to a crash. This bug only
affected Firefox 13 which was in beta back then, and it was marked as
tracking-firefox13+ on 2012-04-30. Then people discussed whether to
back out bug 736028 which was apparently what triggered it, and that
didn't get approved, and bug 749010 was marked as
status-firefox13:wontfix (which was the wrong call IMO, more on that
below) on 2012-05-03, to which people objected and set the status back
to "affcted" on 2012-05-07. Then more discussion and debugging happened
on the bug, Nick posted a patch and Ben had a few nits over it, and then
nothing happened on the bug and we released Firefox 13 on 2012-06-05
with bug 749010 withstanding (and indeed if you look at
https://tbpl.mozilla.org/?tree=Mozilla-Release, you'll see that the
latest test runs on Linux64 PGO did experience crash.)
Now, we ship software with known crashes and bugs all the time, and
there is analysis in this bug saying that the crash did not affect
anybody in the wild, which I trust and I think the right call was
probably made in not getting bug 749010 backed out, especially since we
did have a fix for the crash at hand. But still, the fact remains that
this bug was a persistent crash inside our test suite, which means that
we had no test coverage on everything that was tested after the crashing
test. So we shipped a release without getting full test coverage on one
of our tier 1 platforms. This is the reason that I think marking bug
749010 as "wontfix" was the wrong call.
What bothers me a lot more though is how we went through a release with
a tracking+ status:affected bug and nobody noticed it. This query shows
all such bugs for Firefox 13 <
http://bit.ly/Oqh75w> (with a total of 13
such bugs), and the same query for Firefox 14 <
http://bit.ly/NfhKRN>
shows 28 bugs.
My impression had always been that we make sure to either fix things for
a release, or wontfix them, so I find it very strange that we have
shipped a release with known tracked items without having a final call
on whether we need to fix or wontfix them. I'm not 100% sure what the
process for making this call is, but as a developer I always relied on
the assumption that marking something as tracking+ will make sure that
it doesn't fall through the cracks, and seeing that this doesn't happen
in practice makes me very nervous.
For example, I think in case of bug 749010 at the very list we should
have disabled that test, so that we would only lose coverage on a single
test.
If this way of bugs falling through the cracks is not expected, we
should think about a way to fix it.
Cheers,
Ehsan