Short version: we need more people to look specifically at Linux
topcrashers on crash-stats and file bugs about them.
Long version:
I've been worrying that our topcrashers list are completely dominated
by the volume of Windows crash reports, preventing even our biggest
Linux topcrashers from getting attention.
Today I've looked at our topcrashers lists on Linux, in Firefox 7.0b1,
8.0a2 and 9.0a1. Many of them didn't (according to crash-stats) have a
bug filed about them.
Firefox 7.0b1 crashes:
bug 682621 - JS engine - topcrasher rank 2
bug 682625 - plugins handling - topcrasher rank 5
Firefox 8.0a2 crashes:
bug 682607 - zlib - topcrasher rank 1
bug 682615 - graphics - topcrasher rank 6
bug 682616 - JS engine - topcrasher rank 10
Firefox 9.0a1 crashes:
bug 682593 - video/audio - topcrasher rank 17
bug 682594 - JS engine - topcrasher rank 19
bug 682595 - CSS parser - topcrasher rank 23
bug 682598 - JS engine - topcrasher rank 30
bug 682597 - JS engine - topcrasher rank 33
On a related note: bug 641487 is a crash-stats feature request to
replace the current "top crashers" links by per-OS links to get more
people to realize that they've been looking almost only at windows
topcrashers so far.
Cheers,
Benoit
This is a concern, yes. But I do wonder - how many people are testing
on Linux that are reporting crashes? Last time I looked the
distributions (where most people get their Firefox) they weren't using
the reporter, nor do we have symbols for those builds. Has that
changed?
--Chris
Yes, we (Ubuntu) push symbols for all of our release and beta builds,
and turn on the crash reporter for those builds too, so our users should
be reporting crashes. We don't do this for all of the nightly and aurora
builds we provide though, just because of the amount of data.
Regards
Chris
This is indeed a big issue. Most of the desktop Linux crash reports we
get are from Ubuntu and Fedora users. So at least these 2 distros are
sending reports. I don't know about the rest.
Benoit
Just to make sure that everyone is aware.
If you are interested in working on platform specific crashes for Linux
or Mac
it is always easy to get a top crash list that is platform specific.
just bring up a top crash report like these
https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/6.0/7
https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/9.0a1/7
then click on "Lin" or "Mac" to sort the list based on volume by that
platform.
I keep hearing that, but I've never actually seen any numbers backing
it. Not that I don't believe it, just that I would actually like to know
the proportions.
> they weren't using
> the reporter, nor do we have symbols for those builds. Has that
> changed?
I know Fedora had requested access to upload symbols a while ago, I
don't know if that happened yet.
Mike
First of all, thanks for looking into those, we surely need to look at
lot more closely into crash issues in general.
Us in the CrashKill team can mostly only point people to visible enough
regressions and nag people to get those fixed as well as talk to devs on
getting high-volume issues fixed. That and caring that tooling (mostly
crash-stats) is being improved takes up almost all our time, the rest
going into some special projects like investigating Flash hangs and
prototyping tools we'd like to have so we can go to the Socorro
(crash-stats) team with a good plan.
The real problem with crashes is that we need to reduce crashiness, but
it nowadays comes down to being very much a long tail problem.
Even though the top 10 crashers of 7.0b1 [1] amount for 25% of our
reports, the top spot is "empty signature" which is unhelpful as it can
be all kinds of things (often OOM, though) where the current crash
reporter can't really get out the stack (and other info) from the dead
process, and it needs a redesign of the reporter to get better there
(ted knows more about that). I'd cheer for work to happen there, but so
far, this has been an idea only for quite some time.
The next spots are plugin hangs (or correctly, the browser-side parts of
those hang pairs) - we surely need to get traction on those (users see
them as "crashes" as well, often disturbing what they wanted to do, even
if the main Firefox process stayed alive), but the causes range from
Flash apps (~85% of the hangs are Flash [2]) via the plugin itself to
issues in our code, and we have few people on our side working on this.
Then there is js::mjit::EnterMethodJIT which sums up all crashes in
JITed code of JaegerMonkey and is hard to investigate though I've seen
some work on that in the past and would love to see some again.
_moz_pixman_image_create_bits (#8) is the upside of those top 10 as it's
fixed already and absent from the 7.0b2 topcrasher list [3].
js::gc::PushMarkStack is one of the signatures belonging to the JS GC
crash cloud, most of those happen in the code that actually is being
garbage collected but can't be attributed to it correctly any more in
this phase, as I understand it. Bill McCloskey does some instrumentation
on trunk to get to the bottom of those - or at least many of them.
Sorry for wandering off into the general topic of crash analysis a bit,
but I think some insight into that might be helpful for some people here.
Most of our crashes are, like most of our users, on Windows, not just
because of the amount of users, though, but also because things like
malware and security software hooking binaries into our processes are
more common there and cause their own share of problems.
The first non-Windows signature on the 7.0b1 list shows up is one of
those where a Mac OS X 10.7 library crashes us, which was worked around
in bug 678607 and is fixed in 7.0b2 as well.
Now, for Linux, we are seeing a very low crash volume there overall (the
#1 topcrash for 7.0b1 Linux [4] has 37 reports while the #1 Windows one
has over 5k and that top Mac one has over 1100 - for 7.0b2 Linux [5]
we're at 4 reports for the topcrashers), and some in the Linux
topcrashers have quite unhelpful signatures like linux-gate.so@0x424 as
we don't have symbols for most system libraries.
Still, let's take a look at the ones you listed and filed:
> Firefox 7.0b1 crashes:
> bug 682621 - JS engine - topcrasher rank 2
3 unique crashes within a week, all others are duplicate reports of
them, probably due to repeatedly crashing as this happens at startup.
Looking beyond a week reveals another one.
> bug 682625 - plugins handling - topcrasher rank 5
The ones on 7.0 smell very much like a single user, given the install
times matches to the second, the one on 7.0a2 and the one on 8.0a1 seems
to be two other users that have seen it.
> Firefox 8.0a2 crashes:
> bug 682607 - zlib - topcrasher rank 1
Actually a cross-version, cross-platform problem.
> bug 682615 - graphics - topcrasher rank 6
Cross-version, Linux and Mac. And have I seen a fix in the bug? ;-)
> bug 682616 - JS engine - topcrasher rank 10
Seems to have been around for a while at low volume. Cyle collector is
almost as nasty as GC, though...
> Firefox 9.0a1 crashes:
> bug 682593 - video/audio - topcrasher rank 17
Seems to also affect Mac and Aurora at its low volume (but it could just
be low because Nightly and Aurora have few users).
> bug 682594 - JS engine - topcrasher rank 19
Also Mac and Aurora as well on low volume.
> bug 682595 - CSS parser - topcrasher rank 23
This is a dupe of your own bug 682615 (see above). ;-)
> bug 682598 - JS engine - topcrasher rank 30
Both Linux and Mac, FF4 and above.
> bug 682597 - JS engine - topcrasher rank 33
This is one of the many heads of the ugly GC hydra, which Bill is
looking into over in bug 670702.
One note for the Linux/Mac signatures - they might actually be around on
Windows as well, but Windows reports the parameter declarations of the
function as well, while Linux/Mac only have the function name itself as
the signature. This might be due to differences in symbol formats, I guess.
Again, thanks for looking into those. Every crash we fix means fewer
problems for some users out there.
And the queries I cited in [4] and [5] (and similar ones) should also
help you bridge the time until crash-stats provide per-platform
topcrasher reports in a more convenient way.
Robert Kaiser
[1] https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/7.0b1
[2]
https://crash-analysis.mozilla.com/rkaiser/firefox.4plus.flashsummary.html
[3] https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/7.0b2
[4]
https://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A7.0b1&platform=linux&range_value=4&range_unit=weeks&do_query=1
[5]
https://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A7.0b2&platform=linux&range_value=4&range_unit=weeks&do_query=1
--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)
That only shows you the ones that have enough crashes to make it into
the overall top 300. In my reply to Benoit, there are two query links
that bring you the full list (but take a bot more strain on the server
as they don't base on previously aggregated data like the topcrash pages).
Robert Kaiser
It would also be nice to be able to query for Android crash reports.
Benoit
Can we pre-aggregate the top-100 per platform, as well?
Mike
AFAIK, the data exists, it's just not exposed in a report right now.
That's called "Fennec topcrashers". ;-)
https://crash-stats.mozilla.com/topcrasher/byversion/Fennec/6.0 et al.
have that data but there's a number of problems with the crash reporter
not sending as useful data as on desktop, Naoki from mobile QA knows
more details there.
Ah, OK, sorry -- I thought that you meant that it had to do an ad-hoc
query to get the top-N-per-platform (outside the top-300-for-all),
which was slow. If we already have that data prebuilt, then a report
to expose it would be great.
Mike
Ah! I didn't know that. Had been churning CSV files to get that
feature. Please make this more discoverable, as since Fennec was more
or less renamed to Firefox Mobile I don't think anymore about looking
there.
Benoit
>
> Robert Kaiser
>
> --
> Note that any statements of mine - no matter how passionate - are never
> meant to be offensive but very often as food for thought or possible
> arguments that we as a community should think about. And most of the time, I
> even appreciate irony and fun! :)
Mobile QA surely does, but it may not be as visible across the board as
it could be. May we need to link it more often. ;-)
BTW, as you did look in there, the CSVs also contain "Fennec" (mobile)
in addition to "Firefox" (desktop) crashes.
There's a bug for the report, but the Socorro team is a bit undermanned.
Still, even the ad-hoc queries are not really slow - just when they are
viewed by a lot of people, I guess they could produce some more than
anticipated server load, but I may be overly cautious in saying that.
Thanks for starting this thread. I agree we could be doing more to give
visibility to the linux crashes. We are looking at revamping crashkill a bit
so it gives us the opportunity to make some improvements in this area.
We definitely want to get bug 641487 taken care of but we don't need to
block on this to make progress. We can clearly link to per OS reports in
crashkill home page and add this to the stability report we give during the
Tues meeting. Making sure the bugs are logged and reproducible ones are
indicated as such will also help. Marcia actually has been doing quite a bit
of this but we have been juggling lots of things so likely some bugs slipped
through the cracks.
I am also working closely with QA and the Mobile team to organize the Fennec
related crashes so it's easier to identify the important ones and get them
in front of devs.
Cheers,
Sheila
On Mon, Aug 29, 2011 at 11:01 AM, Robert Kaiser <ka...@kairo.at> wrote:
> Benoit Jacob schrieb:
>
>> 2011/8/29 Robert Kaiser<ka...@kairo.at>:
>>
>> Benoit Jacob schrieb:
>>>
>>>>
>>>> It would also be nice to be able to query for Android crash reports.
>>>>
>>>
>>> That's called "Fennec topcrashers". ;-)
>>> https://crash-stats.mozilla.**com/topcrasher/byversion/**Fennec/6.0<https://crash-stats.mozilla.com/topcrasher/byversion/Fennec/6.0>et al. have
>>> that data but there's a number of problems with the crash reporter not
>>> sending as useful data as on desktop, Naoki from mobile QA knows more
>>> details there.
>>>
>>
>> Ah! I didn't know that. Had been churning CSV files to get that
>> feature. Please make this more discoverable, as since Fennec was more
>> or less renamed to Firefox Mobile I don't think anymore about looking
>> there.
>>
>
> Mobile QA surely does, but it may not be as visible across the board as it
> could be. May we need to link it more often. ;-)
>
> BTW, as you did look in there, the CSVs also contain "Fennec" (mobile) in
> addition to "Firefox" (desktop) crashes.
>
>
> Robert Kaiser
>
>
> --
> Note that any statements of mine - no matter how passionate - are never
> meant to be offensive but very often as food for thought or possible
> arguments that we as a community should think about. And most of the time, I
> even appreciate irony and fun! :)
> ______________________________**_________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/**listinfo/dev-platform<https://lists.mozilla.org/listinfo/dev-platform>
>