Common crashes due to MOZ_CRASH and MOZ_RELEASE

Nicholas Nethercote

unread,

May 31, 2016, 2:28:45 AM5/31/16

to dev-platform

Hi,

Here is a crash-stats search that shows all the crash reports in the
past 7 days that have a "MozCrashReason" field -- which means they
were triggered by MOZ_CRASH or MOZ_RELEASE_ASSERT -- faceted (i.e.
aggregated) by that field:

https://crash-stats.mozilla.com/search/?product=Firefox&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-moz_crash_reason

I've included the output below. (Apologies if it gets munged when the
email is processed; just click on the link above to see the live
list.)

#1 by a long way is shutdown hangs. No great surprise.

#2 is unannotated MOZ_CRASH() calls, i.e. there is no string argument
given. These are mostly OOMs, though there are a few others in there.
These ones should be annotated so they show up separately.

>From #3 down we have smaller numbers, but many of them are still
non-trivial, and a lot of them are probably indicative of problems
that are very easy to fix if the right person sees them. Please take a
look through the list to see if any of them are familiar to you.

(If you're wondering why I made this search... I've found that many
crash reports lack enough data to be actionable -- especially those
involving crashes caused by bad memory accesses. So it's worth
focusing to some degree on the crash reports that are more likely to
be actionable, and places where we deliberately abort are an obvious
case.)

Nick

1 MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.)
129715 9.92 %
2 MOZ_CRASH() 25987 1.99 %
3 MOZ_CRASH(GFX: Unable to get a working D3D9 Compositor)
2104 0.16 %
4 MOZ_CRASH(Unexpected error during FakeBlack creation.) 1679 0.13 %
5 MOZ_CRASH(IPC FatalError in the parent process!) 783 0.06 %
6 MOZ_RELEASE_ASSERT(pi->mInternalRefs < pi->mRefCount) (Cycle
collector found more references to an object than its refcount) 509
0.04 %
7 MOZ_RELEASE_ASSERT(!mDoingStableStates) 466 0.04 %
8 MOZ_CRASH(Bogus tree op) 459 0.04 %
9 MOZ_RELEASE_ASSERT(sAliveDisplayItemDatas &&
sAliveDisplayItemDatas->Contains(aData)) 263 0.02 %
10 MOZ_CRASH(Using observer service off the main thread!) 223 0.02 %
11 MOZ_RELEASE_ASSERT(!mSkipRequest.Exists()) (called mid-skipping)
222 0.02 %
12 MOZ_CRASH(GFX_CRASH) 215 0.02 %
13 MOZ_RELEASE_ASSERT(NS_IsMainThread()) 131 0.01 %
14 MOZ_RELEASE_ASSERT(aMsg.priority() ==
IPC::Message::PRIORITY_NORMAL) 120 0.01 %
15 MOZ_RELEASE_ASSERT(aRefCount != 0) (CCed refcounted object has zero
refcount) 113 0.01 %
16 MOZ_RELEASE_ASSERT(!mAudio.HasPromise()) (No duplicate sample
requests) 110 0.01 %
17 MOZ_RELEASE_ASSERT(ok) 105 0.01 %
18 MOZ_CRASH(GFX: D3D11 timeout) 99 0.01 %
19 MOZ_CRASH(invalid process aSelector) 73 0.01 %
20 MOZ_CRASH(We lost the following char message) 58 0.00 %
21 MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.)
53 0.00 %
22 MOZ_CRASH(TODO: sourcebuffer was deleted from under us) 52
0.00 %
23 MOZ_RELEASE_ASSERT(prio == IPC::Message::PRIORITY_NORMAL ||
NS_IsMainThread()) 51 0.00 %
24 MOZ_CRASH(GFX: Failed to update reference draw target after device
reset) 45 0.00 %
25 MOZ_RELEASE_ASSERT(isSystem) 42 0.00 %
26 MOZ_CRASH(Crash creating texture. See bug 1221348.) 41 0.00 %
27 MOZ_CRASH(sandbox_init() failed) 41 0.00 %
28 MOZ_CRASH(Unable to get a working D3D9 Compositor) 37 0.00 %
29 MOZ_CRASH(GFX: Invalid D3D11 content device) 33 0.00 %
30 MOZ_CRASH(Initial length is too large) 30 0.00 %
31 MOZ_CRASH(Could not start cubeb stream for MSG.) 27 0.00 %
32 MOZ_CRASH(IPC message size is too large) 25 0.00 %
33 MOZ_RELEASE_ASSERT(mWorkerLoopID == MessageLoop::current()->id())
(not on worker thread!) 25 0.00 %
34 MOZ_RELEASE_ASSERT(!mInWriteTransaction) 24 0.00 %
35 MOZ_CRASH(NativeKey tries to dispatch a key event on destroyed
widget) 23 0.00 %
36 MOZ_RELEASE_ASSERT(((bool)(!!(!NS_FAILED_impl(rv)))) && thread)
(Should successfully create image decoding threads) 23 0.00 %
37 MOZ_RELEASE_ASSERT(aInAndOutListener) (can not perform CORS checks
without a listener) 23 0.00 %
38 MOZ_CRASH(Unknown unit type?) 18 0.00 %
39 MOZ_RELEASE_ASSERT(!mVideo.mDecodingRequested) (Reset must have
been called) 17 0.00 %
40 MOZ_RELEASE_ASSERT(msg->size() < IPC::Channel::kMaximumMessageSize)
17 0.00 %
41 MOZ_RELEASE_ASSERT(!r.IsEmpty()) 14 0.00 %
42 MOZ_RELEASE_ASSERT(!r->IsEmpty()) 13 0.00 %
43 MOZ_RELEASE_ASSERT(CheckDocTree()) 12 0.00 %
44 MOZ_RELEASE_ASSERT(mDestroyCalled) 11 0.00 %
45 MOZ_RELEASE_ASSERT(!!compositor) 10 0.00 %
46 MOZ_RELEASE_ASSERT(MessageLoop::current() == mWorkerLoop) 10
0.00 %
47 MOZ_RELEASE_ASSERT(sDebugOwningThread != currentThread) 10
0.00 %
48 MOZ_RELEASE_ASSERT(SizeOfEntryStore(CapacityFromHashShift(),
mEntrySize, &nbytes)) 8 0.00 %
49 MOZ_CRASH(Accessing the Subject Principal without an AutoJSAPI on
the stack is forbidden) 7 0.00 %
50 MOZ_CRASH(Initial entry store size is too large) 7 0.00 %

Chris Peterson

unread,

May 31, 2016, 3:06:30 AM5/31/16

to

On 5/30/16 11:22 PM, Nicholas Nethercote wrote:
> #2 is unannotated MOZ_CRASH() calls, i.e. there is no string argument
> given. These are mostly OOMs, though there are a few others in there.
> These ones should be annotated so they show up separately.

MOZ_CRASH()'s explanation string parameter is optional. Should it be
required? There are 998 calls in mozilla-central to MOZ_CRASH() without
an argument, so annotating all of these won't happen overnight.

Nicholas Nethercote

unread,

May 31, 2016, 4:46:09 AM5/31/16

to Chris Peterson, dev-platform

Doesn't seem worthwhile. Annotating the one in NS_ABORT_OOM() will
cover the vast majority of the cases, and annotating (or fixing) a
handful more cases will get us almost all of the others.

Nick

Gijs Kruitbosch

unread,

May 31, 2016, 7:26:31 AM5/31/16

to Nicholas Nethercote

We could do a find/replace of no-arg calls to a new macro that uses
MOZ_CRASH with a boilerplate message, and make the argument non-optional
for new uses of MOZ_CRASH? That would avoid the problem for new
MOZ_CRASH() additions, which seems like it would be wise so the problem
doesn't get worse? Or is it not worth even that?

~ Gijs

Gabriele Svelto

unread,

May 31, 2016, 9:18:09 AM5/31/16

to Gijs Kruitbosch, dev-pl...@lists.mozilla.org

On 31/05/2016 13:26, Gijs Kruitbosch wrote:
> We could do a find/replace of no-arg calls to a new macro that uses
> MOZ_CRASH with a boilerplate message, and make the argument non-optional
> for new uses of MOZ_CRASH? That would avoid the problem for new
> MOZ_CRASH() additions, which seems like it would be wise so the problem
> doesn't get worse? Or is it not worth even that?

What about adding file/line number information? This way one could
always tell where it's coming from even if it doesn't have a descriptive
string.

Gabriele

signature.asc

Gijs Kruitbosch

unread,

May 31, 2016, 9:27:12 AM5/31/16

to

On 31/05/2016 07:22, Nicholas Nethercote wrote:
> 10 MOZ_CRASH(Using observer service off the main thread!) 223 0.02 %

This looked interesting to me, but it seems almost all of them are
caused by IBM Rapport which hooks into the process and calls the
observer service off-main-thread. If someone has contacts there, I guess
we could tell them not to do that? :-\

~ Gijs

Josh Matthews

unread,

May 31, 2016, 10:02:04 AM5/31/16

to

FTR, I filed bug 1276921 for this (it won't show up in the reports since
that component doesn't have a crash signature field).

Milan Sreckovic

unread,

May 31, 2016, 10:28:19 AM5/31/16

to Nicholas Nethercote, dev-platform

I search for and track the ones that start with “GFX” (which is why we added those prefixes, to make it easier to find them all.) Here’s a comment on the top few of those:

#3 is bug 1254400, filed in March, fixed in May, uplifted to 47.

#12 is a collection of different crashes, with the same message, but it will only crash in nightlies and auroras; in betas and releases, it turns into a warning message, and telemetry gets sent that the “crash was avoided, but you really should try to fix this”.

#18 has been tracked since February 2015, bug 1133623, and is related to recovering from driver resets. Long haul on this one, that’s why we have “recover from driver resets” as a 2016H1 goal.

On a side note, the one that stood out for me was a “TODO” one. A crash seems to be a wrong way to tag TODOs :)

—
- Milan

> On May 31, 2016, at 2:22 , Nicholas Nethercote <n.neth...@gmail.com> wrote:
>
> Hi,
>
> Here is a crash-stats search that shows all the crash reports in the
> past 7 days that have a "MozCrashReason" field -- which means they
> were triggered by MOZ_CRASH or MOZ_RELEASE_ASSERT -- faceted (i.e.
> aggregated) by that field:
>
> https://crash-stats.mozilla.com/search/?product=Firefox&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-moz_crash_reason
>
> I've included the output below. (Apologies if it gets munged when the
> email is processed; just click on the link above to see the live
> list.)
>
> #1 by a long way is shutdown hangs. No great surprise.
>

> #2 is unannotated MOZ_CRASH() calls, i.e. there is no string argument
> given. These are mostly OOMs, though there are a few others in there.
> These ones should be annotated so they show up separately.
>

> 10 MOZ_CRASH(Using observer service off the main thread!) 223 0.02 %

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Milan Sreckovic

unread,

May 31, 2016, 10:31:42 AM5/31/16

to Gabriele Svelto, dev-pl...@lists.mozilla.org, Gijs Kruitbosch

We considered that for the graphics ones, but the line number doesn’t persist between versions, so we weren’t sure how the search would find all the ones that are the same, but on different line numbers in different versions. In the end we settled on using unique strings (for the interesting ones, at least.)
—
- Milan

> On May 31, 2016, at 9:18 , Gabriele Svelto <gsv...@mozilla.com> wrote:
>
> On 31/05/2016 13:26, Gijs Kruitbosch wrote:

>> We could do a find/replace of no-arg calls to a new macro that uses
>> MOZ_CRASH with a boilerplate message, and make the argument non-optional
>> for new uses of MOZ_CRASH? That would avoid the problem for new
>> MOZ_CRASH() additions, which seems like it would be wise so the problem
>> doesn't get worse? Or is it not worth even that?
>

> What about adding file/line number information? This way one could
> always tell where it's coming from even if it doesn't have a descriptive
> string.
>
> Gabriele
>

Benjamin Smedberg

unread,

May 31, 2016, 11:05:39 AM5/31/16

to Gabriele Svelto, dev-platform, Gijs Kruitbosch

You shouldn't need to annotate the file/line separately, because that is
(or at least should be!) the top of the stack.

FWIW, we are currently working on changing the signature for crashes with
an AbortMessage (those using NS_RUNTIMEABORT) so that the abort message is
part of the signature. After that works, we should probably do the same
thing or something similar for the new MozCrashReason field. (I'm not sure
why we used different field names for these.)

--BDS

On Tue, May 31, 2016 at 9:18 AM, Gabriele Svelto <gsv...@mozilla.com>

wrote:

> On 31/05/2016 13:26, Gijs Kruitbosch wrote:

> > We could do a find/replace of no-arg calls to a new macro that uses
> > MOZ_CRASH with a boilerplate message, and make the argument non-optional
> > for new uses of MOZ_CRASH? That would avoid the problem for new
> > MOZ_CRASH() additions, which seems like it would be wise so the problem
> > doesn't get worse? Or is it not worth even that?
>

Lawrence Mandel

unread,

May 31, 2016, 11:25:57 AM5/31/16

to Gijs Kruitbosch, Sylvestre Ledru, dev-platform

We do have contacts. The more information we pass along about this crash
including how to avoid crashing if we know the better our chances of
success. What would you recommend that we tell IBM Rapport?

Sylvestre - Can you please pass along information about the crash?

Thanks,

Lawrence

On Tue, May 31, 2016 at 9:27 AM, Gijs Kruitbosch <gijskru...@gmail.com>
wrote:

Ben Kelly

unread,

May 31, 2016, 11:34:34 AM5/31/16

to Milan Sreckovic, dev-platform, Nicholas Nethercote

On Tue, May 31, 2016 at 10:28 AM, Milan Sreckovic <msrec...@mozilla.com>
wrote:

> On a side note, the one that stood out for me was a “TODO” one. A crash
> seems to be a wrong way to tag TODOs :)
>

FWIW, this appears to have been fixed last week:

https://hg.mozilla.org/mozilla-central/rev/a61e4c04aadb

Markus Stange

unread,

May 31, 2016, 11:45:56 AM5/31/16

to

On 2016-05-31 2:22 AM, Nicholas Nethercote wrote:
> 9 MOZ_RELEASE_ASSERT(sAliveDisplayItemDatas &&
> sAliveDisplayItemDatas->Contains(aData)) 263 0.02 %

This one is https://bugzilla.mozilla.org/show_bug.cgi?id=1141089 .

-Markus

Milan Sreckovic

unread,

May 31, 2016, 11:52:35 AM5/31/16

to Nicholas Nethercote, dev-platform

By the way, this is the kiss of death query. MOZ_CRASH, start up, in safe mode. We’re basically forcing these people away. There is nothing they can do even if they really want to run Firefox (assuming this is a persistent start up crash, of course.) The numbers aren’t high, and majority of them are OOMs, but I still feel like this query should never have things in it. (Randomly picked 8 seconds, I know that some consider it a start up crash if it crashes much later than that.)

https://crash-stats.mozilla.com/search/?product=Firefox&moz_crash_reason=%21__null__&uptime=%3C8&safe_mode=__true__&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

It’s also interesting to extend the query to specify the amount of memory the machine has - how do we get an OOM on startup when the users have 2GB of RAM?
—
- Milan

>

>> 9 MOZ_RELEASE_ASSERT(sAliveDisplayItemDatas &&
>> sAliveDisplayItemDatas->Contains(aData)) 263 0.02 %

>> 10 MOZ_CRASH(Using observer service off the main thread!) 223 0.02 %

Benjamin Smedberg

unread,

May 31, 2016, 12:29:03 PM5/31/16

to Milan Sreckovic, dev-platform, Nicholas Nethercote

You're assuming that this happens every time, instead of randomly. If you
add the time since last crash to your column list, you can see that this is
true in some cases and not others.

I changed your link a little:

* remove "moz crash reason exists" - any startup crash is a problem
* excluded content and plugin process crashes - those aren't "startup"
crashes the same way, and they don't prevent users from updating or
disabling addons
* added facets on the crash reason and the abort message
* added the time-since-last-crash to the column list

The top things on this list are:

by signature:

__RtlUserThreadStart | _RtlUserThreadStart (maybe bug 1164826
<https://bugzilla.mozilla.org/show_bug.cgi?id=1164826> , need to dig to
separate by this DLL)
mozalloc_abort | NS_DebugBreak | ErrorLoadingBuiltinSheet (longstanding
problem, perhaps evidence of a bad update or install - bug 1194856
<https://bugzilla.mozilla.org/show_bug.cgi?id=1194856>)
OOM | small - this is surprising and disturbing to me - I wonder how much
of this is actually OOM and how much is memory corruption. I filed bug
1276993 as a work item for this.
EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER - again
normally OOM but I wouldn't expect that at startup

--BDS

On Tue, May 31, 2016 at 11:52 AM, Milan Sreckovic <msrec...@mozilla.com>
wrote:

Jonathan Kew

unread,

May 31, 2016, 12:45:07 PM5/31/16

to Nicholas Nethercote, dev-platform

>>> On May 31, 2016, at 2:22 , Nicholas Nethercote
>>> <n.neth...@gmail.com> wrote:

>>> #2 is unannotated MOZ_CRASH() calls, i.e. there is no string
>>> argument given. These are mostly OOMs, though there are a few
>>> others in there. These ones should be annotated so they show up
>>> separately.

I took a quick look at a random one of these OOMs[1], and what strikes
me about it is that according to the crash report:

Total Virtual Memory 2147352576

Available Virtual Memory 122331136

System Memory Use Percentage 52

Available Page File 4932567040

Available Physical Memory 1790652416

OOM Allocation Size 24

it seems like the system is still some way from running out of memory.
Available Virtual Memory is "only" 122MB, which admittedly isn't very
much in present-day terms, but still....why can't we successfully
allocate a 24-byte block? Can those 122MB really be _so_ fragmented?!

JK

[1]
https://crash-stats.mozilla.com/report/index/e59d2f18-2131-4f24-9f43-7038b2160524

Benjamin Smedberg

unread,

May 31, 2016, 12:51:03 PM5/31/16

to Jonathan Kew, dev-platform, Nicholas Nethercote

It's likely that this particular report is running out of VM, yes. jemalloc
allocates new memory chunks in large blocks (1MB?), and with only 122MB of
VM it's likely that a lot of that is inaccessible, either because of
fragmentation or because sites are allocating VM blocks of less than 64k,
which is the allocation resolution of Windows VM.

If you look at the raw dump tab, you'll see:

"largest_free_vm_block": "0xf0000" - which is 983040 bytes, less than 1MB.

--BDS

Milan Sreckovic

unread,

May 31, 2016, 2:03:30 PM5/31/16

to Benjamin Smedberg, dev-platform, Nicholas Nethercote

Agreed that all startup crashes are important. The reason I focused on the MOZ_CRASHes is that this is where somebody explicitly said “this is as bad as it gets, we must crash now”, and I’d like to look at those once in a while, and see if that’s really the case. Because I certainly ran into code that was more of the “this should never happen, and if it does I don’t understand why” with MOZ_CRASHes in it, and that can be improved...
—
- Milan

> On May 31, 2016, at 12:28 , Benjamin Smedberg <benj...@smedbergs.us> wrote:
>
> You're assuming that this happens every time, instead of randomly. If you add the time since last crash to your column list, you can see that this is true in some cases and not others.
>
> I changed your link a little:
>
> * remove "moz crash reason exists" - any startup crash is a problem
> * excluded content and plugin process crashes - those aren't "startup" crashes the same way, and they don't prevent users from updating or disabling addons
> * added facets on the crash reason and the abort message
> * added the time-since-last-crash to the column list
>
> The top things on this list are:
>
> by signature:
>
> __RtlUserThreadStart | _RtlUserThreadStart (maybe bug 1164826 <https://bugzilla.mozilla.org/show_bug.cgi?id=1164826> , need to dig to separate by this DLL)
> mozalloc_abort | NS_DebugBreak | ErrorLoadingBuiltinSheet (longstanding problem, perhaps evidence of a bad update or install - bug 1194856 <https://bugzilla.mozilla.org/show_bug.cgi?id=1194856>)
> OOM | small - this is surprising and disturbing to me - I wonder how much of this is actually OOM and how much is memory corruption. I filed bug 1276993 as a work item for this.
> EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER - again normally OOM but I wouldn't expect that at startup
>
>

> --BDS

>
>
> On Tue, May 31, 2016 at 11:52 AM, Milan Sreckovic <msrec...@mozilla.com <mailto:msrec...@mozilla.com>> wrote:
> By the way, this is the kiss of death query. MOZ_CRASH, start up, in safe mode. We’re basically forcing these people away. There is nothing they can do even if they really want to run Firefox (assuming this is a persistent start up crash, of course.) The numbers aren’t high, and majority of them are OOMs, but I still feel like this query should never have things in it. (Randomly picked 8 seconds, I know that some consider it a start up crash if it crashes much later than that.)
>

> https://crash-stats.mozilla.com/search/?product=Firefox&moz_crash_reason=%21__null__&uptime=%3C8&safe_mode=__true__&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature <https://crash-stats.mozilla.com/search/?product=Firefox&moz_crash_reason=%21__null__&uptime=%3C8&safe_mode=__true__&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature>

>
> It’s also interesting to extend the query to specify the amount of memory the machine has - how do we get an OOM on startup when the users have 2GB of RAM?
> —
> - Milan
>
>
>
> >
> >> On May 31, 2016, at 2:22 , Nicholas Nethercote <n.neth...@gmail.com <mailto:n.neth...@gmail.com>> wrote:
> >>
> >> Hi,
> >>
> >> Here is a crash-stats search that shows all the crash reports in the
> >> past 7 days that have a "MozCrashReason" field -- which means they
> >> were triggered by MOZ_CRASH or MOZ_RELEASE_ASSERT -- faceted (i.e.
> >> aggregated) by that field:
> >>

> >> https://crash-stats.mozilla.com/search/?product=Firefox&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-moz_crash_reason <https://crash-stats.mozilla.com/search/?product=Firefox&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-moz_crash_reason>

> >>
> >> I've included the output below. (Apologies if it gets munged when the
> >> email is processed; just click on the link above to see the live
> >> list.)
> >>
> >> #1 by a long way is shutdown hangs. No great surprise.
> >>

> >> #2 is unannotated MOZ_CRASH() calls, i.e. there is no string argument
> >> given. These are mostly OOMs, though there are a few others in there.
> >> These ones should be annotated so they show up separately.
> >>

> >> _______________________________________________
> >> dev-platform mailing list
> >> dev-pl...@lists.mozilla.org <mailto:dev-pl...@lists.mozilla.org>
> >> https://lists.mozilla.org/listinfo/dev-platform <https://lists.mozilla.org/listinfo/dev-platform>
> >
>
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org <mailto:dev-pl...@lists.mozilla.org>
> https://lists.mozilla.org/listinfo/dev-platform <https://lists.mozilla.org/listinfo/dev-platform>
>

Ralph Giles

unread,

May 31, 2016, 3:28:48 PM5/31/16

to Nicholas Nethercote, Jean-Yves Avenard, dev-platform

A few of these are in MediaFormatReader.

On Mon, May 30, 2016 at 11:22 PM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:

> 11 MOZ_RELEASE_ASSERT(!mSkipRequest.Exists()) (called mid-skipping)
> 222 0.02 %

Fixed in bug 1068151.

> 16 MOZ_RELEASE_ASSERT(!mAudio.HasPromise()) (No duplicate sample
> requests) 110 0.01 %

We think this is fixed in bug 1276495, but it's too soon to confirm.

> 39 MOZ_RELEASE_ASSERT(!mVideo.mDecodingRequested) (Reset must have
> been called) 17 0.00 %

Fixed in bug 1272964.

And in GraphDriver,

> 31 MOZ_CRASH(Could not start cubeb stream for MSG.) 27 0.00 %

it looks like this could propagate an error instead. I filed bug 1277037.

-r

Jeff Gilbert

unread,

May 31, 2016, 4:20:12 PM5/31/16

to Gabriele Svelto, dev-platform, Gijs Kruitbosch

On Tue, May 31, 2016 at 6:18 AM, Gabriele Svelto <gsv...@mozilla.com> wrote:
> On 31/05/2016 13:26, Gijs Kruitbosch wrote:

>> We could do a find/replace of no-arg calls to a new macro that uses
>> MOZ_CRASH with a boilerplate message, and make the argument non-optional
>> for new uses of MOZ_CRASH? That would avoid the problem for new
>> MOZ_CRASH() additions, which seems like it would be wise so the problem
>> doesn't get worse? Or is it not worth even that?
>

> What about adding file/line number information? This way one could
> always tell where it's coming from even if it doesn't have a descriptive
> string.

Agreed! These queries are much more useful if they have file names.
Line numbers are a plus, but I agree that since these drift, they are
not useful for collating. File names will generally not drift, and
would make these queries much easier to grep for problems originating
from code we're responsible for.

Eric Rescorla

unread,

May 31, 2016, 7:10:01 PM5/31/16

to Jeff Gilbert, Gabriele Svelto, dev-platform, Gijs Kruitbosch

Also, perhaps function name (__func__) or one of the pretty versions.

-Ekr

On Tue, May 31, 2016 at 1:20 PM, Jeff Gilbert <jgil...@mozilla.com> wrote:

> On Tue, May 31, 2016 at 6:18 AM, Gabriele Svelto <gsv...@mozilla.com>
> wrote:
> > On 31/05/2016 13:26, Gijs Kruitbosch wrote:

> >> We could do a find/replace of no-arg calls to a new macro that uses
> >> MOZ_CRASH with a boilerplate message, and make the argument non-optional
> >> for new uses of MOZ_CRASH? That would avoid the problem for new
> >> MOZ_CRASH() additions, which seems like it would be wise so the problem
> >> doesn't get worse? Or is it not worth even that?
> >

> > What about adding file/line number information? This way one could
> > always tell where it's coming from even if it doesn't have a descriptive
> > string.
>
> Agreed! These queries are much more useful if they have file names.
> Line numbers are a plus, but I agree that since these drift, they are
> not useful for collating. File names will generally not drift, and
> would make these queries much easier to grep for problems originating
> from code we're responsible for.

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org

> https://lists.mozilla.org/listinfo/dev-platform
>

Nicholas Nethercote

unread,

May 31, 2016, 7:39:59 PM5/31/16

to Benjamin Smedberg, Gabriele Svelto, dev-platform, Gijs Kruitbosch

On Wed, Jun 1, 2016 at 1:05 AM, Benjamin Smedberg <benj...@smedbergs.us> wrote:
> You shouldn't need to annotate the file/line separately, because that is
> (or at least should be!) the top of the stack.

Yes. Don't get hung up on the lack of annotations. It isn't much of a
problem; you can click through easily enough. I have filed bug 1277104
to fix the handful of instances that are showing up in practice, but
it'll only be a minor improvement.

And adding more detail can be a hindrance. As others mentioned, if you
add the line number that will change over time, which means that
related crashes will be grouped separately in this search. A more
extreme example of this is NS_RUNTIME_ABORT's AbortMessage, which
includes the *Process ID*. From a crash aggregation POV this is a
disaster, because it all but guarantees that no related crashes will
get clustered in this kind of search.

> FWIW, we are currently working on changing the signature for crashes with
> an AbortMessage (those using NS_RUNTIMEABORT) so that the abort message is
> part of the signature. After that works, we should probably do the same
> thing or something similar for the new MozCrashReason field. (I'm not sure
> why we used different field names for these.)

I tried doing a similar search for AbortMessage but I couldn't get it
to work. For example:

https://crash-stats.mozilla.com/search/?product=Firefox&abort_message=ABORT&_facets=abort_message&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=abort_message#facet-abort_message

Gives a table that looks like this:

Rank Abort message Count %
1 abort 20461 100.00 %
2 file 20148 98.47 %
3 line 13681 66.86 %
4 build 6974 34.08 %
5 builds 6744 32.96 %

Even ignoring the anti-clustering problems caused with the Process ID
(described above), it seems to be bucketing according to individual
words in the AbortMessage, rather than the entire AbortMessage. I
don't understand what's happening.

----

Anyway, can we just get rid of NS_RUNTIMEABORT? I count 308
occurrences of it in the code vs. 4064 for MOZ_CRASH.

Nick

Nicholas Nethercote

unread,

May 31, 2016, 7:57:03 PM5/31/16

to Jonathan Kew, dev-platform

On Wed, Jun 1, 2016 at 2:37 AM, Jonathan Kew <jfkt...@gmail.com> wrote:
>
> I took a quick look at a random one of these OOMs[1], and what strikes me
> about it is that according to the crash report:
>
> Total Virtual Memory 2147352576
>
> Available Virtual Memory 122331136
>
> System Memory Use Percentage 52
>
> Available Page File 4932567040
>
> Available Physical Memory 1790652416
>
> OOM Allocation Size 24
>
> it seems like the system is still some way from running out of memory.
> Available Virtual Memory is "only" 122MB, which admittedly isn't very much
> in present-day terms, but still....why can't we successfully allocate a
> 24-byte block? Can those 122MB really be _so_ fragmented?!

I looked at a bunch of these yesterday. It's pretty common for OOM to
occur when there is around 200--250 MiB of available virtual memory;
122 MB is probably lower than normal. As bsmedberg said, jemalloc uses
1 MiB chunks so the size of 24 is something of a red herring here.
(It's still useful in the sense that it's tiny, so making this
allocation fallible is unlikely to be helpful.)

More generally, I did a search yesterday of all our "OOM | small"
crashes for the past week. About 5% of them occur when the user has >
1 GiB of available virtual memory *and* > 1 GiB of available physical
memory, which is surprising. I would love to see a scatter plot
showing available physical memory vs. available virtual memory for all
our "OOM | small" crashes. bsmedberg, do we have tools to extract that
kind of data from crash-stats?

Nick

L. David Baron

unread,

May 31, 2016, 8:02:11 PM5/31/16

to Benjamin Smedberg, Nicholas Nethercote, dev-platform, Jonathan Kew

On Tuesday 2016-05-31 12:50 -0400, Benjamin Smedberg wrote:
> It's likely that this particular report is running out of VM, yes. jemalloc
> allocates new memory chunks in large blocks (1MB?), and with only 122MB of
> VM it's likely that a lot of that is inaccessible, either because of
> fragmentation or because sites are allocating VM blocks of less than 64k,
> which is the allocation resolution of Windows VM.
>
> If you look at the raw dump tab, you'll see:
>
> "largest_free_vm_block": "0xf0000" - which is 983040 bytes, less than 1MB.

Would it make sense for jemalloc to try allocating memory in smaller
chunks when large ones aren't available?

-David

--
𝄞 L. David Baron http://dbaron.org/ 𝄂
𝄢 Mozilla https://www.mozilla.org/ 𝄂
Before I built a wall I'd ask to know
What I was walling in or walling out,
And to whom I was like to give offense.
- Robert Frost, Mending Wall (1914)

signature.asc

Mike Hommey

unread,

May 31, 2016, 8:08:39 PM5/31/16

to L. David Baron, dev-platform, Benjamin Smedberg, Nicholas Nethercote, Jonathan Kew

On Tue, May 31, 2016 at 05:01:34PM -0700, L. David Baron wrote:
> On Tuesday 2016-05-31 12:50 -0400, Benjamin Smedberg wrote:
> > It's likely that this particular report is running out of VM, yes. jemalloc
> > allocates new memory chunks in large blocks (1MB?), and with only 122MB of
> > VM it's likely that a lot of that is inaccessible, either because of
> > fragmentation or because sites are allocating VM blocks of less than 64k,
> > which is the allocation resolution of Windows VM.
> >
> > If you look at the raw dump tab, you'll see:
> >
> > "largest_free_vm_block": "0xf0000" - which is 983040 bytes, less than 1MB.
>
> Would it make sense for jemalloc to try allocating memory in smaller
> chunks when large ones aren't available?

The way jemalloc (currently) works doesn't allow that. Making chunks
smaller is one option. Future versions might even remove the notion of
chunks (https://github.com/jemalloc/jemalloc/issues/360)

Mike

Nicholas Nethercote

unread,

May 31, 2016, 8:13:36 PM5/31/16

to Benjamin Smedberg, Gabriele Svelto, dev-platform, Gijs Kruitbosch

On Wed, Jun 1, 2016 at 9:39 AM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
>
> Yes. Don't get hung up on the lack of annotations. It isn't much of a
> problem; you can click through easily enough. I have filed bug 1277104
> to fix the handful of instances that are showing up in practice, but
> it'll only be a minor improvement.

Related: this is a common pattern:

if (!cond) {
MOZ_CRASH();
}

If you instead write this:

MOZ_RELEASE_ASSERT(cond);

not only is it shorter and more readable, but the crash report will
automatically be annotated with "cond".

Nick

Jeff Gilbert

unread,

May 31, 2016, 9:31:25 PM5/31/16

to Nicholas Nethercote, Gabriele Svelto, Benjamin Smedberg, dev-platform, Gijs Kruitbosch

On Tue, May 31, 2016 at 4:39 PM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
> On Wed, Jun 1, 2016 at 1:05 AM, Benjamin Smedberg <benj...@smedbergs.us> wrote:
>> You shouldn't need to annotate the file/line separately, because that is
>> (or at least should be!) the top of the stack.
>

> Yes. Don't get hung up on the lack of annotations. It isn't much of a
> problem; you can click through easily enough. I have filed bug 1277104
> to fix the handful of instances that are showing up in practice, but
> it'll only be a minor improvement.

Perhaps this isn't meant for me then? I looked at the query from the
first post, but it's just noise to me. If it included the file that it
crashed from, it would suddenly be very useful, since it'd then be
trivial to see if there's something relevant to me.

As it stands now, the query alone doesn't seem useful to me. If it's
meant to be useful to developers who write MOZ_CRASHes, this is a
problem. If not, please ignore!

I would be extremely interested in MOZ_CRASHes and friends
automatically getting bugs filed and needinfo'd. An index of
crashes-by-file would get half-way there for me.

Nicholas Nethercote

unread,

May 31, 2016, 10:14:37 PM5/31/16

to Jeff Gilbert, Gabriele Svelto, Benjamin Smedberg, dev-platform, Gijs Kruitbosch

On Wed, Jun 1, 2016 at 11:26 AM, Jeff Gilbert <jgil...@mozilla.com> wrote:
>
> Perhaps this isn't meant for me then? I looked at the query from the
> first post, but it's just noise to me. If it included the file that it
> crashed from, it would suddenly be very useful, since it'd then be
> trivial to see if there's something relevant to me.

Let's look at the top #3:

1 MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.)
129715 9.92 %

If you use your favourite source code search tool to look for
"Shutdown too long", you'll find that this crash is occurring in
toolkit/components/terminator/nsTerminator.cpp. For example, here's a
DXR link:

https://dxr.mozilla.org/mozilla-central/search?q=%22Shutdown+too+long%22&redirect=false

The line in question looks like this:

MOZ_CRASH("Shutdown too long, probably frozen, causing a crash.");

2 MOZ_CRASH() 25987 1.99 %

This one matches all calls to MOZ_CRASH() that don't provide a string
parameter. Digging into these ones is slightly harder, requiring a new
search for bugs that have "moz crash reason" set to "MOZ_CRASH()":

https://crash-stats.mozilla.com/search/?product=Firefox&moz_crash_reason=%3DMOZ_CRASH%28%29&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-signature

3 MOZ_CRASH(GFX: Unable to get a working D3D9 Compositor) 2104 0.16 %

Searching for "working D3D9 Compositor" identifies this one as coming
from gfx/layers/d3d9/CompositorD3D9.cpp.

And so on. Searching for strings in code is a useful technique in many
situations, I recommend it!

BTW, thank you to all those who have already looked through the list
and mentioned existing bugs and/or filed new bugs.

Nick

Nicholas Nethercote

unread,

Jun 1, 2016, 5:31:26 AM6/1/16

to dev-platform

Here's an update. This one is bug 1235183:

> 7 MOZ_RELEASE_ASSERT(!mDoingStableStates) 466 0.04 %

This one is covered by bug 616421 (the signature includes a
combination of MOZ_CRASHes and other kinds of crashes):

> 8 MOZ_CRASH(Bogus tree op) 459 0.04 %

Below are all the ones that don't have a bug associated with them, as
determined from replies to this thread.

Nick

> 4 MOZ_CRASH(Unexpected error during FakeBlack creation.) 1679 0.13 %
> 5 MOZ_CRASH(IPC FatalError in the parent process!) 783 0.06 %
> 6 MOZ_RELEASE_ASSERT(pi->mInternalRefs < pi->mRefCount) (Cycle
> collector found more references to an object than its refcount) 509
> 0.04 %

> 13 MOZ_RELEASE_ASSERT(NS_IsMainThread()) 131 0.01 %
> 14 MOZ_RELEASE_ASSERT(aMsg.priority() ==
> IPC::Message::PRIORITY_NORMAL) 120 0.01 %
> 15 MOZ_RELEASE_ASSERT(aRefCount != 0) (CCed refcounted object has zero
> refcount) 113 0.01 %

> 16 MOZ_RELEASE_ASSERT(!mAudio.HasPromise()) (No duplicate sample
> requests) 110 0.01 %

> 17 MOZ_RELEASE_ASSERT(ok) 105 0.01 %

> 19 MOZ_CRASH(invalid process aSelector) 73 0.01 %
> 20 MOZ_CRASH(We lost the following char message) 58 0.00 %
> 21 MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.)
> 53 0.00 %

> 23 MOZ_RELEASE_ASSERT(prio == IPC::Message::PRIORITY_NORMAL ||
> NS_IsMainThread()) 51 0.00 %
> 24 MOZ_CRASH(GFX: Failed to update reference draw target after device
> reset) 45 0.00 %
> 25 MOZ_RELEASE_ASSERT(isSystem) 42 0.00 %
> 26 MOZ_CRASH(Crash creating texture. See bug 1221348.) 41 0.00 %
> 27 MOZ_CRASH(sandbox_init() failed) 41 0.00 %
> 28 MOZ_CRASH(Unable to get a working D3D9 Compositor) 37 0.00 %
> 29 MOZ_CRASH(GFX: Invalid D3D11 content device) 33 0.00 %
> 30 MOZ_CRASH(Initial length is too large) 30 0.00 %

> 32 MOZ_CRASH(IPC message size is too large) 25 0.00 %
> 33 MOZ_RELEASE_ASSERT(mWorkerLoopID == MessageLoop::current()->id())
> (not on worker thread!) 25 0.00 %
> 34 MOZ_RELEASE_ASSERT(!mInWriteTransaction) 24 0.00 %
> 35 MOZ_CRASH(NativeKey tries to dispatch a key event on destroyed
> widget) 23 0.00 %
> 36 MOZ_RELEASE_ASSERT(((bool)(!!(!NS_FAILED_impl(rv)))) && thread)
> (Should successfully create image decoding threads) 23 0.00 %
> 37 MOZ_RELEASE_ASSERT(aInAndOutListener) (can not perform CORS checks
> without a listener) 23 0.00 %
> 38 MOZ_CRASH(Unknown unit type?) 18 0.00 %

Josh Matthews

unread,

Jun 1, 2016, 8:25:48 AM6/1/16

to

On 2016-06-01 2:32 AM, Nicholas Nethercote wrote:
>> 37 MOZ_RELEASE_ASSERT(aInAndOutListener) (can not perform CORS checks
>> without a listener) 23 0.00 %

bug 1276943

Jonathan Kew

unread,

Jun 1, 2016, 9:20:34 AM6/1/16

to Nicholas Nethercote, dev-platform

On 1/6/16 00:51, Nicholas Nethercote wrote:
> On Wed, Jun 1, 2016 at 2:37 AM, Jonathan Kew <jfkt...@gmail.com> wrote:
>>
>> I took a quick look at a random one of these OOMs[1], and what strikes me
>> about it is that according to the crash report:
>>
>> Total Virtual Memory 2147352576
>>
>> Available Virtual Memory 122331136
>>
>> System Memory Use Percentage 52
>>
>> Available Page File 4932567040
>>
>> Available Physical Memory 1790652416
>>
>> OOM Allocation Size 24
>>
>> it seems like the system is still some way from running out of memory.
>> Available Virtual Memory is "only" 122MB, which admittedly isn't very much
>> in present-day terms, but still....why can't we successfully allocate a
>> 24-byte block? Can those 122MB really be _so_ fragmented?!
>
> I looked at a bunch of these yesterday. It's pretty common for OOM to
> occur when there is around 200--250 MiB of available virtual memory;
> 122 MB is probably lower than normal.

Does this suggest that we're not sufficiently proactive about firing
memory-pressure notifications, so that we'll free up memory from various
caches, etc? It looks like we regard 128MB of available VM as "low" (see
[1]) on Windows 32-bit, but apparently we're liable to suffer small-OOM
crashes well before we reach that point. That doesn't seem like a
healthy balance.

JK

[1]
https://dxr.mozilla.org/mozilla-central/source/xpcom/base/AvailableMemoryTracker.cpp#497-498

Boris Zbarsky

unread,

Jun 1, 2016, 9:45:08 AM6/1/16

to

On 6/1/16 2:32 AM, Nicholas Nethercote wrote:
>> 4 MOZ_CRASH(Unexpected error during FakeBlack creation.) 1679 0.13 %

This is https://bugzilla.mozilla.org/show_bug.cgi?id=1247977 (credit to
Milan for noticing that).

>> 49 MOZ_CRASH(Accessing the Subject Principal without an AutoJSAPI on
>> the stack is forbidden) 7 0.00 %

I looked at these. One of these was real and got fixed by
https://bugzilla.mozilla.org/show_bug.cgi?id=1235411

The next six signatures look like a tour of busted vtables or something:

https://crash-stats.mozilla.com/report/index/5edf0db2-44d4-4a00-94b2-89b752160527
-- shows _moz_cairo_surface_destroy calling
XPCJSRuntime::InterruptCallback(JSContext*) which then calls
nsContentUtils::IsCallerChrome(). The latter call makes sense, but the
former does not.

https://crash-stats.mozilla.com/report/index/95363c0a-bdab-4810-b4be-a07552160526
-- shows XPCJSRuntime::PrepareForForgetSkippable() calling
"BRFrame::`scalar deleting destructor'(unsigned int)" calling
nsGlobalWindow::UnmarkGrayTimers() calling nsXPCComponents_ID::Call
which does in fact end up touching the subject principal. But the other
calls in that stack are bunk. :(

https://crash-stats.mozilla.com/report/index/a7b5408d-1087-4fe6-914e-5320f2160527
-- shows mozilla::ipc::MessageChannel::DispatchAsyncMessage calling
mozilla::dom::SubtleCrypto::Verify calling "@0x10db" calling
nsContentUtils::SubjectPrincipal. Still nonsense. :(

https://crash-stats.mozilla.com/report/index/e981363d-5bd5-4a09-b121-1263c2160527
-- Shows nsCOMPtr_base::~nsCOMPtr_base calling
mozilla::dom::StructuredCloneHolder::CustomReadHandler, which is pretty
darned unlikely, I think. Apart from that, this stack actually makes
sense, but that one spot is really going off the rails.

https://crash-stats.mozilla.com/report/index/089ba17c-d2cf-4cc5-bf02-6134b2160530
-- shows nsImageFrame::GetLogicalSkipSides calling
nsGenericHTMLElement::Click on this line:

if (nullptr != GetNextInFlow()) {

https://crash-stats.mozilla.com/report/index/95391d22-6236-4397-b694-61c102160527
-- shows nsIContent::PreHandleEvent calling nsGenericHTMLElement::Click
on this line:

nsTArray<nsIContent*>* destPoints = GetExistingDestInsertionPoints();

Finally,
https://crash-stats.mozilla.com/report/index/8c472aa1-7eff-409a-b9b0-4abaa2160527
and
https://crash-stats.mozilla.com/report/index/8c472aa1-7eff-409a-b9b0-4abaa2160527
-- the stack is totally sensible, but it should have an AutoJSAPI on the
stack! It's coming through nsFrameMessageManager::ReceiveMessage which
totally uses one (via AutoEntryScript) to get its JSContext. No idea
what's going on there.

-Boris

Andrew McCreight

unread,

Jun 1, 2016, 10:10:00 AM6/1/16

to Nicholas Nethercote, dev-platform

On Mon, May 30, 2016 at 11:22 PM, Nicholas Nethercote <
n.neth...@gmail.com> wrote:

> 6 MOZ_RELEASE_ASSERT(pi->mInternalRefs < pi->mRefCount) (Cycle
> collector found more references to an object than its refcount) 509
> 0.04 %
>

That's bug 1266882.

15 MOZ_RELEASE_ASSERT(aRefCount != 0) (CCed refcounted object has zero
> refcount) 113 0.01 %
>

That's odd. I'll file a bug on it.

Andrew McCreight

unread,

Jun 1, 2016, 10:13:12 AM6/1/16

to dev-platform

On Tue, May 31, 2016 at 7:14 PM, Nicholas Nethercote <n.neth...@gmail.com
> wrote:

> If you use your favourite source code search tool to look for
> "Shutdown too long", you'll find that this crash is occurring in
> toolkit/components/terminator/nsTerminator.cpp. For example, here's a
> DXR link:
>

Sure, you can individually search for each assertion failure, but that's
not useful if you are just trying to skim the list looking for assertions
in code you are familiar with. The crash stack contains file information,
so it would be nice if that could be exposed somehow in a search. I don't
think you can do that right now, but I could be wrong.

Andrew

Milan Sreckovic

unread,

Jun 1, 2016, 10:52:47 AM6/1/16

to Nicholas Nethercote, dev-platform, Jonathan Kew

Cairo graphics reports “out of memory” error condition when the author didn’t have time to figure out what went wrong. We caught a few problems that were being reported as out of memory (we would pick up the Cairo library error as out of memory, and dutifully propagate it up the chain), when they weren’t and could be properly handled.

We will also report out of memory when we mean out of resources. If Direct3D library can’t give us what we’re asking for, they will give us the out of memory error code, but that doesn’t mean that we are out of memory as such. We could be out of file descriptors or some other resource (or contiguous memory on the graphics card.) We can’t really easily tell these apart, so the best we can report is OOM.

I don’t know if there are any other places in our code where this kind of thing happens, and I don’t know if we care about the second one, but wanted to point out that showing a lot of available “regular” memory would be something that would happen in both of the cases I mentioned above.

—
- Milan

> On May 31, 2016, at 19:51 , Nicholas Nethercote <n.neth...@gmail.com> wrote:
>
> On Wed, Jun 1, 2016 at 2:37 AM, Jonathan Kew <jfkt...@gmail.com> wrote:
>>
>> I took a quick look at a random one of these OOMs[1], and what strikes me
>> about it is that according to the crash report:
>>
>> Total Virtual Memory 2147352576
>>
>> Available Virtual Memory 122331136
>>
>> System Memory Use Percentage 52
>>
>> Available Page File 4932567040
>>
>> Available Physical Memory 1790652416
>>
>> OOM Allocation Size 24
>>
>> it seems like the system is still some way from running out of memory.
>> Available Virtual Memory is "only" 122MB, which admittedly isn't very much
>> in present-day terms, but still....why can't we successfully allocate a
>> 24-byte block? Can those 122MB really be _so_ fragmented?!
>
> I looked at a bunch of these yesterday. It's pretty common for OOM to
> occur when there is around 200--250 MiB of available virtual memory;

> 122 MB is probably lower than normal. As bsmedberg said, jemalloc uses
> 1 MiB chunks so the size of 24 is something of a red herring here.
> (It's still useful in the sense that it's tiny, so making this
> allocation fallible is unlikely to be helpful.)
>
> More generally, I did a search yesterday of all our "OOM | small"
> crashes for the past week. About 5% of them occur when the user has >
> 1 GiB of available virtual memory *and* > 1 GiB of available physical
> memory, which is surprising. I would love to see a scatter plot
> showing available physical memory vs. available virtual memory for all
> our "OOM | small" crashes. bsmedberg, do we have tools to extract that
> kind of data from crash-stats?
>
> Nick

Milan Sreckovic

unread,

Jun 1, 2016, 11:52:55 AM6/1/16

to Andrew McCreight, dev-platform

Not sure what you mean - the stack information is there, right? This is my workflow:

Search for all the MOZ_CRASHes (e.g., moz crash reason “exists” (sometimes I just search for the graphics reasons, but we don’t have the GFX prefix on all the crashes, so that doesn’t quite work all the time)
Look at the top N signatures, pick one that looks like it could be graphics related
See what the MOZ_CRASH string is for that particular crash
Search for MOZ_CRASH reason that matches that string. Just in case it returns more than “other reports”
DXR for the MOZ_CRASH reason - this gets me to the current version of the code, instead of the version specific one form the crash report itself.

—
- Milan

Jeff Gilbert

unread,

Jun 1, 2016, 3:22:50 PM6/1/16

to Nicholas Nethercote, Gabriele Svelto, Benjamin Smedberg, dev-platform, Gijs Kruitbosch

It would be useful to have a dashboard that collates this information better.

PS: Sarcasm is unhelpful.

On Tue, May 31, 2016 at 7:14 PM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:

> On Wed, Jun 1, 2016 at 11:26 AM, Jeff Gilbert <jgil...@mozilla.com> wrote:
>>
>> Perhaps this isn't meant for me then? I looked at the query from the
>> first post, but it's just noise to me. If it included the file that it
>> crashed from, it would suddenly be very useful, since it'd then be
>> trivial to see if there's something relevant to me.
>
> Let's look at the top #3:
>
>
> 1 MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.)
> 129715 9.92 %
>

> If you use your favourite source code search tool to look for
> "Shutdown too long", you'll find that this crash is occurring in
> toolkit/components/terminator/nsTerminator.cpp. For example, here's a
> DXR link:
>

Ted Mielczarek

unread,

Jun 1, 2016, 3:23:35 PM6/1/16

to Jeff Gilbert, Nicholas Nethercote, dev-platform

On Tue, May 31, 2016, at 09:26 PM, Jeff Gilbert wrote:

> On Tue, May 31, 2016 at 4:39 PM, Nicholas Nethercote
> <n.neth...@gmail.com> wrote:
> > On Wed, Jun 1, 2016 at 1:05 AM, Benjamin Smedberg <benj...@smedbergs.us> wrote:
> >> You shouldn't need to annotate the file/line separately, because that is
> >> (or at least should be!) the top of the stack.
> >
> > Yes. Don't get hung up on the lack of annotations. It isn't much of a
> > problem; you can click through easily enough. I have filed bug 1277104
> > to fix the handful of instances that are showing up in practice, but
> > it'll only be a minor improvement.
>

> Perhaps this isn't meant for me then? I looked at the query from the
> first post, but it's just noise to me. If it included the file that it
> crashed from, it would suddenly be very useful, since it'd then be
> trivial to see if there's something relevant to me.
>

> As it stands now, the query alone doesn't seem useful to me. If it's
> meant to be useful to developers who write MOZ_CRASHes, this is a
> problem. If not, please ignore!
>
> I would be extremely interested in MOZ_CRASHes and friends
> automatically getting bugs filed and needinfo'd. An index of
> crashes-by-file would get half-way there for me.

I believe at some point in the past we talked about trying to do a "top
crashes by bug component" view, but maintaining that mapping was hard.
It turns out that we're storing this data in moz.build files nowadays
(for example[1]), and we even have a web service on hg.mozilla.org to
expose it for any given revision[2]. Unfortunately that web service is
currently broken[3], but gps tells me he's just been delaying fixing it
because there weren't any consumers complaining.

When the source file from the last frame of the stack used to generate
the signature points to hg.mozilla.org we could query that web service
to get a bug component for the file in question and put that in the
crash report, and expose it to queries. That would make it easy to get
lists of crashes by component, which I think would do what people here
are asking for. I filed bug 1277337 to track this idea.

-Ted

1.
https://dxr.mozilla.org/mozilla-central/rev/4d63dde701b47b8661ab7990f197b6b60e543839/dom/media/moz.build#7
2.
http://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/mozbuildinfo.html
3. https://bugzilla.mozilla.org/show_bug.cgi?id=1263973
4. https://bugzilla.mozilla.org/show_bug.cgi?id=1277337

Jeff Gilbert

unread,

Jun 1, 2016, 3:25:58 PM6/1/16

to Ted Mielczarek, dev-platform, Nicholas Nethercote

Awesome, this sounds like what I was after. (though actual components
isn't really necessary. If that part is a pain point, I would prefer
to have a tool without it, than to have no tool)

Nicholas Nethercote

unread,

Jun 1, 2016, 11:30:25 PM6/1/16

to Jeff Gilbert, Gabriele Svelto, Benjamin Smedberg, dev-platform, Gijs Kruitbosch

I apologize for the sarcasm. I was frustrated with this comment:

> I looked at the query from the first post, but it's just noise to me. If it included the file that it
> crashed from, it would suddenly be very useful, since it'd then be trivial to see if there's
> something relevant to me.

but it wasn't a good way to respond. So I'll try again.

Most of the results in the search identify a unique string, which *is*
trivial to look up. And while it's true that file and/or function
names would help refine the small number of cases where distinct
MOZ_CRASH calls, you can also do that with a simple follow-up search
refinement, as I showed in my previous response. The required data is
present. I would describe the presentation as "slightly suboptimal but
still highly usable".

I've looked at a lot of crash reports recently and one thing I've
learned is how inadequate they often are. Many are unactionable. Many
aren't even comprehensible. In comparison, this list is a treasure
trove, containing reports that are much more comprehensible and
actionable than average. It's one which I intend to revisit, and I
hope others will too.

Nick

On Thu, Jun 2, 2016 at 5:22 AM, Jeff Gilbert <jgil...@mozilla.com> wrote:
> It would be useful to have a dashboard that collates this information better.
>
> PS: Sarcasm is unhelpful.
>

> On Tue, May 31, 2016 at 7:14 PM, Nicholas Nethercote
> <n.neth...@gmail.com> wrote:
>> On Wed, Jun 1, 2016 at 11:26 AM, Jeff Gilbert <jgil...@mozilla.com> wrote:
>>>

>>> Perhaps this isn't meant for me then? I looked at the query from the
>>> first post, but it's just noise to me. If it included the file that it
>>> crashed from, it would suddenly be very useful, since it'd then be
>>> trivial to see if there's something relevant to me.
>>

Gabriele Svelto

unread,

Jun 2, 2016, 9:45:35 AM6/2/16

to Jonathan Kew, Nicholas Nethercote, dev-platform

On 01/06/2016 15:20, Jonathan Kew wrote:
> Does this suggest that we're not sufficiently proactive about firing
> memory-pressure notifications, so that we'll free up memory from various
> caches, etc? It looks like we regard 128MB of available VM as "low" (see
> [1]) on Windows 32-bit, but apparently we're liable to suffer small-OOM
> crashes well before we reach that point. That doesn't seem like a
> healthy balance.

Those values were set when the AvailableMemoryTracker was introduced 5
years ago [1]. I'd say we should probably revisit them especially in the
light of these findings.

That being said in Firefox OS we employed memory-pressure notifications
quite successfully to keep processes alive when memory was running low
but we didn't rely on a single threshold because it didn't prove very
effective. Instead we used a floating trigger which started at a certain
level of free memory which once passed would trigger memory-pressure
events. Once that happened we'd lower the threshold. If we hit this new
threshold we'd fire memory-pressure events again, if not we'd try to
raise the threshold again to the higher level. Exponential back-off was
used to avoid having the threshold fluctuate too much between the two
values.

Something similar could be implemented on Windows. If there's consensus
I'm happy to look into it.

Gabriele

[1] On Windows, fire a memory-pressure event when the amount of
available virtual address space or physical memory is low
https://bugzilla.mozilla.org/show_bug.cgi?id=670967

Milan Sreckovic

unread,

Jun 2, 2016, 1:17:54 PM6/2/16

to Nicholas Nethercote, Jeffrey Gilbert, Benjamin Smedberg, dev-platform, Gabriele Svelto, Gijs Kruitbosch

If you want a treasure trove :), when it comes to graphics crashes, the “graphics critical error” field is it. It’s a recent history of barely avoidable crashes and error states just before the crash actually happens. Very often the real cause of the crash happens prior to the line of code captured in the stack trace.

(It doesn’t necessarily make sense to search for the "graphics critical error” existence, as too many bugs will show up, but for us it’s usually the first step after we identify a crash to look at.)

- Milan

Milan Sreckovic

unread,

Jun 2, 2016, 1:46:04 PM6/2/16

to Nicholas Nethercote, dev-platform

For example, searching for “graphics critical error” containing 0x8007000e (Windows error code for “out of memory”) leads you to a lot of crashes/bugs that are not tagged as OOM (which is OK), but clearly have some aspect of “running out of stuff” in them.

In general, it can come in handy when the info is otherwise light (e.g., “ERROR NO MINIDUMP HEADER” ones :)
—
- Milan

Common crashes due to MOZ_CRASH and MOZ_RELEASE_ASSERT

Nicholas Nethercote

Chris Peterson

Nicholas Nethercote

Gijs Kruitbosch

Gabriele Svelto

Gijs Kruitbosch

Josh Matthews

Milan Sreckovic

Milan Sreckovic

Benjamin Smedberg

Lawrence Mandel

Ben Kelly

Markus Stange

Milan Sreckovic

Benjamin Smedberg

Jonathan Kew

Benjamin Smedberg

Milan Sreckovic

Ralph Giles

Jeff Gilbert

Eric Rescorla

Nicholas Nethercote

Nicholas Nethercote

L. David Baron

Mike Hommey

Nicholas Nethercote

Jeff Gilbert

Nicholas Nethercote

Nicholas Nethercote

Josh Matthews

Jonathan Kew

Boris Zbarsky

Andrew McCreight

Andrew McCreight

Milan Sreckovic

Milan Sreckovic

Jeff Gilbert

Ted Mielczarek

Jeff Gilbert

Nicholas Nethercote

Gabriele Svelto

Milan Sreckovic

Milan Sreckovic