Image-suck summary

Nicholas Nethercote

unread,

Jan 12, 2012, 7:09:48 PM1/12/12

to dev-platform

Greetings,

Images are common on the web, and image decoding is currently causing us a
lot of memory and performance problems -- we tend to do much worse than
other browsers on image-heavy pages. The details are spread across a range
of bugs, I wrote a summary to get the situation clear in my own head and
once I finished it I thought it was worth sharing. I've probably got some
details wrong, please correct me if I have.

Nick

TL;DR

If we fixed bugs 542158 and bug 661304 (both of which depend on bug 689623)
Firefox would suck much less on image-heavy pages. Fixing some other bugs
would help too.

BAD BEHAVIOUR #1

If you open a new page in the current tab, all images are decoded
immediately, in parallel.

This is a Snappy problem because visible images aren't given any priority,
which can cause flicker/delay.
https://bugzilla.mozilla.org/show_bug.cgi?id=715308#c37 is about this, it's
an unprioritized Snappy bug, and is assigned to jlebar.

(The patch in that bug also attempts to prevent decoding of lower priority
images from interfering with other browser operations, but it isn't working
as expected. https://bugzilla.mozilla.org/show_bug.cgi?id=716103 has
details).

(Also, our JPEG decoding appears to be very slow:
https://bugzilla.mozilla.org/show_bug.cgi?id=715919.)

This behaviour is also a MemShrink problem because on pages with many
images (including various kinds of image slideshow) it causes excessive
memory consumption. To fix that we should not decode all the images
immediately, just the ones that are "visible or almost visible" (more about
that below). https://bugzilla.mozilla.org/show_bug.cgi?id=542158 is open
for this, it's an unprioritized, unassigned MemShrink and Snappy bug.

(Actually, if bug 542158 is fixed, that'd reduce the importance of bug
715308 because we'd be decoding many fewer images at a time.)

BAD BEHAVIOUR #2

None of the decoded images in the foreground tab are ever discarded.

This behaviour also a MemShrink problem, for the same reason. We should
allow decoded images to be discarded if they are no longer "visible or
almost visible". https://bugzilla.mozilla.org/show_bug.cgi?id=661304 is
open for this, it's a MemShrink:P1 and is assigned to jrmuizel.

"VISIBLE OR ALMOST VISIBLE"

Both these fixes depends on layout helping imagelib with the "visible or
almost visible" heuristic.
https://bugzilla.mozilla.org/show_bug.cgi?id=689623 is about this, it's a
MemShrink:P1 and is assigned to tnikkel. The heuristic may also have a
timing component (e.g. how long has it been since this image was "visible or
almost visible"). This heuristic will need careful tuning.
https://bugzilla.mozilla.org/show_bug.cgi?id=661304 has lots of discussion
about this heuristic.

(There's also https://bugzilla.mozilla.org/show_bug.cgi?id=683290, which is
about discarding images in the foreground tab that are no longer in the DOM.
It's a MemShrink:P2 bug assigned to khuey and has patches, but is stalled on
Android problems. It sounds like bug 661304 subsumes this bug.)

(And there's https://bugzilla.mozilla.org/show_bug.cgi?id=683286, which is
more fodder for the "visible or almost visible" heuristic.)

(And there's https://bugzilla.mozilla.org/show_bug.cgi?id=691169, which
indicates that on Fennec, images in the foreground tab can be discarded?)

OTHER BEHAVIOURS

If you open a new page in a background tab, no images are decoded. This is
good. If you switch to that tab, all images are decoded immediately.

The decoded images in a tab you switch away from are discarded on a timer
(image.mem.min_discard_timeout_ms, which default to 10000, which means they
are discarded after 10 to 20 seconds). There has been some quibbling about
whether this is a good default... it depends greatly on how much RAM you
have, but compared to the above problems I think this is very minor.

TRACKING

We have https://bugzilla.mozilla.org/show_bug.cgi?id=683284 open, which is a
meta-bug for tracking all the ways we handle images poorly. It tracks all
the bugs I've mentioned, plus some specific examples of pages where we do
badly compared to other browsers.

Mike Hommey

unread,

Jan 13, 2012, 1:35:38 AM1/13/12

to Nicholas Nethercote, dev-platform

On Thu, Jan 12, 2012 at 04:09:48PM -0800, Nicholas Nethercote wrote:
> OTHER BEHAVIOURS
>
> If you open a new page in a background tab, no images are decoded. This is
> good. If you switch to that tab, all images are decoded immediately.
>
> The decoded images in a tab you switch away from are discarded on a timer
> (image.mem.min_discard_timeout_ms, which default to 10000, which means they
> are discarded after 10 to 20 seconds). There has been some quibbling about
> whether this is a good default... it depends greatly on how much RAM you
> have, but compared to the above problems I think this is very minor.

Note that has been a known issue with e.g. gmail background images.
Switching to a gmail tab has not been a great experience since decoded
image discard landed. But maybe all that is required here is faster jpeg
decoding...

Mike

Nicholas Nethercote

unread,

Jan 13, 2012, 4:10:00 AM1/13/12

to Mike Hommey, dev-platform

On Fri, Jan 13, 2012 at 5:35 PM, Mike Hommey <m...@glandium.org> wrote:
>>
>> The decoded images in a tab you switch away from are discarded on a timer
>> (image.mem.min_discard_timeout_ms, which default to 10000, which means they
>> are discarded after 10 to 20 seconds). There has been some quibbling about
>> whether this is a good default... it depends greatly on how much RAM you
>> have, but compared to the above problems I think this is very minor.
>
> Note that has been a known issue with e.g. gmail background images.
> Switching to a gmail tab has not been a great experience since decoded
> image discard landed. But maybe all that is required here is faster jpeg
> decoding...

Is there a bug filed on this?

No matter what the heuristic is, at some point decoded images in
background tabs will be discarded -- we obviously can't keep all
decoded images in all background tabs at all times. So there will
always be some cases where switching back to the gmail tab requires
decoding. So the faster jpeg decoding sounds like the right solution
for this.

Nick

James May

unread,

Jan 13, 2012, 4:36:02 AM1/13/12

to Nicholas Nethercote, Mike Hommey, dev-platform

Maybe some sort of (subtle*!) tab switch animation could be used to lessen
the visual impact of images 'popping' in. Maybe the
average/dominate/whatever color could be determined while the image is
decoded the first time and then shown while the image is decoding
subsequently.

Secondly, and I think there's already a bug on this, is getting the
redecode heuristic right. Whether it's just making sure that very small
images are done first, or last, or that are they are in the middle of the
viewport, or more outlandish like some sort of trajectory analysis is done
to figure out what tab I'm about to switch to! (which would be cool in it's
own right, if it worked).

And one last thing is that my main desktop computer has 8GB of RAM and a
SSD, on my netbook and phone I'm constantly surprised that anything works^,
but on my "big" computer, I think I can dedicate a few GB to teh funneh
katz.

Well that's my UX-kinda PoV anyway.

-- James

* On the other hand- gratuitous animations have been used to pave over
performance problems and hardware limitations, quite successfully in the
past. Think OSX 'genie'.

^ And lot's of tabs certainly has improved a lot in the last 12 months!

_______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Mike Hommey

unread,

Jan 13, 2012, 4:44:21 AM1/13/12

to James May, dev-platform, Nicholas Nethercote

On Fri, Jan 13, 2012 at 08:36:02PM +1100, James May wrote:
> Maybe some sort of (subtle*!) tab switch animation could be used to lessen
> the visual impact of images 'popping' in. Maybe the
> average/dominate/whatever color could be determined while the image is
> decoded the first time and then shown while the image is decoding
> subsequently.

Or some kind of tile of dominent colours. And maybe avoid discarding
images needed for display (no need to keep those that aren't) in
app/pinned tabs, or based on tab view frequency.

Image discarding could also be less aggressive when the system has
plenty of free memory.

Mike

Neil

unread,

Jan 13, 2012, 5:30:35 AM1/13/12

to

Nicholas Nethercote wrote:

>None of the decoded images in the foreground tab are ever discarded.
>
>

I often visit a site that is famous for its captioned pictures of cats,
by default showing a number of such images on one page. Does that mean
that as I scroll down I should expect to have to wait for these images
to decode?

--
Warning: May contain traces of nuts.

Erunno

unread,

Jan 13, 2012, 5:46:56 AM1/13/12

to

On Jan 13, 11:30 am, Neil <n...@parkwaycc.co.uk> wrote:
> Nicholas Nethercote wrote:
> >None of the decoded images in the foreground tab are ever discarded.
>
> I often visit a site that is famous for its captioned pictures of cats,
> by default showing a number of such images on one page. Does that mean
> that as I scroll down I should expect to have to wait for these images
> to decode?

Not in my experience. AFAIK every other major browser on the market
decodes images on the fly as the user scrolls up or down a page and
I've yet to see any visual or UI jank caused by this (i.e. it's proven
possible by example to implement this efficiently). With Chrome I can
scroll through a long page with lots of large images quickly without
the memory used by the tab ever exceeding 200 MB while Firefox will
hit 1 GB memory usage or more easily. With Bad Behavior #1 fixed
discarding images in background tabs aggressively should become less
of an issue as only the visible ones will have to be decoded again
upon return.

Kyle Huey

unread,

Jan 13, 2012, 5:49:16 AM1/13/12

to Neil, dev-pl...@lists.mozilla.org

On Fri, Jan 13, 2012 at 11:30 AM, Neil <ne...@parkwaycc.co.uk> wrote:

> Nicholas Nethercote wrote:
>
> None of the decoded images in the foreground tab are ever discarded.
>>
>>
> I often visit a site that is famous for its captioned pictures of cats, by
> default showing a number of such images on one page. Does that mean that as
> I scroll down I should expect to have to wait for these images to decode?
>

IMHO, the ideal behavior would be that we would discard all but the decoded
images in the viewport and within N pixels/pages/whatever of the viewport
for some reasonable N. When you're scrolling we would decode "ahead" in
the direction you're scrolling as well. On sufficiently powerful hardware
(which should be fairly modest with off-main-thread image decoding) we
should have a high enough decoding throughput to handle continuous
scrolling.

- Kyle

fantasai

unread,

Jan 13, 2012, 5:29:30 PM1/13/12

to

Would it make sense to incorporate a timing function in the discard criteria,
to keep the decoded images in memory if someone is doing a lot of back and
forth?

~fantasai

Ehsan Akhgari

unread,

Jan 13, 2012, 6:43:06 PM1/13/12

to fantasai, dev-pl...@lists.mozilla.org

I think we should first get the ability to discard images which are not in
and close to the viewport, and then re-evaluate this problem. If it is
simply fixed by that, then great! If not, we can start to look into
heuristics like you suggest to get a better behavior for use cases which
suffer from decoding speed issues.

--
Ehsan
<http://ehsanakhgari.org/>

Jonas Sicking

unread,

Jan 13, 2012, 8:34:16 PM1/13/12

to Nicholas Nethercote, dev-platform

On Thu, Jan 12, 2012 at 4:09 PM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
> (Also, our JPEG decoding appears to be very slow:
> https://bugzilla.mozilla.org/show_bug.cgi?id=715919.)

This might be too much of a crazy idea, but could we use the GPU to
decode images? The decoded image data is going into video memory
anyway.

/ Jonas

Benoit Jacob

unread,

Jan 13, 2012, 10:15:47 PM1/13/12

to Jonas Sicking, dev-platform, Nicholas Nethercote

2012/1/13 Jonas Sicking <jo...@sicking.cc>:

It seems possible, but very researchy. The way that compressed image
formats work does lend itself somewhat to that (it's a lot about
fourier transforms or variants) but we shouldn't invest in the
researchy work to figure if that can concretely bring benefits. If you
know of a paper on this subject claiming to have already answered
these questions, it's worth reading.

Cheers,
Benoit

Jeff Muizelaar

unread,

Jan 13, 2012, 10:42:37 PM1/13/12

to Jonas Sicking, dev-platform, Nicholas Nethercote

On 2012-01-13, at 8:34 PM, Jonas Sicking wrote:

> On Thu, Jan 12, 2012 at 4:09 PM, Nicholas Nethercote
> <n.neth...@gmail.com> wrote:
>> (Also, our JPEG decoding appears to be very slow:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=715919.)
>
> This might be too much of a crazy idea, but could we use the GPU to
> decode images? The decoded image data is going into video memory
> anyway.

The entropy decoding stage is very sequential and doesn't map that well to GPUs.
It would work, but probably be significantly slower than on the CPU.

FWIW, I expect that JPEG decoding speed is rarely the problem here. It's more likely
to be a combination of poor decoding prioritization, color management, and too much painting
among other things.

-Jeff

Henri Sivonen

unread,

Jan 14, 2012, 5:04:03 AM1/14/12

to dev-platform

Would it make sense to do the YUV to RGB conversion for JPEGs on the GPU,
though? IIRC, a couple of years ago (before SIMD acceleration), JPEG YUV
conversion was on the top of tp4 Shark profiles.

Robert O'Callahan

unread,

Jan 14, 2012, 7:44:16 PM1/14/12

to Henri Sivonen, dev-platform

We've talked about that lots of times. It's certainly possible for large
images. We've just never had a focused benchmark that showed it was worth
doing.

I think that's what we really need here: a repeatable benchmark for
scrolling with large images, that we can profile and optimize for.

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not
in us. If we confess our sins, he is faithful and just and will forgive us
our sins and purify us from all unrighteousness. If we claim we have not
sinned, we make him out to be a liar and his word is not in us." [1 John
1:8-10]

Kyle Huey

unread,

Jan 15, 2012, 4:46:37 PM1/15/12

to fantasai, dev-pl...@lists.mozilla.org

On Fri, Jan 13, 2012 at 11:29 PM, fantasai <fantasa...@inkedblade.net>wrote:

> Would it make sense to incorporate a timing function in the discard
> criteria,
> to keep the decoded images in memory if someone is doing a lot of back and
> forth?
>
>

Yes, and we do that today. Images not in the foreground tab are discarded
after n in [N,2N] milliseconds where N is the value of
image.mem.min_discard_timeout_ms. There might be room to optimize better
than having one global timer, but we're in such a bad place right now that
that kind of polish isn't really worth it at the moment.

- Kyle

Nicholas Nethercote

unread,

Jan 16, 2012, 12:05:19 AM1/16/12

to dev-platform

On Fri, Jan 13, 2012 at 11:09 AM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
>
> (And there's https://bugzilla.mozilla.org/show_bug.cgi?id=691169, which

> indicates that on Fennec, images in the foreground tab can be discarded?)

https://bugzilla.mozilla.org/show_bug.cgi?id=622470 seems to be the
bug that implemented this. Looks like Fennec has some additional
stuff to enable this that desktop Firefox lacks, but there's a problem
in that images that are currently visible can have their decoded data
discarded, which is obviously bad. Still, perhaps that would be a
good starting point for all this desktop foreground tab stuff.

Nick

Justin Lebar

unread,

Jan 16, 2012, 10:15:56 PM1/16/12

to dev-platform

I started an accidental off-list thread with khuey. Here's his response to
my message (quoted).
---------- Forwarded message ----------
From: "Kyle Huey" <m...@kylehuey.com>
Date: Jan 15, 2012 3:43 PM
Subject: Re: Image-suck summary
To: "Justin Lebar" <justin...@gmail.com>

Did you mean to send this just to me (and not the list)?

On Sat, Jan 14, 2012 at 11:53 PM, Justin Lebar <justin...@gmail.com>wrote:

> > IMHO, the ideal behavior would be that we would discard all but the
> decoded

> > images in the viewport and within N pixels/pages/whatever of the viewport
> > for some reasonable N. When you're scrolling we would decode "ahead" in
> > the direction you're scrolling as well. On sufficiently powerful
> hardware
> > (which should be fairly modest with off-main-thread image decoding) we
> > should have a high enough decoding throughput to handle continuous
> > scrolling.
>

> I'm not sure that this is true, at least for many cases. If you grab
> the scrollbar and drag down, we're going to have to decode *very*
> fast. Maybe if [1] lets us get a 10x speedup in jpeg decoding, but I
> suspect I'm doing something wrong there and we don't actually have
> that large of a win lying around.
>

Yeah, but I don't think that's the common case.

>
> > Maybe some sort of (subtle*!) tab switch animation could be used to
> lessen
> > the visual impact of images 'popping' in.
>

> Maybe keeping a (compressed) screenshot of the tab would work. We
> already are planning to do this for two-fingered tab switching on mac
> [2] for windows in the history. If we had that information from
> layout about which images are onscreen, we could even do a heuristic
> like: Take the screenshot only if decoding all the onscreen compressed
> images would take more than Xms.
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=715919
> [2] https://bugzilla.mozilla.org/show_bug.cgi?id=668953
>

Perhaps, but this is something to consider when we're in a far better place
than we are now.

- Kyle

Joe Drew

unread,

Jan 16, 2012, 10:56:49 PM1/16/12

to dev-pl...@lists.mozilla.org

On 2012-01-13 5:29 PM, fantasai wrote:
> Would it make sense to incorporate a timing function in the discard
> criteria,
> to keep the decoded images in memory if someone is doing a lot of back and
> forth?

Images would be "locked" (meaning not discardable) if they are in the
"viewport" (however we define that). After being unlocked, they would be
treated in the same way as any other image; that is, after a given
timeout, the decoded image would be discarded.

So, in short: the only way I would r+ this would implement your request!

joe

Joe Drew

unread,

Jan 16, 2012, 11:00:04 PM1/16/12

to dev-pl...@lists.mozilla.org

On 2012-01-14 7:44 PM, Robert O'Callahan wrote:

> On Sat, Jan 14, 2012 at 11:04 PM, Henri Sivonen<hsiv...@iki.fi> wrote:
>
>> Would it make sense to do the YUV to RGB conversion for JPEGs on the GPU,
>> though? IIRC, a couple of years ago (before SIMD acceleration), JPEG YUV
>> conversion was on the top of tp4 Shark profiles.
>>
>
> We've talked about that lots of times. It's certainly possible for large
> images. We've just never had a focused benchmark that showed it was worth
> doing.

It's also not a slam dunk, in that we'd either have to have colour
management done on the GPU too, or do a YUV->RGB, correct on the CPU,
then do RGB->YUV and upload to the GPU.

joe

Joe Drew

unread,

Jan 16, 2012, 11:02:42 PM1/16/12

to dev-pl...@lists.mozilla.org

On 2012-01-13 5:30 AM, Neil wrote:
> I often visit a site that is famous for its captioned pictures of cats,
> by default showing a number of such images on one page. Does that mean
> that as I scroll down I should expect to have to wait for these images
> to decode?

The images will have to decode, and that will take some amount of time,
but if a) the site inserts them into the DOM early enough and b) we
start the decode once that image is "near" visible, you shouldn't notice it.

joe

db_cooper

unread,

Jan 16, 2012, 11:25:04 PM1/16/12

to dev-pl...@lists.mozilla.org

I think that MPC-HC does color management on the GPU using Little CMS (LCMS).

Robert Kaiser

unread,

Jan 17, 2012, 9:42:56 AM1/17/12

to

Joe Drew schrieb:

> The images will have to decode, and that will take some amount of time,
> but if a) the site inserts them into the DOM early enough and b) we
> start the decode once that image is "near" visible, you shouldn't notice
> it.

Also, if we had them decoded before, we could remember their size inside
the document (if it's intrinsic [I think that's the term], i.e. defined
by the image itself and not CSS/attribute constraints, which give us the
end-result size right away anyhow) and therefore have the rest of the
document laid out correctly and only the image itself being
rendered/displayed lazily, right?

Robert Kaiser

db_cooper

unread,

Jan 16, 2012, 11:25:04 PM1/16/12

to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org

robert...@gmail.com

unread,

Jan 17, 2012, 12:54:29 AM1/17/12

to mozilla.de...@googlegroups.com, dev-platform

Thanks Nicholas. Your detailed and comprehensive summary is understandable to such an end user like me :)

fantasai

unread,

Jan 17, 2012, 3:35:46 PM1/17/12

to

Oh, we should *definitely* not be waiting for the user to scroll before
getting whatever info we need to lay out the rest of the page. Nevermind
having to reflow -- the scroll position won't be accurate!

~fantasai

Joe Drew

unread,

Jan 17, 2012, 3:54:08 PM1/17/12

to dev-pl...@lists.mozilla.org

On 2012-01-17 9:42 AM, Robert Kaiser wrote:
> Also, if we had them decoded before, we could remember their size inside
> the document (if it's intrinsic [I think that's the term], i.e. defined
> by the image itself and not CSS/attribute constraints, which give us the
> end-result size right away anyhow) and therefore have the rest of the
> document laid out correctly and only the image itself being
> rendered/displayed lazily, right?

Not only *could* we do this, we *do* do this. From the very beginning of
"decode-on-draw", we have always extracted the size of an image first,
even if we don't then proceed to decode the actual image data.

You can see this in action now by loading large images in a background
tab, then bringing it to the foreground. The images will redecode, but
the page content won't shift around.

joe

Robert Kaiser

unread,

Jan 18, 2012, 11:51:38 AM1/18/12

to

Joe Drew schrieb:

> Not only *could* we do this, we *do* do this. From the very beginning of
> "decode-on-draw", we have always extracted the size of an image first,
> even if we don't then proceed to decode the actual image data.

Cool, it hadn't been mentioned, I guess after reading this that this was
just because it was taken for granted all along. ;-)

Robert Kaiser

Bob Clary

unread,

Feb 3, 2012, 10:54:37 AM2/3/12

to

On 1/12/12 4:09 PM, Nicholas Nethercote wrote:
> Greetings,
>
> Images are common on the web, and image decoding is currently causing us a
> lot of memory and performance problems -- we tend to do much worse than
> other browsers on image-heavy pages. The details are spread across a range
> of bugs, I wrote a summary to get the situation clear in my own head and
> once I finished it I thought it was worth sharing. I've probably got some
> details wrong, please correct me if I have.
>

One of the most common types of crashes I can reproduce in automation
relate to pages where many images appear on the page.

Due to memory exhaustion these crashes can appear in socorro as an empty
or corrupt dump. Currently the top crasher for Firefox 10 with 8.4% of
all crashes are empty or corrupt dumps.

Many times on Linux, X will consume so much memory that the OOM Killer
will kill X and log the user out.

Many other times, fatal aborts, assertions or crashes will occur in
random locations in the code due to inadequate handling of low memory
conditions. Attempting to reproduce the specific crash is difficult due
to the common case where you will hit an oom abort without reproducing
the original issue you were trying to investigate.

In my opinion, fixing the image-suck bugs would be *a huge win* for
stability.

/bc

Justin Lebar

unread,

Feb 3, 2012, 11:24:26 AM2/3/12

to Bob Clary, dev-pl...@lists.mozilla.org

> Due to memory exhaustion these crashes can appear in socorro as an empty or corrupt dump. Currently the top crasher for
> Firefox 10 with 8.4% of all crashes are empty or corrupt dumps.

Really?

If infallible malloc fails, we definitely get a useful crash report,
at least some of the time. I'd expect to see an eventual null-pointer
exception or something when fallible malloc fails.

But maybe these aren't true OOMs (malloc cannot allocate) but rather
the OS killing us upon its running out of physical memory + swap.

It would be really good if we could confirm that these 8.4% of all
crashes are due to low-memory situations, or even get stacks for them,
so we know how many are related to images. Is there a bug filed on
improving the crash reporter?

-Justin

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Benjamin Smedberg

unread,

Feb 3, 2012, 11:37:44 AM2/3/12

to Justin Lebar, dev-pl...@lists.mozilla.org, Bob Clary

On 2/3/2012 11:24 AM, Justin Lebar wrote:
>> Due to memory exhaustion these crashes can appear in socorro as an empty or corrupt dump. Currently the top crasher for
>> Firefox 10 with 8.4% of all crashes are empty or corrupt dumps.

> Really?
>
> If infallible malloc fails, we definitely get a useful crash report,
> at least some of the time. I'd expect to see an eventual null-pointer
> exception or something when fallible malloc fails.

The problem is that the Windows crash reporter calls into the API
"MinidumpWriteDump" which internally has to allocate memory. Whether or
not the crash is a null-pointer deref or some other exception, if we are
truly out of physical memory or out of address space, MinidumpWriteDump
fails to write a usable minidump and all we get is an empty report.

We don't *know* that this is the case; there could well be other reasons
for MinidumpWriteDump to fail.

>
> But maybe these aren't true OOMs (malloc cannot allocate) but rather
> the OS killing us upon its running out of physical memory + swap.

No, they are definitely crashes which trigger the exception handler.

>
> It would be really good if we could confirm that these 8.4% of all
> crashes are due to low-memory situations, or even get stacks for them,
> so we know how many are related to images. Is there a bug filed on
> improving the crash reporter?

The only good way of improving the process would be to have a separate
process (the crash reporter client, perhaps) call MinidumpWriteDump
instead of doing it within the Firefox process. This is significant
engineering work, and not something that we currently have prioritized.
I'm pretty sure there is a bug filed, but I don't know the number.

--BDS

Ted Mielczarek

unread,

Feb 3, 2012, 11:52:12 AM2/3/12

to Benjamin Smedberg, dev-pl...@lists.mozilla.org, Justin Lebar, Bob Clary

On Fri, Feb 3, 2012 at 11:37 AM, Benjamin Smedberg
<benj...@smedbergs.us> wrote:
> The only good way of improving the process would be to have a separate
> process (the crash reporter client, perhaps) call MinidumpWriteDump instead
> of doing it within the Firefox process. This is significant engineering
> work, and not something that we currently have prioritized. I'm pretty sure
> there is a bug filed, but I don't know the number.

bug 587729 is filed on this. I did some experimental work on a
possible solution[1], but it's fairly invasive and would need more
time spent. I'm also not confident that it would entirely fix the
problem, since it still requires us to do a fair amount of work in the
crashed process.

-Ted

1. http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-oop-test

Justin Lebar

unread,

Feb 3, 2012, 1:07:20 PM2/3/12

to Benjamin Smedberg, dev-pl...@lists.mozilla.org, Bob Clary

> The problem is that the Windows crash reporter calls into the API
> "MinidumpWriteDump" which internally has to allocate memory. Whether or not
> the crash is a null-pointer deref or some other exception, if we are truly
> out of physical memory or out of address space, MinidumpWriteDump fails to
> write a usable minidump and all we get is an empty report.

I don't think it would be a lot of work to add an API to jemalloc
which says "I need you to free Xmb of memory for me so I can make a
crash report."

Might that let us work around this limitation? If freeing random
blocks from the heap would mess up some part of our crash dump, we
could do so only in the case when we find that the crash reporter is
unable to allocate any memory...

Randell Jesup

unread,

Feb 3, 2012, 5:05:11 PM2/3/12

to

Alternatively we could keep a small "ripcord" of pre-allocated memory to
add to the pool before calling MinidumpWriteDump. Yes, I realize this
is unavailable and means we'll hit out-of-address-space slightly earlier
(very slightly), but it shouldn't impact normal operation as it would be
unused-but-allocated pages. If this significantly improved our
reporting and produced useful data about why we run out of memory or why
we crash in that case, then it may be worthwhile.

Or take the largest chunk allocated, steal it and mark it free, and let
the crash reporter use it. We just need to make sure others don't
modify it (or read it) while calling the dump for security reasons, and
I suspect at this point normal code isn't running anymore, so it's not
an issue. (But I don't know when breakpad gets called or how.)

--
Randell Jesup, Mozilla Corp
remove ".news" for personal email