Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Status of XPCOM cycle collector

7 views
Skip to first unread message

Graydon Hoare

unread,
Oct 25, 2006, 10:18:18 PM10/25/06
to
Hi,

I missed the Gran Paradiso meeting today but promised to provide some
details about the state of the XPCOM cycle collector I've been working on.

The short summary is: it seems pretty stable, but it's not completely
clear to me if it's worth landing.

The patch can be found at

https://bugzilla.mozilla.org/show_bug.cgi?id=333078

And it's grown to a healthy 200kb over the past few months.
Functionally, it consists of three parts:

- Backing-out dbaron's earlier DOMGC work
- A general cross-language cycle collection system
- Code to hook a few dozen central classes in content/, dom/ and js/
into the cycle collector

It seems to have "reasonable" runtime behavior: it runs without crashing
for as long as I care to run it, and it doesn't eat memory *rapidly*,
and reports collecting tens of thousands of cycles as it runs (mostly
those DOMGC was catching already). It does slowly grow as it runs
(slower and slower as time passes), but my eyeball tests show it growing
at about the same rate as normal mozilla. The resident set is a bit
higher than pre-patch, but not a lot, perhaps 2 or 3mb. This is a
reasonable number given the delayed collection of cycles: they are
"aged" in a buffer in order to give the suspicious pointers a chance to
remove themselves from suspicion and save a full scan through them.
There are also a few new auxiliary structures that are likely to cost
some bytes.

I was not really expecting this patch to make memory use go *down*
however, so I'm pretty happy with it reaching its stated intentions. The
interface is simple and uniform, and permits us to convert pretty much
any weak pointer to an nsCOMPtr<> with a modest amount of work: figure
out which classes are participating in any newly-created cycle, and make
sure they're all participants in cycle collection.

Whether that infrastructural improvement is worth the disruption of
landing the patch, and the small memory hit, and the risk of a larger
memory hit or critical bug that I haven't discovered yet, I cannot say.
It's up to the group to decide.

-graydon

Jonas Sicking

unread,
Oct 25, 2006, 11:31:18 PM10/25/06
to
> It seems to have "reasonable" runtime behavior: it runs without crashing
> for as long as I care to run it, and it doesn't eat memory *rapidly*,
> and reports collecting tens of thousands of cycles as it runs (mostly
> those DOMGC was catching already).

Mostly out of curiosity, do you know if it collects any cycles that the
old code didn't? I.e. are you already fixing leaks? It'd be interesting
to know which since that could point to other bugs. For example it
should never ever be possible to create a cycle of elements.

> It does slowly grow as it runs
> (slower and slower as time passes), but my eyeball tests show it growing
> at about the same rate as normal mozilla.

You mean that we do still leak over time? I think that's to be expected
:). Will these leaks cause the collector to slow down? I.e. will it
spend time trying again and again to collect objects that it has no
chance of collecting?

> I was not really expecting this patch to make memory use go *down*
> however, so I'm pretty happy with it reaching its stated intentions. The
> interface is simple and uniform, and permits us to convert pretty much
> any weak pointer to an nsCOMPtr<> with a modest amount of work: figure
> out which classes are participating in any newly-created cycle, and make
> sure they're all participants in cycle collection.

This is the part that to me is *huge*. We currently have so much code
that tries to break cycles and we've ended up with so many dangling
pointers because of that. If we can collect all that code in one place
and get a single workable solution rather than using ad-hoc solutions
all over the place I think we've won a lot.

> Whether that infrastructural improvement is worth the disruption of
> landing the patch, and the small memory hit, and the risk of a larger
> memory hit or critical bug that I haven't discovered yet, I cannot say.
> It's up to the group to decide.

I think the biggest question is if we should land this before or after
alpha. The only reason I can think of not to land this for 1.9 is if we
have something else better in the works.

/ Jonas

Graydon Hoare

unread,
Oct 26, 2006, 1:34:46 AM10/26/06
to
Jonas Sicking wrote:

> Mostly out of curiosity, do you know if it collects any cycles that the
> old code didn't? I.e. are you already fixing leaks? It'd be interesting
> to know which since that could point to other bugs. For example it
> should never ever be possible to create a cycle of elements.

I do not know that. It can print a log of the cycles it's collecting,
but it's a pretty verbose log and I haven't classified the results. It
can also graph the cycles as it's running if you have graphviz
installed, but again, very verbose. Big graphs. I haven't seen any
cycles that stood out as unusual-looking.

The old code didn't produce a list of cycles at all; it didn't identify
them, it just tried to keep them broken, through a variety of methods.

> You mean that we do still leak over time? I think that's to be expected
> :). Will these leaks cause the collector to slow down? I.e. will it
> spend time trying again and again to collect objects that it has no
> chance of collecting?

No. Each potential garbage-cycle-member pointer registers itself as
suspicious, then ages for a while. If after the aging period it's still
suspicious, it gets one full scan, then it's dropped. The scan may
identify it as a garbage cycle member, or may not; either way it won't
be checked again until (if ever) it crosses the "suspicious" edge again
(refcnt N+1 -> N for nonzero N) and re-registers itself.

> This is the part that to me is *huge*. We currently have so much code
> that tries to break cycles and we've ended up with so many dangling
> pointers because of that. If we can collect all that code in one place
> and get a single workable solution rather than using ad-hoc solutions
> all over the place I think we've won a lot.

Yeah. I hope so. It's hard to tell. The collector is still doing some
pretty delicate stuff; it has to be quite correct, and its clients have
to not lie to it or tell it bad information, otherwise it'll crash just
as hard as any memory error.

> I think the biggest question is if we should land this before or after
> alpha. The only reason I can think of not to land this for 1.9 is if we
> have something else better in the works.

Ok. The biggest question I have is "has anyone else tried it?"

I'm really not at all comfortable with my ability to eyeball regressions
on this product. I have every reason to believe I broke something that I
didn't notice.

-graydon

Mike Shaver

unread,
Oct 26, 2006, 10:01:47 AM10/26/06
to Graydon Hoare, dev-pl...@lists.mozilla.org
On 10/26/06, Graydon Hoare <gra...@mozilla.com> wrote:
> Ok. The biggest question I have is "has anyone else tried it?"
>
> I'm really not at all comfortable with my ability to eyeball regressions
> on this product. I have every reason to believe I broke something that I
> didn't notice.

Should we spin up some experimental builds for staging, and see what
people find? We could point some extension authors at them too, since
they tend to find all sorts of interesting cycle and retention
behaviour, if history is a guide...

Mike

Vladimir Vukicevic

unread,
Oct 26, 2006, 11:15:15 AM10/26/06
to Mike Shaver, Graydon Hoare, dev-pl...@lists.mozilla.org

Yeah, this sounds like a great idea -- even better would be a tinderbox
with the patch, so that we can hook it into the test tinderbox mechanism
and get some perf and leak test results with the patch to get an idea of
its impact.

- Vlad

Frank Wein

unread,
Oct 26, 2006, 12:42:21 PM10/26/06
to
Graydon Hoare wrote:
> Hi,
[...]

> It seems to have "reasonable" runtime behavior: it runs without crashing
> for as long as I care to run it, and it doesn't eat memory *rapidly*,
> and reports collecting tens of thousands of cycles as it runs (mostly
> those DOMGC was catching already). It does slowly grow as it runs
> (slower and slower as time passes), but my eyeball tests show it growing
> at about the same rate as normal mozilla. The resident set is a bit
> higher than pre-patch, but not a lot, perhaps 2 or 3mb. This is a
> reasonable number given the delayed collection of cycles: they are
> "aged" in a buffer in order to give the suspicious pointers a chance to
> remove themselves from suspicion and save a full scan through them.
> There are also a few new auxiliary structures that are likely to cost
> some bytes.
[...]

I don't know how far app-specific code needs adjusting to this new code
(I did not see any code changes in toolkit/ or browser/ in this patch),
but with a SeaMonkey trunk build I crash somewhat fast. The first
problem was/is that url bar autocomplete seems to be broken with this
patch (at least it worked fine before...), console says:
JavaScript error: , line 0: uncaught exception: [Exception... "Cannot
modify properties of a WrappedNative" nsresult: "0x80570034
(NS_ERROR_XPC_CANT_MODIFY_PROP_ON_WN)" location: "JS frame ::
chrome://global/content/autocomplete.xml :: :: line 45" data: no]
(http://lxr.mozilla.org/mozilla/source/xpfe/components/autocomplete/resources/content/autocomplete.xml#45).
The second problem (a crash) occoured shortly after clicking a link on a
website:
###!!! ASSERTION: nsBaseHashtable was not initialized properly.:
'this->mTable.entrySize', file
h:\mozilla\tree-main\mozilla\seamonkey-build-debug\dist\include\
xpcom\nsBaseHashtable.h, line 183

0:000> kp
ChildEBP RetAddr
0012fbec 02146da4 xpcom_core!PL_DHashTableEnumerate(struct PLDHashTable
* table = 0x0339e874, <function> * etor = 0x021476e0, void * arg =
0x0012fc04)+0x2b
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\xpcom\build\pldhash.c
@ 672]
0012fc0c 02143f50
gklayout!nsBaseHashtable<nsISupportsHashKey,nsRefPtr<nsXBLBinding>,nsXBLBinding
*>::EnumerateRead(<function> * enumFunc = 0x02143ff0, void * userArg =
0x0012fc84)+0x54
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\dist\include\xpcom\nsbasehashtable.h
@ 188]
0012fc20 1009421a
gklayout!nsBindingManager::cycleCollection::Traverse(class nsISupports *
s = 0x0339e860, struct nsCycleCollectionTraversalCallback * cb =
0x0012fc84)+0x40
[h:\mozilla\tree-main\mozilla\content\xbl\src\nsbindingmanager.cpp @ 359]
0012fc5c 100933c9
xpcom_core!nsCycleCollectionXPCOMRuntime::Traverse(void * p =
0x0339e860, struct nsCycleCollectionTraversalCallback * cb =
0x0012fc84)+0x8a
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 832]
0012fc74 10093520 xpcom_core!GraphWalker::Walk(void * s0 =
0x0339e860)+0x109
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 531]
0012fd00 1009510f xpcom_core!nsCycleCollector::MarkRoots(void)+0x80
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 611]
0012fd18 10095578 xpcom_core!nsCycleCollector::Collect(void)+0x17f
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 1554]
0012fd20 1007e71f xpcom_core!nsCycleCollector_collect(void)+0x18
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 1630]
0012fd48 10084cea xpcom_core!nsEventQueue::GetEvent(int mayWait = 0,
class nsIRunnable ** result = 0x00000000)+0x11f
[h:\mozilla\tree-main\mozilla\xpcom\threads\nseventqueue.cpp @ 100]
0012fd5c 100857e6 xpcom_core!nsThread::nsChainedEventQueue::GetEvent(int
mayWait = 0, class nsIRunnable ** event = 0x00000000)+0x1a
[h:\mozilla\tree-main\mozilla\xpcom\threads\nsthread.h @ 110]
0012fd6c 10015782 xpcom_core!nsThread::HasPendingEvents(int * result =
0x0012fd80)+0x46
[h:\mozilla\tree-main\mozilla\xpcom\threads\nsthread.cpp @ 456]
0012fd84 029f6d86 xpcom_core!NS_HasPendingEvents_P(class nsIThread *
thread = 0x007f4cd0)+0x52
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\xpcom\build\nsthreadutils.cpp
@ 205]
0012fdac 100858ce gkwidget!nsBaseAppShell::OnProcessNextEvent(class
nsIThreadInternal * thr = 0x007f4cd0, int mayWait = 1, unsigned int
recursionDepth = 0)+0xa6
[h:\mozilla\tree-main\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp @ 224]
0012fdec 10015826 xpcom_core!nsThread::ProcessNextEvent(int mayWait = 1,
int * result = 0x0012fe04)+0xce
[h:\mozilla\tree-main\mozilla\xpcom\threads\nsthread.cpp @ 472]
0012fe08 029f6c1d xpcom_core!NS_ProcessNextEvent_P(class nsIThread *
thread = 0x007f4cd0, int mayWait = 1)+0x56
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\xpcom\build\nsthreadutils.cpp
@ 225]
0012fe1c 0297fd50 gkwidget!nsBaseAppShell::Run(void)+0x5d
[h:\mozilla\tree-main\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp @ 153]
0012fe2c 004052c4 appcomps!nsAppStartup::Run(void)+0x20
[h:\mozilla\tree-main\mozilla\xpfe\components\startup\src\nsappstartup.cpp
@ 219]
0012ff38 004045ef seamonkey!main1(int argc = 1, char ** argv =
0x007f2830, class nsISupports * nativeApp = 0x00ea23f0)+0xa64
[h:\mozilla\tree-main\mozilla\xpfe\bootstrap\nsapprunner.cpp @ 1239]
0012ff68 0041cb66 seamonkey!main(int argc = 1, char ** argv =
0x007f2830)+0x17f
[h:\mozilla\tree-main\mozilla\xpfe\bootstrap\nsapprunner.cpp @ 1741]
0012ffb8 0041c9bd seamonkey!__tmainCRTStartup(void)+0x1a6
[f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]


Frank

Graydon Hoare

unread,
Oct 31, 2006, 2:35:42 AM10/31/06
to Frank Wein
Frank Wein wrote:

> I don't know how far app-specific code needs adjusting to this new code
> (I did not see any code changes in toolkit/ or browser/ in this patch),
> but with a SeaMonkey trunk build I crash somewhat fast.

Yeah, it's the debug-build nature; I've been doing my work without
debug-mode turned on, just with '-g'; this has the advantage of being
closer to the program we're shipping, but the disadvantage of having a
bunch of useful assertions turned off. I'll try to remember to run tests
with it both on and off more frequently; you tripped a (valid) assertion
on a bad condition I recently introduced. I've just fixed the relevant
bits here and updated the bug, and confirmed that seamonkey debug works
now (for me).

> problem was/is that url bar autocomplete seems to be broken with this
> patch (at least it worked fine before...), console says:
> JavaScript error: , line 0: uncaught exception: [Exception... "Cannot
> modify properties of a WrappedNative" nsresult: "0x80570034
> (NS_ERROR_XPC_CANT_MODIFY_PROP_ON_WN)" location: "JS frame ::
> chrome://global/content/autocomplete.xml :: :: line 45" data: no]
> (http://lxr.mozilla.org/mozilla/source/xpfe/components/autocomplete/resources/content/autocomplete.xml#45).

That one is odd. I had not noticed it before, but you're right. I don't
know why that's happening; presumably I caused something to become a
WrappedNative that didn't used to be. I'll have to look into it further.

> The second problem (a crash) occoured shortly after clicking a link on a
> website:
> ###!!! ASSERTION: nsBaseHashtable was not initialized properly.:
> 'this->mTable.entrySize', file
> h:\mozilla\tree-main\mozilla\seamonkey-build-debug\dist\include\
> xpcom\nsBaseHashtable.h, line 183

This one is certainly my fault, as are two others I found by turning the
debug build on. Sorry. Thanks for testing it! Further feedback appreciated.

-graydon

0 new messages