I missed the Gran Paradiso meeting today but promised to provide some 
details about the state of the XPCOM cycle collector I've been working on.
The short summary is: it seems pretty stable, but it's not completely 
clear to me if it's worth landing.
The patch can be found at
https://bugzilla.mozilla.org/show_bug.cgi?id=333078
And it's grown to a healthy 200kb over the past few months. 
Functionally, it consists of three parts:
   - Backing-out dbaron's earlier DOMGC work
   - A general cross-language cycle collection system
   - Code to hook a few dozen central classes in content/, dom/ and js/
     into the cycle collector
It seems to have "reasonable" runtime behavior: it runs without crashing 
for as long as I care to run it, and it doesn't eat memory *rapidly*, 
and reports collecting tens of thousands of cycles as it runs (mostly 
those DOMGC was catching already). It does slowly grow as it runs 
(slower and slower as time passes), but my eyeball tests show it growing 
at about the same rate as normal mozilla. The resident set is a bit 
higher than pre-patch, but not a lot, perhaps 2 or 3mb. This is a 
reasonable number given the delayed collection of cycles: they are 
"aged" in a buffer in order to give the suspicious pointers a chance to 
remove themselves from suspicion and save a full scan through them. 
There are also a few new auxiliary structures that are likely to cost 
some bytes.
I was not really expecting this patch to make memory use go *down* 
however, so I'm pretty happy with it reaching its stated intentions. The 
interface is simple and uniform, and permits us to convert pretty much 
any weak pointer to an nsCOMPtr<> with a modest amount of work: figure 
out which classes are participating in any newly-created cycle, and make 
sure they're all participants in cycle collection.
Whether that infrastructural improvement is worth the disruption of 
landing the patch, and the small memory hit, and the risk of a larger 
memory hit or critical bug that I haven't discovered yet, I cannot say. 
It's up to the group to decide.
-graydon
Mostly out of curiosity, do you know if it collects any cycles that the 
old code didn't? I.e. are you already fixing leaks? It'd be interesting 
to know which since that could point to other bugs. For example it 
should never ever be possible to create a cycle of elements.
> It does slowly grow as it runs 
> (slower and slower as time passes), but my eyeball tests show it growing 
> at about the same rate as normal mozilla.
You mean that we do still leak over time? I think that's to be expected 
:). Will these leaks cause the collector to slow down? I.e. will it 
spend time trying again and again to collect objects that it has no 
chance of collecting?
> I was not really expecting this patch to make memory use go *down* 
> however, so I'm pretty happy with it reaching its stated intentions. The 
> interface is simple and uniform, and permits us to convert pretty much 
> any weak pointer to an nsCOMPtr<> with a modest amount of work: figure 
> out which classes are participating in any newly-created cycle, and make 
> sure they're all participants in cycle collection.
This is the part that to me is *huge*. We currently have so much code 
that tries to break cycles and we've ended up with so many dangling 
pointers because of that. If we can collect all that code in one place 
and get a single workable solution rather than using ad-hoc solutions 
all over the place I think we've won a lot.
> Whether that infrastructural improvement is worth the disruption of 
> landing the patch, and the small memory hit, and the risk of a larger 
> memory hit or critical bug that I haven't discovered yet, I cannot say. 
> It's up to the group to decide.
I think the biggest question is if we should land this before or after 
alpha. The only reason I can think of not to land this for 1.9 is if we 
have something else better in the works.
/ Jonas
> Mostly out of curiosity, do you know if it collects any cycles that the 
> old code didn't? I.e. are you already fixing leaks? It'd be interesting 
> to know which since that could point to other bugs. For example it 
> should never ever be possible to create a cycle of elements.
I do not know that. It can print a log of the cycles it's collecting, 
but it's a pretty verbose log and I haven't classified the results. It 
can also graph the cycles as it's running if you have graphviz 
installed, but again, very verbose. Big graphs. I haven't seen any 
cycles that stood out as unusual-looking.
The old code didn't produce a list of cycles at all; it didn't identify 
them, it just tried to keep them broken, through a variety of methods.
> You mean that we do still leak over time? I think that's to be expected 
> :). Will these leaks cause the collector to slow down? I.e. will it 
> spend time trying again and again to collect objects that it has no 
> chance of collecting?
No. Each potential garbage-cycle-member pointer registers itself as 
suspicious, then ages for a while. If after the aging period it's still 
suspicious, it gets one full scan, then it's dropped. The scan may 
identify it as a garbage cycle member, or may not; either way it won't 
be checked again until (if ever) it crosses the "suspicious" edge again 
(refcnt N+1 -> N for nonzero N) and re-registers itself.
> This is the part that to me is *huge*. We currently have so much code 
> that tries to break cycles and we've ended up with so many dangling 
> pointers because of that. If we can collect all that code in one place 
> and get a single workable solution rather than using ad-hoc solutions 
> all over the place I think we've won a lot.
Yeah. I hope so. It's hard to tell. The collector is still doing some 
pretty delicate stuff; it has to be quite correct, and its clients have 
to not lie to it or tell it bad information, otherwise it'll crash just 
as hard as any memory error.
> I think the biggest question is if we should land this before or after 
> alpha. The only reason I can think of not to land this for 1.9 is if we 
> have something else better in the works.
Ok. The biggest question I have is "has anyone else tried it?"
I'm really not at all comfortable with my ability to eyeball regressions 
on this product. I have every reason to believe I broke something that I 
didn't notice.
-graydon
Should we spin up some experimental builds for staging, and see what
people find?  We could point some extension authors at them too, since
they tend to find all sorts of interesting cycle and retention
behaviour, if history is a guide...
Mike
Yeah, this sounds like a great idea -- even better would be a tinderbox 
with the patch, so that we can hook it into the test tinderbox mechanism 
and get some perf and leak test results with the patch to get an idea of 
its impact.
     - Vlad
I don't know how far app-specific code needs adjusting to this new code 
(I did not see any code changes in toolkit/ or browser/ in this patch), 
but with a SeaMonkey trunk build I crash somewhat fast. The first 
problem was/is that url bar autocomplete seems to be broken with this 
patch (at least it worked fine before...), console says:
JavaScript error: , line 0: uncaught exception: [Exception... "Cannot 
modify properties of a WrappedNative"  nsresult: "0x80570034 
(NS_ERROR_XPC_CANT_MODIFY_PROP_ON_WN)"  location: "JS frame :: 
chrome://global/content/autocomplete.xml :: :: line 45"  data: no] 
(http://lxr.mozilla.org/mozilla/source/xpfe/components/autocomplete/resources/content/autocomplete.xml#45). 
The second problem (a crash) occoured shortly after clicking a link on a 
website:
###!!! ASSERTION: nsBaseHashtable was not initialized properly.: 
'this->mTable.entrySize', file 
h:\mozilla\tree-main\mozilla\seamonkey-build-debug\dist\include\
xpcom\nsBaseHashtable.h, line 183
0:000> kp
ChildEBP RetAddr
0012fbec 02146da4 xpcom_core!PL_DHashTableEnumerate(struct PLDHashTable 
* table = 0x0339e874, <function> * etor = 0x021476e0, void * arg = 
0x0012fc04)+0x2b 
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\xpcom\build\pldhash.c 
@ 672]
0012fc0c 02143f50 
gklayout!nsBaseHashtable<nsISupportsHashKey,nsRefPtr<nsXBLBinding>,nsXBLBinding 
*>::EnumerateRead(<function> * enumFunc = 0x02143ff0, void * userArg = 
0x0012fc84)+0x54 
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\dist\include\xpcom\nsbasehashtable.h 
@ 188]
0012fc20 1009421a 
gklayout!nsBindingManager::cycleCollection::Traverse(class nsISupports * 
s = 0x0339e860, struct nsCycleCollectionTraversalCallback * cb = 
0x0012fc84)+0x40 
[h:\mozilla\tree-main\mozilla\content\xbl\src\nsbindingmanager.cpp @ 359]
0012fc5c 100933c9 
xpcom_core!nsCycleCollectionXPCOMRuntime::Traverse(void * p = 
0x0339e860, struct nsCycleCollectionTraversalCallback * cb = 
0x0012fc84)+0x8a 
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 832]
0012fc74 10093520 xpcom_core!GraphWalker::Walk(void * s0 = 
0x0339e860)+0x109 
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 531]
0012fd00 1009510f xpcom_core!nsCycleCollector::MarkRoots(void)+0x80 
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 611]
0012fd18 10095578 xpcom_core!nsCycleCollector::Collect(void)+0x17f 
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 1554]
0012fd20 1007e71f xpcom_core!nsCycleCollector_collect(void)+0x18 
[h:\mozilla\tree-main\mozilla\xpcom\base\nscyclecollector.cpp @ 1630]
0012fd48 10084cea xpcom_core!nsEventQueue::GetEvent(int mayWait = 0, 
class nsIRunnable ** result = 0x00000000)+0x11f 
[h:\mozilla\tree-main\mozilla\xpcom\threads\nseventqueue.cpp @ 100]
0012fd5c 100857e6 xpcom_core!nsThread::nsChainedEventQueue::GetEvent(int 
mayWait = 0, class nsIRunnable ** event = 0x00000000)+0x1a 
[h:\mozilla\tree-main\mozilla\xpcom\threads\nsthread.h @ 110]
0012fd6c 10015782 xpcom_core!nsThread::HasPendingEvents(int * result = 
0x0012fd80)+0x46 
[h:\mozilla\tree-main\mozilla\xpcom\threads\nsthread.cpp @ 456]
0012fd84 029f6d86 xpcom_core!NS_HasPendingEvents_P(class nsIThread * 
thread = 0x007f4cd0)+0x52 
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\xpcom\build\nsthreadutils.cpp 
@ 205]
0012fdac 100858ce gkwidget!nsBaseAppShell::OnProcessNextEvent(class 
nsIThreadInternal * thr = 0x007f4cd0, int mayWait = 1, unsigned int 
recursionDepth = 0)+0xa6 
[h:\mozilla\tree-main\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp @ 224]
0012fdec 10015826 xpcom_core!nsThread::ProcessNextEvent(int mayWait = 1, 
int * result = 0x0012fe04)+0xce 
[h:\mozilla\tree-main\mozilla\xpcom\threads\nsthread.cpp @ 472]
0012fe08 029f6c1d xpcom_core!NS_ProcessNextEvent_P(class nsIThread * 
thread = 0x007f4cd0, int mayWait = 1)+0x56 
[h:\mozilla\tree-main\mozilla\seamonkey-build-debug\xpcom\build\nsthreadutils.cpp 
@ 225]
0012fe1c 0297fd50 gkwidget!nsBaseAppShell::Run(void)+0x5d 
[h:\mozilla\tree-main\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp @ 153]
0012fe2c 004052c4 appcomps!nsAppStartup::Run(void)+0x20 
[h:\mozilla\tree-main\mozilla\xpfe\components\startup\src\nsappstartup.cpp 
@ 219]
0012ff38 004045ef seamonkey!main1(int argc = 1, char ** argv = 
0x007f2830, class nsISupports * nativeApp = 0x00ea23f0)+0xa64 
[h:\mozilla\tree-main\mozilla\xpfe\bootstrap\nsapprunner.cpp @ 1239]
0012ff68 0041cb66 seamonkey!main(int argc = 1, char ** argv = 
0x007f2830)+0x17f 
[h:\mozilla\tree-main\mozilla\xpfe\bootstrap\nsapprunner.cpp @ 1741]
0012ffb8 0041c9bd seamonkey!__tmainCRTStartup(void)+0x1a6 
[f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
Frank
> I don't know how far app-specific code needs adjusting to this new code 
> (I did not see any code changes in toolkit/ or browser/ in this patch), 
> but with a SeaMonkey trunk build I crash somewhat fast.
Yeah, it's the debug-build nature; I've been doing my work without 
debug-mode turned on, just with '-g'; this has the advantage of being 
closer to the program we're shipping, but the disadvantage of having a 
bunch of useful assertions turned off. I'll try to remember to run tests 
with it both on and off more frequently; you tripped a (valid) assertion 
on a bad condition I recently introduced. I've just fixed the relevant 
bits here and updated the bug, and confirmed that seamonkey debug works 
now (for me).
> problem was/is that url bar autocomplete seems to be broken with this 
> patch (at least it worked fine before...), console says:
> JavaScript error: , line 0: uncaught exception: [Exception... "Cannot 
> modify properties of a WrappedNative"  nsresult: "0x80570034 
> (NS_ERROR_XPC_CANT_MODIFY_PROP_ON_WN)"  location: "JS frame :: 
> chrome://global/content/autocomplete.xml :: :: line 45"  data: no] 
> (http://lxr.mozilla.org/mozilla/source/xpfe/components/autocomplete/resources/content/autocomplete.xml#45). 
That one is odd. I had not noticed it before, but you're right. I don't 
know why that's happening; presumably I caused something to become a 
WrappedNative that didn't used to be. I'll have to look into it further.
> The second problem (a crash) occoured shortly after clicking a link on a 
> website:
> ###!!! ASSERTION: nsBaseHashtable was not initialized properly.: 
> 'this->mTable.entrySize', file 
> h:\mozilla\tree-main\mozilla\seamonkey-build-debug\dist\include\
> xpcom\nsBaseHashtable.h, line 183
This one is certainly my fault, as are two others I found by turning the 
debug build on. Sorry. Thanks for testing it! Further feedback appreciated.
-graydon