The future of XPCOM memory management

Jason Orendorff

unread,

Aug 28, 2007, 9:34:50 AM8/28/07

to

AddRef and Release constitute a contract between every XPCOM object
and all its users. The contract governs object lifetimes,
finalization order, and memory management.

Advantages of this specific contract:

1. It's relatively simple.
2. It requires no global coordination.
3. Prompt destruction. If there are no cycles, objects are
destroyed as soon as they're no longer needed.
4. It destroys objects in the right order.

Disadvantages:

1. It requires manual bookkeeping throughout the codebase
(nsCOMPtr, NS_ADDREF, NS_RELEASE, already_AddRefed,
kungFuDeathGrip, etc.) This clutters up the code, and all the
virtual method calls and AtomicIncrements can't be good.
2. The problem of reference cycles is built in.
3. Interacting with other memory management schemes is painful
and slow. (See cycle collector, XPConnect.)

In Mozilla 2, we should change this to require less effort from the
programmer and less clutter throughout the code. This means coming
up with a new contract that cooperates with garbage collection,
rather than fighting it.

Jason Orendorff

unread,

Aug 28, 2007, 4:32:44 PM8/28/07

to

On Aug 28, 9:34 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> More on this in a few hours.

I don't think there's such a thing as a contract that (a) is simple,
(b) is
efficient, (c) supports GC, (d) supports refcounting for objects that
want it,
and (e) really hides memory management implementation details. So
there will
be design tradeoffs here.

For the sake of having a concrete proposal to chew on, I propose the
following:

* Drop AddRef and Release from nsISupports.

* Require all XPCOM objects to be MMgc GCObjects or
GCFinalizedObjects,
allocated from the same GC allocator as all JavaScript objects.

Read more about MMgc here:
http://developer.mozilla.org/en/docs/MMgc

* Change any code that depends on objects being destroyed in a
specific
order, or at a specific time, to use some explicit means of
ensuring that
it really happens that way, rather than depending on reference
counting.

* Use static tools to replace our uses of nsCOMPtr and friends with
the MMgc
equivalents, and replace nsISupportsWeakReference with MMgc
GCWeakRefs.

* Add thread-safety to MMgc using the Spidermonkey request model.

* Delete the cycle collector.

How did I do? This proposal nails goal (c), does quite well on (a)
and OK on (b),
and ignores goals (d) and (e). Maybe you can do better.

I'll be working on the following bug quite soon, so now is the time to
speak up.

Bug 393034 - Allocate DOM objects using MMgc
https://bugzilla.mozilla.org/show_bug.cgi?id=393034

-j

Jonas Sicking

unread,

Aug 28, 2007, 6:04:30 PM8/28/07

to

I definitely think we should try to get rid of reference counting in
favor of garbage collection. There are a few things that I'm worried
about and that needs investigation.

How complicated is embedding going to be? I suspect it's going to be
more complicated than now in some ways, since interacting with a GC
engine is probably trickier than simply calling AddRef. At the same
time, environments that use GC, such as java, should have an easier time
avoiding leaks.

How is performance going to be during GC? While we're not writing a
real-time app, we don't want the UI to lock up for seconds while GC is
running. If we get better at detecting inactivity from the user, we
might get away with more here since we can run GC while the user is busy
simply looking at a webpage.

Related to the above, should we attempt to use incremental GC. From what
I understand this should be entierly possible with MMgc. However it
requires that all pointers use special smart-pointers. Including
pointers that are currently raw-pointers. This seems a little bit scary
and easy to forget, but might be very nice for performance.

/ Jonas

Jonas Sicking

unread,

Aug 28, 2007, 7:32:24 PM8/28/07

to

One thing not raised in your proposal is what to do with objects that
are currently *not* refcounted. Two good examples are strings (nsString
and friends) and arrays, for example nsTArray.

If we turn all currently refcounted objects into GCFinalizedObject, then
any nsString and nsTArray inline members will get their destructor
called when the hosting object is destroyed. Would that be a big
overhead? The devmo docs discourage GCFinalizedObject.

An alternative is to make nsString and nsTArray inherit GCObject, and
make them allocate their internal buffers using GC::Alloc.

However this doesn't fully work for nsTArray since we would still not be
finalizing the objects in the array. We could of course say that you're
not allowed to stick objects that need finalizing in nsTArray, but that
would probably break a good number of current users, for example
PathExpr::mItems in txExpr.h. This contains a number of PathExprItems like:

class PathExprItem {
public:
nsAutoPtr<Expr> expr;
PathOperator pathOp;
};

One way to fix this one example would be to make PathExpr::PathExprItem
and Exprs be GCObjects too, and so on. However with this strategy we
would likely be forced to convert a very large number of objects into
GCObject. This certainly sounds doable, but it seems like a lot of work,
much of it risky.

We can't simply make nsTArray a GCFinalizedObject, for two reasons.
First of all we don't really want to pay the overhead of a vtable
pointer. This class is just 4 bytes big when empty, so it would double
in size. Second, nsTArrays often appears as inline members in other
classes. If such classes are GCObjects then the garbage collector will
not be able to detect the inline GCFinalizedObject and finalize it.

/ Jonas

L. David Baron

unread,

Aug 28, 2007, 7:46:58 PM8/28/07

to Jason Orendorff

Jason Orendorff wrote:
> * Add thread-safety to MMgc using the Spidermonkey request model.

We currently don't pay the cost of thread-safety for most XPCOM objects,
since most objects are main-thread-only. Would doing this impose
significant performance penalties for locking or atomic operations? Or
would most operations still be cheap?

> * Delete the cycle collector.

Are we dropping the multi-language aspects of the platform (introduced
in this milestone, at considerable effort in
https://bugzilla.mozilla.org/show_bug.cgi?id=255942 )? Or is there a
good way for python to use MMgc as well?

(That said, the python stuff hasn't caught up with the cycle collector,
and would probably leak a lot if it were used as intensively as we use
JS, so I'm not sure how seriously we should take it.)

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

mhammond

unread,

Aug 28, 2007, 9:07:55 PM8/28/07

to

On Aug 29, 9:46 am, "L. David Baron" <dba...@dbaron.org> wrote:
> Jason Orendorff wrote:

> > * Delete the cycle collector.
>
> Are we dropping the multi-language aspects of the platform (introduced

> in this milestone, at considerable effort inhttps://bugzilla.mozilla.org/show_bug.cgi?id=255942)? Or is there a

> good way for python to use MMgc as well?
>
> (That said, the python stuff hasn't caught up with the cycle collector,
> and would probably leak a lot if it were used as intensively as we use
> JS, so I'm not sure how seriously we should take it.)

Yes, this will be a challenge. I can't picture how dropping refcounts
would work in the general case with Python. The XPCOM objects exposed
by Python can be made a GCObject - but I'm not sure how we would
integrate the rest of the Python universe - eg, assuming we have an
arbitrary number of Python objects holding pointers to xpcom objects,
I'm not sure how we would tell the GC about all such references - and
without that knowledge, the GC would cleanup objects that are
referenced. Deep hacks specific to Python and MMgc might be possible,
but that still screws Perl, etc.

Building a new version of Python that uses MMgc might be possible but
(a) it might not be and (b) every Python extension module would also
need to either (likely) change or (if we are extremely lucky) be
rebuilt.

To summarize, I see that having an external language integrate with
such MMgc is no more - and no less - difficult than integrating with
the existing spidermonkey GC - and I'm not aware of anyone who
believes that is feasible in either the general case, or even just the
specific cases on the table (ie, existing languages with xpcom
bindings)

On the other side of the coin though, the future may be closer to
something like .NET, where languages like Python are reimplemented on
top of a new VM - in which case the GC comes "for free" - but in such
a world xpcom doesn't make as much sense anyway - the VM itself can
make cross-language calls. So maybe we are asking the wrong question
- what is the future of XPCOM itself, not just its memory management?

Cheers,

Mark

bre...@mozilla.org

unread,

Aug 28, 2007, 9:09:55 PM8/28/07

to

Replying to several messages/thoughts at once:

1. Losing the cycle collector's support for other languages is
necessary to get C++ and JS on a better footing -- a shared GC heap.
But the idea and even code could be harvested for use by other
language runtimes, since we will still face uncollectable cycle
hazards interfacing Java, Python, etc. to C++ and JS.

2. Other languages in Mozilla 2 should prefer to integrate at the VM
level, on Tamarin. See IronMonkey.

3. Jonas's first point: We should avoid making non-refcounted XPCOM
classes be GC-objects without good evidence doing so wins in time and
space overhead. But (separate topic/thread) we hope to use std::string
and the like more where possible, and use Taras's elsa-based tools to
write the mega-patches for us.

4. Jonas's second point: MMgc needs to become more conservative about
interior objects. Currently it does not back up from a pointer to an
interior (via MI or explicit member embedding) to the outermost
(allocation) object.

5. David's first point: the request model is already followed (file
bugs if you can) in Mozilla code (this was not always so). We aim to
keep it, but it does not mean there is any thread-safety cost imposed
on GC'ed objects. Only that the embedding must begin, end, suspend,
resume, and yield requests appropriately (could use some static
analysis help here too). The JS objects that SpiderMonkey creates
already use the request model to do optimistic lock-free
synchronization, so no change there. And we are not imposing such
synchronization on other objects for Mozilla 2, as far as I can see.

/be

bre...@mozilla.org

unread,

Aug 28, 2007, 9:23:09 PM8/28/07

to

On Aug 28, 6:07 pm, mhammond <mhamm...@skippinet.com.au> wrote:
> The XPCOM objects exposed
> by Python can be made a GCObject - but I'm not sure how we would
> integrate the rest of the Python universe - eg, assuming we have an
> arbitrary number of Python objects holding pointers to xpcom objects,

MMgc is conservative. So long as Python allocates memory to be scanned
for pointers to MMgc allocations using MMgc's malloc wrapper, MMgc
will find these pointers. You need an MMgc GCRoot subclass (one or
more, perhaps at most one per other language implementation) to root
all the Python objects that are not (or not guaranteed to be) pointed
at by pointers in memory MMgc scans.

> I'm not sure how we would tell the GC about all such references - and
> without that knowledge, the GC would cleanup objects that are
> referenced.

That would be bad, so let's not ;-).

> Deep hacks specific to Python and MMgc might be possible,
> but that still screws Perl, etc.

Perl is not helped much by PyXPCOM, right? How does this differ with
XPCOM-on-MMgc?

> Building a new version of Python that uses MMgc might be possible but
> (a) it might not be and (b) every Python extension module would also
> need to either (likely) change or (if we are extremely lucky) be
> rebuilt.

This is not the way, agreed.

> To summarize, I see that having an external language integrate with
> such MMgc is no more - and no less - difficult than integrating with
> the existing spidermonkey GC -

It's easier because MMgc is conservative -- but I should add that it's
harder if MMgc is used in incremental mode, because you need to impose
a write barrier on Python.

> and I'm not aware of anyone who
> believes that is feasible in either the general case, or even just the
> specific cases on the table (ie, existing languages with xpcom
> bindings)

GC-to-GC cycle collection is easier than refcount cycle collection.
See http://www.cs.cmu.edu/~roc/HetGC.html (I hope this is sound -- I
never did proofs ;-) and Parley from IBM research, where the best link
I can find is:

http://researchweb.watson.ibm.com/vee04/video.html#grove

> On the other side of the coin though, the future may be closer to
> something like .NET, where languages like Python are reimplemented on
> top of a new VM - in which case the GC comes "for free" - but in such
> a world xpcom doesn't make as much sense anyway - the VM itself can
> make cross-language calls.

Precisely -- wherefore IronMonkey.

> So maybe we are asking the wrong question
> - what is the future of XPCOM itself, not just its memory management?

And indeed Jason posted a separate thread on that topic. See you
there :-).

/be

bre...@mozilla.org

unread,

Aug 28, 2007, 9:26:03 PM8/28/07

to mham...@skippinet.com.au

On Aug 28, 6:23 pm, "bren...@mozilla.org" <bren...@mozilla.org> wrote:
> On Aug 28, 6:07 pm, mhammond <mhamm...@skippinet.com.au> wrote:
>
> > The XPCOM objects exposed
> > by Python can be made a GCObject - but I'm not sure how we would
> > integrate the rest of the Python universe - eg, assuming we have an
> > arbitrary number of Python objects holding pointers to xpcom objects,
>
> MMgc is conservative. So long as Python allocates memory to be scanned
> for pointers to MMgc allocations using MMgc's malloc wrapper, MMgc
> will find these pointers.

I should have written "using an appropriate malloc wrapper". Ideally
only memory that might contain pointers to XPGC (heh) objects would be
allocated with the kContainsPointer MMgc flag.

that may be hard to hook into C-Python. Is it?

/be

Benjamin Smedberg

unread,

Aug 29, 2007, 8:11:29 AM8/29/07

to

mhammond wrote:

> Yes, this will be a challenge. I can't picture how dropping refcounts
> would work in the general case with Python. The XPCOM objects exposed
> by Python can be made a GCObject - but I'm not sure how we would
> integrate the rest of the Python universe - eg, assuming we have an
> arbitrary number of Python objects holding pointers to xpcom objects,
> I'm not sure how we would tell the GC about all such references - and

Can't you just root them while python holds the external references?

--BDS

Benjamin Smedberg

unread,

Aug 29, 2007, 11:03:22 AM8/29/07

to

Jonas Sicking wrote:

> How complicated is embedding going to be? I suspect it's going to be
> more complicated than now in some ways, since interacting with a GC
> engine is probably trickier than simply calling AddRef. At the same
> time, environments that use GC, such as java, should have an easier time
> avoiding leaks.

I believe that embedders should never see any of this: embedders should not
use XPCOM any more, we should expose real platform-native embedding layers
with a stable API.

--BDS

Benjamin Smedberg

unread,

Aug 29, 2007, 11:22:22 AM8/29/07

to

Jason Orendorff wrote:

> * Use static tools to replace our uses of nsCOMPtr and friends with
> the MMgc
> equivalents, and replace nsISupportsWeakReference with MMgc
> GCWeakRefs.

There are two different uses of weakrefs in our tree:

1) Weak refs used to avoid cycles: A holds a strong-ref to B, which holds a
weak ref to A. This is more COM-safe than holding a raw pointer to A. This
pattern should simply be replaced by object pointers, which I think means
DWB(MyObject*)

> * Add thread-safety to MMgc using the Spidermonkey request model.

This is the part that has me extremely worried. We would have to propagate
the request model throughout all of our XPCOM code, which is a very tricky
task. As I understand it, the invariants for requests are:

1) GC references may change only within a request
2) blocking (or long-running) activity should not take place within a request

I think that keeping track of whether we're currently in a request would be
a major headache.

Brendan may kill me for this, but I think that we can and should assert
single-threaded behavior for all of JS and "XPCOM/MMGC": that is, MMGC
should only be on the main thread.

There are necko classes which will want to be accessible via MMGC/XPCOM and
also be internally threadsafe, but I believe that they can root themself on
the main thread and perform all of their multi-thread networking using an
internal threadsafe reference-counting and proxying scheme that is "not XPCOM".

> I'll be working on the following bug quite soon, so now is the time to
> speak up.
>
> Bug 393034 - Allocate DOM objects using MMgc
> https://bugzilla.mozilla.org/show_bug.cgi?id=393034

Are we sure that we want to get into the "revamp XPCOM" game just to get
going with fast-path DOM? The approach you mention in comment 14 approach C
makes a lot of sense, without rewriting all of XPCOM:

1) use MMGC for internal DOM references
2) keep using XPCOM for "external" references - XPCOM refs mean the object
is rooted
3) teach XPConnect to use MMGC references for DOM objects

--BDS

Brendan Eich

unread,

Aug 29, 2007, 3:01:58 PM8/29/07

to Benjamin Smedberg

You can root at some cost. But roots aren't cheap, and IIRC Mark used
delegated Python incref/decref to call XPCOM AddRef/Release directly --
cheaper and in sync with Python's ref-counting with background GC memory
management. And with roots, you can still have cycles between heaps (but
these were not addressed by PyXPCOM, and as dbaron noted, the XPCOM
cycle collector's stubs for PyXPCOM need to be fleshed out and tested).

Roots are not the answer for interior nodes. With SpiderMOnkey,
delegated trace (formerly mark) and finalize class hooks help. But any
two memory managers colliding, you want cheaper-than-global-root edge
tracing, and for leak-proofing you must have to do something further to
deal with cycles.

Life is better with a single GC, which is why we are unifying C++ and JS
on top of MMgc for Mozilla 2, and why we are supporting IronMonkey to
add other languages on top of a common memory manager and JITting VM
(along with memory safety and other benefits as motivation, in addition
to avoiding multi-heap integration and cycle-breaking hassles).

/be

Jason Orendorff

unread,

Aug 29, 2007, 3:05:04 PM8/29/07

to

On Aug 29, 11:22 am, Benjamin Smedberg <benja...@smedbergs.us> wrote:

> Jason Orendorff wrote:
> > * Add thread-safety to MMgc using the Spidermonkey request model.
>
> This is the part that has me extremely worried. We would have to propagate
> the request model throughout all of our XPCOM code, which is a very tricky
> task. As I understand it, the invariants for requests are:
>
> 1) GC references may change only within a request
> 2) blocking (or long-running) activity should not take place within a request
>
> I think that keeping track of whether we're currently in a request would be
> a major headache.

I am not too worried. I think I see how this is going to work.

All threads will be in a request all the time, except when doing
blocking I/O or CPU-bound, non-GC-touching stuff. You'll have to
suspend the request before doing that kind of thing, and resume it
afterwards. I imagine we'll have a C++ object that knows how to do
this. (Like nsAutoLock. "nsAutoSuspendRequest", maybe.)

Finding those places would be the only hard part. But if you miss one,
it should be *real* easy to spot and debug. Firefox will seem to hang.
You'll attach a debugger, and all threads will be sitting in
MMgc::waitForGC except for one, which will be blocked on DNS or
compositing video buffers.

> Are we sure that we want to get into the "revamp XPCOM" game just to get
> going with fast-path DOM? The approach you mention in comment 14 approach C
> makes a lot of sense, without rewriting all of XPCOM:

> [...]

Well, my short-term plans are still incremental. But this
conversation has been dormant since December. I really want to know
the long-term plan, for both short-term and long-term reasons.

-j

Brendan Eich

unread,

Aug 29, 2007, 3:10:09 PM8/29/07

to Benjamin Smedberg, vlad...@pobox.com

Or an unstable API. It's what everyone else does. Stability arises
through a "conversation" between producers and consumers, and over time
increases, until there's a "big shift". This came up in face-to-face
meetings, and we should hash it out.

New thread in .embedding, cross-posted here with followup-to: set?

/be

Brendan Eich

unread,

Aug 29, 2007, 3:11:40 PM8/29/07

to tg...@mozilla.com

Jonas Sicking wrote:
> Related to the above, should we attempt to use incremental GC. From what
> I understand this should be entierly possible with MMgc. However it
> requires that all pointers use special smart-pointers. Including
> pointers that are currently raw-pointers. This seems a little bit scary
> and easy to forget, but might be very nice for performance.

Raw pointers should be banned in incremental GC settings. We can use
static analysis to enforce this.

And anyway, we really do need to understand ownership at every edge in
the graph. Right now, a raw pointer is a giant question mark that should
raise alarms about either manual-over-refcounted leak bugs, or else
manually-dropped-early or just plain-old-raw-weak-pointer and therefore
dangling-pointer, exploitable bugs.

/be

Jason Orendorff

unread,

Aug 29, 2007, 3:17:08 PM8/29/07

to

Yes, but you'd leak any garbage cycles that include both Python and
XPCOM objects--they would be rooted. That can be fixed, too, with
enough effort. It's not simple.

-j

Benjamin Smedberg

unread,

Aug 29, 2007, 3:25:54 PM8/29/07

to

Brendan Eich wrote:

> Or an unstable API. It's what everyone else does. Stability arises
> through a "conversation" between producers and consumers, and over time
> increases, until there's a "big shift". This came up in face-to-face
> meetings, and we should hash it out.
>
> New thread in .embedding, cross-posted here with followup-to: set?

Sure. The important point was not the stability or instability of the
embedding API, but that it is entirely decoupled from "XPCOMGC".

--BDS

Brendan Eich

unread,

Aug 29, 2007, 3:26:38 PM8/29/07

to Benjamin Smedberg

Benjamin Smedberg wrote:
> Brendan may kill me for this, but I think that we can and should assert
> single-threaded behavior for all of JS and "XPCOM/MMGC": that is, MMGC
> should only be on the main thread.

Jason addressed the request model fear. Main thread code can't block
indefinitely for i/o already, so the only request suspend points that I
can see right now are

* lengthy, non-GC-graph-mutating computations;
* file i/o that's "blocking, but fast", yet not fast enough for us to
wish to stay in a request;
* required deadlock-with-the-GC avoidance not handled by the request
model itself.

> There are necko classes which will want to be accessible via MMGC/XPCOM and
> also be internally threadsafe, but I believe that they can root themself on
> the main thread and perform all of their multi-thread networking using an
> internal threadsafe reference-counting and proxying scheme that is "not XPCOM".

More than Necko classes are at stake. We know of AllPeers and Songbird
MT XPCOM usage, and I believe Joost too uses XPCOM with shared memory
threads. I do not propose to make all such consumers of XPCOM rewrite
their code for Mozilla 2, even though it could turn out that everyone
agrees on doing that, for good wins in reasonable timeframe.

Proceeding incrementally, removing ref-counting from XPCOM and moving it
to GC, seems a much better approach, since we don't know all the costs
and benefits, and how they trade off for different platform clients.

>> I'll be working on the following bug quite soon, so now is the time to
>> speak up.
>>
>> Bug 393034 - Allocate DOM objects using MMgc
>> https://bugzilla.mozilla.org/show_bug.cgi?id=393034
>
> Are we sure that we want to get into the "revamp XPCOM" game just to get
> going with fast-path DOM? The approach you mention in comment 14 approach C
> makes a lot of sense, without rewriting all of XPCOM:
>
> 1) use MMGC for internal DOM references
> 2) keep using XPCOM for "external" references - XPCOM refs mean the object
> is rooted
> 3) teach XPConnect to use MMGC references for DOM objects

This is more work because it bridges two memory managers. It requires
the cycle collector still. We should try to cut to the chase and move
XPCOM to MMgc.

/be

Brendan Eich

unread,

Aug 29, 2007, 3:32:36 PM8/29/07

to Benjamin Smedberg

Indeed, and I'm with you on that point. It's non-trivial with exact GC,
since you end up requiring fat handles (roots or scannable thread-local
helpers such as SpiderMonkey's JSTempValueRooters), and these can't be
hidden even with C++ auto-storage-class automation.

Think of the JNI with its global and local (per-activation) roots, and
the need to manage the latter when you create thousands of newborns and
connect each as you go to the live object graph.

We don't want a JNI-like embedding API just to future-proof for exact
GC. We might evolve MMgc toward a more exact mode of operation, but
there's little motivation for that now. So we are probably committing to
at least conservative stack scanning in our GC, by using simple
embedding APIs.

I'm assuming the embedding APIs will involve pointers to GC-allocated
things. Copying strings in and out can get expensive depending on the
embedding. But this is fodder for the new thread.

/be

Brendan Eich

unread,

Aug 29, 2007, 3:37:05 PM8/29/07

to Benjamin Smedberg

Brendan Eich wrote:
> Benjamin Smedberg wrote:
>> Brendan may kill me for this, but I think that we can and should assert
>> single-threaded behavior for all of JS and "XPCOM/MMGC": that is, MMGC
>> should only be on the main thread.
>
> Jason addressed the request model fear. Main thread code can't block
> indefinitely for i/o already, so the only request suspend points that I
> can see right now are
>
> * lengthy, non-GC-graph-mutating computations;

These are UI starvation bugs to fix already, btw.

> * file i/o that's "blocking, but fast", yet not fast enough for us to
> wish to stay in a request;

I'm thinking of local file i/o, but we do that non-blocking too, don't we?

> * required deadlock-with-the-GC avoidance not handled by the request
> model itself.

This would be something like the cycle collector, for Java and C-Python
if we care to avoid cross-heap leaks with those runtimes.

/be

Brendan Eich

unread,

Aug 29, 2007, 3:39:36 PM8/29/07

to tg...@mozilla.com

Brendan Eich wrote:
> Jonas Sicking wrote:
>> Related to the above, should we attempt to use incremental GC. From what
>> I understand this should be entierly possible with MMgc. However it
>> requires that all pointers use special smart-pointers. Including
>> pointers that are currently raw-pointers. This seems a little bit
>> scary and easy to forget, but might be very nice for performance.
>
> Raw pointers should be banned in incremental GC settings. We can use
> static analysis to enforce this.

Sorry to be unclear: the context here is heap-allocated data structures.
You need a write barrier for any pointer in another GC-allocated struct.

The thread stack can be full of raw pointers, no problem. Conservative
scanning means they (along with the odd float ;-) will be taken for
strong refs, and mutation does not need to update card marks or colors
since we don't GC the stack.

/be

Jason Orendorff

unread,

Aug 29, 2007, 5:35:21 PM8/29/07

to

On Aug 28, 7:46 pm, "L. David Baron" <dba...@dbaron.org> wrote:
> Jason Orendorff wrote:
> > * Add thread-safety to MMgc using the Spidermonkey request model.
>
> We currently don't pay the cost of thread-safety for most XPCOM objects,
> since most objects are main-thread-only. Would doing this impose
> significant performance penalties for locking or atomic operations? Or
> would most operations still be cheap?

The request model helps prevent two kinds of thread-unsafety:
1) GC colliding with other threads doing stuff
2) two threads touching an object at the same time

We only need it for item 1, which is cheap.

Individual classes may opt in for item 2. JSObject does. But most
XPCOM classes won't-- and so they will incur no cost.

Those following along at home can read up on the request model here:
SpiderMonkey Internals: Thread Safety
http://tinyurl.com/yt5rtr

-j

Jason Orendorff

unread,

Aug 29, 2007, 5:44:10 PM8/29/07

to

On Aug 28, 7:46 pm, "L. David Baron" <dba...@dbaron.org> wrote:
> Jason Orendorff wrote:

> > * Delete the cycle collector.
>
> Are we dropping the multi-language aspects of the platform (introduced
> in this milestone, at considerable effort in

> https://bugzilla.mozilla.org/show_bug.cgi?id=255942)? Or is there a

> good way for python to use MMgc as well?

You're right, we must decide whether to keep this. If so, I see
several options, none easy:

- Create the opposite of cycle collector: code to walk CPython
(refcounted) object graphs so that MMgc can see through them.
CPython has a cycle collector API that would probably help:

http://docs.python.org/api/supporting-cycle-detection.html

- Patch CPython to use MMgc. Somewhat scary due to the likelihood
of Python code depending on destructors being called in order.

- Create a library to facilitate interop among multiple
language runtimes in a single process, with distributed garbage
collection, etc. Like SWIG, only much better. Implement it for
Tamarin, XPCOM, CPython, and Java.

If the last option sounds crazy, it should, but I'll go ahead and
point
out that we've done interop at least 3 times already (LiveConnect,
XPConnect, PyXPCOM), and we're about to do it 2 more times
(ScreamingMonkey needs Tamarin/MSCOM interop; ActionMonkey
needs Tamarin/XPCOM interop). Maybe it's time to do it in a generic
form that other open source projects can use.

-j

Jonas Sicking

unread,

Aug 29, 2007, 6:23:03 PM8/29/07

to

Jason Orendorff wrote:
> On Aug 28, 7:46 pm, "L. David Baron" <dba...@dbaron.org> wrote:
>> Jason Orendorff wrote:
>>> * Delete the cycle collector.
>> Are we dropping the multi-language aspects of the platform (introduced
>> in this milestone, at considerable effort in
>> https://bugzilla.mozilla.org/show_bug.cgi?id=255942)? Or is there a
>> good way for python to use MMgc as well?
>
> You're right, we must decide whether to keep this. If so, I see
> several options, none easy:
>
> - Create the opposite of cycle collector: code to walk CPython
> (refcounted) object graphs so that MMgc can see through them.
> CPython has a cycle collector API that would probably help:

The opposite would also work, make the CPython cycle collector walk
through the MMgc graph, like we currently make our own cycle collector
walk through the JS graph.

/ Jonas

Jonas Sicking

unread,

Aug 29, 2007, 7:10:29 PM8/29/07

to

Many of our raw pointers exist solely to avoid cycles, like the
nsNodeInfoManager::mDocument <-> nsDocument::mNodeInfoManager cycle
where the first is a raw pointer but is nulled out when the nsDocument
is deleted. In this case both pointers should use normal write barriered
pointers.

In other cases we use raw pointers in order to store extra bits. For
example nsINode::mParentPtrBits where we use the two lower bits to store
data. Here I suspect we could probably create some sort of wrapper class
that creates a write barrier, but still allows the two lower bits to be
used.

Yet a third example is nsINode::mFlagsOrSlots which sometimes stores a
bitfield and sometimes stores a pointer. This situation is approximately
the same as the previous one, possibly with exception that the wrapper
class needs to be able to return a bitfield in addition to a pointer.

/ Jonas

mhammond

unread,

Aug 29, 2007, 9:01:53 PM8/29/07

to

On Aug 29, 11:26 am, "bren...@mozilla.org" <bren...@mozilla.org>

I believe it is very hard to hook it into a built Python. It would be
much easier to hook it in at build time but IIUC, it would also
require that *all* Python extensions you wish to use are also rebuilt;
any prebuilt Python extensions you can find on the web would be
unusable. My gut tells me that this would be unacceptable to people
using this platform with Python, but hopefully there are some lurkers
here who can throw their 2c in.

Another alternative I'm yet to investigate is that we hack on Python
to offer the ability to hook a memory allocator in at runtime before
Python is initialized. The downside of this approach is the in the
short-term, we will not be able to work with a released version - it
would need Python 2.6 or later.

But even then, I have a concern regarding other languages - do we
really want to raise the bar for entry into the xpcom world to being
able to integrate with a garbage collection system? It seems our long
terms goal is to get rid of xpcom in favour of the "one VM for all
languages" approach, so while xpcom remains alive it should keep doing
all it can to be inclusive of the languages able to be supported.

Mark

Benjamin Smedberg

unread,

Aug 30, 2007, 9:41:58 AM8/30/07

to

Jason Orendorff wrote:

> All threads will be in a request all the time, except when doing
> blocking I/O or CPU-bound, non-GC-touching stuff. You'll have to
> suspend the request before doing that kind of thing, and resume it
> afterwards. I imagine we'll have a C++ object that knows how to do
> this. (Like nsAutoLock. "nsAutoSuspendRequest", maybe.)
>
> Finding those places would be the only hard part. But if you miss one,
> it should be *real* easy to spot and debug. Firefox will seem to hang.
> You'll attach a debugger, and all threads will be sitting in
> MMgc::waitForGC except for one, which will be blocked on DNS or
> compositing video buffers.

ok, you have me mostly convinced... let's proceed under the general
assumption that this is what we want to do. To accomplish this, we're going
to have a lot of different things going on:

The list of tasks to accomplish this is at least:

* Add the request model threadsafety to MMGc
* Give MMGc the ability to recognized "inner" pointers to objects
* Identify request start/end points in the codebase (blocking activity)
* Ensure (how?) that existing locking mechanisms for threadsafe code won't
deadlock with GC
* Rewrite XPCOM addref/release handling
** Remove or stub out getter_AddRefs, already_AddRefed, and other helper classes
** Make member-comptrs call/be DWB
** Make stack-comptrs raw pointers
** Fix some COM-holding utility classes
*** nsCOMArray
*** nsInterfaceHashKey
*** nsInterfaceHashtable
** Identify XPCOM weakrefs that can be GCRefs
** Rewrite the other XPCOM weak-references into GCWeakRefs

Other random notes/questions:

What are the rules for objects with finalizers? Is the finalize method
allowed to touch other objects? Presumably these objects may have already
been finalized, right (or else you'd end up with finalization cycles)?

Right now many objects are going to have to be finalized, because they
contain string members or do real work in their constructor. We should
discuss the pros/cons of making strings GCthings, or even sharing the
tamarin string type with XPCOM. We should also automatically identify
destructors that do "real" work to see if we can remove that work, or if the
work is even safe to do when the target object may have already been finalized.

Because the main thread is always non-blocking by design, it would naturally
never exit its request. This is probably ok as long as we force GC to always
take place on the main thread. If GC gets triggered on a worker thread, that
thread would block forever. Alternately we could exit/reenter the request
every time we process the main event queue.

--BDS

Taras Glek

unread,

Aug 30, 2007, 3:02:06 PM8/30/07

to Benjamin Smedberg

Benjamin Smedberg wrote:
>
> The list of tasks to accomplish this is at least:

Commenting here on things that concern automation.

>
> * Add the request model threadsafety to MMGc
> * Give MMGc the ability to recognized "inner" pointers to objects
> * Identify request start/end points in the codebase (blocking activity)
> * Ensure (how?) that existing locking mechanisms for threadsafe code won't
> deadlock with GC

I don't see better way to do this other than experimentation. Once we
discover bugs and unsafe usage patterns we can start thinking about
hunting those down using static analysis tools.

> * Rewrite XPCOM addref/release handling
> ** Remove or stub out getter_AddRefs, already_AddRefed, and other helper classes
> ** Make member-comptrs call/be DWB

Can rewrite these. Only question is, do these have to be macros? They
make rewriting things later more painful than needed.

> ** Make stack-comptrs raw pointers

Can detect those easily enough with static analysis.

> ** Fix some COM-holding utility classes
> *** nsCOMArray

We should probably be switching away from moz-specific containers to stl
ones. I could probably rewrite these.

> *** nsInterfaceHashKey
> *** nsInterfaceHashtable
> ** Identify XPCOM weakrefs that can be GCRefs

This can't be done completely automatically. Once we identify a common
pattern to look for, then automation can be considered.

> ** Rewrite the other XPCOM weak-references into GCWeakRefs

I'm not sure if the concepts map directly. Shouldn't most weak
references become gc-managed pointers?

Taras

Graydon Hoare

unread,

Aug 30, 2007, 4:19:09 PM8/30/07

to

Jason Orendorff wrote:

> - Create a library to facilitate interop among multiple
> language runtimes in a single process, with distributed garbage
> collection, etc. Like SWIG, only much better. Implement it for
> Tamarin, XPCOM, CPython, and Java.
>
> If the last option sounds crazy, it should, but I'll go ahead and
> point
> out that we've done interop at least 3 times already (LiveConnect,
> XPConnect, PyXPCOM), and we're about to do it 2 more times
> (ScreamingMonkey needs Tamarin/MSCOM interop; ActionMonkey
> needs Tamarin/XPCOM interop). Maybe it's time to do it in a generic
> form that other open source projects can use.

Sorry, just catching up on this thread (and trying to keep it concrete).

Let's untangle "interop" into 3 categories: memory management,
concurrency and calling. And then untangle each of those categories into
two concrete sub-headings: semantics and runtime support library.

Here's my current picture of "multiple language interop" with some
values filled in. Feel free to disagree:

- Memory management
- semantics: conservative GC + refcount/finalize API
- runtime support library: mmGC

- Concurrency
- semantics: ?? something like request model ??
- runtime support library: ?? JSAPI + moz event queue ??

- Calling
- semantics: XPIDL type system (~ subset related to MSCOM)
- runtime support library: tamarin JIT + typelibs

So ... I'm considering what we're doing here to be about "dynamic-izing
the XPCOM/MSCOM/C++ side" not "static-izing the existing dynamic
language runtimes".

Dynamic language runtimes can already synthesize their own object
proxies by inspecting typelibs, and can already walk their own object
graphs. Many -- perhaps most? -- do user-level "threading" via a central
event queue. So we may need to ask them for these services, but every
dynamic language runtime has an API to them. Assuming many dynamic
language runtimes *need* long term integration with us. Some dynamic
languages may just give up and just write compilers to ABC bytecode.
*Cough* JS *cough*. All the better.

C++ does need help, but we *have* the libraries we're intending to use.
The tamarin JIT and mmGC let us perform very dynamic calling and memory
management tasks on C++, quite generally, at a sub-language,
machine-code and memory-address level. That's the whole point. IMO
there's no need to invent other low-level libraries or additional
inter-language semantics here.

(To clarify: I'm assuming the "replacement" for XPConnect will drive the
Tamarin JIT via reflection on typelibs or XPIDL type representations
bundled in ABC, and xptcall will vanish. Correct me if this is not the
plan.)

-Graydon

Robert O'Callahan

unread,

Sep 4, 2007, 12:34:32 AM9/4/07

to

On Aug 30, 9:44 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> - Create a library to facilitate interop among multiple
> language runtimes in a single process, with distributed garbage
> collection, etc. Like SWIG, only much better. Implement it for
> Tamarin, XPCOM, CPython, and Java.
>
> If the last option sounds crazy, it should, but I'll go ahead and
> point
> out that we've done interop at least 3 times already (LiveConnect,
> XPConnect, PyXPCOM), and we're about to do it 2 more times
> (ScreamingMonkey needs Tamarin/MSCOM interop; ActionMonkey
> needs Tamarin/XPCOM interop). Maybe it's time to do it in a generic
> form that other open source projects can use.

It sounds slightly crazy but I think it's doable. We had a project
related to this at IBM --- Parley, that Brendan mentioned --- although
it didn't get anywhere, partly because I left. The basic idea was to
add a distributed mark and sweep phase that multiple runtimes can plug
into. Even generational and copying collectors can participate; VMs
that do generational collection need to temporarily root references
that escape the VM, and copying collectors need to add wrappers that
don't move, or support pinning (which I believe most real copying
collectors do).

Rob

Robert O'Callahan

unread,

Sep 4, 2007, 12:43:32 AM9/4/07

to

On Aug 29, 8:32 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> * Add thread-safety to MMgc using the Spidermonkey request model.

Is this compatible with incremental marking? I don't know the details
of MMgc incremental marking but I fear the complexity, and overhead,
of safe concurrent incremental marking. I would hate to eat a
complexity or performance hit for thread-safe memory management that
is very little used. We've already been down that road with thread-
safe XPConnect.

I suppose it might be possible to perform incremental marking on the
main thread only, avoiding the overhead of concurrent marking.

Also, how about MMgc's reference counted objects, would you make those
thread-safe too? That sounds like another performance hit.

Rob

Robert O'Callahan

unread,

Sep 4, 2007, 12:46:56 AM9/4/07

to

On Aug 29, 8:32 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:

> * Drop AddRef and Release from nsISupports.
>
> * Require all XPCOM objects to be MMgc GCObjects or
> GCFinalizedObjects,
> allocated from the same GC allocator as all JavaScript objects.

There are some objects, notably nsIFrame and subclasses, that inherit
from nsISupports but aren't actually refcounted. As a pre-step we
would want to stop them inheriting from nsISupports, and have them
inherit from something with QueryInterface only (or better, just get
rid of all uses of QueryInterface on frames).

Rob

Jason Orendorff

unread,

Sep 4, 2007, 11:43:55 AM9/4/07

to

On Sep 4, 12:43 am, Robert O'Callahan <rocalla...@gmail.com> wrote:
> On Aug 29, 8:32 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> > * Add thread-safety to MMgc using the Spidermonkey request model.
>
> Is this compatible with incremental marking? I don't know the details
> of MMgc incremental marking but I fear the complexity, and overhead,
> of safe concurrent incremental marking. I would hate to eat a
> complexity or performance hit for thread-safe memory management that
> is very little used.

Here's the post to read:

https://mail.mozilla.org/pipermail/tamarin-devel/2007-August/000017.html
(under "Maybe incremental is not so bad")

Here are the costs:

* Incremental marking must happen under a global lock. No other code
can be touching GC-managed objects while this happens, just the same
as for non-incremental GC.

* There's a synchronization cost per call to GC::IncrementalMark,
probably negligible in the scheme of things.

* There's an additional cost per "write boundary hit". This happens
when you assign a pointer to a "white" (unmarked, unqueued) object to
a field of a "black" (already marked) object. The white object has to
be queued. This should be relatively rare. The cost, when it
happens, is that you have to atomically test-and-set a bit
(OSAtomicTestAndSet on Mac), which shouldn't be so horrible.

> Also, how about MMgc's reference counted objects, would you make those
> thread-safe too? That sounds like another performance hit.

How about this: split MMgc::RCObject into two classes,
MMgc::ThreadSafeRCObject and MMgc::SingleThreadRCObject. Choose one
or the other on a per-class basis. This is like what XPCOM
programmers already do. I haven't thought this through thoroughly,
though. I see (theoretical) performance costs even for SingleThread
objects, but not per-refcount and probably acceptable.

The bigger problem with MMgc deferred reference counting (DRC) is how
to expose it to users. COM's AddRef/Release contract may be annoying,
but at least it's simple. Supporting both DRC and straight-up GC
means supporting at least 2 totally different memory-management
contracts, on a per-interface or per-object basis. It reminds me of
the proliferation of open source licenses. How to do this without
burdening users is an open question. Suggestions welcome-- DRC is
good stuff (and as Tamarin uses it for everything, it's probably
unavoidable).

-j

Robert O'Callahan

unread,

Sep 4, 2007, 7:47:18 PM9/4/07

to

On Sep 5, 3:43 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> The bigger problem with MMgc deferred reference counting (DRC) is how
> to expose it to users. COM's AddRef/Release contract may be annoying,
> but at least it's simple. Supporting both DRC and straight-up GC
> means supporting at least 2 totally different memory-management
> contracts, on a per-interface or per-object basis. It reminds me of
> the proliferation of open source licenses. How to do this without
> burdening users is an open question. Suggestions welcome-- DRC is
> good stuff (and as Tamarin uses it for everything, it's probably
> unavoidable).

What's the impact of using DRC for everything in our own code?

BTW is there a document somewhere that summarizes the run-time costs
of inheriting from GCObject etc, or should I just look at the source?

Rob

Jason Orendorff

unread,

Sep 5, 2007, 10:10:10 AM9/5/07

to

On Aug 30, 4:19 pm, Graydon Hoare <gray...@mozilla.com> wrote:
> Dynamic language runtimes can already synthesize their own object
> proxies by inspecting typelibs, and can already walk their own object
> graphs. Many -- perhaps most? -- do user-level "threading" via a central
> event queue. So we may need to ask them for these services, but every
> dynamic language runtime has an API to them. Assuming many dynamic

> language runtimes *need* long term integration with us. [...]

I don't think this recognizes the amount of work involved in something
like PyXPCOM.

It seems to me our choices are:
1) don't try to support multiple language bindings
2) support them in a common way, with common
library code, in a form that might solve
problems for other people too (and thus attract
contributors)
3) support them as we do PyXPCOM and
LiveConnect now, i.e. by ourselves, separately,
and not especially well

I consider #1 a strong option. #2 is pretty nutty. I brought it up
for two reasons. First, #3 is a lot like #2, only without pooling any
effort or designing for pluggability. Under #3, lesser-used languages
(like Python) are unavoidably second-class citizens. They'll never
reflect XPCOM with high fidelity unless someone spends an unlikely
amount of effort on it. My impression is that PyXPCOM is already
lagging and unlikely to catch up, much less keep pace with coming
changes.

Second, I was writing under a hypothetical that assumed PyXPCOM is
something we want to keep and maintain.

> Some dynamic
> languages may just give up and just write compilers to ABC bytecode.
> *Cough* JS *cough*. All the better.

This also seems to underestimate effort. A compiler alone doesn't get
you halfway to, say, Jython.

I'm not convinced we want multi-language support. It's real expensive
and not very useful.

-j

Benjamin Smedberg

unread,

Sep 5, 2007, 11:53:28 AM9/5/07

to

Robert O'Callahan wrote:

> What's the impact of using DRC for everything in our own code?

I'll summary what I learned from Jason on IRC:

* RCObject is an early-collection optimization: when an RCObject refcount
goes to zero, it is placed in the ZCT (zero-count table)... at some frequent
interval, MMGc scans the stack to make sure there are no stack pointers to
ZCT objects and then collects them. This collection is shallow and fast.

* Tamarin uses RCObject for all JS objects, including strings.

The costs of RCObject:

* the objects keep an extra int member
* if threadsafety is needed, we need atomic increment/decrement
* cycles between RCObjects or any references to RCObjects from GCObjects
will not be collected until a "standard" GC
* Any object that holds a reference to an RCObject (a DRCWB) has to have a
finalizer

From reading code, I've also gleaned that the current DRCWB system assumes
that you have a toplevel pointer, not an internal pointer like we normally
keep in XPCOM. To work around this you either have to
dynamic_cast<RCOBject*>, which requires RTTI and may not be cheap (needs
measurement), or keeping virtual functions, which means AddRef/Releease.

I tend to think that the costs of RCObject for general "XPCOMGC" use are too
high: in particular, I think we want to have pervasive cycles between DOM
objects: parent<->children DOM nodes as well as node<->document references.

--BDS

Jason Orendorff

unread,

Sep 5, 2007, 4:33:00 PM9/5/07

to

On Sep 4, 7:47 pm, Robert O'Callahan <rocalla...@gmail.com> wrote:
> What's the impact of using DRC for everything in our own code?

I think we would have to keep AddRef and Release as virtual functions,
and we would have to keep the hack in DOM where child nodes don't hold
real references to their siblings or parents. We would keep all the
reference-counting scaffolding we have now; it would just be
backstopped by MMgc instead of the cycle collector.

A lot of this pain is because multiple inheritance and DRC don't mix
very well, as Benjamin pointed out.

Benjamin also thinks DRC is, in fact, avoidable in XPCOM-- we'll use a
GCObject wrapper when passing DRC script objects to XPCOM code.

> BTW is there a document somewhere that summarizes the run-time costs
> of inheriting from GCObject etc, or should I just look at the source?

Look at the source, probably.

-j

mhammond

unread,

Sep 5, 2007, 7:35:49 PM9/5/07

to

On Sep 6, 12:10 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:

> It seems to me our choices are:
> 1) don't try to support multiple language bindings
> 2) support them in a common way, with common
> library code, in a form that might solve
> problems for other people too (and thus attract
> contributors)
> 3) support them as we do PyXPCOM and
> LiveConnect now, i.e. by ourselves, separately,
> and not especially well
>
> I consider #1 a strong option. #2 is pretty nutty. I brought it up
> for two reasons. First, #3 is a lot like #2, only without pooling any
> effort or designing for pluggability. Under #3, lesser-used languages
> (like Python) are unavoidably second-class citizens. They'll never
> reflect XPCOM with high fidelity unless someone spends an unlikely
> amount of effort on it. My impression is that PyXPCOM is already
> lagging and unlikely to catch up, much less keep pace with coming
> changes.

I think it would be a step backwards for the platform to drop external
languages. Many people who adopt the platform choose to use Python
for valid reasons - such projects include ActiveState's Komodo and the
OLPC project. In both cases, the ability to use Python was crucial to
the choice to use the platform - indeed, in ActiveState's case, they
felt so strongly about using Python that they funded the creation of
PyXPCOM. Still today, for any non-trivial code they still use Python,
even if that means writing a little "shim" in Javascript to enable
that.

Also, pyxpcom is not lagging from an xpcom POV. xpcom itself is not
undergoing many changes, so it is keeping up fine. What is *not*
happening is decent integration into the non XPCOM world. Alot of the
DOM, for example, is exposed in a way that is JS specific. Work to
integrate JS and other languages so, for example, 'expando' objects
can be accessed in different languages is lagging. Python does *not*
have access to parts of the platform that have been de-comtaminated,
or implemented using anything other than xpcom. So I would argue that
it is not pyxpcom that is lagging, but instead the platform itself is
trying to steam away from xpcom, and in the process also steaming away
from the integration opportunities xpcom has already demonstrated. I
understand xpcom has a number of issues, but I fear that some of the
proposed solutions risk throwing out the baby with the bath water.

As you are, I'm also slightly skeptical that "insisting" that
languages which want to play in our new playground be reimplemented on
a new virtual machine will be fruitful. Even if such an
implementation of Python was trivial to put together, I don't believe
it would keep those existing Python based projects happy. People
choose to use Python inside the mozilla architecture both for the
language, and for the library. In the same way that any non-trivial
Python program can't run the same on CPython and IronPython (ie,
the .NET port), it will not be a simple matter of swapping out the
language implementation and still expecting existing Python code to
run.

> I'm not convinced we want multi-language support. It's real expensive
> and not very useful.

I think that we should try and remember why we wanted to open up the
existing architecture to external languages in the first place, and
see if those reasons are still valid. I can't see why they are not,
but if we really do want to scale back the scope of this as a general
purpose "application platform", then I agree it would make our lives
much easier. But is this really all about making our lives easy? ;)

Cheers,

Mark

Jonas Sicking

unread,

Sep 5, 2007, 6:45:34 PM9/5/07

to

> I tend to think that the costs of RCObject for general "XPCOMGC" use are too
> high: in particular, I think we want to have pervasive cycles between DOM
> objects: parent<->children DOM nodes as well as node<->document references.

I agree. I don't see that much added benefit in using RCObjects rather
than just GCObjects. The only win is earlier destruction of objects, at
the cost of performance overhead and complexity.

It does worry me a little though that if we make all XPCOM objects
GCObjects, we won't destroy any XPCOM objects until the first GC. It
would be good to create a testbuild that doesn't destroy any XPCOM
objects and see how much memory such a build uses just to start up the
browser. During startup I don't think we currently do a GC, and we
probably don't want to for performance reasons.

/ Jonas

Jonas Sicking

unread,

Sep 5, 2007, 8:36:31 PM9/5/07

to

> BTW is there a document somewhere that summarizes the run-time costs
> of inheriting from GCObject etc, or should I just look at the source?

GCObject has an inlined empty constructor, and no destructor.
GCFinalizeableObject does have a virtual empty destructor. So the cost
is nothing. What does cost though is that allocation is now done through
MMgc functions that are probably slightly slower than simply malloc/free is.

Additionally these functions null out the area before returning it,
though I'm not entirely sure why it does this, but a guess is that this
way it's less likely that the conservative GC will find bogus edges.

/ Jonas

Blake Kaplan

unread,

Sep 6, 2007, 12:30:15 AM9/6/07

to

Jonas Sicking wrote:
> GCObject has an inlined empty constructor, and no destructor.
> GCFinalizeableObject does have a virtual empty destructor. So the cost
> is nothing. What does cost though is that allocation is now done through
> MMgc functions that are probably slightly slower than simply malloc/free
> is.

I believe this probably depends on how the threadsafety discussion works
itself out. As I understand things how, malloc is very expensive because
it's the system malloc and must be threadsafe. If MMgc doesn't have to
lock around each malloc call, then I think it's very possible that it'll
be as fast or faster than the system malloc.
--
Blake Kaplan

Robert O'Callahan

unread,

Sep 6, 2007, 5:11:55 AM9/6/07

to

On Sep 6, 4:30 pm, Blake Kaplan <mrb...@gmail.com> wrote:

> Jonas Sicking wrote:
> I believe this probably depends on how the threadsafety discussion works
> itself out. As I understand things how, malloc is very expensive because
> it's the system malloc and must be threadsafe.

The system malloc might suck, but there are plenty of malloc
implementations that use per-thread allocation pools.

Rob

Robert O'Callahan

unread,

Sep 6, 2007, 5:17:40 AM9/6/07

to

On Sep 6, 11:35 am, mhammond <mhamm...@skippinet.com.au> wrote:
> As you are, I'm also slightly skeptical that "insisting" that
> languages which want to play in our new playground be reimplemented on
> a new virtual machine will be fruitful.

Me too. But there's a less intrusive option, which is to ask their VM
to participate in a distributed mark and sweep algorithm using a
common interface. This can be done without constraining the
representation of the VM's objects.

I understand that's still a major requirement, especially since the
interface doesn't exist yet and when it does exist VMs will have to be
retrofitted with it in sensitive areas of their code. But I don't see
any possibility of collecting cycles across VM boundaries unless the
VMs participate in some kind of global tracing algorithm.

Rob

Benjamin Smedberg

unread,

Sep 6, 2007, 8:57:48 AM9/6/07

to

Jonas Sicking wrote:

> It does worry me a little though that if we make all XPCOM objects
> GCObjects, we won't destroy any XPCOM objects until the first GC. It
> would be good to create a testbuild that doesn't destroy any XPCOM
> objects and see how much memory such a build uses just to start up the
> browser. During startup I don't think we currently do a GC, and we
> probably don't want to for performance reasons.

Or at least instrument how many and what kind of bjects are *deleted* during
a startup run up to some arbitrary point (the beginning of the main event
loop, perhaps).

Of course if we're deleting lots of objects during startup, we should
probably examine why we were allocating those objects in the first place.

--BDS

mhammond

unread,

Sep 6, 2007, 8:17:25 PM9/6/07

to

That would be reasonable assuming the *only* problem we see with cross-
language xpcom is collecting cycles - but it seems to me that this
thread has identified a number of other issues too - for example,
there was discussion of dropping AddRef and Release and moving to
assuming MMgc or similar is the memory manager. Such issues go beyond
simply integrating with a cycle collection detector (and bring us
right back to the start of this thread :)

Cheers,

Mark

Robert O'Callahan

unread,

Sep 7, 2007, 5:32:27 AM9/7/07

to

On Sep 7, 12:17 pm, mhammond <mhamm...@skippinet.com.au> wrote:
> That would be reasonable assuming the *only* problem we see with cross-
> language xpcom is collecting cycles - but it seems to me that this
> thread has identified a number of other issues too - for example,
> there was discussion of dropping AddRef and Release and moving to
> assuming MMgc or similar is the memory manager. Such issues go beyond
> simply integrating with a cycle collection detector (and bring us
> right back to the start of this thread :)

The scheme I'm suggesting would eliminate the need for reference
counting, without forcing everyone to use a common memory manager.

The basic idea is to have all VMs participate in a global mark-and-
sweep collection, by plugging them into a common API so that one VM
can mark objects managed by another VM. Objects that might be
referenced by foreign VMs can only be collected during this global GC.

This approach can collect cycles but it's not just a cycle detector.

Rob