GC/DOM integration

Bill McCloskey

unread,

May 25, 2012, 8:37:57 PM5/25/12

to dev-tech-js-en...@lists.mozilla.org, Ben Turner, ms2...@gmail.com, peterv, Bobby Holley

I had some time today while waiting for all my try runs to turn orange, so I wrote up a wiki article about how we expect new GC features to affect the rest of the browser code. It has details about when write barriers are needed and how things will change for moving GC. Hopefully this will be useful to some of the DOM people.

https://developer.mozilla.org/en/SpiderMonkey/GCIntegration

-Bill

Boris Zbarsky

unread,

May 25, 2012, 10:06:50 PM5/25/12

to

On 5/25/12 8:37 PM, Bill McCloskey wrote:
> I had some time today while waiting for all my try runs to turn orange, so I wrote up a wiki article about how we expect new GC features to affect the rest of the browser code. It has details about when write barriers are needed and how things will change for moving GC. Hopefully this will be useful to some of the DOM people.
>
> https://developer.mozilla.org/en/SpiderMonkey/GCIntegration

Thanks for writing this up! Some questions:

"Avoid having C++ objects that point to JS objects". We do that anyway,
because having it be that way is huge pain, but sometimes it's not
really avoidable given the current set of APIs SpiderMonkey exposes.
Things like addEventListener come to mind, plus the proto object array
you already ran into, and the fact that web specs can generally define
APIs that involve the DOM implementation holding on to JS objects. I
assume there will still be be an API for doing that? There's no problem
with holding on to some sort of handle intead of an actual JSObject*
here, of course, to make moving the objects possible....

That's in terms of moving. In terms of write barriers, the proto array
doesn't need them (since it never loses refs), but event listeners
should probably switch to js::ObjectPtr?

-Boris

Bill McCloskey

unread,

May 26, 2012, 1:36:14 AM5/26/12

to Boris Zbarsky, dev-tech-js-en...@lists.mozilla.org

----- Original Message -----
> From: "Boris Zbarsky" <bzba...@mit.edu>
> To: dev-tech-js-en...@lists.mozilla.org
> Sent: Friday, May 25, 2012 7:06:50 PM
> Subject: Re: [JS-internals] GC/DOM integration
>
> "Avoid having C++ objects that point to JS objects". We do that
> anyway,
> because having it be that way is huge pain, but sometimes it's not
> really avoidable given the current set of APIs SpiderMonkey exposes.
> Things like addEventListener come to mind, plus the proto object
> array
> you already ran into, and the fact that web specs can generally
> define
> APIs that involve the DOM implementation holding on to JS objects. I
> assume there will still be be an API for doing that? There's no
> problem
> with holding on to some sort of handle intead of an actual JSObject*
> here, of course, to make moving the objects possible....

I don't understand what you mean by addEventListener. Could you give a link to the code?

The proto array I can see being a problem, since it's (presumably?) not a fixed size. Would it be possible to use a JS array rather than a C++ array? That would make tracing of it automatic, and it might make things go more smoothly for generational collection.

I realize that we're always going to have cases where C++ objects need to refer to JS objects. Existing mechanisms like the xpconnect object holding stuff (http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#254) seems to work okay, I guess.

In the new DOM bindings, is it planned that every C++ object will always have a corresponding JS object, or will the JS objects be created lazily? If it's the former, I was kinda hoping that the JS objects could form the "backbone" of the data structure, with C++ stuff kind of hanging limply at the sides. However, I don't know how realistic that really is.

> That's in terms of moving. In terms of write barriers, the proto
> array
> doesn't need them (since it never loses refs)

Agreed.

> but event listeners
> should probably switch to js::ObjectPtr?

I think that nsJSEventListener used the xpconnect object holding mechanism to root its pointers. That's safe, because that stuff will always be traced in the first slice, so there's no danger that the pointer could be overwritten before being traced. Maybe I'm looking at the wrong code, though.

I did see some worrisome stuff in the new XHR bindings:
http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1433
I'm guessing that these fields are not immutable? If so, would it be possible to move them to reserved slots of the JS object?

-Bill

Boris Zbarsky

unread,

May 26, 2012, 2:29:41 AM5/26/12

to

On 5/26/12 1:36 AM, Bill McCloskey wrote:
> I don't understand what you mean by addEventListener. Could you give a link to the code?

The basic idea is that you pass a JS function to a DOM object and then
later it calls it as needed.

Right now this is sometimes implemented using XPCWrappedJS and a C++
interface and in other cases using nsJSEventListener.

We'd sort of like to move to some simpler and human-understandable setup
for it, though. If/when we do we'll probably try to encapsulate
whatever JSAPI finagling is needed as much as possible, like XPConnect
does now.

> The proto array I can see being a problem, since it's (presumably?) not a fixed size.

It's a fixed size at the moment. Longer-term, we'll see. If we get to
the point where it can't be a fixed size (because it'll be too big and
hence use too much memory), it'll need to stop being an array
altogether, I suspect.

> Would it be possible to use a JS array rather than a C++ array?

It would, I thinks, but at some performance cost, obviously. Getting
entries out of this array quickly is on the critical path for creating a
new JS reflection of a DOM object, and it's not like we've been trying
hard to optimize JSAPI access to things like array elements. :(

> I realize that we're always going to have cases where C++ objects need to refer to JS objects. Existing mechanisms like the xpconnect object holding stuff (http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#254) seems to work okay, I guess.

Well... for some values of OK, yes. It has a tendency to be
overcomplicated and slow. ;) And to defeat the whole incrementality bit
slightly, since as you noted it gets traced first in any GC. This is in
fact going to be more and more of a problem going forward: if we're
doing lots of per-compartment GCs, the fact that each one is tracing all
xpconnect stuff globally (which seems to be the case) will make them a
lot more expensive than they would be otherwise.

> In the new DOM bindings, is it planned that every C++ object will always have a corresponding JS object, or will the JS objects be created lazily?

The latter, at the moment, to save memory and because that's what we
already have infrastructure for. We could consider the other, though,
now that JSObjects are smaller. It doesn't necessarily help, unless we
move all the various member storage into those JS objects as you
suggest. And the problem with _that_ is that we need to get to that
data from C++, often fast,, and getting something out of a JS object
from C++ is painfully slow.

This is a problem we should really solve, though; servo will need some
solution here too, especially if we want to avoid a cycle collector there.

> I think that nsJSEventListener used the xpconnect object holding mechanism to root its pointers.

I believe that's correct, right now.

> I did see some worrisome stuff in the new XHR bindings:
> http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1433

That's worker-specific code, fwiw.... Workers _do_ have JS objects
always in existence for C++ objects. The problem is that they then need
to trace all these objects, because the JS objects may not be pointing
to each other.

> I'm guessing that these fields are not immutable?

mUpload in this case is null initially, can get set once, then sticks
around with that value. The actual return value of GetJSObject() on it
is just getting stuff from the wrapper cache that mUpload inherits from.
Does that count as "immutable"?

Note that the GetJSObject() call here explicitly bypasses the normal
wrapper cache read barrier. Not sure whether that matters, since it's
in the middle of tracing the object anyway.

> If so, would it be possible to move them to reserved slots of the JS object?

That would be somewhat hard: the jsclass for these objects is basically
code-generated based on the IDL, but the set of things that need tracing
depends on the implementation. Obviously something can be hacked
together, but it won't exactly be pretty...

-Boris

Andrew McCreight

unread,

May 26, 2012, 8:22:22 AM5/26/12

to dev-tech-js-en...@lists.mozilla.org

----- Original Message -----
> Well... for some values of OK, yes. It has a tendency to be
> overcomplicated and slow. ;) And to defeat the whole incrementality
> bit slightly, since as you noted it gets traced first in any GC. This is
> in fact going to be more and more of a problem going forward: if we're
> doing lots of per-compartment GCs, the fact that each one is tracing
> all xpconnect stuff globally (which seems to be the case) will make them
> a lot more expensive than they would be otherwise.

I don't think marking XPConnect roots is a huge bottleneck right now, but I could be wrong.

If it does become a problem, I don't think it would be that hard to make them incremental, as the functions to add and remove objects from the XPConnect root sets would serve as natural points to place read and write barriers. If an object gets removed from a root set during an incremental GC, mark it. You'd also need to add some support for incrementally scanning the root set linked lists, and dealing with the currently scanned object being removed, but that would be done in XPC anyways, so it doesn't seem like an insurmountable difficulty.

These data structures could also probably be stored on a per-compartment basis, if it ends up that we're wasting a lot of time adding roots for compartments we're not collecting.

Andrew

Bill McCloskey

unread,

May 27, 2012, 5:49:30 PM5/27/12

to Boris Zbarsky, dev-tech-js-en...@lists.mozilla.org

----- Original Message -----
> From: "Boris Zbarsky" <bzba...@mit.edu>
> To: dev-tech-js-en...@lists.mozilla.org
> Sent: Friday, May 25, 2012 11:29:41 PM
> Subject: Re: [JS-internals] GC/DOM integration
>

> Well... for some values of OK, yes. It has a tendency to be
> overcomplicated and slow. ;) And to defeat the whole incrementality
> bit
> slightly, since as you noted it gets traced first in any GC. This is
> in
> fact going to be more and more of a problem going forward: if we're
> doing lots of per-compartment GCs, the fact that each one is tracing
> all
> xpconnect stuff globally (which seems to be the case) will make them
> a
> lot more expensive than they would be otherwise.

Andrew's right, root marking is usually quite fast. It's normally a few milliseconds per GC. The one area where I do worry is generational GC. Doing 5ms of root marking for each nursery collection is pretty awful. However, I'm hopeful that we can overcome that problem. For example, I'm guessing that most JSObjects held by C++ objects will already be tenured when they're registered with XPConnect.

> > In the new DOM bindings, is it planned that every C++ object will
> > always have a corresponding JS object, or will the JS objects be
> > created lazily?
>
> The latter, at the moment, to save memory and because that's what we
> already have infrastructure for. We could consider the other,
> though,
> now that JSObjects are smaller. It doesn't necessarily help, unless
> we
> move all the various member storage into those JS objects as you
> suggest. And the problem with _that_ is that we need to get to that
> data from C++, often fast,, and getting something out of a JS object
> from C++ is painfully slow.

I understand the concern about memory. Do we have any idea what percentage of DOM nodes end up with JS counterparts on typical web sites? Or does it vary too much?

I'm surprised that speed is a problem. For accessing reserved slots, we have js::GetReservedSlot and js::SetReservedSlot in jsfriendapi.h, which are pretty fast. If need be, I think we could even make them a little faster.

> > I did see some worrisome stuff in the new XHR bindings:
> > http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1433
>
> That's worker-specific code, fwiw.... Workers _do_ have JS objects
> always in existence for C++ objects. The problem is that they then
> need
> to trace all these objects, because the JS objects may not be
> pointing
> to each other.

OK, I guess there's no problem right now. Incremental GC is disabled for workers.

> > If so, would it be possible to move them to reserved slots of the
> > JS object?
>
> That would be somewhat hard: the jsclass for these objects is
> basically
> code-generated based on the IDL, but the set of things that need
> tracing
> depends on the implementation. Obviously something can be hacked
> together, but it won't exactly be pretty...

I don't really understand the problem. If this stuff was stored in reserved slots, then you wouldn't need any tracing, right?

-Bill

Boris Zbarsky

unread,

May 27, 2012, 11:52:51 PM5/27/12

to

On 5/26/12 8:22 AM, Andrew McCreight wrote:
> I don't think marking XPConnect roots is a huge bottleneck right now, but I could be wrong.

It was 50+% of the GC time in the admittely odd circumstances of bug
754201 (or more than 2ms per GC call on my hardware). That's with two
tabs open. The basic problem, though, is that XPConnect root marking
scales with the total number of XPConnect objects around, afaict, so if
I open more tabs it gets slower and slower. And if it's already taking
2ms, then it can't get that much slower before we're into 16ms+ "makes
frames skip" territory, unless I'm missing something.

> If it does become a problem, I don't think it would be that hard to make them incremental, as the functions to add and remove objects from the XPConnect root sets would serve as natural points to place read and write barriers. If an object gets removed from a root set during an incremental GC, mark it. You'd also need to add some support for incrementally scanning the root set linked lists, and dealing with the currently scanned object being removed, but that would be done in XPC anyways, so it doesn't seem like an insurmountable difficulty.

OK, good.

> These data structures could also probably be stored on a per-compartment basis

I think eventually we'll want this.

-Boris

Boris Zbarsky

unread,

May 28, 2012, 12:09:38 AM5/28/12

to

On 5/27/12 5:49 PM, Bill McCloskey wrote:
> Andrew's right, root marking is usually quite fast. It's normally a few milliseconds per GC.

I'd love to see numbers for this. Is this one of the GC stats we log to
the console? If not, could we log it, please?

> I understand the concern about memory. Do we have any idea what percentage of DOM nodes end up with JS counterparts on typical web sites? Or does it vary too much?

It probably varies a huge amount. Some sites it'll be close to 0 (e.g.
large static tables, huge mail archives, etc, where there is no script
involved). On the other hand, some sites build the whole site from JS
so it's 100% at least transiently (right now of course we allow those JS
reflections to get GCed unless the site has set expandos on them, so
even in the 100% cases we might be winning some memory).

We could try to do some measurements here, if desired.

> I'm surprised that speed is a problem. For accessing reserved slots, we have js::GetReservedSlot and js::SetReservedSlot in jsfriendapi.h, which are pretty fast

They're a _lot_ slower than an actual inlined C++ member get, in my
measurements. If nothing else, they have to do a pointer-chase to the
jsclass just to find out what sort of slot they're dealing with.

I'd be interested in the numbers for an implementation of effectively
querySelectorAll("*") in JSAPI/reserved-slot form. It needs to:

1) Do a loop with the equivalent of nsINode::GetNextNode
2) For each node do the equivalent of SelectorMatches.

If the time is anywhere close to the ballpark of querySelectorAll("*")
on any sort of reasonably-sized document, I'll be pretty surprised...

But also, plain reserved slots wouldn't get us far enough: some of our
members here are not just single pointers but things like hashtables,
linked lists, arrays, etc. So we'd need to basically code up jsapi
implementations of these various data structures such that the gc knew
about all the references involved. So we need to worry about the
performance of all of those.

>> That would be somewhat hard: the jsclass for these objects is
>> basically
>> code-generated based on the IDL, but the set of things that need
>> tracing
>> depends on the implementation. Obviously something can be hacked
>> together, but it won't exactly be pretty...
>
> I don't really understand the problem. If this stuff was stored in reserved slots, then you wouldn't need any tracing, right?

The first problem is that the number of reserved slots required depends
on information that the code generator does not have (implementation
details of the object, whereas the code generator has the IDL), and the
code generator is what's generating the JSClass declaration.

We might be able to expose the needed number of reserved slots as a
static method on the underlying object class. Except in some cases we
have different C++ implementations, with different sets of members, but
sharing a single API, for the same JSClass.

-Boris

Boris Zbarsky

unread,

May 28, 2012, 12:16:22 AM5/28/12

to

On 5/28/12 12:09 AM, Boris Zbarsky wrote:
> But also, plain reserved slots wouldn't get us far enough: some of our
> members here are not just single pointers but things like hashtables,
> linked lists, arrays, etc. So we'd need to basically code up jsapi
> implementations of these various data structures such that the gc knew
> about all the references involved. So we need to worry about the
> performance of all of those.

One more thing. We're (slowly) moving toward doing some things in
parallel in Gecko. So if we do move in this direction (which I do agree
is awfully tempting if we can sort out the various issues with reserved
slots as they stand), then we'll need js::GetReservedSlot to support
being called from arbitrary threads as long as the main thread is
blocked such that no GC can happen. It might already support that, of
course.

-Boris

Bill McCloskey

unread,

May 28, 2012, 1:17:22 AM5/28/12

to Boris Zbarsky, dev-tech-js-en...@lists.mozilla.org

----- Original Message -----
> From: "Boris Zbarsky" <bzba...@mit.edu>
> To: dev-tech-js-en...@lists.mozilla.org
> Sent: Sunday, May 27, 2012 9:09:38 PM
> Subject: Re: [JS-internals] GC/DOM integration
>

> On 5/27/12 5:49 PM, Bill McCloskey wrote:
> > Andrew's right, root marking is usually quite fast. It's normally a
> > few milliseconds per GC.
>
> I'd love to see numbers for this. Is this one of the GC stats we log
> to
> the console? If not, could we log it, please?

We log this to the error console under "Mark Roots:". On my 6-year-old iMac I'm seeing 8ms for 11 tabs, which is a little higher than I expected, but it is an older computer. However, that includes all root marking, which encompasses a lot of things. I'm not sure what percentage of that time is for XPConnect roots. That would be interesting to check.

> > I understand the concern about memory. Do we have any idea what
> > percentage of DOM nodes end up with JS counterparts on typical web
> > sites? Or does it vary too much?
>
> It probably varies a huge amount. Some sites it'll be close to 0
> (e.g.
> large static tables, huge mail archives, etc, where there is no
> script
> involved). On the other hand, some sites build the whole site from
> JS
> so it's 100% at least transiently (right now of course we allow those
> JS
> reflections to get GCed unless the site has set expandos on them, so
> even in the 100% cases we might be winning some memory).
>
> We could try to do some measurements here, if desired.

I guess it might be useful to know what a common worst-case scenario would be. For example, how much more memory would we use when loading a big static table? I don't even have a ballpark estimate for how many DOM objects a typical page uses per table cell. Is it closer to 1 or 10 or 50?

> > I'm surprised that speed is a problem. For accessing reserved
> > slots, we have js::GetReservedSlot and js::SetReservedSlot in
> > jsfriendapi.h, which are pretty fast
>
> They're a _lot_ slower than an actual inlined C++ member get, in my
> measurements. If nothing else, they have to do a pointer-chase to
> the
> jsclass just to find out what sort of slot they're dealing with.

The code for getting a slot out is here:
http://mxr.mozilla.org/mozilla-central/source/js/src/jsfriendapi.h#281
In the common case (loading from fixed slots), it's 3 loads from (likely) 2 distinct cache lines. If we knew that we were accessing a fixed slot, which seems quite feasible, then it could be reduced to a single 8-byte load. I might be wrong, but I think that we can guarantee that you're accessing a fixed slot if the slot number is < 16, since I think we always factor in the number of desired reserved slots when deciding how many fixed slots to allocate with. So it seems like it would be very easy to reduce this to a single load. For stores we would still have to worry about a write barrier, but we're trying to enforce those anyway.

> But also, plain reserved slots wouldn't get us far enough: some of
> our
> members here are not just single pointers but things like hashtables,
> linked lists, arrays, etc. So we'd need to basically code up jsapi
> implementations of these various data structures such that the gc
> knew
> about all the references involved. So we need to worry about the
> performance of all of those.

Well, I think this is something we could do gradually. For example, I looked through about half the NS_IMPL_CYCLE_COLLECTION_TRACE_BEGIN implementations. They all seem to trace a single field of a C++ object. So it seems like a lot of stuff is of the simple variety.

> The first problem is that the number of reserved slots required
> depends
> on information that the code generator does not have (implementation
> details of the object, whereas the code generator has the IDL), and
> the
> code generator is what's generating the JSClass declaration.
>
> We might be able to expose the needed number of reserved slots as a
> static method on the underlying object class. Except in some cases
> we
> have different C++ implementations, with different sets of members,
> but
> sharing a single API, for the same JSClass.

I don't understand very well how the new DOM bindings are generated. It seems to me like code generation is a *perfect* way to address these issues: it gives us a simple way to do experiments by changing the implementation for all the new DOM objects at once, and it allows us to specialize the implementation to take advantage of optimizations where they apply (like fast reserved slot access).

However, it sounds like a mixture of code generation and normal human-coded C++ is used. What's the dividing line?

-Bill

Boris Zbarsky

unread,

May 28, 2012, 2:14:09 AM5/28/12

to

On 5/28/12 1:17 AM, Bill McCloskey wrote:
> We log this to the error console under "Mark Roots:". On my 6-year-old iMac I'm seeing 8ms for 11 tabs, which is a little higher than I expected, but it is an older computer. However, that includes all root marking, which encompasses a lot of things. I'm not sure what percentage of that time is for XPConnect roots. That would be interesting to check.

Indeed. My "Mark Roots" times, on an MBP that's less than 2 years old
are in the 20ms range, but I have probably around 20-30 active tabs and
another 50-60 not-yet-loaded ones. But again, it's not clear what
fraction of that is the XPConnect roots.

> I guess it might be useful to know what a common worst-case scenario would be. For example, how much more memory would we use when loading a big static table?

That depends on how big the JSObjects end up being, in practice.
Basically at least 4 words plus 8 bytes (assuming one reserved slot),
plus whatever the amortized cost of shapes, etc, etc is, right?

> I don't even have a ballpark estimate for how many DOM objects a typical page uses per table cell. Is it closer to 1 or 10 or 50?

Actual DOM nodes, 1 per cell, plus one per row, plus 2 for the table. I
_think_ most of the other things involved lazily allocate their entire
DOM reflection, so wouldn't have DOM objects at all by default.

So figure 1 DOM object per cell.

> The code for getting a slot out is here:
> http://mxr.mozilla.org/mozilla-central/source/js/src/jsfriendapi.h#281

Yes, I'm well aware. ;)

> In the common case (loading from fixed slots), it's 3 loads from (likely) 2 distinct cache lines.

Yep.

> If we knew that we were accessing a fixed slot, which seems quite feasible, then it could be reduced to a single 8-byte load.

That would help significantly, yes. Would still be a noticeable perf
hit in some cases for 32-bit builds due to the increased cache pressure
(e.g. see https://twitter.com/bz_moz/status/73784940755566592 for a case
where that effect is visible with 64-bit vs 32-bit Gecko builds), but
would make this a lot more palatable.

> For stores we would still have to worry about a write barrier, but we're trying to enforce those anyway.

And stores are a lot less common anyway.

> Well, I think this is something we could do gradually. For example, I looked through about half the NS_IMPL_CYCLE_COLLECTION_TRACE_BEGIN implementations. They all seem to trace a single field of a C++ object. So it seems like a lot of stuff is of the simple variety.

Indeed. Is there a benefit to doing this partway? Seems like if we
need extra complexity to support the unconverted cases then we need it
anyway, and the other approach trades off cycle-collection complexity in
the C++ for complexity in implementing members...

>> We might be able to expose the needed number of reserved slots as a
>> static method on the underlying object class. Except in some cases
>> we
>> have different C++ implementations, with different sets of members,
>> but
>> sharing a single API, for the same JSClass.
>
> I don't understand very well how the new DOM bindings are generated.

The input is an IDL file.

The output is the following things:

1) A JSClass.
2) A bunch of JSNatives that know how to get the C++ object from
|this|, convert arguments to C++ things, and call a function on the
C++ objects.

The rest of the work is done in the C++ object. The functions called in
#2 above might be virtual functions, with different implementations in
different objects. Multiple different C++ classes can share a single
JSClass as described above, as long as they have a common superclass.

In fact, there are various cases in the DOM specs in which two objects
are required to have the same prototype (and hence the same JSNatives as
described above) and have totally different behavior for those
methods/properties. Right now we implement this via polymorphism in
C++. It _could_ be done with a simple implementation class and branches
on something, in theory, but in practice trying to use a single impl
class for both nsComputedDOMStyle and DOMCSSDeclarationImpl, say, would
be ... difficult.

> It seems to me like code generation is a *perfect* way to address these issues: it gives us a simple way to do experiments by changing the implementation for all the new DOM objects at once

We don't code-generate the implementation. We code-generate glue code
that calls into already-existing implementations. That's why they're
called "bindings". ;)

> However, it sounds like a mixture of code generation and normal human-coded C++ is used. What's the dividing line?

The code generation is used to implement WebIDL, basically. It handles
conversion from JSAPI stuff to WebIDL-like types and invocation of the
actual implementation methods, using the IDL files as input and
processing them according to the WebIDL rules.

Since the behavior of the implementation methods themselves is typically
described in specification prose, not in a machine-readable format, it's
rather impossible to codegen those....

-Boris

Boris Zbarsky

unread,

May 28, 2012, 2:27:26 AM5/28/12

to

On 5/28/12 1:17 AM, Bill McCloskey wrote:

> For example, how much more memory would we use when loading a big static table?

I just realized I never answered this question.

A "typical" big static table (insofar as such things exist) would
probably have on the order of 1e4-1e5 cells, so figure on a 32-bit
system the added overhead would be on the order of 0.25-2.5 MB. On a
64-bit system, 0.4-4MB. If I'm counting right. Not that bad, I guess,
nowadays.

That's just for having objects there. Obviously the space taken up by
member fields of existing DOM objects would also increase on 32-bit
builds (but not on 64-bit ones, for pointer members). For that same
table, figure at least another 40 bytes extra per DOM node in a 32-bit
build, so another 0.4-4MB.

-Boris

P.S. Just to put all these numbers in perspective, my about:memory
right now lists a total of 135MB for all DOM + layout objects, and
looking at the details, "DOM" is about half of that. It also lists
145MB of heap-unclassified, which could quite likely include some DOM
stuff. It also lists about 400MB of JS stuff of various sorts (script
data, GC heaps, etc). This is a 64-bit build. As I said, I have about
20-30 active tabs right now, and a bunch of about:blank, plus the
various browser UI bits, so figure a 1MB per "real" page increase would
be about 30MB or so for me, or somewhere around 3%.