Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Weak references

349 views
Skip to first unread message

Jason Orendorff

unread,
Nov 1, 2013, 11:26:19 AM11/1/13
to Brendan Eich, Allen Wirfs-Brock, David Herman, JS Internals list
This proposal is before TC39 for inclusion in the next ECMAScript spec
edition following ES6:
http://wiki.ecmascript.org/doku.php?id=strawman:weak_references

Mozilla GC hackers are opposed, for reasons they can articulate; I'm
opposed because I can't stick the nondeterminism and because the total
cost is out of all proportion with the benefit.

However. There are use cases. Please see:
https://mail.mozilla.org/pipermail/es-discuss/2013-February/028572.html
and subsequent messages.

Also, though I think this use case is weaker:
https://mail.mozilla.org/pipermail/es-discuss/2013-March/029423.html

If you're a GC hacker and you want to stop this train, your best bet is
to read those threads and speak up now.

-j

Bobby Holley

unread,
Nov 1, 2013, 12:42:58 PM11/1/13
to Jason Orendorff, Allen Wirfs-Brock, Brendan Eich, David Herman, JS Internals list
>From the proposal:

Note that makeWeakRef is not safe for general access since it grants access
> to the non-determinism inherent in observing garbage collection.


What does that mean? That they don't expect this to be exposed to the web?
In that case, why bother speccing it, and why would we need to be concerned
with implementing it?

FWIW, I strongly believe that we should refuse to implement specs that make
GC effects any more observable than they already are on the web.

bholley
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>

David Bruant

unread,
Nov 1, 2013, 1:07:32 PM11/1/13
to Bobby Holley, Jason Orendorff, Allen Wirfs-Brock, Brendan Eich, David Herman, JS Internals list
Hi,

Steve Fink

unread,
Nov 1, 2013, 1:48:14 PM11/1/13
to dev-tech-js-en...@lists.mozilla.org
On 11/01/2013 09:42 AM, Bobby Holley wrote:
> From the proposal:
>
> Note that makeWeakRef is not safe for general access since it grants access
>> to the non-determinism inherent in observing garbage collection.
>
> What does that mean? That they don't expect this to be exposed to the web?
> In that case, why bother speccing it, and why would we need to be concerned
> with implementing it?

Yeah, that's some very critical weasel-wording in the strawman. "Let's
add this to the language, but not expose it to things it shouldn't be
exposed to." Huh?

> FWIW, I strongly believe that we should refuse to implement specs that make
> GC effects any more observable than they already are on the web.

Why? I agree, but only for some general reasons and some
dimly-remembered reasons that I've encountered in the past where the
implications turned out to be far worse than I would have initially
thought. I'd really like to have a crisp explanation of exactly *why*
exposing GC behavior is bad, because otherwise I feel like people will
end up deciding they can live with the minor drawbacks they can think
of, and proceed forward with something truly awful. (Like, for example,
exposing the current strawman to general web content.)

And there really are compelling use cases for having this sort of stuff.
As Kevin Gadd said (I think it was him), people are reimplementing
garbage collection over typed arrays, in JS, just to gain this level of
control. We need to know why, in order to provide something reasonable
for whatever those specific use cases happen to be.

Andrew McCreight

unread,
Nov 1, 2013, 2:34:34 PM11/1/13
to dev-tech-js-en...@lists.mozilla.org

I have a few disjoint thoughts here.

1. Here's a link to a previous ES-discuss thread about somebody trying to create an exploit by combining observable weak references with conservative GC.
https://mail.mozilla.org/pipermail/es-discuss/2013-March/029489.html

I think the basic idea is that you stick an integer on the stack, then observe whether an object dies, and that will tell you if the integer is the address of the object.

Aside from that, weak references complicate the implementation in ways that may not be immediately obvious. For weak maps, we have special machinery to deal with:
- cross-compartment wrappers
- C++ reflectors
- cross-compartment wrappers of C++ reflectors
Mistakes in any of these could lead to critical security vulnerabilities.

2. With regards to the cycle collector, Bill and I discussed this, and I think weak references won't affect the cycle collector. If you consider two heap graphs, one with and one without weak references, the set of live objects in both graphs must be identical, thus the CC shouldn't have to care. It is only the JS engine, that has to decide when the null out the weak references, that needs to figure that out.

3. Finally, though I'm generally opposed to weak references, as they are complex to implement, I do understand that there's a need for them. When working on the leaks in bug 893012, to fix some of the individual leaks, like bug 894147, we had to use weak references. You have iframes that want to listen to events, so the event listener has to hold onto the iframe, but you don't want to keep the iframe around if the only thing that is keeping the iframe alive is the event listener. So it feels a little crummy to say "hey B2G shows that JS is good enough for everything!" if you have to choose between leaking or using web-legal JS.

Andrew

David Bruant

unread,
Nov 1, 2013, 2:52:37 PM11/1/13
to Jason Orendorff, Brendan Eich, Allen Wirfs-Brock, David Herman, JS Internals list
[sorry for previous early send error]

Hi,

Le 01/11/2013 16:26, Jason Orendorff a écrit :
> This proposal is before TC39 for inclusion in the next ECMAScript spec
> edition following ES6:
> http://wiki.ecmascript.org/doku.php?id=strawman:weak_references
>
> Mozilla GC hackers are opposed, for reasons they can articulate; I'm
> opposed because I can't stick the nondeterminism and because the total
> cost is out of all proportion with the benefit.
>
> However. There are use cases. Please see:
> https://mail.mozilla.org/pipermail/es-discuss/2013-February/028572.html
> and subsequent messages.
>
> Also, though I think this use case is weaker:
> https://mail.mozilla.org/pipermail/es-discuss/2013-March/029423.html
Both of these use cases are largely equivalent. I've been feriously
against weak refs based on this use case because I believe (admittedly
without concrete proof) that good API design can achieve the same
benefits without having to expose GC non-determinism.
I acknowledge that in large software cases, "good APIs" is hard to
retrofit and make good use of in multi-layered software. Harder, it
looks, than retroffiting weakrefs indeed, but doable anyway.

In any case, I've stopped being against weakrefs after a message by Mark
Miller and the use case he described
https://mail.mozilla.org/pipermail/es-discuss/2013-April/029598.html
At first, I didn't understand any of it ("distributed acyclic garbage
collector"? wat?!?), so I felt I needed to understand what it was about
before being against. After discussing with Mark Miller, my
understanding is as follow.
The problem being solved by what Mark Miller describes is cross-vat
(unit of computation with its own unshared memory, so a process or a
machine in practice) object references. In object oriented programing
and object capabilities (extreme form of object orientation, most often
used as a security framework), objects and what they represent usually
only exists in one process. We usually then resort to sending data
across units that don't share memory when such units need to
collaborate, but this breaks the notion of object reference and the
benefits that come with it (like solving the grant matcher problem [1]).
This is also true of RPC systems (what I know of them at least) where
it's a machine speaking to another machine without really a smaller
granualrity (object granularity is lost).
To keep the object granularity across machines, in E, they've created a
protocol (CapTP) that, in essence, streches object references across
machines. Cross-machine wrappers if you will.
When designing this protocol, at some point comes the problem of GC.
Distributed GC... Sometimes, a machine has a reference to a remote
object and at some point, all refs to this objects in this machine are
dropped. The remote machine needs to know that this machine doesn't hold
a reference anymore. My understanding is that there are roughly 2 ways
to solve this problem. Either the protocol is built-in in the language
and that can all be taken care of under the hood or one has to implement
it and that requires a form of weak ref (so you know when to tell the
remote machine what it needs to know).

I tried to be as brief as possible in explaining the use case, feel free
to ask me more if I was too brief or unclear.
I believe this use case is valuable (there is an interesting discussion
to have here; Mark Miller message was a bit dry as is. Preferably, let's
have it on es-discuss, so other interested parties can participate...
and it'll re-happen there anyway). Adding a built-in cross-machine
protocol in JavaScript shouldn't happen, because first, it's too much
work to get right, then, there might be other different protocols with
different characteristics that makes sense in different contexts, so no
need to force one in the language. It may be better to let people
implement their own and experiment different protocols.

That requires weakrefs in some way, I'm afraid. I'm interested if other
solutions can be proposed to achieve object-granularity communication
across machines (I'm sure Mark Miller will be interested as well).

One use case of cross-vat communication is the remote debugger protocol
implemented in Firefox/OS. I haven't taken the time to run over related
bugs and follow its development (and probably won't for now, because
it'd be a lot of work), but it'd be interesting to think of how a
cross-vat protocol would have made its implementation easier/safer/less
error-prone/less leak-prone.
Anyone knows how leaky the remote debugger protocol is now?

David

[1] http://www.erights.org/elib/equality/grant-matcher/index.html

David Bruant

unread,
Nov 1, 2013, 3:10:29 PM11/1/13
to Andrew McCreight, dev-tech-js-en...@lists.mozilla.org
Le 01/11/2013 19:34, Andrew McCreight a écrit :
> 3. Finally, though I'm generally opposed to weak references, as they are complex to implement, I do understand that there's a need for them. When working on the leaks in bug 893012, to fix some of the individual leaks, like bug 894147, we had to use weak references.
Can you provide more details on why you *had* to use weak refs?

> You have iframes that want to listen to events, so the event listener has to hold onto the iframe
Why so?

function f(e){}

(function(){
var iframe = document.getElementByTagName('iframe')[0];
iframe.addEventListener('someEvent', f);
})()

In this case, f doesn't hold a reference to anything besides itself, not
to the iframe at least. In the case of DOM events, it can look at its
argument to find e.target which will dynamically (!), only for the time
of the listener call, refer to the iframe.

In any case, when an event doesn't need to be listened to anymore, there
is a time in your application where you can remove the listener. The
leak is not removing the listener, not the listener itself.

> So it feels a little crummy to say "hey B2G shows that JS is good enough for everything!" if you have to choose between leaking or using web-legal JS.
+1

David

Boris Zbarsky

unread,
Nov 1, 2013, 3:27:00 PM11/1/13
to
On 11/1/13 3:10 PM, David Bruant wrote:
> Why so?
>
> function f(e){}
>
> (function(){
> var iframe = document.getElementByTagName('iframe')[0];
> iframe.addEventListener('someEvent', f);
> })()

This is very fragile. It assumes that nowhere up the scopes that f is
closing over is there anyone holding a reference to that iframe object.

Simple to guarantee in simple cases, but ... fragile.

-Boris

Steve Fink

unread,
Nov 1, 2013, 4:06:17 PM11/1/13
to David Bruant, Allen Wirfs-Brock, Brendan Eich, Jim Blandy, JS Internals list, Jason Orendorff, David Herman
On 11/01/2013 11:52 AM, David Bruant wrote:
> That requires weakrefs in some way, I'm afraid. I'm interested if
> other solutions can be proposed to achieve object-granularity
> communication across machines (I'm sure Mark Miller will be interested
> as well).
>

Good use case. What if you had a serializable but still not iterable
WeakMap? Or rather, a synchronizable WeakMap. The cross-vat (I'm
mentally translating this to "cross-runtime") references would be
registered in this map. The timing of synchronization updates would be a
little funky.

Honestly, I haven't thought this through at all.

> One use case of cross-vat communication is the remote debugger
> protocol implemented in Firefox/OS. I haven't taken the time to run
> over related bugs and follow its development (and probably won't for
> now, because it'd be a lot of work), but it'd be interesting to think
> of how a cross-vat protocol would have made its implementation
> easier/safer/less error-prone/less leak-prone.
> Anyone knows how leaky the remote debugger protocol is now?

CC'ing jimb

Andrew McCreight

unread,
Nov 1, 2013, 4:13:27 PM11/1/13
to dev-tech-js-en...@lists.mozilla.org
----- Original Message -----
> Why so?
>
> function f(e){}
>
> (function(){
> var iframe = document.getElementByTagName('iframe')[0];
> iframe.addEventListener('someEvent', f);
> })()
>
> In this case, f doesn't hold a reference to anything besides itself, not
> to the iframe at least. In the case of DOM events, it can look at its
> argument to find e.target which will dynamically (!), only for the time
> of the listener call, refer to the iframe.
>
> In any case, when an event doesn't need to be listened to anymore, there
> is a time in your application where you can remove the listener. The
> leak is not removing the listener, not the listener itself.

Well, you can look at the patch. I guess I described it poorly. You have some thing (I said 'iframe' above, but I guess it isn't really one) x that wants to listen to an event, so you create an event listener for it, that holds alive x. The situation is more like this:

function f(e){tellSomebodyAboutTheEvent(x);}

(function(){
var iframe = document.getElementByTagName('iframe')[0];
iframe.addEventListener('someEvent', f);
})()

f is now keeping x alive, and if f is the only thing keeping x alive, then that's basically a leak as far as we are concerned as a JS programmer.

(There's also the problem that f stays alive forever if the event never fires, but that's a smaller leak...)

In case being concrete helps, in our case, x was the parent-process representation of some app, and the event was some kind of visibility change event, so you can tell the app if the screen is turned off or on. Obviously, you want to get rid of x rather than keep it around to ensure that we can tell this zombie app thing that the screen is going to turn off.

Andrew

Terrence Cole

unread,
Nov 1, 2013, 4:26:14 PM11/1/13
to Jason Orendorff, Allen Wirfs-Brock, Brendan Eich, David Herman, JS Internals list
On 11/01/2013 08:26 AM, Jason Orendorff wrote:
> This proposal is before TC39 for inclusion in the next ECMAScript spec
> edition following ES6:
> http://wiki.ecmascript.org/doku.php?id=strawman:weak_references
>
> Mozilla GC hackers are opposed, for reasons they can articulate; I'm
> opposed because I can't stick the nondeterminism and because the total
> cost is out of all proportion with the benefit.
>
> However. There are use cases. Please see:
> https://mail.mozilla.org/pipermail/es-discuss/2013-February/028572.html
> and subsequent messages.
>
> Also, though I think this use case is weaker:
> https://mail.mozilla.org/pipermail/es-discuss/2013-March/029423.html

I would agree that pretty much all the use cases I've seen have value:
GC is a very handy form of resource management and it is nice not to
have to re-invent it at the program level.

However, I think this proposal is a much more significant change than
most people realize. Right now JS GC is unspecified magic that happens
behind the scenes to prevent OOM. This proposal would change the JS GC
into a generic resource management framework. I'm actually fine with
this, but please understand that it will impose some significant
constraints on how we can evolve the GC.

First, performance: this particular proposal would force us to visit
objects that are being swept. Our nursery design is currently such that
this is not even possible. This is already problematic for weak maps. We
are able to get around this using a few hacks, but it depends on being
able to do full mark and sweep GC's at some point. If we moved to a more
distributed architecture like G1, this would be a severe limitation. I'm
not even sure how concurrent GC will handle weakmaps efficiently.

The performance of the web is vital. With gaming and video as
first-class citizens, we have to consider both the throughput and
latency of the GC as priorities. Any added complexity in the GC will
either directly make the web slower or will disproportionately divert
our energy from making other things faster.

Secondly, correctness. The GC is by definition a cross-cutting concern;
you cannot build anything in SpiderMonkey without considering GC. This
makes weak primitives a cross-cutting concern of a cross-cutting
concern. Our experience with combining generational GC and WeakMaps
reflects this.

When implementing generational GC, our first few instances attempted to
deal with weak maps in an efficient way. Unfortunately, it turns out
this is actually impossible in the presence of nested weakmaps: we
cannot reach a fixed point while only visiting the nursery heap. Sadly,
even after we realized this, we still had to spent a tremendous amount
of effort merely proving to ourself that our design is safe in the
presence of weak maps.

The GC is a bad place to add complexity: any error in the GC leads to an
impossible-to-debug top-crasher with a sec-crit rating. We can certainly
deal with this -- we have so far -- but it takes a disproportionate
amount of work.

That said, we absolutely do want to create a resource management
framework that can enable the sort of neat implementations that people
are envisioning for this proposal. I just believe there /must/ be a
better way to bring that to the web than piggy-backing on the GC.

Cheers,
Terrence

> If you're a GC hacker and you want to stop this train, your best bet is
> to read those threads and speak up now.
>
> -j
>

Jason Orendorff

unread,
Nov 1, 2013, 6:17:04 PM11/1/13
to David Bruant, Brendan Eich, Allen Wirfs-Brock, David Herman, JS Internals list
On 11/1/13 1:52 PM, David Bruant wrote:
> In any case, I've stopped being against weakrefs after a message by
> Mark Miller[...]
I am now going to try to convince you that you shouldn't have been
convinced by this use case. :)

> To keep the object granularity across machines, in E, they've created
> a protocol (CapTP) that, in essence, streches object references across
> machines. Cross-machine wrappers if you will.
> When designing this protocol, at some point comes the problem of GC.
> Distributed GC...

First, read Terrence's first response in this thread. This is exactly
the kind of use case he is talking about, where GC is used as a general
resource-management workhorse.

I'm not convinced acyclic distributed GC is a good thing to support.

The idea is that by the magic of proxies, remote objects can be made to
look exactly like local objects. But then, you can't freely pass a local
object as an argument to a remote method. That would create a back edge.
The system must either block that with a runtime error or risk a cycle—a
silent memory leak on both sides of the boundary. So the boundary is not
transparent after all.

The transparency doesn’t extend to performance or ease of use anyway.

Sidebar: I think general distributed GC (with cycles) is considered
Hard. It's been studied. Mark would know better than me. But I think it
requires, at a minimum, a lot more cooperation from the GC, including
abstraction-busting APIs that we cannot safely expose (like those
Gecko's cycle collector uses to trace through the JS heap).

E is an incredible body of work. I really can't speak too highly of it;
it's amazing. But replicating E is not something I think people want to
use JS for.

> One use case of cross-vat communication is the remote debugger
> protocol implemented in Firefox/OS.

This is a really good point. I asked Jim Blandy about this. In the
debugger protocol, when you hit a breakpoint or debugger-statement, that
creates a "pause actor" in the protocol. Actors form a tree. Just about
everything you encounter while inspecting a paused debuggee is parented
by the pause actor. When you tell the debuggee to continue, the pause
actor and all its descendents are released (with a single protocol request).

The protocol lets you explicitly promote a reference to some debuggee
object from "pause lifetime" to "thread lifetime", but we don't
currently have any code that uses that feature.

So as far as I know it is *impossible* for the debugger to be leaking
stuff beyond the next time you hit the Continue button!

This supports your contention that good API design achieves better
results. If we had implemented some complicated distributed GC scheme
with weak references, it would be possible to leak things in the
debuggee simply by accidentally stashing a closure somewhere that closes
over some reference that we wish would be collected. Instead, we have
done something simple, so we can grep the code and see that there are no
leaks.

IPDL does something similar. Objects that can be referenced across the
process boundary form explicit hierarchies; when one is freed, its
descendents are also freed.

-j

Niko Matsakis

unread,
Nov 2, 2013, 7:24:19 AM11/2/13
to Terrence Cole, Jason Orendorff, Allen Wirfs-Brock, Brendan Eich, David Herman, JS Internals list
On Fri, Nov 01, 2013 at 01:26:14PM -0700, Terrence Cole wrote:
> First, performance: this particular proposal would force us to visit
> objects that are being swept.

This is an interesting ramification that should have been obvious in
retrospect.

> Our nursery design is currently such that this is not even
> possible. This is already problematic for weak maps. We are able to
> get around this using a few hacks, but it depends on being able to
> do full mark and sweep GC's at some point. If we moved to a more
> distributed architecture like G1, this would be a severe
> limitation. I'm not even sure how concurrent GC will handle weakmaps
> efficiently.

The Java VM (where G1 was developed) supports weak refs. Any idea what is
different there or how they managed it?


Niko

David Bruant

unread,
Nov 2, 2013, 10:28:26 AM11/2/13
to Andrew McCreight, dev-tech-js-en...@lists.mozilla.org
Le 01/11/2013 21:13, Andrew McCreight a écrit :
> ----- Original Message -----
>> Why so?
>>
>> function f(e){}
>>
>> (function(){
>> var iframe = document.getElementByTagName('iframe')[0];
>> iframe.addEventListener('someEvent', f);
>> })()
>>
>> In this case, f doesn't hold a reference to anything besides itself, not
>> to the iframe at least. In the case of DOM events, it can look at its
>> argument to find e.target which will dynamically (!), only for the time
>> of the listener call, refer to the iframe.
>>
>> In any case, when an event doesn't need to be listened to anymore, there
>> is a time in your application where you can remove the listener. The
>> leak is not removing the listener, not the listener itself.
> Well, you can look at the patch. I guess I described it poorly. You have some thing (I said 'iframe' above, but I guess it isn't really one) x that wants to listen to an event, so you create an event listener for it, that holds alive x. The situation is more like this:
>
> function f(e){tellSomebodyAboutTheEvent(x);}
>
> (function(){
> var iframe = document.getElementByTagName('iframe')[0];
> iframe.addEventListener('someEvent', f);
> })()
>
> f is now keeping x alive, and if f is the only thing keeping x alive, then that's basically a leak as far as we are concerned as a JS programmer.
There is a point in your program in which you know that 'iframe' won't
be needed anymore and you do something about that. Maybe you clear a
given data structure, maybe you remove the iframe from the document
without assigning it to a variable, maybe you do something else, but you
do do something.
It's at this point that either you want to remove the listener or
reassign x to another value.

> (There's also the problem that f stays alive forever if the event never fires, but that's a smaller leak...)
It's only a leak if you *know* that the event will never fire. If the
event may never fire, that's not a leak, just bad luck.

> In case being concrete helps
(it does :-) )

> in our case, x was the parent-process representation of some app, and the event was some kind of visibility change event, so you can tell the app if the screen is turned off or on. Obviously, you want to get rid of x rather than keep it around to ensure that we can tell this zombie app thing that the screen is going to turn off.
Ok. I think I understand better the patch now. Thanks.
I noticed a bug in the patch [1]. It's been interestingly fixed as a
side-effect of bug 899354 [2], but sharing for those interested.
In the patch [1], the event listener is
visibilityChangeHandler.bind(/*args*/). The listener that is attempted
to be removed is visibilityChangeHandler which is a different function
and was never itself registered as a listener, so the listener was never
removed and the removeEventListener call failed silently.

Even with the new version, as application author, you know when you
don't want a given browserElementParent (x) any longer, so you know when
to break all the references. The only thing that's lacking is API bits
all over the place for you to break these references. It's more work
than weak references, but it's doable.

David

[1] https://hg.mozilla.org/mozilla-central/rev/d23e2b6fb808
[2] https://hg.mozilla.org/mozilla-central/rev/602e8b21a0ff

David Bruant

unread,
Nov 2, 2013, 11:37:29 AM11/2/13
to Jason Orendorff, Brendan Eich, Allen Wirfs-Brock, David Herman, JS Internals list
Le 01/11/2013 23:17, Jason Orendorff a écrit :
> On 11/1/13 1:52 PM, David Bruant wrote:
>> In any case, I've stopped being against weakrefs after a message by
>> Mark Miller[...]
> I am now going to try to convince you that you shouldn't have been
> convinced by this use case. :)
You can try :-p

>> To keep the object granularity across machines, in E, they've created
>> a protocol (CapTP) that, in essence, streches object references across
>> machines. Cross-machine wrappers if you will.
>> When designing this protocol, at some point comes the problem of GC.
>> Distributed GC...
> First, read Terrence's first response in this thread. This is exactly
> the kind of use case he is talking about, where GC is used as a general
> resource-management workhorse.
> I'm not convinced acyclic distributed GC is a good thing to support.
Just to clarify, I don't care about ADGC in isolation, but more about
the idea of manipulating remote objects. That's something that'll be
hard to convince me that wouldn't be interesting in supporting.
And some form of GC is necessary for remote objects. ADGC seems like one
practical solution. I'm all ears for other solutions.

> The idea is that by the magic of proxies, remote objects can be made to
> look exactly like local objects. But then, you can't freely pass a local
> object as an argument to a remote method. That would create a back edge.
> The system must either block that with a runtime error or risk a cycle—a
> silent memory leak on both sides of the boundary. So the boundary is not
> transparent after all.
Not necessarily. If one side drops all references to the remote object,
it can tell the vat the object comes from, cycle is broken and GC can
happen. This happens if the argument is not used beyond the method call.
The cycle leak is when only remote references to some objects remain
(while all local refs have been dropped).
The description of DACG in the CapTP protocol [1] is nothing but a
cross-machine refcounting algorithm, with the same issues that are
inherent to this algorithm (cross-machine cycles), but temporary cycles
(as you seem to describe) are not a problem.

> The transparency doesn’t extend to performance or ease of use anyway.
I disagree. I doubt promise pipelining [2] (which dramatically reduces
the impact of network latency in applications) can be made easier to
achieve without this sort of transparency.

> Sidebar: I think general distributed GC (with cycles) is considered
> Hard. It's been studied. Mark would know better than me. But I think it
> requires, at a minimum, a lot more cooperation from the GC, including
> abstraction-busting APIs that we cannot safely expose (like those
> Gecko's cycle collector uses to trace through the JS heap).
That or an implementation or improvement upon the idea of distributed
mark and sweep. I agree it's Hard. It doesn't make it impractical. From [1]:
"The advantage of taking this low road [ADGC] is that garbage collection
can proceed by separate uncoordinated local interactions, with no global
analysis. Of course, the disadvantage is that we leek live distributed
cycles. At such a time as this proves to be a significant problem, we
will revisit the issue of implementing a full Distributed Garbage
Collector. "

So far, ADGC has proven to be enough in practice. It'll be time to
revisit when the time comes.

> E is an incredible body of work. I really can't speak too highly of it;
> it's amazing. But replicating E is not something I think people want to
> use JS for.
The Q library (and I guess the promise community) is moving more and
more toward this direction. See promise.invoke for instance:
https://github.com/kriskowal/q/wiki/API-Reference#promiseinvokemethodname-args

Another anecdotal evidence is the API exposed by Google App Script (the
server-side is in JS):
https://developers.google.com/apps-script/guides/html/communication?hl=fr
You can define a server-side function f and call it on the client side with:
google.script.run.f(...args)
It works as if a promise for the result was returned.
The args are serialized as data, so that's where that all stops, but the
idea of transparently play with objects defined in one machine is around.
I said "anecdotal", but I don't think the user based is massive. But
some people us it.

Q and Google App Script are not the entire JS community, but I feel a
growing appetite for transparent remote objects, especially since JS can
now be used in both client and server-side.
A couple of years ago, I had server-to-server communication use case. I
designed an HTTP protocol, reinvented the wheel here and there. A remote
object protocol may have accelerated things if I had it handy. I'll need
to rewrite it from scratch at some point. I'll try that.

>> One use case of cross-vat communication is the remote debugger
>> protocol implemented in Firefox/OS.
> This is a really good point. I asked Jim Blandy about this. In the
> debugger protocol, when you hit a breakpoint or debugger-statement, that
> creates a "pause actor" in the protocol. Actors form a tree. Just about
> everything you encounter while inspecting a paused debuggee is parented
> by the pause actor. When you tell the debuggee to continue, the pause
> actor and all its descendents are released (with a single protocol request).
>
> The protocol lets you explicitly promote a reference to some debuggee
> object from "pause lifetime" to "thread lifetime", but we don't
> currently have any code that uses that feature.
>
> So as far as I know it is *impossible* for the debugger to be leaking
> stuff beyond the next time you hit the Continue button!
Interesting. I should start reading about the debugger protocol in details.

David

[1] http://erights.org/elib/distrib/captp/dagc.html
[2] http://erights.org/elib/distrib/pipeline.html

Brendan Eich

unread,
Nov 2, 2013, 2:06:58 PM11/2/13
to Steve Fink, dev-tech-js-en...@lists.mozilla.org
Steve Fink wrote:
> Yeah, that's some very critical weasel-wording in the strawman. "Let's
> add this to the language, but not expose it to things it shouldn't be
> exposed to." Huh?

To be fair to the author, Mark Miller, he knows well that this must be
specified. It's not, ergo (among other reasons), strawman status.

A privilege model that supports this kind of API is not yet available on
the web. The clown-car brigade of bundled permissions for packaged apps
on Android is a counterexample. Mark is of the Object Capability school,
and has done lots of work on ocap-for-JS (Caja, SES). So the way to read
this is: work in progress, made of straw, not ready for prime time.

The reason developers need this is clear, I hope. In any kind of MVC
framework without guest/host VM-like supervision and stratification,
it's easy for a subject observes object to go wrong if subject holds a
strong ref. The object often links back (parentNode) to the global
object, which links "down" (document, and onward to some kind of
observers list on an element) to the subject.

The subject does not want to extend the lifetime of the object in any
event, and does not always need to know exactly when the object becomes
garbage. Thus the idea of using event loop turns to schedule notifications.

Ad-hoc protocols in a single-hacker or small-world-hacker-network system
can suffice, but for a framework that is "third party" to the developers
consuming the framework, no general cycle-breaking protocol, with
mandatory dispose() calls, say, suffices. People forget, they leak
memory, they don't know how *not to* leak.

None of this is "out of bounds" for JS, any more than for Obj-C or C++
or Java, where weak references are supported.

Yikes, I'm behind on this thread. I hope I didn't duplicate anything too
badly. Wanted to put in a word in defense of Mark, and of the reality of
the problem faced by framework developers.

/be

Brendan Eich

unread,
Nov 2, 2013, 3:11:29 PM11/2/13
to Allen Wirfs-Brock, Jason Orendorff, David Bruant, David Herman, JS Internals list
Allen Wirfs-Brock wrote:
> My experience is that Terrence is absolutely correct in this regard
> and that this position is share by virtually all experienced GC
> implementors. A former colleague of mine, George Bosworth, expressed
> it this way in an experience report at a ISMM a number of years ago:

George is right on, and over the years with SpiderMonkey embedders,
we've fought people who wish, e.g., to have post-mortem finalization
just to close a file descriptor (!) or release a db cursor. Total
mismatch in terms of heuristics, resource scarcity, etc.

OTOH, the framework subject-observes-object (which might link back to
subject) problem is real. It does not involve managing external or
otherwise misaligned or not-affine resources. It's the main use-case I
see for which developers request weak refs. And it has nothing to do
withi ADGC!

Can we support this use-case more narrowly? If so, we should.

/be

Brendan Eich

unread,
Nov 2, 2013, 3:41:58 PM11/2/13
to Terrence Cole, Jason Orendorff, Allen Wirfs-Brock, David Herman, JS Internals list
Terrence Cole wrote:
> The performance of the web is vital. With gaming and video as
> first-class citizens, we have to consider both the throughput and
> latency of the GC as priorities. Any added complexity in the GC will
> either directly make the web slower or will disproportionately divert
> our energy from making other things faster.

Excellent point, and of course native-uber-alles devs object to GC on
this basis (http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/).

So far, and certainly for V8 and the V8 and then Octane benchmarks,
throughput is the only measured good, if either of throughput and
latency can be said to be directly benchmarked. Latency needs benchmarks!

Let's say game developers tend toward C++/OpenGL for good reasons
(mainly cross-platform portability + best perf), and Emscripten+asm.js
(with JS evolving to keep asm.js a subset) are "enough". No GC there.

Then there are still cases where GC latency matters, of course. Even
with C++/OpenGL, there may be cases where the developer or our target
runtime wants to mix "host" (JS GC-heap allocated) objects and "guest"
(asm.js's typed array) objects. And of course, there's gmail.

At JSConf.eu, John McCutchan of Google and a colleague who works on
gmail gave a good talk on GC, showing off Chrome's memory profiling
tool. They showed historical plots revealing gmail bugs, V8 GC
regressions, all the interesting dirt. Their tooling is great, I think
dcamp and team are on the case. But when they called on JS developers to
manage GC pause time, they lost me.

How are JS developers supposed to manage pause time, even indirectly (by
avoiding unnecessary allocations and fixing leaks)? There's no way. We
won't be adding manually callable gc() built-ins to the standard.

While some want requestAnimationFrame to keep the GC at bay (is this
doable), in general the developer should not have to hint or beg the JS
VM for lower latency or (really) no sudden loss of 60fps soft realtime
guarantees.

So +1 for giving latency a boost in GC design and benchmarking. I write
this acknowledging your point about weak refs going against this goal.
But perhaps we can have both, in their proper relation (pause-free GC
out of the box for code that doesn't use certain features, especially
for those gmail-to-game apps; others pay as they buy trouble by the yard).

/be

Igor Bukanov

unread,
Nov 2, 2013, 4:22:31 PM11/2/13
to Niko Matsakis, Allen Wirfs-Brock, Brendan Eich, Terrence Cole, JS Internals list, Jason Orendorff, David Herman
On 2 November 2013 12:24, Niko Matsakis <ni...@alum.mit.edu> wrote:
> The Java VM (where G1 was developed) supports weak refs. Any idea what is
> different there or how they managed it?

Java does not have WeakMap. Weak references, despite their
expose-GC-to-the-world semantics, are much easier to implement.
WeakMap is really complexity and correctness evil.

David Bruant

unread,
Nov 2, 2013, 5:44:56 PM11/2/13
to Brendan Eich, Terrence Cole, Jason Orendorff, Allen Wirfs-Brock, David Herman, JS Internals list
Le 02/11/2013 20:41, Brendan Eich a écrit :
> At JSConf.eu, John McCutchan of Google and a colleague who works on
> gmail gave a good talk on GC, showing off Chrome's memory profiling tool.
Article version and longer talk at
http://www.html5rocks.com/en/tutorials/memory/effectivemanagement/

> They showed historical plots revealing gmail bugs, V8 GC regressions,
> all the interesting dirt. Their tooling is great
I wouldn't say "great". It's state-of-the-art in web development, it's
the best things we have so far, but I still find the tool very hard to
use and I think I'm on the educated end of web developers when in comes
to memory management. Lots of Webkit/V8 internal machinery is exposed,
there is too much information and also too few of the needed information
(relationship to source code)... and... except for the high-level
indicator of memory, the information is displayed in tables and lists
which is at best ineffective for the amount of variety of information to
show.
It appears the tool was excellent for the Gmail team because they had
the V8 team around to collaborate with. Not the case of every webdevs.

In my opinion, Chrome tooling is way too low-level (and
engine-specific!), as suggested by the name ("heap snapshot") to be
useful to the average webdev.


Apparently IE11 comes with some memory-related tooling. I haven't been
able to put my hands on it yet, it'd be interesting to see how they
differ from V8 tool.

> I think dcamp and team are on the case.
I think what's being worked on now is
https://bugzilla.mozilla.org/show_bug.cgi?id=923275 (high level memory
indicator).
I'm also experimenting on something (and been teasing Twitter about it
[1][2]). I'll post on the devtools mailing-list when I have something
presentable.

> But when they called on JS developers to manage GC pause time, they
> lost me.
> How are JS developers supposed to manage pause time, even indirectly
> (by avoiding unnecessary allocations and fixing leaks)? There's no
> way. We won't be adding manually callable gc() built-ins to the standard.
They seem to take their experience with V8 as a generality. I think what
they meant was that allocating triggers GC which triggers copying
(compacting?), which costs, so be careful with allocations. But this
advice is hard to follow in practice.
This advice is also dependent on V8 allocation/GC scheduling strategy
and may be inappropriate in the future.

David

[1] https://twitter.com/DavidBruant/status/379033449438248960
[2] https://twitter.com/DavidBruant/status/379320523429117953

Brendan Eich

unread,
Nov 2, 2013, 5:59:46 PM11/2/13
to David Bruant, Jason Orendorff, Allen Wirfs-Brock, Terrence Cole, David Herman, JS Internals list
David Bruant wrote:
> Article version and longer talk at
> http://www.html5rocks.com/en/tutorials/memory/effectivemanagement/

Thanks.

>> They showed historical plots revealing gmail bugs, V8 GC regressions,
>> all the interesting dirt. Their tooling is great
> I wouldn't say "great". It's state-of-the-art in web development, it's
> the best things we have so far, but I still find the tool very hard to
> use and

You're right, I should leave superlatives to their marketing. It's all
relative => standards are low :-P. Good news: we can do better in
Firefox DevTools. Thanks for your help there.

>> But when they called on JS developers to manage GC pause time, they
>> lost me.
>> How are JS developers supposed to manage pause time, even indirectly
>> (by avoiding unnecessary allocations and fixing leaks)? There's no
>> way. We won't be adding manually callable gc() built-ins to the
>> standard.
> They seem to take their experience with V8 as a generality. I think
> what they meant was that allocating triggers GC which triggers copying
> (compacting?), which costs, so be careful with allocations. But this
> advice is hard to follow in practice.

It is useless advice.

Can we learn from pause-free or realtime GC work, e.g. Metronome by
David Bacon et al. at IBM Research?

/be

Bobby Holley

unread,
Nov 3, 2013, 5:29:01 AM11/3/13
to Andrew McCreight, Kyle Huey, JS Internals list
The problem with weak references is that they put the GC in the driver's
seat, which makes it very hard to avoid revealing GC secrets to the client.
Instead, I propose that we let the client code drive, and see how far we
get with hueyfix-style tools.

One of the major difficulties with leak prevention on the Web is that there
are often two phases of retirement for a given object. The first happens
when the object goes away semantically - when a window is closed, when a
node is removed from the DOM, when a connection is terminated, etc. The
second happens when the object is actually GC-able, which depends on the
graph of references in the VM, and can occur at an arbitrarily distant
time. The interval from the first to the second is general the period in
which JS "leaks".

What if we give script the ability to say "this Foo is semantically dead -
please neuter cross-global references to it"? This is effectively what we
have with Cu.nukeSandbox, and it works well. It doesn't in any way expose
GC behavior, but it lets callers give the GC a much-needed boost, which the
GC may subsequently leverage if it turns out to be useful.

This won't solve esoteric cross-vat use cases, but I think it would be a
nice way to bulldoze the subtle gotchas that make it so easy to introduce
subtle leaks in large-scale JS.

Thoughts?
bholley

Jorge Chamorro

unread,
Nov 3, 2013, 6:49:27 AM11/3/13
to Bobby Holley, Kyle Huey, Andrew McCreight, JS Internals list
On 03/11/2013, at 11:29, Bobby Holley wrote:
>
>
> What if we give script the ability to say "this Foo is semantically dead -
> please neuter cross-global references to it"? This is effectively what we
> have with Cu.nukeSandbox, and it works well. It doesn't in any way expose
> GC behavior, but it lets callers give the GC a much-needed boost, which the
> GC may subsequently leverage if it turns out to be useful.

How would that be?

x= y= {}
[object Object]

o= {x:x}
[object Object]

free(x)
true

typeof x
??

typeof y
??

typeof o.x
??

Or:

(function () {
function ƒ () {}
setTimeout(ƒ, 1000)
free(ƒ)
})()

?

When would be the right time to free(something)?

What would `something` be after the call to free(), null?

What if `something` isn't the only reference to the thing?

Would it be ok to null somebody else's references?

If not, what exactly would free(thing) do?

Igor Bukanov

unread,
Nov 3, 2013, 6:51:10 AM11/3/13
to David Bruant, Andrew McCreight, JS Internals list
On 2 November 2013 15:28, David Bruant <brua...@gmail.com> wrote:
>> function f(e){tellSomebodyAboutTheEvent(x);}
>>
>> (function(){
>> var iframe = document.getElementByTagName('iframe')[0];
>> iframe.addEventListener('someEvent', f);
>> })()
>>
> There is a point in your program in which you know that 'iframe' won't be
> needed anymore and you do something about that. Maybe you clear a given data
> structure, maybe you remove the iframe from the document without assigning
> it to a variable, maybe you do something else, but you do do something.
> It's at this point that either you want to remove the listener or reassign x
> to another value.

A library author may have little control over how iframe is used. A
caller may use another library that caches all iframes for some
purposes etc. However, I suspect that a lot of problems for library
authors can go away if JS provides API like removeAllEventListeners.

Igor Bukanov

unread,
Nov 3, 2013, 7:00:41 AM11/3/13
to Jorge Chamorro, Kyle Huey, Andrew McCreight, Bobby Holley, JS Internals list
On 3 November 2013 12:49, Jorge Chamorro <jo...@jorgechamorro.com> wrote:
> If not, what exactly would free(thing) do?

That is is easy - it ensures that the thing does not reference any
other GC things (including references as keys in a WeakMap) so it
cannot be a part of reference cycle. Currently this cannot be even
implemented even for pure JS objects as a prototype link or slots in
the closure cannot be cleared not to mention host objects.

David Bruant

unread,
Nov 3, 2013, 7:36:29 AM11/3/13
to Jorge Chamorro, Bobby Holley, Kyle Huey, Andrew McCreight, JS Internals list
Le 03/11/2013 12:49, Jorge Chamorro a écrit :
> When would be the right time to free(something)?
>
> What would `something` be after the call to free(), null?
>
> What if `something` isn't the only reference to the thing?
I don't think Bobby was suggesting the addition of a free function or
operator. This discussion already happened [1] and the outcome was that
this won't happen, precisely because the questions you ask have no good
answers.

Bobby was making a reference to Cu.nukeSandbox (which I don't know about
so I'll read about it before saying more).

David

[1] https://mail.mozilla.org/pipermail/es-discuss/2012-October/026007.html

David Bruant

unread,
Nov 3, 2013, 7:50:58 AM11/3/13
to Bobby Holley, Andrew McCreight, Kyle Huey, JS Internals list
Le 03/11/2013 11:29, Bobby Holley a écrit :
> The problem with weak references is that they put the GC in the driver's
> seat, which makes it very hard to avoid revealing GC secrets to the client.
> Instead, I propose that we let the client code drive, and see how far we
> get with hueyfix-style tools.
hueyfix-style tools for client code or implementors?
Client code will have one in ES6 in the form of revocable proxies.
Initial post on the topic (mentioning GC specifically) [1] and API [2]
(also available in the latest ES6 draft).

In a nutshell:
let { proxy, revoke } = Proxy.revocable(target, handler);
proxy.foo // traps
revoke() // always returns undefined
proxy.foo // throws TypeError: "proxy is revoked"

> One of the major difficulties with leak prevention on the Web is that there
> are often two phases of retirement for a given object. The first happens
> when the object goes away semantically - when a window is closed, when a
> node is removed from the DOM, when a connection is terminated, etc. The
> second happens when the object is actually GC-able, which depends on the
> graph of references in the VM, and can occur at an arbitrarily distant
> time. The interval from the first to the second is general the period in
> which JS "leaks".
>
> What if we give script the ability to say "this Foo is semantically dead -
> please neuter cross-global references to it"? This is effectively what we
> have with Cu.nukeSandbox, and it works well. It doesn't in any way expose
> GC behavior, but it lets callers give the GC a much-needed boost, which the
> GC may subsequently leverage if it turns out to be useful.
I don't understand the "cross-global" part. I don't think that's the
main cause of leaks in problem usually.
But revocable proxies allow this sort of expressiveness (at arbitrary
granularity, not just cross-global).

> This won't solve esoteric cross-vat use cases, but I think it would be a
> nice way to bulldoze the subtle gotchas that make it so easy to introduce
> subtle leaks in large-scale JS.
I agree, but can't help adding: "...at the cost of try/catch all over
the place".

David

[1] https://mail.mozilla.org/pipermail/es-discuss/2012-August/024344.html
[2] http://wiki.ecmascript.org/doku.php?id=strawman:revokable_proxies

Brendan Eich

unread,
Nov 3, 2013, 11:16:57 AM11/3/13
to JS Internals list
Bobby Holley wrote:
> This won't solve esoteric cross-vat use cases, but I think it would be a
> nice way to bulldoze the subtle gotchas that make it so easy to introduce
> subtle leaks in large-scale JS.

With an MVC framework, nukeSandbox is of no avail. The subject does not
want to extend the lifetime of the object it observes, but it need not
be notified exactly when that object "becomes garbage" semantically. It
could be at a later event turn -- just not so late that garbage piles
too high.

This suggests a solution, which I thought was standard in GCs with weak
refs (but I'm rusty): tenure any weak referent as soon as it is know to
be such (even if it might not be weakly referred to for its entire
lifetime). Let people buy trouble by the yard (or more, if they tie
combinatorially explosive knots) until the tenured generation is big
enough that a full M&S GC must be done.

Anyway, that's the theory (modulo bugs in my memory) and IIRC it
suffices for weak maps, which we already implement. Are weak refs any
worse (as Igor asks, are they not "better" by some measures)?

/be

Niko Matsakis

unread,
Nov 3, 2013, 4:07:20 PM11/3/13
to Igor Bukanov, Allen Wirfs-Brock, Brendan Eich, Terrence Cole, JS Internals list, Jason Orendorff, David Herman
Perhaps I am confused?

- Aren't we talking about weak refs? I thought a weakmap in JS was a
done deal? I believe Java's weak refs, and in particular weak
references when combined with [reference queues][1], are pretty
similar to what the strawman proposed.

- Why do you say Java doesn't have a weakmap? What is the
[WeakHashMap][2] class if not a WeakMap? Presumably we are using the
term differently? (I imagine there are many variations of weakmaps
which vary in subtle but significant ways)

In any case, Brendan's e-mail suggesting tenuring weakly referenced
objects offered one possible workaround for maintaining top nursery
performance in the face of weak refs. (I have no idea, of course, if
this is what the JVM does.)


Niko

[1]: http://docs.oracle.com/javase/7/docs/api/java/lang/ref/ReferenceQueue.html
[2]: http://docs.oracle.com/javase/6/docs/api/java/util/WeakHashMap.html

Till Schneidereit

unread,
Nov 3, 2013, 4:18:54 PM11/3/13
to Niko Matsakis, Allen Wirfs-Brock, Brendan Eich, Terrence Cole, Igor Bukanov, JS Internals list, Jason Orendorff, David Herman
On Sun, Nov 3, 2013 at 10:07 PM, Niko Matsakis <ni...@alum.mit.edu> wrote:

> On Sat, Nov 02, 2013 at 09:22:31PM +0100, Igor Bukanov wrote:
> > On 2 November 2013 12:24, Niko Matsakis <ni...@alum.mit.edu> wrote:
> > > The Java VM (where G1 was developed) supports weak refs. Any idea what
> is
> > > different there or how they managed it?
> >
> > Java does not have WeakMap. Weak references, despite their
> > expose-GC-to-the-world semantics, are much easier to implement.
> > WeakMap is really complexity and correctness evil.
>
> Perhaps I am confused?
>
> - Aren't we talking about weak refs? I thought a weakmap in JS was a
> done deal? I believe Java's weak refs, and in particular weak
> references when combined with [reference queues][1], are pretty
> similar to what the strawman proposed.
>
> - Why do you say Java doesn't have a weakmap? What is the
> [WeakHashMap][2] class if not a WeakMap? Presumably we are using the
> term differently? (I imagine there are many variations of weakmaps
> which vary in subtle but significant ways)
>
> In any case, Brendan's e-mail suggesting tenuring weakly referenced
> objects offered one possible workaround for maintaining top nursery
> performance in the face of weak refs. (I have no idea, of course, if
> this is what the JVM does.)
>

In addition to Java having a WeakMap, note also that one can be implemented
based on weak refs. I wrote up an implementation based on the weak refs
proposal here:
https://gist.github.com/tschneidereit/7294906
(With some bug fixes curtesy of Mark Miller.)


till

Igor Bukanov

unread,
Nov 3, 2013, 5:32:43 PM11/3/13
to Niko Matsakis, Allen Wirfs-Brock, Brendan Eich, Terrence Cole, JS Internals list, Jason Orendorff, David Herman
On 3 November 2013 22:07, Niko Matsakis <ni...@alum.mit.edu> wrote:
> - Aren't we talking about weak refs? I thought a weakmap in JS was a
> done deal? I believe Java's weak refs, and in particular weak
> references when combined with [reference queues][1], are pretty
> similar to what the strawman proposed.
>
> - Why do you say Java doesn't have a weakmap? What is the
> [WeakHashMap][2] class if not a WeakMap? Presumably we are using the
> term differently? (I imagine there are many variations of weakmaps
> which vary in subtle but significant ways)

In Java's WeakHashMap holds a strong reference to the value. As such a
cycle from the value to the key is not collectable. One can mitigate
that via storing a weak reference to the value, but then the GC can
collect the value and it is observable.

Similarly reference queues do not cover the functionality provided by
the WeakMap when both the map and the key must be reachable for the
value to stay alive.

smaug

unread,
Nov 3, 2013, 6:20:20 PM11/3/13
to
On 11/01/2013 05:26 PM, Jason Orendorff wrote:
> This proposal is before TC39 for inclusion in the next ECMAScript spec
> edition following ES6:
> http://wiki.ecmascript.org/doku.php?id=strawman:weak_references
>
> Mozilla GC hackers are opposed, for reasons they can articulate; I'm
> opposed because I can't stick the nondeterminism and because the total
> cost is out of all proportion with the benefit.
>
> However. There are use cases. Please see:
> https://mail.mozilla.org/pipermail/es-discuss/2013-February/028572.html
> and subsequent messages.
>
> Also, though I think this use case is weaker:
> https://mail.mozilla.org/pipermail/es-discuss/2013-March/029423.html
>
> If you're a GC hacker and you want to stop this train, your best bet is
> to read those threads and speak up now.
>
> -j
>


I'm rather strongly opposed exposing GC behavior in anyway to the Web, since once that is
done in one API, it is hard to argue why not expose similar nondeterminism in other APIs and
we soon start to rely on certain GC behavior... should we then specify exactly what kind of GC behavior
JS should have, and how GC should interact with browser internals (DOM etc)?


IIRC, currently HTML spec exposes GC behavior directly in one place [1] but that is considered a bug.
(Luckily no one implements PortCollection)


-Olli



[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/web-messaging.html#broadcasting-to-many-ports

Brendan Eich

unread,
Nov 3, 2013, 7:24:18 PM11/3/13
to Niko Matsakis, Allen Wirfs-Brock, Terrence Cole, Igor Bukanov, JS Internals list, Jason Orendorff, David Herman
Niko Matsakis wrote:
> - Why do you say Java doesn't have a weakmap? What is the
> [WeakHashMap][2] class if not a WeakMap? Presumably we are using the
> term differently? (I imagine there are many variations of weakmaps
> which vary in subtle but significant ways)

Quoting from [2]:

"*Implementation note:* The value objects in a WeakHashMap are held by
ordinary strong references. Thus care should be taken to ensure that
value objects do not strongly refer to their own keys, either directly
or indirectly, since that will prevent the keys from being discarded."

In contrast, an ES6 WeakMap is an ephemeron table, where the keys and
values can be twisted into cycles, but so long as no live key object
remains, the whole knot can be cut and collected. See
http://en.wikipedia.org/wiki/Ephemeron (but Allen's testimony trumps
this page, given his being in the room when Ephemerons were discovered).

/be

Allen Wirfs-Brock

unread,
Nov 3, 2013, 6:15:28 PM11/3/13
to Niko Matsakis, Brendan Eich, Terrence Cole, Igor Bukanov, JS Internals list, Jason Orendorff, David Herman

On Nov 3, 2013, at 1:07 PM, Niko Matsakis wrote:

>
> In any case, Brendan's e-mail suggesting tenuring weakly referenced
> objects offered one possible workaround for maintaining top nursery
> performance in the face of weak refs. (I have no idea, of course, if
> this is what the JVM does.)
>

Tenured (if by that you mean stays around until an exhaustive GC) seems somewhat contrary to the whole point of using a weak map. Many apps may never trigger that level of GC in which case using a WeakMap (even with explicit deletes) under that policy would actually be worse than using a regular Map.

But the basic concept isn't all that crazy. One of the advantages of having a hierarchy of generations is that you can apply different policies at different levels of the hierarchy (in my experience an Ungar GS-style configuration is too simplistic).

For example, in the nursery you might treat WeakMaps as having strong reference and also any remembered set entries into the nursery also as strong references. Hence, little, if any, WeakMap overhead needed to scavenge the nursery.

Allen


Allen Wirfs-Brock

unread,
Nov 2, 2013, 12:48:56 PM11/2/13
to Terrence Cole, Jason Orendorff, Brendan Eich, David Herman, JS Internals list

On Nov 1, 2013, at 1:26 PM, Terrence Cole wrote:

> ...
>
> Secondly, correctness. The GC is by definition a cross-cutting concern;
> you cannot build anything in SpiderMonkey without considering GC. This
> makes weak primitives a cross-cutting concern of a cross-cutting
> concern. Our experience with combining generational GC and WeakMaps
> reflects this.
>
> When implementing generational GC, our first few instances attempted to
> deal with weak maps in an efficient way. Unfortunately, it turns out
> this is actually impossible in the presence of nested weakmaps: we
> cannot reach a fixed point while only visiting the nursery heap. Sadly,
> even after we realized this, we still had to spent a tremendous amount
> of effort merely proving to ourself that our design is safe in the
> presence of weak maps.

I'm in whole-hearted agreement with you over-all perspective in this message.

But I'm curious about the issues you have been having with with WeakMaps and generational GC. I was one of the original designers of Ephemerons and our GCs were all generational and very highly optimized. Ephemerons certainly introduced additional work in the GC that were proportional to there use but the generational nature of our collectors didn't significantly complicate our design Of course, JS WeakMaps have a different granularity than Ephemerons (although Ephemerons could be use as a primitive for implementing WeakMaps).

It might be interesting to chat sometime and compare our experiences WRT generational collection and weak references.

Allen

Allen Wirfs-Brock

unread,
Nov 2, 2013, 1:24:57 PM11/2/13
to Jason Orendorff, David Bruant, Brendan Eich, David Herman, JS Internals list

On Nov 1, 2013, at 3:17 PM, Jason Orendorff wrote:

> On 11/1/13 1:52 PM, David Bruant wrote:
>> In any case, I've stopped being against weakrefs after a message by
>> Mark Miller[...]
> I am now going to try to convince you that you shouldn't have been
> convinced by this use case. :)
>
>> To keep the object granularity across machines, in E, they've created
>> a protocol (CapTP) that, in essence, streches object references across
>> machines. Cross-machine wrappers if you will.
>> When designing this protocol, at some point comes the problem of GC.
>> Distributed GC...
>
> First, read Terrence's first response in this thread. This is exactly
> the kind of use case he is talking about, where GC is used as a general
> resource-management workhorse.

My experience is that Terrence is absolutely correct in this regard and that this position is share by virtually all experienced GC implementors. A former colleague of mine, George Bosworth, expressed it this way in an experience report at a ISMM a number of years ago:

A modern GC is a heuristics-based resource manager. The resources it manages generally have very low individual value (a few dozen bytes of memory) and exist is vast numbers. There is a wide distribution of life-times of these resources, but the majority are highly ephemeral. The average latency of resource recovery is important but the recovery latency of any individual resource is generally unimportant. The heuristics of a great GC take all of these characteristics into accounts. When you piggy-back upon a GC (via finalization, or equivalent mechanism) the management of a different kind of resource you are applying the heuristic of memory resource manegment to the management of the piggy-backed resources. This is typically a poor fit. For example, the piggy-backed resource may be of high individual value and exist in limited numbers (George's used file descriptors as an example). A good GC will be a bad resource manager for such resources.

There are many types of resources that need to be managed in complex systems. Thinking that a GC will serve as a good management foundation for most of those resources is just naive.

I previously made some other comments that relate to this issue at http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs#allen_wirfs-brock_20111219 In particular, see the "backstop" discussion.

>
> I'm not convinced acyclic distributed GC is a good thing to support.
>
> The idea is that by the magic of proxies, remote objects can be made to
> look exactly like local objects. But then, you can't freely pass a local
> object as an argument to a remote method. That would create a back edge.
> The system must either block that with a runtime error or risk a cycle—a
> silent memory leak on both sides of the boundary. So the boundary is not
> transparent after all.
>
> The transparency doesn’t extend to performance or ease of use anyway.
>
> Sidebar: I think general distributed GC (with cycles) is considered
> Hard. It's been studied. Mark would know better than me. But I think it
> requires, at a minimum, a lot more cooperation from the GC, including
> abstraction-busting APIs that we cannot safely expose (like those
> Gecko's cycle collector uses to trace through the JS heap).

Yes, indeed. And there is a lot of experience that fully transparent distributed object graphs are not a very good idea. The dream was that an application could ignore distribution boundaries. In practice you always want to know when you are about to traverse a highly latency reference rather than a normal essentially 0 latency reference.

Allen

Bobby Holley

unread,
Nov 4, 2013, 5:39:52 AM11/4/13
to David Bruant, Kyle Huey, Andrew McCreight, JS Internals list
On Sun, Nov 3, 2013 at 1:50 PM, David Bruant <brua...@gmail.com> wrote:

> hueyfix-style tools for client code or implementors?
> Client code will have one in ES6 in the form of revocable proxies. Initial
> post on the topic (mentioning GC specifically) [1] and API [2] (also
> available in the latest ES6 draft).
>

Yeah, I think that's probably the way to go for my suggestion, assuming we
can make them performant enough. The arbitrary à la carte granularity there
is really nice.

bholley

Brendan Eich

unread,
Nov 4, 2013, 11:05:27 AM11/4/13
to Bobby Holley, Kyle Huey, David Bruant, Andrew McCreight, JS Internals list
Bobby Holley wrote:
> On Sun, Nov 3, 2013 at 1:50 PM, David Bruant<brua...@gmail.com> wrote:
>
>> > hueyfix-style tools for client code or implementors?
>> > Client code will have one in ES6 in the form of revocable proxies. Initial
>> > post on the topic (mentioning GC specifically) [1] and API [2] (also
>> > available in the latest ES6 draft).
>> >
>
> Yeah, I think that's probably the way to go for my suggestion, assuming we
> can make them performant enough. The arbitrary ą la carte granularity there
> is really nice.

And all praise to khuey and his fix -- the ES6 proxy design evolved in a
way that risked losing the khueyfix property, and we added revocable
proxies to restore it.

/be

Terrence Cole

unread,
Nov 4, 2013, 11:53:19 AM11/4/13
to Brendan Eich, JS Internals list
On 11/03/2013 08:16 AM, Brendan Eich wrote:
> Bobby Holley wrote:
>> This won't solve esoteric cross-vat use cases, but I think it would be a
>> nice way to bulldoze the subtle gotchas that make it so easy to introduce
>> subtle leaks in large-scale JS.
>
> With an MVC framework, nukeSandbox is of no avail. The subject does not
> want to extend the lifetime of the object it observes, but it need not
> be notified exactly when that object "becomes garbage" semantically. It
> could be at a later event turn -- just not so late that garbage piles
> too high.
>
> This suggests a solution, which I thought was standard in GCs with weak
> refs (but I'm rusty): tenure any weak referent as soon as it is know to
> be such (even if it might not be weakly referred to for its entire
> lifetime). Let people buy trouble by the yard (or more, if they tie
> combinatorially explosive knots) until the tenured generation is big
> enough that a full M&S GC must be done.

This is exactly what we do.

> Anyway, that's the theory (modulo bugs in my memory) and IIRC it
> suffices for weak maps, which we already implement. Are weak refs any
> worse (as Igor asks, are they not "better" by some measures)?

I explicitly aliased all weak primitives (weakrefs, weakmaps with weak
values, weakmaps with strong values, etc) in my discussion. The issues
here are unbelievably subtle and complex: I think it is extremely
difficult to say "hard" or "easy" without looking at the specifics of
both the proposal and the GC and then spending a day just thinking about
the potential pitfalls (or just implementing and fuzzing it).

While, in general, weak refs are categorically less complicated -- and I
do think that holds for our current GC -- I would want to spend some
time thinking about the combination of weakrefs + JS weakmaps + future
GC plans before commenting.

> /be
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Terrence Cole

unread,
Nov 4, 2013, 12:00:08 PM11/4/13
to Allen Wirfs-Brock, Jason Orendorff, David Bruant, Brendan Eich, David Herman, JS Internals list
On 11/02/2013 10:24 AM, Allen Wirfs-Brock wrote:
>
> On Nov 1, 2013, at 3:17 PM, Jason Orendorff wrote:
>
>> On 11/1/13 1:52 PM, David Bruant wrote:
>>> In any case, I've stopped being against weakrefs after a message by
>>> Mark Miller[...]
>> I am now going to try to convince you that you shouldn't have been
>> convinced by this use case. :)
>>
>>> To keep the object granularity across machines, in E, they've created
>>> a protocol (CapTP) that, in essence, streches object references across
>>> machines. Cross-machine wrappers if you will.
>>> When designing this protocol, at some point comes the problem of GC.
>>> Distributed GC...
>>
>> First, read Terrence's first response in this thread. This is exactly
>> the kind of use case he is talking about, where GC is used as a general
>> resource-management workhorse.
>
> My experience is that Terrence is absolutely correct in this regard and that this position is share by virtually all experienced GC implementors. A former colleague of mine, George Bosworth, expressed it this way in an experience report at a ISMM a number of years ago:
>
> A modern GC is a heuristics-based resource manager. The resources it manages generally have very low individual value (a few dozen bytes of memory) and exist is vast numbers. There is a wide distribution of life-times of these resources, but the majority are highly ephemeral. The average latency of resource recovery is important but the recovery latency of any individual resource is generally unimportant. The heuristics of a great GC take all of these characteristics into accounts. When you piggy-back upon a GC (via finalization, or equivalent mechanism) the management of a different kind of resource you are applying the heuristic of memory resource manegment to the management of the piggy-backed resources. This is typically a poor fit. For example, the piggy-backed resource may be of high individual value and exist in limited numbers (George's used file descriptors as an example). A good GC will be a bad resource manager for such resources.
>
> There are many types of resources that need to be managed in complex systems. Thinking that a GC will serve as a good management foundation for most of those resources is just naive.

Thank you! That was exactly what I wanted to express, only with greater
pith and fewer words.

> I previously made some other comments that relate to this issue at http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs#allen_wirfs-brock_20111219 In particular, see the "backstop" discussion.
>
>>
>> I'm not convinced acyclic distributed GC is a good thing to support.
>>
>> The idea is that by the magic of proxies, remote objects can be made to
>> look exactly like local objects. But then, you can't freely pass a local
>> object as an argument to a remote method. That would create a back edge.
>> The system must either block that with a runtime error or risk a cycle—a
>> silent memory leak on both sides of the boundary. So the boundary is not
>> transparent after all.
>>
>> The transparency doesn’t extend to performance or ease of use anyway.
>>
>> Sidebar: I think general distributed GC (with cycles) is considered
>> Hard. It's been studied. Mark would know better than me. But I think it
>> requires, at a minimum, a lot more cooperation from the GC, including
>> abstraction-busting APIs that we cannot safely expose (like those
>> Gecko's cycle collector uses to trace through the JS heap).
>
> Yes, indeed. And there is a lot of experience that fully transparent distributed object graphs are not a very good idea. The dream was that an application could ignore distribution boundaries. In practice you always want to know when you are about to traverse a highly latency reference rather than a normal essentially 0 latency reference.
>
> Allen

Terrence Cole

unread,
Nov 4, 2013, 1:10:48 PM11/4/13
to Brendan Eich, Jason Orendorff, Allen Wirfs-Brock, David Bruant, David Herman, JS Internals list
On 11/02/2013 02:59 PM, Brendan Eich wrote:
> David Bruant wrote:
>> Article version and longer talk at
>> http://www.html5rocks.com/en/tutorials/memory/effectivemanagement/
>
> Thanks.
>
>>> They showed historical plots revealing gmail bugs, V8 GC regressions,
>>> all the interesting dirt. Their tooling is great
>> I wouldn't say "great". It's state-of-the-art in web development, it's
>> the best things we have so far, but I still find the tool very hard to
>> use and
>
> You're right, I should leave superlatives to their marketing. It's all
> relative => standards are low :-P. Good news: we can do better in
> Firefox DevTools. Thanks for your help there.

It's great to know that someone is working on this! GC tooling, leak
tooling in particular, is Hard, but we should still be doing better.

>>> But when they called on JS developers to manage GC pause time, they
>>> lost me.
>>> How are JS developers supposed to manage pause time, even indirectly
>>> (by avoiding unnecessary allocations and fixing leaks)? There's no
>>> way. We won't be adding manually callable gc() built-ins to the
>>> standard.
>> They seem to take their experience with V8 as a generality. I think
>> what they meant was that allocating triggers GC which triggers copying
>> (compacting?), which costs, so be careful with allocations. But this
>> advice is hard to follow in practice.
>
> It is useless advice.
>
> Can we learn from pause-free or realtime GC work, e.g. Metronome by
> David Bacon et al. at IBM Research?

I thought Bill worked on Metronome before coming to Mozilla? In any
case, his incremental GC gives us similar control over GC/mutator
utilization: slices are tied to screen refresh and have controllable
duration. Recently, Luke tried turning down the incremental slice time
knob to see how low we could go. He found that 2ms is easily doable,
except when traversing the browser C++ heap, taking about 6ms.

Now that generational GC has added Heap<T> to the browser for
intercepting writes, we plan to incrementalize this phase as well and
target extremely low latency, low utilization GC when in "use asm".
Generational GC, obviously, is harder to schedule, but collection times
are generally less than 100us so we haven't bothered yet.

The holy grail here is concurrent GC. Unlike Java, fully concurrent GC
for JS is probably not going to be practical on 32bit platforms because
of the danger of torn Values. However, in the medium term, limited
concurrent nursery GC may be possible by bootstrapping off of Brian's
compartment-local concurrency model and a sharded heap, ala bug 902174.
Haswell's TSX instructions also provide a route forward here, albeit
only on modern 64bit platforms. We'll definitely be looking harder at
these once we've reduced the current backlog.

> /be

Brendan Eich

unread,
Nov 4, 2013, 1:14:20 PM11/4/13
to Terrence Cole, Jason Orendorff, Allen Wirfs-Brock, David Bruant, David Herman, JS Internals list
Terrence Cole wrote:
> Recently, Luke tried turning down the incremental slice time
> knob to see how low we could go. He found that2ms is easily doable,
> except when traversing the browser C++ heap, taking about6ms.

I wonder how this scales on, say, a Nexus 4 (not to drag us all down to
a ZTE Open). Anyone know?

/be

Till Schneidereit

unread,
Nov 7, 2013, 5:38:27 AM11/7/13
to Brendan Eich, Allen Wirfs-Brock, David Bruant, Terrence Cole, JS Internals list, Jason Orendorff, David Herman
This topic has come up on es-discuss again[1], with Mark Miller
referencing[2] an older post of his to the list that contains a proposal
for mitigating the issues with making GCs visible[3].


[1]: https://mail.mozilla.org/pipermail/es-discuss/2013-November/034619.html
[2]: https://mail.mozilla.org/pipermail/es-discuss/2013-November/034630.html
[3]: https://mail.mozilla.org/pipermail/es-discuss/2013-January/028542.html

Andrew McCreight

unread,
Nov 7, 2013, 10:37:49 AM11/7/13
to JS Internals list
For completeness (and irrelevant to this particular message I am replying...) I should mention that Bill pointed out in email that the cycle collector can't be completely oblivious to weak references, as I said before. Right now, when the CC identifies a JS object as garbage, it doesn't do anything to it. Instead, killing any C++ references to JS is expected to make the object garbage from the perspective of the GC.

With weak references, those dead JS objects could still be reachable via a weak reference, which in turn could reach an unlinked C++ object, which is basically a C++ object with its pointers nulled out, which is not great. So to properly support weak references, the CC would have to do a callback for every JS object into the JS engine, which would then have to kill off any weak references to those dying objects. This seems similar to the issue with generational GC, where we go from not having to do anything on destruction to having to do some explicit cleanup.

Jim Blandy

unread,
Nov 15, 2013, 1:25:07 PM11/15/13
to dev-tech-js-en...@lists.mozilla.org
On 11/01/2013 10:48 AM, Steve Fink wrote:
> Why? I agree, but only for some general reasons and some
> dimly-remembered reasons that I've encountered in the past where the
> implications turned out to be far worse than I would have initially
> thought. I'd really like to have a crisp explanation of exactly *why*
> exposing GC behavior is bad, because otherwise I feel like people will
> end up deciding they can live with the minor drawbacks they can think
> of, and proceed forward with something truly awful. (Like, for example,
> exposing the current strawman to general web content.)

None of this is anything you couldn't think of yourself, but:

Our experiences exposing GC-sensitive APIs in Debugger have been quite
negative.

Debugger.prototype.findScripts scans arenas directly to build an array
of scripts that match a given query, so its return value depends on
which scripts have been collected at the time of the call. We've bumbled
into tests that begin to fail in response to no evidently relevant
change; problems we can reproduce in the browser but not the shell; etc.

The heart of the problem is that, while GC is unpredictable /in
principle/, in practice its behavior is often stable for long periods of
time, and perhaps even across platforms. If you write a test or some
code that happens to work despite being wrong in principle, it will
usually continue to work for quite some time. Perhaps even across
multiple platforms. Then, when some other change innocently shifts the
conditions, your test or code breaks.

Since the heap is this big shared thing, GC-sensitive APIs make it
possible for perfectly legitimate changes in one part of the browser to
affect the behavior of completely unrelated code.

Jim Blandy

unread,
Nov 15, 2013, 1:48:12 PM11/15/13
to dev-tech-js-en...@lists.mozilla.org
On 11/01/2013 11:52 AM, David Bruant wrote:
> One use case of cross-vat communication is the remote debugger
> protocol implemented in Firefox/OS. I haven't taken the time to run
> over related bugs and follow its development (and probably won't for
> now, because it'd be a lot of work), but it'd be interesting to think
> of how a cross-vat protocol would have made its implementation
> easier/safer/less error-prone/less leak-prone.
> Anyone knows how leaky the remote debugger protocol is now?
Hopefully it isn't leaky at all. The protocol uses an explicit-free
model, with some help: all the actors are arranged in a tree, and
certain actions free entire subtrees at once.

So, it may leak for the same sorts of reasons malloc/free-based code
leaks, but the subtree-freeing rules are meant to reduce the amount of
bookkeeping required to the point that it can be managed.

I have no experimental evidence either way. As a programmer, I have
confidence that the way I expect things to work is probably how they're
working in practice.

0 new messages