weakref keywords RFC

8 views
Skip to first unread message

Kevin Downey

unread,
Jun 17, 2010, 10:58:20 PM6/17/10
to Clojure Dev
Keywords are currently stored in a ConcurrentHashMap<Symbol, Keyword>,
I have a patch that changes it to a WeakHashMap<Symbol,
WeakReference<Keyword>>

this allows keywords to be garbage collected. the weakhashmap does
require locking. the reason for both the weakhashmap and the
weakreference is that keywords store a ref to the corresponding symbol
(which is used for equality). if there is no reference to the symbol
(which cannot happen if the corresponding keyword exists) the whole
mapentry can be gc'ed. if there is no reference to the keyword, the
keyword can be gc'ed, in which case it no longer refs the symbol, and
if that was the last reference to the symbol, the weakhashmap can drop
the entry.

the current patch also unfortunately includes a lot of whitespace
changes, which may or may not be acceptable.

https://gist.github.com/4c9fce8ea6f611589ea3

--
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Alex Miller

unread,
Jun 18, 2010, 1:29:16 AM6/18/10
to cloju...@googlegroups.com
I assume you have a test that verifies the before/after GC cleanup behavior? Is there some poster-child example program where keywords pollute the heap and OOME?

It's really painful to give up the Concurrent part of the CHM for this - this introduces a JVM-wide global lock in Keyword which is hit on every write AND read and that critical section is a lot wider. You're unlikely to see any difference with a small number of threads, but for certain multi-threaded programs, I can imagine this would be tragically bad. It strikes me that indeed the very programs that are most likely to suffer heap pollution issues are also most likely to need the concurrency here.

[The Map implementations in the JDK are actually my canonical example of why inheritance sucks and composition rules. No way to mix/match Concurrent, Identity, Weak key behavior. Google Collections did a far better job with MapMaker.]

Presuming Clojure doesn't want a dependency on google collections, it is possible to basically start from the public domain version of CHM and modify it to add the Weak key behavior to get a WeakConcurrentHashMap. That has the downside of introducing nasty mostly-copied collection code into clojure of course.

Alex


> href="https://gist.github.com/4c9fce8ea6f611589ea3" target=_blank
> >https://gist.github.com/4c9fce8ea6f611589ea3

--
And what is
> good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these
> things?

--
You received this message because you are subscribed to
> the Google Groups "Clojure Dev" group.
To post to this group, send email to
>
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
To
> unsubscribe from this group, send email to clojure-dev+
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
For
> more options, visit this group at
> http://groups.google.com/group/clojure-dev?hl=en.

Kevin Downey

unread,
Jun 18, 2010, 12:55:57 PM6/18/10
to cloju...@googlegroups.com
well

user=> (dotimes [_ 1e9] (keyword (name (gensym))))

has been running for almost 10 hours now, so I think keywords are getting gc'ed

> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.


> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
>
>

--

Kevin Downey

unread,
Jun 18, 2010, 12:57:17 PM6/18/10
to cloju...@googlegroups.com
without the patch it fails in a few minutes

Alex Miller

unread,
Jun 18, 2010, 2:38:58 PM6/18/10
to cloju...@googlegroups.com
Cool, always good to have a reproducible test. It would be nice to extend that to N threads and bash on it to see if the concurrency aspect is obvious.

Seems like we can take as requirements:
1) keywords must be interned (identity semantics, not just equality)
2) keywords are dynamic and may be introduced over the life of a running process
3) unreferenced keywords could be reclaimed without affecting programs
4) creating keyword instances should be fast and highly concurrent

We're trying to address #3 but the proposed solution steps backwards on #4 (by inspection at least).

I think one improvement would be to do a (safe) double-check (my apologies if I typo as I didn't try this in an IDE):

private static Map<Symbol, WeakReference<Keyword>> table = new WeakHashMap();
private static ReadWriteLock lock = new ReentrantReadWriteLock();


public static Keyword intern(Symbol sym) {
WeakReference<Keyword> existingKey = null;
try {
// do checks under the READ lock (allowing concurrent checks)
lock.readLock().lock();
existingKey = table.get(sym);
if(existingKey != null) {
Keyword k = existingKey.get();
if(k != null) {
return k; // common case of returning an existing keyword
}
} finally {
// you cannot upgrade a RRWL from read to write, must relinquish read
lock.readLock().unlock();
}

// create this outside the critical section
newKey = new WeakReference(new Keyword(sym));

// acquire WRITE lock and RE-check if some other thread beat us in the gap
lock.writeLock().lock();
try {
checkKey = table.get(sym);
if(checkKey != null) {
Keyword k = checkKey.get();
if(k != null) {
// another thread beat us, so use his keyword
return k;
}
}

// otherwise, we won and created the new key
table.put(sym,newKey);
return newKey;

} finally {
lock.writeLock().unlock();
}
}


This implementation at least gives you read concurrency on the checks and moves a bit more code outside the critical section. Writes on new keywords will still collide.

I think the ideal solution would be to create a ConcurrentWeakHashMap and use putIfAbsent though.

Alex

----- Original Message ----
> From: Kevin Downey <red...@gmail.com>
> To: cloju...@googlegroups.com
> Sent: Fri, June 18, 2010 11:57:17 AM
> Subject: Re: weakref keywords RFC
>
> without the patch it fails in a few minutes

On Fri, Jun 18, 2010 at 9:55
> AM, Kevin Downey <

> href="mailto:red...@gmail.com">red...@gmail.com> wrote:
>
> well
>
> user=> (dotimes [_ 1e9] (keyword (name
> (gensym))))
>
> has been running for almost 10 hours now, so I think
> keywords are getting gc'ed
>
> On Thu, Jun 17, 2010 at 10:29 PM,
> Alex Miller <

> href="mailto:alexd...@yahoo.com">alexd...@yahoo.com>

> ymailto="mailto:red...@gmail.com"
> href="mailto:red...@gmail.com">red...@gmail.com>
>>> To:
> Clojure Dev <
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com>

> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com">
> ymailto="mailto:cloju...@googlegroups.com"

> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
>>
> To
>>> unsubscribe from this group, send email to
> clojure-dev+
>>> ymailto="mailto:
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com"
>>>

> href="mailto:
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">


> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
>>
> For
>>> more options, visit this group at
>>>
> http://groups.google.com/group/clojure-dev?hl=en.
>>
>>
> --
>> You received this message because you are subscribed to the
> Google Groups "Clojure Dev" group.
>> To post to this group, send email
> to
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
>>
> To unsubscribe from this group, send email to clojure-dev+
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
>>
> For more options, visit this group at

> href="http://groups.google.com/group/clojure-dev?hl=en" target=_blank

> >http://groups.google.com/group/clojure-dev?hl=en.
>>
>>
>
>
>
>
> --
> And what is good, Phaedrus,
> And what is not good—
>
> Need we ask anyone to tell us these things?
>

--
And
> what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell
> us these things?

--

You received this message because you are
> subscribed to the Google Groups "Clojure Dev" group.
To post to this group,
> send email to
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
To
> unsubscribe from this group, send email to clojure-dev+
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
For
> more options, visit this group at

> href="http://groups.google.com/group/clojure-dev?hl=en" target=_blank
> >http://groups.google.com/group/clojure-dev?hl=en.

Kevin Downey

unread,
Jun 18, 2010, 3:56:14 PM6/18/10
to cloju...@googlegroups.com
I think adding an extra lock introduces more complexity. do you have
benchmarks where it makes a difference?

> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.


> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
>
>

--

Alex Miller

unread,
Jun 18, 2010, 11:56:32 PM6/18/10
to cloju...@googlegroups.com
I didn't add any locks - I just replaced the ReentrantReadLock with a ReentrantReadWriteLock.

I have not written any tests, I'm just going by the reasoning that a single global lock on both reads and writes will be an issue. I started working on a multi-threaded test for you but I don't know when I'll have time to complete it.

Alex

----- Original Message ----
> From: Kevin Downey <red...@gmail.com>
> To: cloju...@googlegroups.com
> Sent: Fri, June 18, 2010 2:56:14 PM
> Subject: Re: weakref keywords RFC
>

> I think adding an extra lock introduces more complexity. do you
> have
benchmarks where it makes a difference?

On Fri, Jun 18, 2010 at
> 11:38 AM, Alex Miller <

> Sent: Fri, June 18, 2010 11:57:17 AM
>> Subject: Re: weakref keywords
> RFC
>>
>> without the patch it fails in a few
> minutes
>
> On Fri, Jun 18, 2010 at 9:55
>> AM, Kevin
> Downey <
>> href="mailto:

> href="mailto:red...@gmail.com">red...@gmail.com">


> ymailto="mailto:red...@gmail.com"
> href="mailto:red...@gmail.com">red...@gmail.com>

> wrote:
>>
>> well
>>
>> user=> (dotimes
> [_ 1e9] (keyword (name
>> (gensym))))
>>
>> has been
> running for almost 10 hours now, so I think
>> keywords are getting
> gc'ed
>>
>> On Thu, Jun 17, 2010 at 10:29 PM,
>> Alex
> Miller <
>> href="mailto:

> href="mailto:alexd...@yahoo.com">alexd...@yahoo.com">
> ymailto="mailto:alexd...@yahoo.com"

> href="mailto:red...@gmail.com">red...@gmail.com"
>>
> href="mailto:
> href="mailto:red...@gmail.com">red...@gmail.com">


> ymailto="mailto:red...@gmail.com"
> href="mailto:red...@gmail.com">red...@gmail.com>
>>>>
> To:
>> Clojure Dev <
>> href="mailto:

> target=_blank >https://gist.github.com/4c9fce8ea6f611589ea3"
> target=_blank

> --
>>> And what is
>>>> good,
> Phaedrus,
>>> And what
>> is not good—
>>> Need
> we ask anyone to tell us these
>>>>
>>
> things?
>>>
>>> --
>>> You received this
> message because
>> you are subscribed to
>>>> the Google
> Groups "Clojure Dev"
>> group.
>>> To post to this group,
> send email
>> to
>>>>
>>>>
> href="mailto:
>> ymailto="mailto:
> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com"
>>

> href="mailto:

> ymailto="mailto:
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com"
>>
> href="mailto:


> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com">
> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
>>>
>>
> To
>>>> unsubscribe from this group, send email to
>>
> clojure-dev+
>>>> ymailto="mailto:
>> ymailto="mailto:
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com"
>>
> href="mailto:
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com"
>>>>
>>
> href="mailto:
>> href="mailto:
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">
>>

> ymailto="mailto:


> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com"
>>
> href="mailto:
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
>>>
>>
> For
>>>> more options, visit this group
> at
>>>>
>>
> http://groups.google.com/group/clojure-dev?hl=en.
>>>
>>>
>>
> --
>>> You received this message because you are subscribed to
> the
>> Google Groups "Clojure Dev" group.
>>> To post to
> this group, send email
>> to
>> href="mailto:

> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com">
> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
>>>
>>
> To unsubscribe from this group, send email to clojure-dev+
>>
> ymailto="mailto:

> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com"
>>
> href="mailto:
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
>>>
>>
> For more options, visit this group at

>> href="

> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com">
> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
>
> To
>> unsubscribe from this group, send email to
> clojure-dev+
>> ymailto="mailto:
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com"
>>
> href="mailto:
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com">
> ymailto="mailto:unsub...@googlegroups.com"
> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com.
>
> For
>> more options, visit this group at

>> href="
> href="http://groups.google.com/group/clojure-dev?hl=en" target=_blank

> >http://groups.google.com/group/clojure-dev?hl=en" target=_blank
>>
> >
> >http://groups.google.com/group/clojure-dev?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure Dev" group.
> To post to this group, send email to

> ymailto="mailto:cloju...@googlegroups.com"
> href="mailto:cloju...@googlegroups.com">cloju...@googlegroups.com.
>
> To unsubscribe from this group, send email to clojure-dev+

Kevin Downey

unread,
Jun 29, 2010, 5:01:59 PM6/29/10
to cloju...@googlegroups.com
here is the patch without all the whitespace munging

> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.


> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
>
>

--

weakref-keywords2.diff

Paul Stadig

unread,
Jul 13, 2010, 5:52:28 PM7/13/10
to cloju...@googlegroups.com
On Tue, Jun 29, 2010 at 5:01 PM, Kevin Downey <red...@gmail.com> wrote:
here is the patch without all the whitespace munging


What's the status of this?

I share Alex's concern about a single global lock. However, I think realistically throughput on creating Keywords wouldn't be an issue. Keywords have identity semantics precisely because they are meant to be write once and read many. Dynamic creation of keywords would generally happen with user input, which is probably a bad idea anyway. Also, the fact that this bug has taken so long to become an issue probably speaks to the fact that people aren't generally writing Clojure code that either creates lots of keywords, or is long running, or both.

All that said, I like Alex's idea of using a read/write lock and/or a WeakConcurrentHashMap. I don't think it is worth introducing a whole new data structure just for this one issue, so perhaps the read/write lock is the most practical. There is no reason to unnecessarily create a bottleneck.

This may not be a serious and frequent issue, but I think it definitely needs to be fixed. For each un-GC'ed Keyword you have: a Keyword object, a Symbol object, a hash map entry object (from the CHM), a String object, and a byte array.  So the memory piles up rather quickly.


Paul

Alex Miller

unread,
Jul 14, 2010, 10:52:22 PM7/14/10
to cloju...@googlegroups.com
I'd definitely like to help on this...just no free time for me right now!

I agree that the gc is a serious must-fix kind of issue but my history in scaling concurrent apps makes me wary of any code that creates a global lock on reads.  Maybe I'm weirdly biased in this way. :)    

Our app happens to use dynamic keywords (based on data, not user input) so it's the kind of thing we could probably see if we were actually running it in production somewhere long enough.  


From: Paul Stadig <pa...@stadig.name>
To: cloju...@googlegroups.com
Sent: Tue, July 13, 2010 4:52:28 PM

Subject: Re: weakref keywords RFC

Rich Hickey

unread,
Jul 16, 2010, 8:54:36 AM7/16/10
to Clojure Dev


On Jul 13, 5:52 pm, Paul Stadig <p...@stadig.name> wrote:
Fixed, thanks for the report and suggestions. Used concurrent map to
soft refs, no locks needed.

Rich
Reply all
Reply to author
Forward
0 new messages