Interner

33 views
Skip to first unread message

Blair Zajac

unread,
Jan 4, 2010, 6:36:57 PM1/4/10
to google-colle...@googlegroups.com
I'm in need of a general purpose interner, something like

public class Interner
{
public static <T> T intern(T o) {
// ...
}
}

A couple of questions.

1) Will Guava have the interner? If so, how soon?

If I need to write my own within the same week, would you mind saying
how you implemented yours?

2) Is there one Interner for all objects you would want to intern or do
you have different Interners for each class of object? I could see an
internal hash map getting pretty large if you wanted to intern a number
of different types of objects, so is there an advantage to partitioning
by object class?

3) Is the Google Interner static or does one need to instantiate an
Interner that then can be garbage collected? It seems like one would
want a single Interner across the entire application, so a static
approach would work if the interner could handle all object types.

These design questions would help lead my efforts to quickly throw one
together.

Thanks,
Blair

Dimitris Andreou

unread,
Jan 4, 2010, 7:03:47 PM1/4/10
to Blair Zajac, google-colle...@googlegroups.com
I would also be interested in implementation/performance details of
the Interner. In the meantime, the simplest solution is probably using
a MapMaper with weak keys and values (and implementing intern by
calling putIfAbsent).

Regards,
Dimitris

2010/1/5 Blair Zajac <bl...@orcaware.com>:

> --
> Google Collections Library - users list
> http://groups.google.com/group/google-collections-dev?hl=en
>
> To unsubscribe, send email to:
> google-collections...@googlegroups.com

Kevin Bourrillion

unread,
Jan 4, 2010, 7:53:16 PM1/4/10
to Dimitris Andreou, Blair Zajac, google-colle...@googlegroups.com
Sigh: how would you feel about an Interner that allocates one
throwaway instance on every single call to intern()? :-( Cause
that's the implementation we can release in any kind of soonish
timeframe. Note that several teams are using it and it does make
their lives better, even as it is suboptimal.

--
Kevin Bourrillion @ Google
internal: http://go/javalibraries
external: guava-libraries.googlecode.com

Blair Zajac

unread,
Jan 4, 2010, 8:11:11 PM1/4/10
to Dimitris Andreou, google-colle...@googlegroups.com
That's what I have now but with soft keys and for only a single class.
So yes, just looking how best to generalize it.

Blair

Dimitris Andreou

unread,
Jan 4, 2010, 8:22:06 PM1/4/10
to Kevin Bourrillion, Blair Zajac, google-colle...@googlegroups.com
So there isn't any difference to a MapMaker-based solution? Or the
latter is even more costly than that?

I had the (wrong? outdated?) impression that the problem of creating
throw-away objects was in the old ReferenceMap and was fixed in
MapMaker (since the CHM code itself was hacked, instead of merely used
as a black-box library). Anyway, it would be nice to have a
one-sentence version of the technical problem leading to this state of
affairs, so we all better understand what you are dealing with.

Dimitris

2010/1/5 Kevin Bourrillion <kev...@google.com>:

Kevin Bourrillion

unread,
Jan 4, 2010, 8:55:32 PM1/4/10
to Dimitris Andreou, Blair Zajac, google-colle...@googlegroups.com
On Mon, Jan 4, 2010 at 5:22 PM, Dimitris Andreou <jim.a...@gmail.com> wrote:
> So there isn't any difference to a MapMaker-based solution? Or the
> latter is even more costly than that?
>
> I had the (wrong? outdated?) impression that the problem of creating
> throw-away objects was in the old ReferenceMap and was fixed in
> MapMaker (since the CHM code itself was hacked, instead of merely used
> as a black-box library).

That's correct. However, I wanted to avoid building Interner on top
of ReferenceMap because, worse than just creating a *throwaway* extra
instance, that would require creating two separate weak references to
stick in the map (as key and value) indefinitely for every interned
instance.

I've roughly edited our code and attached it (apologies if it doesn't
totally compile; it's probably close to compiling at least). As you
can see, it's... nontrivial. Do you think it can be simplified?
Anyway, a future edition of MapMaker will allow for a custom
Equivalence<T> (and there was much rejoicing) so that may help.

Interner.java
Interners.java

Dimitris Andreou

unread,
Jan 4, 2010, 9:40:33 PM1/4/10
to Kevin Bourrillion, Blair Zajac, google-colle...@googlegroups.com
First of all, thanks for sharing!

2010/1/5 Kevin Bourrillion <kev...@google.com>:


> On Mon, Jan 4, 2010 at 5:22 PM, Dimitris Andreou <jim.a...@gmail.com> wrote:
>> So there isn't any difference to a MapMaker-based solution? Or the
>> latter is even more costly than that?
>>
>> I had the (wrong? outdated?) impression that the problem of creating
>> throw-away objects was in the old ReferenceMap and was fixed in
>> MapMaker (since the CHM code itself was hacked, instead of merely used
>> as a black-box library).
>
> That's correct.  However, I wanted to avoid building Interner on top
> of ReferenceMap because, worse than just creating a *throwaway* extra
> instance, that would require creating two separate weak references to
> stick in the map (as key and value) indefinitely for every interned
> instance.

I see. Whereas this implementation uses the same reference object both
as key and value.

>
> I've roughly edited our code and attached it (apologies if it doesn't
> totally compile; it's probably close to compiling at least).  As you
> can see, it's... nontrivial.  Do you think it can be simplified?
> Anyway, a future edition of MapMaker will allow for a custom
> Equivalence<T> (and there was much rejoicing) so that may help.

It does compile, and I like it very much. It might have been a bit
happier with an Equivalence, but, come on, this is the definition of a
non-escaping object that can safely be stack allocated (more
accurately: not allocated at all) in an appropriate VM (disclaimer: I
don't follow closely the state of escape analysis). I don't think this
is a problem that should be worrying you. Is this the reason you are
reluctant to release this?

I also agree with your approach of ignoring the possibility of soft
references - an interner doesn't have to also be a cache. But if this
becomes popular, people probably will ask for that option too :)

All in all, it looks like a very reasonable implementation to me, thanks.

Dimitris

Blair Zajac

unread,
Jan 4, 2010, 10:01:28 PM1/4/10
to Kevin Bourrillion, Dimitris Andreou, google-colle...@googlegroups.com
How about just posting the code to the mailing list for now and we can
see if it should go into Guava?

Blair

Kevin Bourrillion

unread,
Jan 4, 2010, 10:02:06 PM1/4/10
to Dimitris Andreou, Blair Zajac, google-colle...@googlegroups.com
On Mon, Jan 4, 2010 at 6:40 PM, Dimitris Andreou <jim.a...@gmail.com> wrote:

> It does compile, and I like it very much.

Thanks!


> It might have been a bit
> happier with an Equivalence, but, come on, this is the definition of a
> non-escaping object that can safely be stack allocated (more
> accurately: not allocated at all) in an appropriate VM (disclaimer: I
> don't follow closely the state of escape analysis). I don't think this
> is a problem that should be worrying you. Is this the reason you are
> reluctant to release this?

That's it entirely. :-) Guess I'll get it into Guava as it is then eh?


> I also agree with your approach of ignoring the possibility of soft
> references - an interner doesn't have to also be a cache. But if this
> becomes popular, people probably will ask for that option too :)

Yes, I don't think soft references make any sense here. Nor do
strong, especially, except that I expect that version to be more
efficient -- but one day, I'll test that and see.

Blair Zajac

unread,
Jan 4, 2010, 10:02:53 PM1/4/10
to Kevin Bourrillion, Dimitris Andreou, google-colle...@googlegroups.com
Never mind, just saw your other note with the code. Don't reply to any
threads until one has read them all :)

Blair

Dimitris Andreou

unread,
Jan 4, 2010, 10:28:46 PM1/4/10
to Kevin Bourrillion, Blair Zajac, google-colle...@googlegroups.com
2010/1/5 Kevin Bourrillion <kev...@google.com>:

> On Mon, Jan 4, 2010 at 6:40 PM, Dimitris Andreou <jim.a...@gmail.com> wrote:
>
>> It does compile, and I like it very much.
>
> Thanks!
>
>
>> It might have been a bit
>> happier with an Equivalence, but, come on, this is the definition of a
>> non-escaping object that can safely be stack allocated (more
>> accurately: not allocated at all) in an appropriate VM (disclaimer: I
>> don't follow closely the state of escape analysis). I don't think this
>> is a problem that should be worrying you. Is this the reason you are
>> reluctant to release this?
>
> That's it entirely. :-)  Guess I'll get it into Guava as it is then eh?

Yes, please do, angels will be singing songs of joy. :)

This is the magic bit that should be mentioned:
-XX:+DoEscapeAnalysis (under HotSpot/6u14)
(http://java.sun.com/javase/6/webnotes/6u14.html)

At some point someone, or me, should perhaps compare String.intern()
with a (weak) Interner<String>, to have a feel of its relative
performance. (Of course it's unfair, but still).

Regards,
Dimitris

Blair Zajac

unread,
Jan 4, 2010, 10:39:30 PM1/4/10
to Dimitris Andreou, Kevin Bourrillion, google-colle...@googlegroups.com
On 1/4/10 7:28 PM, Dimitris Andreou wrote:
> 2010/1/5 Kevin Bourrillion<kev...@google.com>:
>> On Mon, Jan 4, 2010 at 6:40 PM, Dimitris Andreou<jim.a...@gmail.com> wrote:
>>
>> That's it entirely. :-) Guess I'll get it into Guava as it is then eh?
>
> Yes, please do, angels will be singing songs of joy. :)
>
> This is the magic bit that should be mentioned:
> -XX:+DoEscapeAnalysis (under HotSpot/6u14)
> (http://java.sun.com/javase/6/webnotes/6u14.html)

It looks like escape analysis is being removed in 6u18:

http://mail.openjdk.java.net/pipermail/jdk6-dev/2009-December/001043.html

Blair

Blair Zajac

unread,
Jan 4, 2010, 10:53:02 PM1/4/10
to Dimitris Andreou, Kevin Bourrillion, google-colle...@googlegroups.com
On 1/4/10 7:28 PM, Dimitris Andreou wrote:
> 2010/1/5 Kevin Bourrillion<kev...@google.com>:
>> On Mon, Jan 4, 2010 at 6:40 PM, Dimitris Andreou<jim.a...@gmail.com> wrote:
>>

> At some point someone, or me, should perhaps compare String.intern()
> with a (weak) Interner<String>, to have a feel of its relative
> performance. (Of course it's unfair, but still).

Also, I understand once you use String#intern() it will never be GCed,
so even if Interner<String> is slower it won't consume all available memory.

Blair

Dimitris Andreou

unread,
Jan 4, 2010, 11:04:12 PM1/4/10
to Blair Zajac, Kevin Bourrillion, google-colle...@googlegroups.com
2010/1/5 Blair Zajac <bl...@orcaware.com>:

They are not "removing" it, just not enabling it by default. Though
the cryptic "Currently ... it could produce incorrect results" is not
too reassuring. Lets hope they manage to fix it soon.

Dimitris

Dimitris Andreou

unread,
Jan 4, 2010, 11:05:37 PM1/4/10
to Blair Zajac, Kevin Bourrillion, google-colle...@googlegroups.com
2010/1/5 Blair Zajac <bl...@orcaware.com>:

Various sources claim the opposite. In any case, I can't make the
following to throw an OOME:

public static void main(String[] args) {
int sum = 0;
for (int i = 0; i < 1000000; i++) {
char[] c = new char[640];
for (int j = 0; j < c.length; j++) {
c[j++] = (char)i;
}

sum += new String(c).intern().hashCode();
}
System.out.println(sum);
}

Dimitris

Blair Zajac

unread,
Jan 5, 2010, 12:20:25 AM1/5/10
to Kevin Bourrillion, Dimitris Andreou, google-colle...@googlegroups.com

Thanks Kevin.

One question, why can't the fakeReference just be the InternReference so
you don't have to create a dummy reference? Won't using the same
InternReference just work? I see they have equals() implemented
differently, but using InternReference for the first get looks like it
would work.

If you wanted multiple interners, which I'll need for different types, I
don't want to carry around too many, so I'm thinking of an interface
like this:

public interface Interner
{
<E> E intern(E sample);
}

Then InternReference would extend FinalizableWeakReference<Object> and
at the end of intern() there would be a check like this:

Object canonical = sneakyRef.get();
if (canonical != null) {
return sample.getClass().cast(canonical);
}

Regards,
Blair

Kevin Bourrillion

unread,
Jan 5, 2010, 12:25:12 AM1/5/10
to Blair Zajac, Dimitris Andreou, google-colle...@googlegroups.com
On Mon, Jan 4, 2010 at 9:20 PM, Blair Zajac <bl...@orcaware.com> wrote:

One question, why can't the fakeReference just be the InternReference so you don't have to create a dummy reference?  Won't using the same InternReference just work?  I see they have equals() implemented differently, but using InternReference for the first get looks like it would work.

Will think about this when my brain works again!

 
If you wanted multiple interners, which I'll need for different types, I don't want to carry around too many, so I'm thinking of an interface like this:

public interface Interner
{
   <E> E intern(E sample);
}

You're absolutely right!  I am dismayed to find out that the code I attached still uses the wrong API.  I thought I had updated it.  

 
Then InternReference would extend FinalizableWeakReference<Object> and at the end of intern() there would be a check like this:

         Object canonical = sneakyRef.get();
         if (canonical != null) {
           return sample.getClass().cast(canonical);
         }

again with the thinking, when the braining working is.


Blair Zajac

unread,
Jan 5, 2010, 12:35:20 AM1/5/10
to Kevin Bourrillion, Dimitris Andreou, google-colle...@googlegroups.com
On 1/4/10 5:55 PM, Kevin Bourrillion wrote:

Kevin,

Here's a small patch that returns the number of instances are currently
maintained by the interner. This lets me unit test the interner by
doing a this at the end of a check

// Scala code
val interner = Interners.newWeakInterner[String]

// Use interner
// ....

System.gc()
System.runFinalization()

Thread.sleep(1000)

assertEquals(0, interner.size)

This works on Sun's JVM on Linux and Mac's.

Blair

add-an-interner-size-method-so.txt

Pat Farrell

unread,
Jan 5, 2010, 2:57:39 AM1/5/10
to Blair Zajac, google-colle...@googlegroups.com
On Mon, Jan 4, 2010 at 6:36 PM, Blair Zajac <bl...@orcaware.com> wrote:
I'm in need of a general purpose interner, something like

public class Interner
1) Will Guava have the interner?  If so, how soon?


Can someone help me understand what this is about. Is this a typo of Iterator? or something related? or is there really such a thing as an "interner"?

Nanda Firdausi

unread,
Jan 5, 2010, 3:01:39 AM1/5/10
to Pat Farrell, Blair Zajac, google-colle...@googlegroups.com
It's like String.intern() for any object. The reason behind interner
is usually memory optimization.

CMIIW
--
Nanda Firdausi Muhammad
http://satukubik.com

Dimitris Andreou

unread,
Jan 5, 2010, 4:16:49 AM1/5/10
to Blair Zajac, Kevin Bourrillion, google-colle...@googlegroups.com
2010/1/5 Blair Zajac <bl...@orcaware.com>:

I don't see a reason why this wouldn't work, the effect would be the
same - looking up an existing InternReference via a get(). But it is
seems quite more work to create such an object (compared to a simple
2-field object). There are 4 constructors that need to be called, as
well as FinalizableReferenceQueue#cleanUp().

I also more skeptical about escape analysis than yesterday. Not
entirely sure how easy it is to find out that fakeReference will be
only used in CHM's get() and in CHM$Segment.get(), i.e. it reaches two
levels lower in the call-stack. Hopefully it will be smart enough -
but if someone knows the right people to ask whether this can happen,
please do. Also, the map should be declared with type of
ConcurrentHashMap, instead of ConcurrentMap, to help the system
understand the target of the get() method.

In any case, if escape analysis fails to help, the code can always be
fixed by an introduced Equivalence, same api, just faster.

>
> If you wanted multiple interners, which I'll need for different types, I
> don't want to carry around too many, so I'm thinking of an interface like
> this:
>
> public interface Interner
> {
>    <E> E intern(E sample);
> }
>
> Then InternReference would extend FinalizableWeakReference<Object> and at
> the end of intern() there would be a check like this:
>
>          Object canonical = sneakyRef.get();
>          if (canonical != null) {
>            return sample.getClass().cast(canonical);
>          }

Just a minor note here: getClass() in this expression yields Class<?
extends Object>, not Class<E>, so to make this work, the code will
eventually need to bypass an unchecked warning.

Regards,
Dimitris

>
> Regards,
> Blair
>

Reply all
Reply to author
Forward
0 new messages