A report on leaks in reflection

Eugene Burmako

unread,

Sep 22, 2012, 8:32:59 AM9/22/12

to scala-internals, martin....@epfl.ch, pa...@improving.org

Here's the list of top-level vals/vars/objects declared in
scala.reflect.internals and scala.reflect.runtime: https://gist.github.com/3760764.

Of those we have four things that are supposedly leaking:
1) Symbols.originalOwner (easy to fix => just don't track original
owners in reflection, because they are only used in GenASM/GenJVM)
2) Types.lubResults/glbResults (these are cleaned up by public lub/glb
functions, but there are non-public overloads of lub/glb, which don't
clean up the cache and are called by public stuff like <:< and
baseClasses)
3) Types.undoLog (only cleaned up in typer cleanup, which never
happens in reflection)
4) Types.uniques (we already discussed this one)

Toolboxes are currently leaking as well, but just a bit. Most caches
are cleaned up correctly, except for perRunCaches used in typeCheck.
perRunCaches are only flushed in Global.compileUnits, which isn't
called by typeCheck. That's a minor problem I think, since we can just
call perRunCaches.clear in typeCheck.

Would be great to hear your feedback.

Paul Phillips

unread,

Sep 22, 2012, 9:42:36 AM9/22/12

to Eugene Burmako, scala-internals, martin....@epfl.ch

On Sat, Sep 22, 2012 at 5:32 AM, Eugene Burmako <eugene....@epfl.ch> wrote:

Toolboxes are currently leaking as well, but just a bit. Most caches
are cleaned up correctly, except for perRunCaches used in typeCheck.
perRunCaches are only flushed in Global.compileUnits, which isn't
called by typeCheck. That's a minor problem I think, since we can just
call perRunCaches.clear in typeCheck.

I would use perRunCaches for all of these. uniques already does. perRunCaches lets you clear everything at once, in the same way, and makes it possible to track what's in all the caches - I have code which does a lot of that, including being signallable so you can get a dump of their current contents, but I never checked it in.

"Most caches are cleaned up correctly, except for perRunCaches" sounds a bit like "I can run faster than most people, except for those who have two legs." perRunCaches is a lot of caches.

Eugene Burmako

unread,

Sep 22, 2012, 10:15:30 AM9/22/12

to Paul Phillips, scala-internals, martin....@epfl.ch

Hehe, I wanted to say "most of the time caches are cleaned up correctly, except when perRunCaches are used in typeCheck".

But toolboxes are easy. To the contrast, we can't clean perRunCaches for reflection, because there's only a single run there.

Paul Phillips

unread,

Sep 22, 2012, 11:15:51 AM9/22/12

to Eugene Burmako, scala-internals, martin....@epfl.ch

On Sat, Sep 22, 2012 at 7:15 AM, Eugene Burmako <eugene....@epfl.ch> wrote:

But toolboxes are easy. To the contrast, we can't clean perRunCaches for reflection, because there's only a single run there.

I don't know what that means in detail, but I know that if we can't use perRunCaches, we can use something very like perRunCaches. The key characteristics are that caches are created centrally, indicate their lifetime and strong/soft/weak/correctness/performance/synchronization attributes at creation, and that they are cleared in a predictable way at predictable times.

Eugene Burmako

unread,

Sep 23, 2012, 1:05:26 PM9/23/12

to scala-internals, Aleksandar Prokopec

Here's some progress: https://github.com/scala/scala/pull/1385

Questions:

1) How do I deterministically trigger a full gc? Currently I just call
runtime.gc several times in a row, but this can't be an answer.

2) What standard data structure would be the best for implementing
`uniques`? Currently all I can think of is WeakHashMap[Type,
WeakReference[Type]]. I need the data structure to support a single
operation: findEntryOrUpdate(x: T), which finds and returns entry by
x.##, or updates the map with x and returns x.

On Sep 22, 5:15 pm, Paul Phillips <pa...@improving.org> wrote:

Paul Phillips

unread,

Sep 23, 2012, 1:27:02 PM9/23/12

to scala-i...@googlegroups.com, Aleksandar Prokopec

On Sun, Sep 23, 2012 at 10:05 AM, Eugene Burmako <eugene....@epfl.ch> wrote:

1) How do I deterministically trigger a full gc? Currently I just call
runtime.gc several times in a row, but this can't be an answer.

If you are determined enough, you could call it in a loop until a "Full GC" event is registered...

% scala3 -J-XX:+PrintGC

Welcome to Scala version 2.11.0-20120921-015043-4adfcfe6ff (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_06).

Type in expressions to have them evaluated.

Type :help for more information.

scala> System.gc

[GC 39021K->34121K(107456K), 0.0040590 secs]

[Full GC 34121K->32740K(107456K), 0.1231400 secs]

If you are having difficulty making it collect that way, you might throw in these flags:

% scala3 -J-XX:+UseParallelGC -J-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses -J-XX:+PrintGC

2) What standard data structure would be the best for implementing
`uniques`? Currently all I can think of is WeakHashMap[Type,
WeakReference[Type]]. I need the data structure to support a single
operation: findEntryOrUpdate(x: T), which finds and returns entry by
x.##, or updates the map with x and returns x.

I don't have a good answer to this, but messing with uniques may prove a challenge. I hope we're not adding any interesting amount of overhead this way; my intuition is that there is some better way to handle reflection leakage than making this a weak cache.

Rex Kerr

unread,

Sep 23, 2012, 1:37:46 PM9/23/12

to scala-i...@googlegroups.com

System.gc() invokes a full GC every time in my hands (though I don't think it's guaranteed). If you have the concurrent GC running, you might be confused by asking for gc and then doing things before it completes. So I'd take Paul's advice _without_ the +ExplicitGCInvokesConcurrent, since you don't want the concurrent GC. +UseParallelGC is a good idea.

--Rex

Eugene Burmako

unread,

Sep 23, 2012, 2:07:49 PM9/23/12

to scala-i...@googlegroups.com, Aleksandar Prokopec

I would be terrified to touch scala.reflect.internal.Types.uniques.

The `unique` method is already overriden in ReflectedSymbols, and I'm creating and using my own uniques there.

Eugene Burmako

unread,

Sep 24, 2012, 8:32:58 AM9/24/12

to scala-internals

Paolo Giarrusso

unread,

Sep 30, 2012, 5:00:30 PM9/30/12

to scala-i...@googlegroups.com

I'm forwarding, with permissions of the participants, a side
discussion which branched off-list. We ended up finding a small leak
in WeakHashSet (which affects mostly the presentation compiler, and
potential future users who aren't as alert as Eugene).

---------- Forwarded message ----------
From: Eugene Burmako <eugene....@epfl.ch>
Date: Mon, Sep 24, 2012 at 10:07 AM
Subject: Re: A report on leaks in reflection
To: "Paolo G. Giarrusso" <p.gia...@gmail.com>

Isn't WeakHashSet leaking? It looks like weak references will remain
there forever, even though their contents will be collected.

On 24 September 2012 02:23, Paolo G. Giarrusso <p.gia...@gmail.com> wrote:

>
>
>
> Il giorno domenica 23 settembre 2012 19:05:27 UTC+2, Eugene Burmako ha scritto:
>>
>> Here's some progress: https://github.com/scala/scala/pull/1385
>>
>> Questions:
>>

>> 2) What standard data structure would be the best for implementing
>> `uniques`? Currently all I can think of is WeakHashMap[Type,
>> WeakReference[Type]]. I need the data structure to support a single
>> operation: findEntryOrUpdate(x: T), which finds and returns entry by
>> x.##, or updates the map with x and returns x.
>

> You seem to be describing WeakHashSet, which according to Google has been written many times (in Java&Scala).
> I'd add findEntryOrUpdate to src/reflect/scala/reflect/internal/util/WeakHashSet.scala. Probably you'd need the same method in FlatHashTable (without exposing it to the world, unlike existing methods which are part of the public API of HashSet even if they shouldn't); it seems you could mix addEntry and containsEntry, following src/reflect/scala/reflect/internal/util/HashSet.scala as a template.
>
> Best,
> Paolo

--
Paolo G. Giarrusso - Ph.D. Student, Philipps-University Marburg
http://www.informatik.uni-marburg.de/~pgiarrusso/

Paolo Giarrusso

unread,

Sep 30, 2012, 5:01:07 PM9/30/12

to scala-i...@googlegroups.com

---------- Forwarded message ----------
From: iulian dragos <jagu...@gmail.com>
Date: Tue, Sep 25, 2012 at 2:16 PM
Subject: Re: A report on leaks in reflection

To: Paolo Giarrusso <p.gia...@gmail.com>
Cc: Eugene Burmako <eugene....@epfl.ch>

Hi Paolo,

On Mon, Sep 24, 2012 at 1:10 PM, Paolo Giarrusso <p.gia...@gmail.com> wrote:

>
> On Mon, Sep 24, 2012 at 10:07 AM, Eugene Burmako <eugene....@epfl.ch> wrote:
> > Isn't WeakHashSet leaking? It looks like weak references will remain there
> > forever, even though their contents will be collected.

I didn't see the previous message, so I don't know if there is any
ongoing discussion on our implementation of WeakHashSet.

>
>
> I believe you're right, so it seems WeakHashSet does _not_ implement
> the intended specification - the one suggested by its name. It should
> get fixed, right? Isn't then WeakHashSet a (quite small) disaster
> waiting to happen, since it's used (see
> 568546ef3f1073febc9bfc6baf63eceaf92213b6)? It's used in a perRunCache,
> but the commit strongly suggests that at least when it was introduced
> the cache was not cleaned often enough, so a working weak hash set is
> still required and there is probably a (smaller) memory leak (maybe
> restricted to the presentation compiler).

AFAIK it leaks entries in the backing hashset, but they are indeed
much smaller the original leak. PerRunCaches wasn't working well, I
believe it does what it says it does, but we still needed the weak set
or otherwise a single run would need too much memory.

>
> Iulian, since you wrote the code, could you confirm or deny our suspects?
>
> For the existing use case (but not for Eugene's one, I fear) the
> safest implementation for WeakHashSet would be to wrap this Java code
> [1]:
>
> Set<T> weakHashSet = Collections.newSetFromMap(
> new WeakHashMap<T, Boolean>());
>
> [1] http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29
>

I totally agree. This is the correct way to do it, I should have read
up on it before rolling my own.

>
> For Eugene's use case, I guess that at this point in the release cycle
> your original idea is the safest - this isn't the right time to write
> a new implementation for efficiency (if efficiency is needed at all),
> so I retract my original suggestion.
> WeakHashMap[T, WeakReference[T]] sounds fine for your case. Be careful
> not to expose it as a too general utility: if the key and the value
> pointed to different values, entries and keys would be leaked when
> values are collected; if that were a problem for you (it shouldn't
> be), you might want to look into Guava's or Apache Commons maps
> (http://stackoverflow.com/a/264591/53974).

Not sure what's the use case at hand, but make sure you can recover
when the key is in the cache, but it's value has been collected.

iulian

>
>
> Best,
> Paolo

>
> > On 24 September 2012 02:23, Paolo G. Giarrusso <p.gia...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >> Il giorno domenica 23 settembre 2012 19:05:27 UTC+2, Eugene Burmako ha
> >> scritto:
> >>>

> >>> Here's some progress: https://github.com/scala/scala/pull/1385
> >>>
> >>> Questions:
> >>>

> >>> 2) What standard data structure would be the best for implementing
> >>> `uniques`? Currently all I can think of is WeakHashMap[Type,
> >>> WeakReference[Type]]. I need the data structure to support a single
> >>> operation: findEntryOrUpdate(x: T), which finds and returns entry by
> >>> x.##, or updates the map with x and returns x.
> >>

> >> You seem to be describing WeakHashSet, which according to Google has been
> >> written many times (in Java&Scala).
> >> I'd add findEntryOrUpdate to
> >> src/reflect/scala/reflect/internal/util/WeakHashSet.scala. Probably you'd
> >> need the same method in FlatHashTable (without exposing it to the world,
> >> unlike existing methods which are part of the public API of HashSet even if
> >> they shouldn't); it seems you could mix addEntry and containsEntry,
> >> following src/reflect/scala/reflect/internal/util/HashSet.scala as a
> >> template.
> >>
> >> Best,
> >> Paolo
> >
> >
>
>
>
> --
> Paolo G. Giarrusso - Ph.D. Student, Philipps-University Marburg
> http://www.informatik.uni-marburg.de/~pgiarrusso/

--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais

Paolo Giarrusso

unread,

Sep 30, 2012, 5:00:53 PM9/30/12

to scala-i...@googlegroups.com

---------- Forwarded message ----------
From: Paolo Giarrusso <p.gia...@gmail.com>
Date: Mon, Sep 24, 2012 at 1:10 PM
Subject: Re: A report on leaks in reflection
To: Eugene Burmako <eugene....@epfl.ch>, Iulian Dragos <jagu...@gmail.com>

On Mon, Sep 24, 2012 at 10:07 AM, Eugene Burmako <eugene....@epfl.ch> wrote:
> Isn't WeakHashSet leaking? It looks like weak references will remain there
> forever, even though their contents will be collected.

I believe you're right, so it seems WeakHashSet does _not_ implement
the intended specification - the one suggested by its name. It should
get fixed, right? Isn't then WeakHashSet a (quite small) disaster
waiting to happen, since it's used (see
568546ef3f1073febc9bfc6baf63eceaf92213b6)? It's used in a perRunCache,
but the commit strongly suggests that at least when it was introduced
the cache was not cleaned often enough, so a working weak hash set is
still required and there is probably a (smaller) memory leak (maybe
restricted to the presentation compiler).

Iulian, since you wrote the code, could you confirm or deny our suspects?

For the existing use case (but not for Eugene's one, I fear) the
safest implementation for WeakHashSet would be to wrap this Java code
[1]:

Set<T> weakHashSet = Collections.newSetFromMap(
new WeakHashMap<T, Boolean>());

[1] http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29

For Eugene's use case, I guess that at this point in the release cycle
your original idea is the safest - this isn't the right time to write
a new implementation for efficiency (if efficiency is needed at all),
so I retract my original suggestion.
WeakHashMap[T, WeakReference[T]] sounds fine for your case. Be careful
not to expose it as a too general utility: if the key and the value
pointed to different values, entries and keys would be leaked when
values are collected; if that were a problem for you (it shouldn't
be), you might want to look into Guava's or Apache Commons maps
(http://stackoverflow.com/a/264591/53974).

Best,
Paolo

> On 24 September 2012 02:23, Paolo G. Giarrusso <p.gia...@gmail.com>
> wrote:
>>
>>
>>
>> Il giorno domenica 23 settembre 2012 19:05:27 UTC+2, Eugene Burmako ha
>> scritto:
>>>

>>> Here's some progress: https://github.com/scala/scala/pull/1385
>>>
>>> Questions:
>>>

>>> 2) What standard data structure would be the best for implementing
>>> `uniques`? Currently all I can think of is WeakHashMap[Type,
>>> WeakReference[Type]]. I need the data structure to support a single
>>> operation: findEntryOrUpdate(x: T), which finds and returns entry by
>>> x.##, or updates the map with x and returns x.
>>

Paolo Giarrusso

unread,

Sep 30, 2012, 5:01:21 PM9/30/12

to scala-i...@googlegroups.com

---------- Forwarded message ----------
From: Paolo Giarrusso <p.gia...@gmail.com>
Date: Tue, Sep 25, 2012 at 7:38 PM
Subject: Re: A report on leaks in reflection

To: iulian dragos <jagu...@gmail.com>
Cc: Eugene Burmako <eugene....@epfl.ch>

Hi Iulian, hi Eugene,
can I forward your messages to scala-internals?
Iulian, do I need to open a ticket against WeakHashSet to fix the
remaining leak or can you take care of that?

On Tue, Sep 25, 2012 at 2:16 PM, iulian dragos <jagu...@gmail.com> wrote:
> Hi Paolo,
>
> On Mon, Sep 24, 2012 at 1:10 PM, Paolo Giarrusso <p.gia...@gmail.com>
> wrote:
>>

>> On Mon, Sep 24, 2012 at 10:07 AM, Eugene Burmako <eugene....@epfl.ch>
>> wrote:
>> > Isn't WeakHashSet leaking? It looks like weak references will remain
>> > there
>> > forever, even though their contents will be collected.
>
>

> I didn't see the previous message, so I don't know if there is any ongoing
> discussion on our implementation of WeakHashSet.

Sorry, not sure why this isn't on scala-internals! I suggested to use
WeakHashSet when Eugene asked:

>>> 2) What standard data structure would be the best for implementing
>>> `uniques`? Currently all I can think of is WeakHashMap[Type,
>>> WeakReference[Type]]. I need the data structure to support a single
>>> operation: findEntryOrUpdate(x: T), which finds and returns entry by
>>> x.##, or updates the map with x and returns x.

>> I believe you're right, so it seems WeakHashSet does _not_ implement
>> the intended specification - the one suggested by its name. It should
>> get fixed, right? Isn't then WeakHashSet a (quite small) disaster
>> waiting to happen, since it's used (see
>> 568546ef3f1073febc9bfc6baf63eceaf92213b6)? It's used in a perRunCache,
>> but the commit strongly suggests that at least when it was introduced
>> the cache was not cleaned often enough, so a working weak hash set is
>> still required and there is probably a (smaller) memory leak (maybe
>> restricted to the presentation compiler).
>
>

> AFAIK it leaks entries in the backing hashset, but they are indeed much
> smaller the original leak. PerRunCaches wasn't working well, I believe it
> does what it says it does, but we still needed the weak set or otherwise a
> single run would need too much memory.
>
>>

>> Iulian, since you wrote the code, could you confirm or deny our suspects?
>>
>> For the existing use case (but not for Eugene's one, I fear) the
>> safest implementation for WeakHashSet would be to wrap this Java code
>> [1]:
>>
>> Set<T> weakHashSet = Collections.newSetFromMap(
>> new WeakHashMap<T, Boolean>());
>>
>> [1]
>> http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29
>>
>

> I totally agree. This is the correct way to do it, I should have read up on
> it before rolling my own.
>
>
>>

>> For Eugene's use case, I guess that at this point in the release cycle
>> your original idea is the safest - this isn't the right time to write
>> a new implementation for efficiency (if efficiency is needed at all),
>> so I retract my original suggestion.
>> WeakHashMap[T, WeakReference[T]] sounds fine for your case. Be careful
>> not to expose it as a too general utility: if the key and the value
>> pointed to different values, entries and keys would be leaked when
>> values are collected; if that were a problem for you (it shouldn't
>> be), you might want to look into Guava's or Apache Commons maps
>> (http://stackoverflow.com/a/264591/53974).
>
>

> Not sure what's the use case at hand, but make sure you can recover when the
> key is in the cache, but it's value has been collected.

Good point, luckily it's not a problem for this use case (since key
and value are always the same).

Best regards,

Reply all

Reply to author

Forward