Soft references for memoize?

73 views
Skip to first unread message

Mikera

unread,
Mar 8, 2011, 3:41:21 PM3/8/11
to Clojure Dev
Hi all,

I was looking around in the source for memoize and noticed that it
uses a map with hard references.

This could trip people up - since it makes memoize risky/unsuitable
for situations where the cache might grow very large. It's also a
shame because it precludes the safe use of memoize in what would
otherwise be quite nice use cases (e.g. memory-sensitive caching of
query results in a web application). Since memoize is intended for
referentially transparent functions only, it should always be safe to
discard the cache in the event of memory pressure - which is exactly
the semantics that the Java SoftReference class provides.

Would there be an interest in a lightweight, SoftReference based
replacement for memoize? Happy to contribute one if it would be
useful.

Mike.

Cosmin Stejerean

unread,
Mar 8, 2011, 4:21:27 PM3/8/11
to cloju...@googlegroups.com, Mikera
On Tue, Mar 8, 2011 at 2:41 PM, Mikera <mike_r_...@yahoo.co.uk> wrote:
Would there be an interest in a lightweight, SoftReference based
replacement for memoize? Happy to contribute one if it would be
useful.

+1 on a replacement for memoize that uses soft references.


--
Cosmin Stejerean
http://offbytwo.com

Chas Emerick

unread,
Mar 8, 2011, 4:37:56 PM3/8/11
to cloju...@googlegroups.com

On Mar 8, 2011, at 4:21 PM, Cosmin Stejerean wrote:

On Tue, Mar 8, 2011 at 2:41 PM, Mikera <mike_r_...@yahoo.co.uk> wrote:
Would there be an interest in a lightweight, SoftReference based
replacement for memoize? Happy to contribute one if it would be
useful.

+1 on a replacement for memoize that uses soft references.

David Powell

unread,
Mar 8, 2011, 5:00:29 PM3/8/11
to cloju...@googlegroups.com

I think that memoization tends to be somewhat misunderstood.

My understanding of a general purpose memoize funciton, is that when you call a memoized function, the parameter/results mapping will be stored forever, and when the function is called again with the same parameters, the memoized result can be returned in constant time, without having to re-execute the function.  This isn't a shortcoming of memoize - it is just what memoize is designed to do.

Memoization is sometimes used to take recursive O(2^n) algorithms and convert them to O(x^2) algorithms.  To do this, you don't memoize the function and stick the function in a top-level def - you just ephemerally construct a memoized copy of the function, and arrange for the recursive calls to be made on that copy, discarding the copy when the function finally returns.  For example, the longest-common-subsequence problem [1] can be solved using dynamic programming or it can be solved using memoization.



That isn't to say that a more intelligent, tunable, application-level cache isn't a useful thing to have - it is, and it is probably more useful than memoize.  But they are different things, with different purposes.  Adding soft references to memoize wouldn't improve memoize - it would change it into something else.

That a memoized function keeps input/output pairs around forever isn't a memory leak in memoize - it is a memory leak in whatever application doesn't properly manage the life of the function that memoize returns.  For optimizing recursive algorithms you may need to throw it away, for lazy initialisation applications you probably don't care if the range of inputs is controlled.

-- 
Dave

Justin Balthrop

unread,
Mar 8, 2011, 7:19:14 PM3/8/11
to Clojure Dev
Also worth a read:

http://kotka.de/blog/2010/03/memoize_done_right.html

On Mar 8, 1:37 pm, Chas Emerick <cemer...@snowtide.com> wrote:
> On Mar 8, 2011, at 4:21 PM, Cosmin Stejerean wrote:
>

Mikera

unread,
Mar 9, 2011, 11:33:15 AM3/9/11
to Clojure Dev
On Mar 8, 10:00 pm, David Powell <djpow...@djpowell.net> wrote:
> I think that memoization tends to be somewhat misunderstood.

Almost certainly! I count myself humbly among the number :-)

>
> My understanding of a general purpose memoize funciton, is that when you
> call a memoized function, the parameter/results mapping will be stored
> forever, and when the function is called again with the same parameters, the
> memoized result can be returned in constant time, without having to
> re-execute the function.  This isn't a shortcoming of memoize - it is just
> what memoize is designed to do.

This is fine as a definition, however it fails in a low memory
condition.

The SoftReference alternative has the same semantics except that
instead of
throwing OOM, it discards some or all cached data in preference to
failing.
Some values may then need to be recalculated, but that should be safe
since
your function is meant to be referentially transparent anyway.

>
> Memoization is sometimes used to take recursive O(2^n) algorithms and
> convert them to O(x^2) algorithms.  To do this, you don't memoize the
> function and stick the function in a top-level def - you just ephemerally
> construct a memoized copy of the function, and arrange for the recursive
> calls to be made on that copy, discarding the copy when the function finally
> returns.  For example, the longest-common-subsequence problem [1] can be
> solved using dynamic programming or it can be solved using memoization.
>
> [1]http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
>

Agreed. But a SoftReference solution would also provide the same
performance,
at least until it hits a low memory condition, in which case the
O(x^2)
algorithm would have failed anyway so you can hardly complain....

> That isn't to say that a more intelligent, tunable, application-level cache
> isn't a useful thing to have - it is, and it is probably more useful than
> memoize.  But they are different things, with different purposes.  Adding
> soft references to memoize wouldn't improve memoize - it would change it
> into something else.
>
> That a memoized function keeps input/output pairs around forever isn't a
> memory leak in memoize - it is a memory leak in whatever application doesn't
> properly manage the life of the function that memoize returns.  For
> optimizing recursive algorithms you may need to throw it away, for lazy
> initialisation applications you probably don't care if the range of inputs
> is controlled.

Perhaps we should keep memoize as it is and provide a new function
("cachify"?)
that has the same usage as memoize apart from the memory sensitive
caching
behaviour. That would obviously work.

But then I can't think of a single situation where I would actually
use the
original memoize above cachify. Hence I wonder whether it wouldn't be
better
just to upgrade the original memoize? This would still be consistent
with
how memoize is documented in the Clojure 1.2 API, and would help
prevent less
experienced users of memoize accidentally creating "memory leaks".

Guess it comes down to what you think is the most reasonable default
behaviour.
Reply all
Reply to author
Forward
0 new messages