bounded memoize

380 views
Skip to first unread message

Eugen Dück

unread,
Mar 7, 2010, 11:31:44 PM3/7/10
to Clojure
In many real applications (I guess that rules out fibonacci), memoize
will consume a lot of memory and create OutOfMemoryErrors sooner or
later. Often you want to only keep the latest items in the cache and
forget about older ones.

I've seen a variant of memoize that evicts items based on a time-to-
live bound in this group (http://groups.google.com/group/clojure/
browse_thread/thread/402c489f805ade94/ ).

And here's a variant that evicts elements when the size of the cache
exceeds some limit. In that case, the first item that was put in it
will be dissoc'ed. I'm using an array-map to accomplish this:

(defn bounded-memoize
[f bound]
(let [mem (atom (array-map))]
(fn [& args]
(if-let [e (find @mem args)]
(val e)
(let [ret (apply f args)]
(swap! mem #(let [new-mem (assoc % args ret)]
(if (> (count new-mem) bound)
(dissoc new-mem (first (first new-mem)))
new-mem)))
ret)))))

Michał Marczyk

unread,
Mar 8, 2010, 9:22:50 PM3/8/10
to clo...@googlegroups.com
On 8 March 2010 05:31, Eugen Dück <eu...@dueck.org> wrote:
> And here's a variant that evicts elements when the size of the cache
> exceeds some limit. In that case, the first item that was put in it
> will be dissoc'ed. I'm using an array-map to accomplish this:

I don't think this will work as you expect it to. There are two reasons:

(1) Array maps are transparently converted to hash maps when they grow
beyond a certain size:

user> (class (array-map :a 1 :b 2 :c 3 :d 4 :e 5 :f 6 :g 7 :h 8))
clojure.lang.PersistentArrayMap
user> (class (assoc (array-map :a 1 :b 2 :c 3 :d 4 :e 5 :f 6 :g 7 :h 8) :i 9))
clojure.lang.PersistentArrayMap
user> (class (assoc (array-map :a 1 :b 2 :c 3 :d 4 :e 5 :f 6 :g 7 :h
8) :i 9 :j 10))
clojure.lang.PersistentHashMap

(2) More importantly, if you dissoc the first key from an array map,
then assoc a new key onto the array map, the newly assoc key will take
the first slot:

user> (first (assoc (dissoc (array-map :a 1 :b 2 :c 3) :a) :d 4))
[:d 4]

I'd suggest a vector instead; they're countable in constant time and
you can use, say, conj and rest for add to end of queue / eject from
front of queue.

The idea is certainly a good one, though, so with the above mentioned
issues fixed, this will be a nice utility function. Thanks for
sharing!

Sincerely,
Michał

André Ferreira

unread,
Mar 8, 2010, 10:00:26 PM3/8/10
to Clojure
On 8 mar, 23:22, Michał Marczyk <michal.marc...@gmail.com> wrote:
> I'd suggest a vector instead; they're countable in constant time and
> you can use, say, conj and rest for add to end of queue / eject from
> front of queue.

Conj adds to the end of a vector in constant time, but rest will not
return another vector, but a sequence. Converting that sequence into
another vector will take O(n). If you want queueing behaviour, you
should use a clojure.lang.PersistentQueue:
(-> clojure.lang.PersistentQueue/EMPTY (conj 5) (conj 7) pop (conj 8)
peek)

Michał Marczyk

unread,
Mar 8, 2010, 10:20:34 PM3/8/10
to clo...@googlegroups.com
2010/3/9 André Ferreira <grei...@gmail.com>:

> Conj adds to the end of a vector in constant time, but rest will not
> return another vector, but a sequence. Converting that sequence into
> another vector will take O(n). If you want queueing behaviour, you
> should use a clojure.lang.PersistentQueue:
> (-> clojure.lang.PersistentQueue/EMPTY (conj 5) (conj 7) pop (conj 8)
> peek)

rest is definately a bad choice, thanks for pointing that out! In this
particular case, it could be replaced with subvec (called with two
arguments: (subvec v 1)) which the docs describe as being "O(1) and
very fast", but since it is precisely the semantics of a queue that I
had in mind, your suggestion is probably better.

Sincerely,
Michał

Eugen Dück

unread,
Mar 8, 2010, 10:41:13 PM3/8/10
to Clojure
Good points! Testing array-map briefly led me to believe they can be
used as the clojure equivalent of Java\s LinkedHashMaps.

Here's a version that uses a vector to remember order of insertion - I
guess I have to use refs and transactions now:

(defn bounded-memoize
[f bound]
(let [mem (ref {})
v (ref [])]


(fn [& args]
(if-let [e (find @mem args)]
(val e)
(let [ret (apply f args)]

(dosync
(when (= (count @v) bound)
(alter mem dissoc (first @v))
(alter v subvec 1))
(alter mem assoc args ret)
(alter v conj args))
ret)))))

Haven't looked at clojure's queues yet, they might make the code more
concise, but by looking at that other post, they don't seem to be
exposed in a clojurey way (using a java class name).

Meikel Brandmeyer

unread,
Mar 9, 2010, 2:51:48 AM3/9/10
to Clojure
Hi,

On Mar 9, 4:41 am, Eugen Dück <eu...@dueck.org> wrote:
> Good points! Testing array-map briefly led me to believe they can be
> used as the clojure equivalent of Java\s LinkedHashMaps.
>
> Here's a version that uses a vector to remember order of insertion - I
> guess I have to use refs and transactions now:
>
> (defn bounded-memoize
>   [f bound]
>   (let [mem (ref {})
>         v (ref [])]
>     (fn [& args]
>       (if-let [e (find @mem args)]
>         (val e)
>         (let [ret (apply f args)]
>           (dosync
>            (when (= (count @v) bound)
>              (alter mem dissoc (first @v))
>              (alter v subvec 1))
>            (alter mem assoc args ret)
>            (alter v conj args))
>           ret)))))
>
> Haven't looked at clojure's queues yet, they might make the code more
> concise, but by looking at that other post, they don't seem to be
> exposed in a clojurey way (using a java class name).

How about generalising memoize to allow different strategies?

(defn bound-cache-strategy
"Implements a bound cache strategy for memoize. At most bound number
of
argument lists are kept in the cache. They are dropped in order of
insertion."
[bound]
[find
(let [values (ref clojure.lang.PersistentQueue/EMPTY)]
(fn [cache args]
(alter values conj args)
(if (> (count @values) bound)
(do
(alter values pop)
(dissoc cache args))
cache)))])

(defn lru-cache-strategy
"Implements LRU cache strategy for memoize. At most bound number of
argument lists are kept in the cache. They are dropped in LRU
order."
[bound]
(let [values (ref {})]
[(fn lru-cache-strategy-lookup
[cache args]
(when-let [e (find cache args)]
(let [now (System/currentTimeMillis)]
(dosync (alter values assoc args now))
e)))
(fn lru-cache-strategy-update
[cache args]
(let [now (System/currentTimeMillis)
k (min-key @values (keys @values))]
(alter values dissoc k)
(alter values assoc args now)
(dissoc cache k)))]))

(defn most-used-cache-strategy
"Implements MU cache strategy for memoize. At most bound number of
argument lists are kept in the cache. They are dropped in LU order.
In case elements have the same usage count, the order of drop is
unspecified."
[bound]
(let [values (ref {})]
[(fn most-used-cache-strategy-lookup
[cache args]
(when-let [e (find cache args)]
(dosync (alter values update-in [args] inc))
e))
(fn most-used-cache-strategy-update
[cache args]
(let [k (min-key @values (keys @values))]
(alter values dissoc k)
(alter values assoc args 1)
(dissoc cache k)))]))

(defn memoize
"Returns a memoized version of a referentially transparent function.
The
memoized version of the function keeps a cache of the mapping from
arguments
to results and, when calls with the same arguments are repeated
often, has
higher performance at the expense of higher memory use.

Optionally a cache strategy might be supplied. A strategy is pair of
functions:
- one for accessing the cache, returns the map entry on success or
nil
(cf. find)
- one, which takes the cache and the argument vector and might
modify
the cache.
Possible implementation could be a bounded cache or a LRU strategy.
Default is a naive strategy keeping all encountered argument lists
forever.
The cache update function is called in a transaction, the cache
lookup
function not necessarily so."
([f] (memoize f [find (fn [c _] c)]))
([f [cache-lookup cache-update]]
(let [cache (ref {})]
(fn [& args]
(if-let [e (cache-lookup @cache args)]
(val e)
(dosync
(if-let [e (cache-lookup @cache args)]
(val e)
(let [result (apply f args)]
(alter cache assoc args result)
(alter cache cache-update args)
result))))))))

Usage Examples:

(memoize f)
(memoize f (bound-cache-strategy 5))
(memoize f (lru-cache-strategy 5))
(memoize f (most-used-cache-strategy 5))

(Note: I have no clue about whether lru is well implemented or mu
makes sense. Just some examples...)

Sincerely
Meikel

Meikel Brandmeyer

unread,
Mar 9, 2010, 3:57:29 AM3/9/10
to Clojure
Woops. And of course there should be the check for bound in the update
functions of the lru and mu strategies...

(defn lru-cache-strategy
"Implements LRU cache strategy for memoize. At most bound number of
argument lists are kept in the cache. They are dropped in LRU
order."
[bound]
(let [values (ref {})]
[(fn lru-cache-strategy-lookup
[cache args]
(when-let [e (find cache args)]
(let [now (System/currentTimeMillis)]
(dosync (alter values assoc args now))
e)))
(fn lru-cache-strategy-update
[cache args]

(let [now (System/currentTimeMillis)]


(alter values assoc args now)

(if (> (count@values) bound)


(let [k (min-key @values (keys @values))]
(alter values dissoc k)

(dissoc cache k))
cache))))]))

(defn most-used-cache-strategy
"Implements MU cache strategy for memoize. At most bound number of
argument lists are kept in the cache. They are dropped in LU order.
In case elements have the same usage count, the order of drop is
unspecified."
[bound]
(let [values (ref {})]
[(fn most-used-cache-strategy-lookup
[cache args]
(when-let [e (find cache args)]
(dosync (alter values update-in [args] inc))
e))
(fn most-used-cache-strategy-update
[cache args]

(alter values assoc args 1)

(if (> (count @values) bound)

(let [k (min-key @values (keys @values))]
(alter values dissoc k)

(dissoc cache k))
cache))]))

Meikel Brandmeyer

unread,
Mar 9, 2010, 6:06:35 AM3/9/10
to Clojure
Hi again,

On Mar 9, 8:51 am, Meikel Brandmeyer <m...@kotka.de> wrote:

>            (dissoc cache args))

And of course this is also wrong. Bleh. I will clean this up and do a
short blog post this evening...

Sincerely
Meikel

Eugen Dück

unread,
Mar 9, 2010, 12:28:02 PM3/9/10
to Clojure
I totally agree, having different eviction strategies would be great
and having memoizers with max capacities in clojure.contrib would
probably be useful to a lot of developers.

Also, a memory sensitive memoizer (using soft references?) would be
nice. :)

Meikel Brandmeyer

unread,
Mar 9, 2010, 5:17:10 PM3/9/10
to Clojure
As threatened here a writeup. For this thread the Summary section is
probably most interesting.

http://kotka.de/blog/2010/03/The_Rule_of_Three.html#summary

Sincerely
Meikel

Michał Marczyk

unread,
Mar 9, 2010, 6:22:15 PM3/9/10
to clo...@googlegroups.com

In the way of early feedback -- that's looks super neat! I've got this
instant feeling that this would be a great clojure.contrib.memoize.
:-)

Sincerely,
Michał

Steve Purcell

unread,
Mar 10, 2010, 5:33:36 AM3/10/10
to clo...@googlegroups.com
On 9 Mar 2010, at 23:22, Michał Marczyk wrote:

> In the way of early feedback -- that's looks super neat! I've got this
> instant feeling that this would be a great clojure.contrib.memoize.


+1

That would be wonderful.

Laurent PETIT

unread,
Mar 10, 2010, 5:45:54 AM3/10/10
to clo...@googlegroups.com
2010/3/10 Steve Purcell <st...@sanityinc.com>:

Well, in the way of early feedback too (alas not much time to argument
in length), there are some points which annoy me :

* usage of refs : I had a bad feeling, and cgrand confirmed this to
me by pointing an even more interesting counter-argument. Me: using
refs is not mandatory since you do not need to synchronize this change
with anything else. Christophe: And by using refs, you synchronize the
change with a potential uber STM transaction, and if this uber STM
transaction retries, you will not benefit from the memoization, since
the memoization itself will be discarded by the retry.
* lookup function responsibilities: I cannot (yet) offer a better
way to approach the problem, but I have a bad feeling with the lookup
function changing things "behind the scene".

My 0.02 euros,

--
Laurent

Meikel Brandmeyer

unread,
Mar 11, 2010, 2:42:49 AM3/11/10
to Clojure
Hello Laurent,

On Mar 10, 11:45 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:

> * usage of refs : I had a bad feeling, and cgrand confirmed this to
> me by pointing an even more interesting counter-argument. Me: using
> refs is not mandatory since you do not need to synchronize this change
> with anything else.

I don't think, that this is entirely true! You have to syncronise the
cache with the state in the strategy. This can be done with atoms only
if the cache and the strategy state are contained in the same atom and
all updates happen in a single call to swap! (or reset!). As soon as
you
have multiple calls, you need transactions again, because the atom
might
change between the two calls. And you can't swap! and return a result
at
the same time.

Say you have the LRU strategy. You deref the cache, find the result
cached. Now you swap! in the new access time into your state. But
between the deref and the swap! another call might trigger the removal
of the entry. So you always have to guard against the atom suddenly
changing underneath - even if it's only one atom.

Something what annoys me more is that the computation may be fired of
several times. Since the f is supposedly expensive, I'd rather avoid
that.

> * lookup function responsibilities: I cannot (yet) offer a better
> way to approach the problem, but I have a bad feeling with the lookup
> function changing things "behind the scene".

Can you explain a little more on what you think is the problem? Things
like LRU need some bookkeeping on access. Otherwise one just does not
have the required information at hand and the strategy is not
possible.
So I think, you mean something else or you consider LRU a bad
strategy.
I have no experience with the different strategies, but I tend to
believe the first possibility.

If this is true you refer to the TTL strategy, where the lookup
function
is removing old entries. I also don't really like this. What are the
possibilities?

* removal in cache-update, may cause items to linger quite long in the
cache depending on the call structure, possibly much longer than the
TTL, still needs special handling in cache-lookup or out-of-date
entries will be returned
* removal in cache-lookup, happens (possibly) much more often
* a dedicated thread, which is created when a new entry is done. Then
sleeps for it's TTL and then wakes and removes the entry. This will
however potentially create a lot of threads just sleeping around. I
don't know how expensive this is.
* a cron like thread, which sleeps to the next removal time, removes
the out-of-date item for that point, sleeps again, and so on. This
would be only one thread, but a more difficult implementation and
requires cross-thread syncronisation.

The cache-lookup provided the most bang for the buck if you want to
keep
it simple.

> Christophe: And by using refs, you synchronize the
> change with a potential uber STM transaction, and if this uber STM
> transaction retries, you will not benefit from the memoization, since
> the memoization itself will be discarded by the retry.

This is a problem the STM cannot solve. We would need some
dont-nest-me-dosync, which does not merge with a possible
surrounding dosync. (This was actually part of the famous
discussion between Rich and Cliff Click: How to define the
borders of a transaction region?)

I gave it another run and came up with another solution. Even more
hairy. :P

* a warm cache is fast
* a cache miss is potentially slow, but the missing value is computed
only
once, multiple requests for the same item wait for the initially
triggered
computation
* the cache can be updated (almost) in parallel for different items

(defn memoize
([f] (memoize f naive-strategy))
([f [cache-lookup cache-update]]
(let [cache-state (atom {})]
(fn [& args]
(if-let [e (cache-lookup cache-state args)]
@(val e)
@(locking cache-state
(if-let [e (cache-lookup cache-state args)]
(val e)
(let [result (promise)]
(.start (Thread. #(deliver result (apply f args))))
(swap! cache-state assoc-in [:cache args] result)
(cache-update cache-state args)
result))))))))

But with this you can't reliably implement strategies like LRU and the
like.

But this is really an interesting problem. I'll mull a little more
about it. :)

Sincerely
Meikel

Christophe Grand

unread,
Mar 12, 2010, 2:27:15 PM3/12/10
to clo...@googlegroups.com
Hi Meikel,

Since Laurent dragged me into the discussion:

On Thu, Mar 11, 2010 at 8:42 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
Hello Laurent,

On Mar 10, 11:45 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:

>  * usage of refs : I had a bad feeling, and cgrand confirmed this to
> me by pointing an even more interesting counter-argument. Me: using
> refs is not mandatory since you do not need to synchronize this change
> with anything else.

I don't think, that this is entirely true! You have to syncronise the
cache with the state in the strategy. This can be done with atoms only
if the cache and the strategy state are contained in the same atom and
all updates happen in a single call to swap! (or reset!). As soon as
you
have multiple calls, you need transactions again, because the atom
might
change between the two calls. And you can't swap! and return a result
at
the same time.

Say you have the LRU strategy. You deref the cache, find the result
cached. Now you swap! in the new access time into your state. But
between the deref and the swap! another call might trigger the removal
of the entry. So you always have to guard against the atom suddenly
changing underneath - even if it's only one atom.

I agree.
 
Something what annoys me more is that the computation may be fired of
several times. Since the f is supposedly expensive, I'd rather avoid
that.

See my memoize5: the call isn't computed inside the swap!s
 

Since you use a lock I think some clever combination of memoized functions can create a deadlock.
 
But this is really an interesting problem. I'll mull a little more
about it. :)

I agree, it's interesting and here is my entry:
http://gist.github.com/330644

Christophe

Eugen Dück

unread,
Mar 12, 2010, 9:28:10 PM3/12/10
to Clojure
Laurent, Meikel, Christophe,

I guess I must be missing something obvious, but can't we just put
more than one thing into an atom in order to get atomic behavior?
Using, say, a vector. Using the simple bounded memoizer as an example,
this looks to me like it works:

(defn bounded-memoize
[f capacity]
(let [mem (atom [{} []])]
(fn [& args]
(if-let [e (find (first @mem) args)]


(val e)
(let [ret (apply f args)]

(swap! mem #(let [cache (assoc (first %) args ret)
v (conj (second %) args)]
(if (> (count v) capacity)
[(dissoc cache (first v)) (subvec v 1)]
[cache v])))
ret)))))

And if this works, it should be applicable to the strategy-ed
memoizers, too.

Eugen

Christophe Grand

unread,
Mar 13, 2010, 2:51:42 AM3/13/10
to clo...@googlegroups.com
Hi Eugen,

On Sat, Mar 13, 2010 at 3:28 AM, Eugen Dück <eu...@dueck.org> wrote:
I guess I must be missing something obvious, but can't we just put
more than one thing into an atom in order to get atomic behavior?

My variations on memoize use a single atom: your bounded-memoize id roughly equivalent to my memoize2 + fifo-strategy, see http://gist.github.com/330644#LID19.
 

Using, say, a vector. Using the simple bounded memoizer as an example,
this looks to me like it works:

(defn bounded-memoize
 [f capacity]
 (let [mem (atom [{} []])]
   (fn [& args]
     (if-let [e (find (first @mem) args)]
       (val e)
       (let [ret (apply f args)]
         (swap! mem #(let [cache (assoc (first %) args ret)
                           v (conj (second %) args)]
                       (if (> (count v) capacity)
                         [(dissoc cache (first v)) (subvec v 1)]
                         [cache v])))
         ret)))))

Two quick notes:
* a PersistentQueue is a better fit there (conj at one end, pop/peek at the other) than a vector,
* by padding it to the desired length (capacity) with dummy items, you are sure that the queue (or the vector) is always full, so you always have to remove an item: no more need for the (> (count v) capacity) test.
 
 
And if this works, it should be applicable to the strategy-ed
memoizers, too.

You implementation suffers from the same problems that my memoize2 (and they become worst in memoize3) which are caused by using the atom twice (once when reading, once when swapping).
Say you have:
(def g (bounded-memoize identity 3))
(g 0) (g 1) (g 2)

so at this point the value in mem is [{(0) 0, (1) 1, (2) 2} [0 1 2]].
Now, concurrently, two threads call (g 3), each one see that 3 isn't memoized yet and swap! the new entry in.
After the first swap! mem holds [{(1) 1, (2) 2 (3) 3} [1 2 3]], which is expected.
However after the second swap! mem holds [{(2) 2 (3) 3} [2 3 3]].

That's why Meikel tests twice if the value is already in the cache (the first time outside of a transactio, the second time inside) and why I introduce hit-or-assoc in memoize4/memoize5.

Have a nice week-end!

Christophe

Meikel Brandmeyer

unread,
Mar 13, 2010, 4:51:38 PM3/13/10
to clo...@googlegroups.com
Hello Christophe,

On Fri, Mar 12, 2010 at 08:27:15PM +0100, Christophe Grand wrote:

> See my memoize5: the call isn't computed inside the swap!s

That doesn't mean, that it is not computed several times!

user=> (defn f
[x]
(println "Got" x "from" (Thread/currentThread))
(Thread/sleep 5000)
(case x
3 (f 2)
2 (f 1)
1 :done))
#'user/f
user=> (def f (memoize5 f))
#'user/f
user=> (-> #(do (f 3) (println "Done for" (Thread/currentThread))) Thread. .start)
(Thread/sleep 2500)
(-> #(do (f 3) (println "Done for" (Thread/currentThread))) Thread. .start)
Got 3 from #<Thread Thread[Thread-2,5,main]>
Got 3 from #<Thread Thread[Thread-3,5,main]>
Got 2 from #<Thread Thread[Thread-2,5,main]>
Got 2 from #<Thread Thread[Thread-3,5,main]>
Got 1 from #<Thread Thread[Thread-2,5,main]>
Got 1 from #<Thread Thread[Thread-3,5,main]>
Done for #<Thread Thread[Thread-2,5,main]>
Done for #<Thread Thread[Thread-3,5,main]>

Hmm? Wasn't f supposed to memoized? The problem is, that the call to f
is not guarded.

> Since you use a lock I think some clever combination of memoized functions
> can create a deadlock.

No. In the protected area we simply create a promise and fire off a
thread, which does the computation. Then we return immediatelly. So no
deadlock possibility here. However we trade one lock for the „other“
(the promise). What is the possibility of a deadlock here? Well, the
computation of f never completes. But this is not a problem of memoize.
The only way memoize could cause a problem here is that the computation
of f somehow calls f again with the same arguments. Then it would
deadlock on the promise, but without memoize we would also have a
infinite loop here...

Sincerely
Meikel

Christophe Grand

unread,
Mar 13, 2010, 7:26:10 PM3/13/10
to clo...@googlegroups.com
Hi Meikel,


On Sat, Mar 13, 2010 at 10:51 PM, Meikel Brandmeyer <m...@kotka.de> wrote:
On Fri, Mar 12, 2010 at 08:27:15PM +0100, Christophe Grand wrote:

> See my memoize5: the call isn't computed inside the swap!s

That doesn't mean, that it is not computed several times!


I agree: it can be concurrently computed several times (but a given thread would only compute it at most once).
I'm sory: I read too quickly and didn't understand that you were concerned about the computation happening only once for all threads.
 


> Since you use a lock I think some clever combination of memoized functions
> can create a deadlock.

No. In the protected area we simply create a promise and fire off a
thread, which does the computation. Then we return immediatelly. So no
deadlock possibility here. However we trade one lock for the „other“
(the promise). What is the possibility of a deadlock here? Well, the
computation of f never completes. But this is not a problem of memoize.
The only way memoize could cause a problem here is that the computation
of f somehow calls f again with the same arguments. Then it would
deadlock on the promise, but without memoize we would also have a
infinite loop here...


You're right of course. I apologize.
As a minor note: couldn't a future replace your promise? Or couldn't you get rid of the other thread and deliver the promise in the same thread but outside of the (locking ..) form?
Hmm or even a delay!?

Ok I tried to used delays. So I slightly modified my memoize5 (see memoize6) and I built on that a memoize7 which enforces (through delays) that a value is onlycomputed once (except if the strategy previously removed it from the cache of course).

Do you see a problem in my latest implementation? http://gist.github.com/330644#LID153

Thank you,

Christophe

Eugen Dück

unread,
Mar 13, 2010, 9:57:03 PM3/13/10
to Clojure
On Mar 13, 4:51 pm, Christophe Grand <christo...@cgrand.net> wrote:
> My variations on memoize use a single atom: your bounded-memoize id roughly
> equivalent to my memoize2 + fifo-strategy, seehttp://gist.github.com/330644#LID19.

I finally found the time to fully read your gist, and I see you are
indeed doing the same there. It's just a little less obvious due to
the indirection introduced by strategies.

> Two quick notes:
> * a PersistentQueue is a better fit there (conj at one end, pop/peek at the
> other) than a vector,
> * by padding it to the desired length (capacity) with dummy items, you are
> sure that the queue (or the vector) is always full, so you always have to
> remove an item: no more need for the (> (count v) capacity) test.

Agree, this simplifies the code a bit, and once we reach full
capacity, the "if" will always be true anyway.

> That's why Meikel tests twice if the value is already in the cache (the
> first time outside of a transactio, the second time inside) and why I
> introduce hit-or-assoc in memoize4/memoize5.

Still, with bad luck or just enough concurrency, simultaneous calls to
memoize6 with the same args can result in multiple computations, as
the computation (apply f args) is done outside of the swap:

(let [ret (if-let [e (find (cached @mem) args)] (val e) (apply f
args))
m (swap! mem hit-or-assoc args ret)]

This gap shrinks when using memoize7 thanks to the use of delays, but
it is not completely closed and can still lead to multiple delays of
the same computation. If we want to get rid off this gap and make it
truely atomic, we have to make the decision whether or not to compute
inside the swap! block.

My memoizer with the hard-coded fifo strategy thus turns into
something like this:

(defn bounded-memoize
[f capacity]
(let [mem (atom [{} []])]
(fn [& args]

(deref ((first
(swap! mem #(if (contains? (first %) args)
%
(let [new-cache (assoc (first %) args (delay (apply f args)))
new-v (conj (second %) args)]
(if (> (count new-v) capacity)
[(dissoc new-cache (first new-v)) (subvec new-v 1)]
[new-cache new-v])))))
args)))))

Using a padded queue (and getting rid of the anonymous % param names),
it can be (slightly) simplified to:

(defn bounded-memoize
[f capacity]
(let [mem (atom [{} (into clojure.lang.PersistentQueue/EMPTY (repeat
capacity :dummy))])]
(fn [& args]
(deref ((first
(swap! mem (fn [[m q :as cache]]
(if (contains? m args)
cache
[(dissoc (assoc m args (delay (apply f args))) (peek q))
(-> q pop (conj args))]))))
args)))))

Eugen Dück

unread,
Mar 13, 2010, 10:01:49 PM3/13/10
to Clojure
On Mar 14, 11:57 am, Eugen Dück <eu...@dueck.org> wrote:

> This gap shrinks when using memoize7 thanks to the use of delays, but
> it is not completely closed and can still lead to multiple delays of
> the same computation. If we want to get rid off this gap and make it

Actually, I take that back. The delay might be created multiple times,
but only one of them will be deref'ed (multiple times).

Eugen Dück

unread,
Mar 14, 2010, 12:51:25 AM3/14/10
to Clojure
Hi Christophe,

your fifo-strategy (the one that uses "identity" as the hit method)
does not work:

user=> (def g (memoize7 identity (fifo-strategy 3)))
#'user/g
user=> (g 1)
1
user=> (g 1)
java.lang.IllegalArgumentException: Wrong number of args passed to:
core$identity (NO_SOURCE_FILE:0)

You have to use something like #(first %&) instead of identity.

memoize7 looks fine to me in terms of avoiding extra computation. I'm
not sure though why memoize6 exists as a fn in its own right. I guess
you wouldn't use it without something like delays, as the whole point
of memoize is to avoid multiple costly computations of the same thing,
and memoize6 does not always do that, unless you give it a fn f that
takes care of it, as memoize7 does.

If you don't rely on memoize6, you can write memoize7 even more
compact than memoize6, like this:

(defn memoize7-variant
([f] (memoize7-variant f [{} identity (fn [mem args] mem) assoc]))
([f [init cached hit assoc]]
(let [mem (atom init)]
(fn [& args]
(deref ((cached
(swap! mem (fn [mem]
(if (contains? (cached mem) args)
(hit mem args)
(assoc mem args (delay (apply f args)))))))
args))))))

It's more compact, as we do the contains? (or in your original
version: "find") check only once inside the swap! function. This has
the (admittedly not really huge) added benefit that we don't create
multiple delays in the case of "bad luck", as mentioned in my earlier
post.

Any downsides to this?

Cheers
Eugen

Meikel Brandmeyer

unread,
Mar 14, 2010, 4:42:02 AM3/14/10
to clo...@googlegroups.com
Hi,

On Sun, Mar 14, 2010 at 01:26:10AM +0100, Christophe Grand wrote:

> I agree: it can be concurrently computed several times (but a given thread
> would only compute it at most once).

I must confess I was a little insisting on a point which has probably
only little impact in the real life. But a) I prefer to have correct
solution and b) it was a nice exercise. :)

> You're right of course. I apologize.

^^^^^^^^^
Huh? What for? This was a really enlightening discussion! Who would have
thought, that memoize is so non-trivial. :)

> As a minor note: couldn't a future replace your promise? Or couldn't you get
> rid of the other thread and deliver the promise in the same thread but
> outside of the (locking ..) form?

The problem with the future is, that it starts immediately. A similar
issue is with the same thread: the behaviour after the swap! must be
identical to hits and misses. So I needed the locking to really start
the computation only once.

> Hmm or even a delay!?

And of course, this is the solution to get rid of the locking! I should
have thought of delay much earlier.

> Do you see a problem in my latest implementation?
> http://gist.github.com/330644#LID153

No. This one looks fine.

Christophe, Eugen! I will summarise this discussion in blog post. It
really shows, that concurrent programming is not trivial. Not even for
„trivial“ things like a memoised function.

Sincerely
Meikel

Eugen Dück

unread,
Mar 14, 2010, 6:32:11 AM3/14/10
to Clojure
On Mar 14, 5:42 pm, Meikel Brandmeyer <m...@kotka.de> wrote:
> really shows, that concurrent programming is not trivial. Not even for
> „trivial“ things like a memoised function.

True. It's not too hard to be correct, but being correct and
performant at the same time is a different issue...

Thinking about your earlier comment:

> discussion between Rich and Cliff Click: How to define the
> borders of a transaction region?)

That's what packing more than one value into a single atom kind of
gives us, although in a limited way. This is one of my take-aways from
this discussion. But you can't mix and match any arbitrary reference
into such a "transaction", like with refs.

Cheers
Eugen

Christophe Grand

unread,
Mar 14, 2010, 6:59:42 AM3/14/10
to clo...@googlegroups.com
Hi Eugen!


On Sun, Mar 14, 2010 at 6:51 AM, Eugen Dück <eu...@dueck.org> wrote:
your fifo-strategy (the one that uses "identity" as the hit method)
does not work:

user=> (def g (memoize7 identity (fifo-strategy 3)))
#'user/g
user=> (g 1)
1
user=> (g 1)
java.lang.IllegalArgumentException: Wrong number of args passed to:
core$identity (NO_SOURCE_FILE:0)

You have to use something like #(first %&) instead of identity.

Yeah, you're right: I only tested with lru-strategy and forgot to replace identity by (fn [mem args] mem) (like for the default strategy)
 

(defn memoize7-variant
 ([f] (memoize7-variant f [{} identity (fn [mem args] mem) assoc]))
 ([f [init cached hit assoc]]
 (let [mem (atom init)]
   (fn [& args]
     (deref ((cached
              (swap! mem (fn [mem]
                           (if (contains? (cached mem) args)
                            (hit mem args)
                            (assoc mem args (delay (apply f args)))))))
             args))))))

It's more compact, as we do the contains? (or in your original
version: "find") check only once inside the swap! function. This has
the (admittedly not really huge) added benefit that we don't create
multiple delays in the case of "bad luck", as mentioned in my earlier
post.

Any downsides to this?

Well the fn passed to swap! can be retried so in case of "bad luck" you'll still create several delays.
However your version is interesting in that, while evolving one memoize into another, I didn't realize that the first read wasn't needed anymore (now that we have delays).

I updated my gist http://gist.github.com/330644

Thanks,

Christophe

Christophe Grand

unread,
Mar 14, 2010, 7:43:57 AM3/14/10
to clo...@googlegroups.com
Hi Meikel,

On Sun, Mar 14, 2010 at 9:42 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
> I agree: it can be concurrently computed several times (but a given thread
> would only compute it at most once).

On Sun, Mar 14, 2010 at 01:26:10AM +0100, Christophe Grand wrote:
I must confess I was a little insisting on a point which has probably
only little impact in the real life. But a) I prefer to have correct
solution and b) it was a nice exercise. :)

Better safe than sorry :-)

 

> You're right of course. I apologize.
                           ^^^^^^^^^
Huh? What for?

For not reading your code with enough scrutiny.

 
This was a really enlightening discussion!

I concur: I really enjoyed this conversation.

 

> As a minor note: couldn't a future replace your promise? Or couldn't you get
> rid of the other thread and deliver the promise in the same thread but
> outside of the (locking ..) form?

The problem with the future is, that it starts immediately.

I don't see where the problem is with a future as long as you are in the locking form.

 

Christophe, Eugen! I will summarise this discussion in blog post. It
really shows, that concurrent programming is not trivial. Not even for
„trivial“ things like a memoised function.

Great! Looking forward to read it!

Christophe

Eugen Dück

unread,
Mar 14, 2010, 9:06:08 AM3/14/10
to Clojure
On Mar 14, 7:59 pm, Christophe Grand <christo...@cgrand.net> wrote:
> Well the fn passed to swap! can be retried so in case of "bad luck" you'll
> still create several delays.

You're absolutely right!

I wonder whether we can make any statement about the likeliness of
this happening in memoize6/7 vs. memoize7-variant or your memoize8,
although it's unlikely to occur in all scenarios so far that I've
wanted to use memoize in.

But nevertheless, it got me thinking: Does swap! always finish every
retry it does? So it won't stop in the middle because it discovered
that someone updated the atom in the meantime, correct? I guess It
will only be discovered by the final compare-and-set.

Answering myself, after checking Atom.java: Yes, that's how it works.

So the likeliness of multiple delays being created might actually have
increased in the latest memoize versions, as the "transaction" now can
take longer, in case it has to do the delay :) As mentioned, this
potential difference is probably of minor importance in most cases,
more important is to keep the fact that retries can happen in mind...

Thanks Christophe, and also Meikel, Michał, Laurent, André for all the
input. That was very useful!
Eugen

Christophe Grand

unread,
Mar 14, 2010, 9:37:45 AM3/14/10
to clo...@googlegroups.com
On Sun, Mar 14, 2010 at 2:06 PM, Eugen Dück <eu...@dueck.org> wrote:
I wonder whether we can make any statement about the likeliness of
this happening in memoize6/7 vs. memoize7-variant or your memoize8,
although it's unlikely to occur in all scenarios so far that I've
wanted to use memoize in.

Anyway delay is cheap: it creates an object (the dealy itself) and a closure.
user=> (time (dotimes [_ 1e10] (delay (+ 2 2))))
"Elapsed time: 1552.607315 msecs"
nil
user=> (time (dotimes [_ 1e10] [#(+ 2 2)]))
"Elapsed time: 1734.104963 msecs"
nil

I don't think it's worth worrying about where the delay is created.

Christophe

Meikel Brandmeyer

unread,
Mar 16, 2010, 3:13:06 AM3/16/10
to Clojure
Hi,

as threatened previously I wrote a summery blog post of this
discussion: http://kotka.de/blog/2010/03/memoize_done_right.html.
Please let me know in case I missed something crucial.

I'm also working on a protocol/type based implementation here:
http://paste.pocoo.org/show/190179. More to come this evening.

Sincerely
Meikel

Meikel Brandmeyer

unread,
Mar 17, 2010, 4:12:46 AM3/17/10
to Clojure
Hi,

On Mar 16, 8:13 am, Meikel Brandmeyer <m...@kotka.de> wrote:

> as threatened previously I wrote a summery blog post of this
> discussion: http://kotka.de/blog/2010/03/memoize_done_right.html.
> Please let me know in case I missed something crucial.

And updated with a protocol/type version for bleeding edge:
http://kotka.de/blog/2010/03/memoize_done_right.html#protocols

Sincerely
Meikel

Meikel Brandmeyer

unread,
Mar 17, 2010, 6:22:53 AM3/17/10
to Clojure
Pfff... ttl strategy still broken...
Reply all
Reply to author
Forward
0 new messages