Memoize improvement

85 views
Skip to first unread message

Stefan Arentz

unread,
Oct 30, 2009, 7:57:48 PM10/30/09
to Clojure

This is some of my first Clojure code so it might not be the
greatest ... yet!

Here is an improved memoize function that also takes a time-to-live
argument. Useful when you want to expire results from the cache.

I'm using it to cache query results from a database. This probably
breaks some rules about functional programming though :-)

(defn expire-cached-results [cached-results time-to-live]
"Expire items from the cached function results."
(into {} (filter (fn [[k v]] (> time-to-live (- (System/
currentTimeMillis) (:time v)))) cached-results)))

(defn my-memoize
"Returns a memoized version of a referentially transparent
function. The
memoized version of the function keeps a cache of the mapping from
arguments
to results and, when calls with the same arguments are repeated
often, has
higher performance at the expense of higher memory use. Cached
results are
removed from the cache when their time to live value expires."
[function time-to-live]
(let [cached-results (atom {})]
(fn [& arguments]
(swap! cached-results expire-cached-results time-to-live)
(println cached-results)
(if-let [entry (find @cached-results arguments)]
(:result (val entry))
(let [result (apply function arguments)]
(swap! cached-results assoc arguments { :result
result :time (System/currentTimeMillis)})
result)))))

Sample usage:

(defn calculation [n]
(println "Doing some bistromath")
(* n 42))

(def foo (my-memoize calculation 5000))

;; First execution
user> (foo 10)
Doing some bistromath
420

;; Executed immediately after first time - cached result
user> (foo 10)
420

;; Executed after 5 seconds - not cached anymore
user> (foo 10)
Doing some bistromath
420


John Harrop

unread,
Oct 31, 2009, 9:17:51 AM10/31/09
to clo...@googlegroups.com
On Fri, Oct 30, 2009 at 7:57 PM, Stefan Arentz <ste...@arentz.ca> wrote:
This is some of my first Clojure code so it might not be the
greatest ... yet!

It's quite interesting.

(defn my-memoize
  "Returns a memoized version of a referentially transparent
function. The
  memoized version of the function keeps a cache of the mapping from
arguments
  to results and, when calls with the same arguments are repeated
often, has
  higher performance at the expense of higher memory use. Cached
results are
  removed from the cache when their time to live value expires."
  [function time-to-live]
  (let [cached-results (atom {})]
    (fn [& arguments]
      (swap! cached-results expire-cached-results time-to-live)
      (println cached-results)
      (if-let [entry (find @cached-results arguments)]
        (:result (val entry))
        (let [result (apply function arguments)]
          (swap! cached-results assoc arguments { :result
result :time (System/currentTimeMillis)})
          result)))))

I'd make a couple of changes, though. First:
 
(defn my-memoize
  "Returns a memoized version of a referentially transparent
   function. The memoized version of the function keeps a cache of
   the mapping from arguments to results and, when calls with the
   same arguments are repeated often, has higher performance at
   the expense of higher memory use. Cached results are removed
   from the cache when their time to live value expires."
  [function time-to-live]
  (let [cached-results (atom {})]
    (fn [& arguments]
      (swap! cached-results expire-cached-results time-to-live)
      (let [result (if-let [r (get @cached-results arguments)]
                     (:result r)
                     (apply function arguments))]
        (swap! cached-results assoc arguments
          {:result result :time (System/currentTimeMillis)})
        result))))

This resets the expiry clock on a result if the same arguments occur again. So instead of a result expiring n seconds after being put in, it only does so if there's an n-second period during which that result isn't used at all.

A more sophisticated change is to make expire-cached-results happen even if the function stops being called for a long time. That requires threading. Most efficient might be to dispense with the {:result result :time time} and just map arguments onto results, but whenever one does an (apply function arguments) one also assoc's the result into the map in the atom AND starts a thread:

(.start (Thread. #(Thread/sleep time-to-live) (swap! cached-results dissoc arguments)))

Poof! After time-to-live that particular result disappears. This reverts to having the result expire n seconds after being put in, though, and it also potentially spawns large numbers of threads, albeit mostly all sleeping at once.

The "result goes away if unused for n seconds" behavior and accumulation of threads can both be nixed, though:

(defn my-memoize
  "Returns a memoized version of a referentially transparent
   function. The memoized version of the function keeps a cache of
   the mapping from arguments to results and, when calls with the
   same arguments are repeated often, has higher performance at
   the expense of higher memory use. Cached results are removed
   from the cache when their time to live value expires."
  [function time-to-live]
  (let [cached-results (ref {})
        last-time (ref (System/currentTimeMillis))
        expiry-queue (ref [])
        process-queue #(when-not (empty? @expiry-queue)
                         (Thread/sleep
                           (:wait (get @cached-results (first @expiry-queue))))
                           (dosync
                             (let [args (first (ensure expiry-queue))
                                   item (get (ensure cached-results) args)
                                   t (- (:time item) 100)]
                               (if (<= t (System/currentTimeMillis))
                                 (alter cached-results dissoc args))
                               (ref-set expiry-queue
                                 (vec (rest @expiry-queue)))))
                         (recur))
        ts-agent (agent nil)]
    (fn [& args]
      (let [result (if-let [r (get @cached-results args)]
                     (:result r)
                     (apply function args))]
        (dosync
          (let [t (+ time-to-live (System/currentTimeMillis))
                l (max (ensure last-time) (System/currentTimeMillis))
                w (- t l)
                q (ensure expiry-queue)]
            (println t w q @cached-results)
            (alter cached-results assoc args
              {:result result :time t :wait w})
            (ref-set last-time t)
            (alter expiry-queue conj args)
            (if (empty? q)
              (send ts-agent (fn [_] (.start (Thread. process-queue)))))))
        result))))

This is tested and works:

user=> (def x (my-memoize #(do (println %) (+ % 10)) 1000))
#'user/x
user=> (doall (for [n [1 2 4 1 2 3 1 2 3 2 3 2 3 2 3 1 4]] (do (Thread/sleep 300) (x n))))
1
1256994331877 999 [] {}
2
1256994332178 301 [(1)] {(1) {:result 11, :time 1256994331877, :wait 999}}
4
1256994332478 300 [(1) (2)] {(2) {:result 12, :time 1256994332178, :wait 301}, (1) {:result 11, :time 1256994331877, :wait 999}}
1256994332778 300 [(1) (2) (4)] {(4) {:result 14, :time 1256994332478, :wait 300}, (2) {:result 12, :time 1256994332178, :wait 301}, (1) {:result 11, :time 1256994331877, :wait 999}}
1256994333078 300 [(2) (4) (1)] {(4) {:result 14, :time 1256994332478, :wait 300}, (2) {:result 12, :time 1256994332178, :wait 301}, (1) {:result 11, :time 1256994332778, :wait 300}}
3
1256994333378 300 [(4) (1) (2)] {(4) {:result 14, :time 1256994332478, :wait 300}, (2) {:result 12, :time 1256994333078, :wait 300}, (1) {:result 11, :time 1256994332778, :wait 300}}
1256994333679 301 [(1) (2) (3)] {(3) {:result 13, :time 1256994333378, :wait 300}, (2) {:result 12, :time 1256994333078, :wait 300}, (1) {:result 11, :time 1256994332778, :wait 300}}
1256994333979 300 [(2) (3) (1)] {(3) {:result 13, :time 1256994333378, :wait 300}, (2) {:result 12, :time 1256994333078, :wait 300}, (1) {:result 11, :time 1256994333679, :wait 301}}
1256994334279 300 [(3) (1) (2)] {(3) {:result 13, :time 1256994333378, :wait 300}, (2) {:result 12, :time 1256994333979, :wait 300}, (1) {:result 11, :time 1256994333679, :wait 301}}
1256994334579 300 [(1) (2) (3)] {(3) {:result 13, :time 1256994334279, :wait 300}, (2) {:result 12, :time 1256994333979, :wait 300}, (1) {:result 11, :time 1256994333679, :wait 301}}
1256994334879 300 [(2) (3) (2)] {(3) {:result 13, :time 1256994334279, :wait 300}, (2) {:result 12, :time 1256994334579, :wait 300}}
1256994335179 300 [(3) (2) (3)] {(3) {:result 13, :time 1256994334879, :wait 300}, (2) {:result 12, :time 1256994334579, :wait 300}}
1256994335479 300 [(2) (3) (2)] {(3) {:result 13, :time 1256994334879, :wait 300}, (2) {:result 12, :time 1256994335179, :wait 300}}
1256994335779 300 [(3) (2) (3)] {(3) {:result 13, :time 1256994335479, :wait 300}, (2) {:result 12, :time 1256994335179, :wait 300}}
1256994336079 300 [(2) (3) (2)] {(3) {:result 13, :time 1256994335479, :wait 300}, (2) {:result 12, :time 1256994335779, :wait 300}}
1
1256994336379 300 [(3) (2) (3)] {(3) {:result 13, :time 1256994336079, :wait 300}, (2) {:result 12, :time 1256994335779, :wait 300}}
4
1256994336679 300 [(2) (3) (1)] {(1) {:result 11, :time 1256994336379, :wait 300}, (3) {:result 13, :time 1256994336079, :wait 300}, (2) {:result 12, :time 1256994335779, :wait 300}}
(11 12 14 11 12 13 11 12 13 12 13 12 13 12 13 11 14)

You can see how the expiry queue is consumed at one end by the expiry thread and appended to at the other end. The shortest time until the next item might expire is stored in the :wait key, and the thread peeks at the queue, then sleeps that long before popping the queue. If the queue is empty it quits, but if the queue is empty the enqueueing of a new item (re)starts it. The thread start is launched via an empty agent so that it is only started if the transaction in the memoized fn succeeds. When the expiry thread consumes a queue item it checks if the expiry time has actually expired or gotten bumped, and only in the former case does it remove from the results cache. As a result, the println in the memoized fn only executes for 1 and 4 early on, and then after both have gone unused for long enough -- otherwise, the expiry time keeps getting bumped. (Each bump appends another copy to the end of the queue.) The (- (:time item) 100) is to allow a little slop in the timing (you'll notice wait keys getting off-by-one numbers like 999 and 301 in the transcript above).

Actually, the above 35 lines of code (I didn't count the doc-string lines) supplies a very general expiring-cache mechanism that might be very useful in general. As usual, I offer it into the public domain. (You'll want to get rid of the println before using it in production code though.)

Hopefully the above also serves as a nice example of the use of refs and agents, and also how closures allow one to encapsulate an entire little ref-world in a self-contained package.

John Harrop

unread,
Oct 31, 2009, 9:34:57 AM10/31/09
to clo...@googlegroups.com
Actually, the code I posted might behave less well if the timing is less uniform. It's better to store wait times with queue entries rather than with cache entries:

(defn my-memoize
  "Returns a memoized version of a referentially transparent
   function. The memoized version of the function keeps a cache of
   the mapping from arguments to results and, when calls with the
   same arguments are repeated often, has higher performance at
   the expense of higher memory use. Cached results are removed
   from the cache when their time to live value expires."
  [function time-to-live]
  (let [cached-results (ref {})
        last-time (ref (System/currentTimeMillis))
        expiry-queue (ref [])
        process-queue #(when-not (empty? @expiry-queue)
                         (Thread/sleep
                           (:wait (first @expiry-queue)))
                           (dosync
                             (let [args (:item (first (ensure expiry-queue)))
                                   item (get (ensure cached-results) args)
                                   t (- (:time item) 100)]
                               (if (<= t (System/currentTimeMillis))
                                 (alter cached-results dissoc args))
                               (ref-set expiry-queue
                                 (vec (rest @expiry-queue)))))
                         (recur))
        ts-agent (agent nil)]
    (fn [& args]
      (let [result (if-let [r (get @cached-results args)]
                     (:result r)
                     (apply function args))]
        (dosync
          (let [t (+ time-to-live (System/currentTimeMillis))
                l (max (ensure last-time) (System/currentTimeMillis))
                w (- t l)
                q (ensure expiry-queue)]
            (alter cached-results assoc args
              {:result result :time t})
            (ref-set last-time t)
            (alter expiry-queue conj {:item args :wait w})
            (if (empty? q)
              (send ts-agent (fn [_] (.start (Thread. process-queue)))))))
        result))))

This version also drops the println.

user=> (def x (my-memoize #(do (println %) (+ % 10)) 1000))
#'user/x
user=> (doall (for [n [1 2 4 1 2 3 1 2 3 2 3 2 3 2 3 1 4]] (do (Thread/sleep 300) (x n))))
1
2
4 ; pause
3 ; long pause
1
4
Reply all
Reply to author
Forward
0 new messages