Deep recursion, continuation passing style, trampolining and memoization

Christian Schuhegger

unread,

Mar 22, 2011, 3:09:53 AM3/22/11

to Clojure

Hello all,

I've implemented the levenshtein measure in clojure and am quite happy
with it except that I run into stack overflows if I hand over long
strings. My current solution looks like this:
-- snip start --
(declare levenshtein-rec)
(declare change-cost)
(defn levenshtein
"Compute Levenshtein distance between 2 strings"
[a b]
(binding [change-cost (fn [x y] (if (= x y) 0 1))
levenshtein-rec (memoize
(fn [x y]
(cond
(empty? x) (count y)
(empty? y) (count x)
:default
(min
(+ 1 (levenshtein-rec (rest x) y)) ; deletion
(+ 1 (levenshtein-rec x (rest y))) ; addition
(+ (change-cost (first x) (first y))
(levenshtein-rec (rest x) (rest y))) ; change
))
)
)
]
(levenshtein-rec a b)))
;;;(levenshtein "kitten" "sitting")
;;;(time (levenshtein (apply str (repeat 300 "kitten ")) (apply str
(repeat 300 "sitting"))))
-- snip end --

Now what I've done to overcome that limitation was to transform the
function from above manually into continuation passing style (CPS),
adapted it to be ready for trampolining (return in certain places
functions instead of evaluating them) and added memoization. The
result is extremely ugly and can be seen here:
https://gist.github.com/880873

I did not look yet into the clojure libraries for automatic CPS
transformation, but I am nearly certain that these libraries will not
take care of adapting the result to be ready for trampolining and
definitely will not take care of memoization.

Would somebody on the list have a proposal on how to create such a
solution simpler and nicer? Especially how would you apply memoization
easily in CPS style?

I've found an article on the web where somebody did something in that
direction in F#:
http://www.patrickdewane.com/2009/02/continuation-passing-style-and-memoization-part-2.html

Many thanks and best regards,
Christian

Tim Webster

unread,

Mar 22, 2011, 9:42:47 AM3/22/11

to Clojure

A few things:
--clojure does not do automatic TCO, so you might want to look at
recur to handle your stack issue
--recur won't work unless you rewrite your function so that the
recursive call is in tail position
--you probably can do that rewrite by passing around the levenshtein
grid as an accumulator (not quite the same, or as hard to figure out,
as cps)
--once you have the simple recursive function with an accumulators for
the grid and the result, wrap that up in a function with only the
parameters you want to expose
--I am not sure this is a great use case for memoize--if your function
is properly recursive it will not take very long to run--but if you
want to use it, wrap the simple-signature function with memoize

On Mar 22, 3:09 am, Christian Schuhegger

> direction in F#:http://www.patrickdewane.com/2009/02/continuation-passing-style-and-m...

Mark Engelberg

unread,

Mar 22, 2011, 12:30:23 PM3/22/11

to clo...@googlegroups.com

I don't believe Tim's comments are correct. Since there are three recursive calls, I don't think there is a straightforward transformation to tail position using an accumulator. Also, due to the interleaved nature of the recursive calls, I would expect memoization to be essential.

So, I think the CPS transformation is in fact what is needed to make this top-down recursive implementation execute without blowing the stack. I looked at Christian's code briefly, and it looks to me that the CPS transformation was well done. Aside from a couple minor stylistic things (e.g., I think most would use [arg1 arg2] rather than (list arg1 arg2)), I don't see any obvious improvements to be made. If anyone sees a better way to make this particular approach work, I'd also be interested in seeing it.

Of course, in most languages the Levenshtein distance is computed using the bottom-up Dynamic Programming strategy of filling in a matrix left-to-right, one row at a time, thus ensuring that all the entries needed to compute a given entry are already available. There is pseudocode for this approach on the Wikipedia, and I think I've seen both a Perl and Scheme solution that improve upon that by storing only one row's worth of data, rather than the full matrix. Clojure certainly would support such a strategy, and in fact, that's probably the better way to solve this problem. One way to implement the DP solution would be to pass the grid in an accumulator, and for such a strategy you wouldn't need memoization, so perhaps that's what Tim had in mind.

But even though DP would be effective here, DP does not work for all recursive problems, and therefore, Christian's original question about how best to get the *recursive formulation* of this algorithm to run under Clojure using a methodical transformation is a valid one.

Chas Emerick

unread,

Mar 22, 2011, 12:45:06 PM3/22/11

to clo...@googlegroups.com

On Mar 22, 2011, at 12:30 PM, Mark Engelberg wrote:

> Of course, in most languages the Levenshtein distance is computed using the bottom-up Dynamic Programming strategy of filling in a matrix left-to-right, one row at a time, thus ensuring that all the entries needed to compute a given entry are already available. There is pseudocode for this approach on the Wikipedia, and I think I've seen both a Perl and Scheme solution that improve upon that by storing only one row's worth of data, rather than the full matrix. Clojure certainly would support such a strategy, and in fact, that's probably the better way to solve this problem. One way to implement the DP solution would be to pass the grid in an accumulator, and for such a strategy you wouldn't need memoization, so perhaps that's what Tim had in mind.

A pretty sizable set of example LD implementations are available here:

http://www.merriampark.com/ld.htm (applet warning! :-/)

The advantage of the single row of data is that you're not limited in the size of strings you can compare -- constructing a full matrix will very large strings will blow your heap in most cases. I implemented a flavour of LD that uses an array instead of a matrix to get around this problem years ago; I *think* it's in apache commons-lang still. In any case, the implementation is here, with some explanatory verbiage:

http://www.merriampark.com/ldjava.htm

- Chas

Jonathan Smith

unread,

Mar 22, 2011, 7:40:18 PM3/22/11

to Clojure

I'm wondering if it wouldn't be better to simply implement it using a
mutable 2d Java array? (the standard imperative implementation).

It wouldn't be a 'purely functional' answer, but the array wouldn't
leak out of the levenshtein-distance function.

On Mar 22, 3:09 am, Christian Schuhegger
<christian.schuheg...@gmail.com> wrote:

> direction in F#:http://www.patrickdewane.com/2009/02/continuation-passing-style-and-m...

Christian Schuhegger

unread,

Mar 23, 2011, 1:32:56 AM3/23/11

to Clojure

Actually Mark is right. My point is not about the Levenshtein
distance. I've even found a quite nice and concise implementation here
on the list a few weeks ago:
"generic (works for any seq) levenshtein distance"
https://groups.google.com/group/clojure/browse_frm/thread/c5da3ac1b6704eda

My point is about what the subject says: deep recursion, CPS,
trampolining and memoization.

My question is: how to do that in idiomatic clojure in general?

Tim Webster

unread,

Mar 23, 2011, 4:20:54 PM3/23/11

to Clojure

@Mark, yeah, I meant using the bottom-up approach and passing that
matrix as the accumulator. (I thought that I remembered seeing such an
implementation in the goopy library, but when I looked today I didn't
find any lev implementation at all.)

@Christian, yes, I missed the point of your question. When I hear the
words "stack overflow" I reach for recur.

On Mar 23, 1:32 am, Christian Schuhegger

<christian.schuheg...@gmail.com> wrote:
> Actually Mark is right. My point is not about the Levenshtein
> distance. I've even found a quite nice and concise implementation here
> on the list a few weeks ago:

> "generic (works for any seq) levenshtein distance"https://groups.google.com/group/clojure/browse_frm/thread/c5da3ac1b67...

Reply all

Reply to author

Forward