Concat implementation without stack overflow

2,421 views
Skip to first unread message

Jon Distad

unread,
Aug 15, 2015, 9:03:05 PM8/15/15
to Clojure Dev
After learning of the left-recursive concat stack overflow gotcha, or LeReConStOG as it is more commonly known, I figured I'd see if I could resolve it. I suspected that the process would be similar to one I hacked together to enable arbitrary-depth tail calls in Wombat-J (https://github.com/jondistad/wombat-j).

More information on the original issue is on Stuart Sierra's blog: http://stuartsierra.com/2015/04/26/clojure-donts-concat

I've posted a gist with the good bits here: https://gist.github.com/jondistad/2a4971fe8948ca2f6ba0

And attached a patch, if you'd like to try it out. It applies cleanly to clojure-1.7.0 and master (as of today anyway).

You can think of Concat as a special Cons, only for sequences. Each Concat cell has a head
and a tail. The head is the first sequence in the concatenation, and the tail is
everything that follows. It could be a vanilla sequence, another Concat, or nil (null).

The basic idea is that if a Concat object is nested within another, they both actually
represent the same depth. So, we can flatten the left-recursive concats out, rather than
expand them recursively.

The basic flattening out (extrusion) algorithm is:

Extrude C:
  let
Head := C.Head
  let
Tail := C.Tail
 
while Head is a Concat object:
    let
Tail := Concat(Head.Tail, Tail)
    let
Head := Head.Head
 
return Concat(Head, Tail)


And then attempting to get a sequence from the head of the extruded Concat:


AttemptSeq C:
 
if Seq(C.Head) is not Null:
   
return Cons(Head.First, Concat(Head.Rest, Tail))
 
else:
   
return Null


And finally the process which gets the real sequence value:

Seq C:
  let
CExt := Extrude(C)
  let S
:= AttemptSeq(CExt)
 
if S is Null: // The head is an empty sequence
   
while S is Null AND CExt.Tail is a Concat object:
      let
CExt := Extrude(CExt.Tail)
      let S
:= AttemptSeq(CExt)
   
if S is Null: // We're either out of sequences, or CExt.Tail is a normal sequence
      let S
:= Seq(SExt.Tail)
 
return S


That last one is a bit awkward, but the general idea is you start with something like
this:

(concat (concat my-seq nil) another-seq)


and extrusion turns it into this:

(concat my-seq (concat nil another-seq))


And if the new Head is not empty or Null, then AttemptSeq gives you this:

(cons (first my-seq) (concat (rest my-seq) (concat nil another-seq)))


But what if we started with this:

(concat (concat () my-seq) another-seq)


Then extrusion gives us:

(concat () (concat my-seq another-seq))


So when we AttemptSeq, we get nil. But, we can ignore the head anyway because empty lists
are the concat identity. So we run extrude over the tail and get

(concat my-seq another-seq) // There was no change this time


And then run AttemptSeq on that and get

(cons (first my-seq) (concat (rest my-seq) another-seq))


The actual code ends up a bit more complicated because we want to cache the sequence
results rather than build everything from scratch every time, but it's essentially the
same.

TL;DR-
Without patch:
user=> (time (count (reduce #(concat (list %2) %1) nil (range 10000000)))) ; right-recursive, so it works
"Elapsed time: 11658.052987 msecs"
10000000
user
=> (time (count (reduce #(concat %1 (list %2)) nil (range 10000000)))) ; BOOM
StackOverflowError   clojure.lang.LazySeq.seq (LazySeq.java:49)

With patch:

user=> (time (count (reduce #(concat (list %2) %1) nil (range 10000000))))
"Elapsed time: 5343.70402 msecs"
10000000
user
=> (time (count (reduce #(concat %1 (list %2)) nil (range 10000000)))) ; Ta-da!
"Elapsed time: 7497.264373 msecs"
10000000


I hope some of you found this interesting. I had fun implementing it!

Questions and comments are encouraged!

Jon
0001-Define-clojure.lang.Concat.patch

Stuart Halloway

unread,
Sep 11, 2015, 9:45:50 AM9/11/15
to cloju...@googlegroups.com
Hi Jon,

Are you aware of other languages that do this?

Stu

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Jon Distad

unread,
Sep 11, 2015, 5:34:29 PM9/11/15
to cloju...@googlegroups.com
Hi Stu!

I'm not aware of any other language that does this. There's a paper about building trampolines in scala which is similar (http://blog.higher-order.com/assets/trampolines.pdf), but its not in the core language.
You received this message because you are subscribed to a topic in the Google Groups "Clojure Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure-dev/ewBuyloeiFs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure-dev...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages