After learning of the left-recursive concat stack overflow gotcha, or LeReConStOG as it is more commonly known, I figured I'd see if I could resolve it. I suspected that the process would be similar to one I hacked together to enable arbitrary-depth tail calls in Wombat-J (
).
And attached a patch, if you'd like to try it out. It applies cleanly to clojure-1.7.0 and master (as of today anyway).
You can think of Concat as a special Cons, only for sequences. Each Concat cell has a head
and a tail. The head is the first sequence in the concatenation, and the tail is
everything that follows. It could be a vanilla sequence, another Concat, or nil (null).
The basic idea is that if a Concat object is nested within another, they both actually
represent the same depth. So, we can flatten the left-recursive concats out, rather than
expand them recursively.
The basic flattening out (extrusion) algorithm is:
Extrude C:
let Head := C.Head
let Tail := C.Tail
while Head is a Concat object:
let Tail := Concat(Head.Tail, Tail)
let Head := Head.Head
return Concat(Head, Tail)
And then attempting to get a sequence from the head of the extruded Concat:
AttemptSeq C:
if Seq(C.Head) is not Null:
return Cons(Head.First, Concat(Head.Rest, Tail))
else:
return Null
And finally the process which gets the real sequence value:
Seq C:
let CExt := Extrude(C)
let S := AttemptSeq(CExt)
if S is Null: // The head is an empty sequence
while S is Null AND CExt.Tail is a Concat object:
let CExt := Extrude(CExt.Tail)
let S := AttemptSeq(CExt)
if S is Null: // We're either out of sequences, or CExt.Tail is a normal sequence
let S := Seq(SExt.Tail)
return S
That last one is a bit awkward, but the general idea is you start with something like
this:
(concat (concat my-seq nil) another-seq)
and extrusion turns it into this:
(concat my-seq (concat nil another-seq))
And if the new Head is not empty or Null, then AttemptSeq gives you this:
(cons (first my-seq) (concat (rest my-seq) (concat nil another-seq)))
But what if we started with this:
(concat (concat () my-seq) another-seq)
Then extrusion gives us:
(concat () (concat my-seq another-seq))
So when we AttemptSeq, we get nil. But, we can ignore the head anyway because empty lists
are the concat identity. So we run extrude over the tail and get
(concat my-seq another-seq) // There was no change this time
And then run AttemptSeq on that and get
(cons (first my-seq) (concat (rest my-seq) another-seq))
The actual code ends up a bit more complicated because we want to cache the sequence
results rather than build everything from scratch every time, but it's essentially the
same.
TL;DR-
Without patch:
user=> (time (count (reduce #(concat (list %2) %1) nil (range 10000000)))) ; right-recursive, so it works
"Elapsed time: 11658.052987 msecs"
10000000
user=> (time (count (reduce #(concat %1 (list %2)) nil (range 10000000)))) ; BOOM
StackOverflowError clojure.lang.LazySeq.seq (LazySeq.java:49)
With patch:
user=> (time (count (reduce #(concat (list %2) %1) nil (range 10000000))))
"Elapsed time: 5343.70402 msecs"
10000000
user=> (time (count (reduce #(concat %1 (list %2)) nil (range 10000000)))) ; Ta-da!
"Elapsed time: 7497.264373 msecs"
10000000
I hope some of you found this interesting. I had fun implementing it!
Questions and comments are encouraged!
Jon