Cons.count overflows stack (with patch)

5 views
Skip to first unread message

Chouser

unread,
Jan 6, 2009, 11:26:41 PM1/6/09
to clo...@googlegroups.com
As discovered by cmvkk in IRC:

user=> (count (reduce #(cons %2 %) [0] (range 4601)))
java.lang.StackOverflowError (NO_SOURCE_FILE:0)

This is because Cons.count() is recursive via RT.count(). Presumably
it's to allow efficient counting of heterogeneous seqs, like:

user=> (def big-vec (into [] (range 10000000))) ; this is slow
#'user/big-vec
user=> (count (cons :a big-vec)) ; but this is fast
10000001

I've attached a patch that keeps this latter efficiency, but won't
blow the stack for pure chains of Cons's. However, it has
Cons.count() still defer to RT.count() for non-Cons rests. Since I
couldn't find any other class that uses this kind of recursion for
count(), it may be impossible to build a seq that would still cause
Cons.count() to overflow the stack, but I'm not completely certain.

An alternative, of course, would be to do a simple loop as
ASeq.count() does, at the cost of the efficient vector counting
demonstrated above.

If you have comments or questions, don't hesitate or I'll move this to
the issues page!

--Chouser

cons-count.patch

Christian Vest Hansen

unread,
Jan 7, 2009, 2:41:09 AM1/7/09
to clo...@googlegroups.com
On Wed, Jan 7, 2009 at 5:26 AM, Chouser <cho...@gmail.com> wrote:
> As discovered by cmvkk in IRC:
>
> user=> (count (reduce #(cons %2 %) [0] (range 4601)))
> java.lang.StackOverflowError (NO_SOURCE_FILE:0)
>
> This is because Cons.count() is recursive via RT.count(). Presumably
> it's to allow efficient counting of heterogeneous seqs, like:
>
> user=> (def big-vec (into [] (range 10000000))) ; this is slow
> #'user/big-vec
> user=> (count (cons :a big-vec)) ; but this is fast
> 10000001
>
> I've attached a patch that keeps this latter efficiency, but won't
> blow the stack for pure chains of Cons's. However, it has
> Cons.count() still defer to RT.count() for non-Cons rests. Since I
> couldn't find any other class that uses this kind of recursion for
> count(), it may be impossible to build a seq that would still cause
> Cons.count() to overflow the stack, but I'm not completely certain.

We're all allowed to implement our own IPersistentCollections, but if
they break 'count, then that would be a different bug, no?

>
> An alternative, of course, would be to do a simple loop as
> ASeq.count() does, at the cost of the efficient vector counting
> demonstrated above.
>
> If you have comments or questions, don't hesitate or I'll move this to
> the issues page!
>
> --Chouser
>
> >
>



--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Chouser

unread,
Jan 7, 2009, 8:10:05 AM1/7/09
to clo...@googlegroups.com
On Wed, Jan 7, 2009 at 2:41 AM, Christian Vest Hansen
<karma...@gmail.com> wrote:
>
> On Wed, Jan 7, 2009 at 5:26 AM, Chouser <cho...@gmail.com> wrote:
>> Since I couldn't find any other class that uses this kind of
>> recursion for count(), it may be impossible to build a seq that
>> would still cause Cons.count() to overflow the stack, but I'm not
>> completely certain.
>
> We're all allowed to implement our own IPersistentCollections, but if
> they break 'count, then that would be a different bug, no?

Good question.

If your new collection (FooColl) implements seq() by returning a new
kind of Seq (FooColl$Seq) which in turn implements FooColl$Seq.count()
recursively as Cons did, it will break for long chains of FooColl$Seq.
This already strikes me as very unlikely, because it would only be
tempting to do in collections that accept an unmodified ISeq as part
of their contents. But regardless, this would be entirely your own
new code and there's nothing I can do about it.

However, if FooColl$Seq avoids that error by using code like in this
patch, then we would have a more subtle issue. A FooColl$Seq that
includes a Cons which in turn was cons'ed onto a FooColl$Seq that
includes a Cons ... and continues alternating like that for sufficient
depth would defeat this patch's fix. Cons.count() would see that it's
'rest' is not a Cons, and would delegate to RT.count(). FooColl$Seq,
doing the same, would create a mutual recursion situation that could
overflow the stack.

I think we're now firmly off in the weeds far enough to declare "we
can cross that bridge when we get to it," happy in the belief that we
never will. But even so, FooColl$Seq could push things a little
further by checking for Cons *or* FooColl$Seq and avoid recursing into
either one. This would then postpone the problem until someone makes
an alternating chain of FooColl$Seq and BarColl$Seq.

I suppose the right way to handle this would be a signal Interface (or
perhaps just a method?) declaring if a seq's count() is "fast" or not.
Then Cons.count() could check this: if 'rest' is fast, defer to
RT.count(), otherwise inc in it's own loop. This sounds to me like a
lot of work and a big patch for a problem nobody's likely to have.

--Chouser

Christian Vest Hansen

unread,
Jan 7, 2009, 8:43:22 AM1/7/09
to clo...@googlegroups.com
Or those people doing tricksy stuff like this could make a careful and
considered decision to design their Seq types such that they are kinds
of Cons, which, logically, I'd say they are.

But considering your patch, my spidey-senses have also failed to
unearth a way to use this approach to blow the stack.

>
> I suppose the right way to handle this would be a signal Interface (or
> perhaps just a method?) declaring if a seq's count() is "fast" or not.
> Then Cons.count() could check this: if 'rest' is fast, defer to
> RT.count(), otherwise inc in it's own loop. This sounds to me like a
> lot of work and a big patch for a problem nobody's likely to have.

And when it does happen in a couple of years time, we can point to
this discussion and ask the guy to kindly provide a patch :-P

it reminds of a JRE bug that remain undescovered for something like
five years, until someone had the audacity to try and Arrays.sort an
array larger than half of Integer.MAX_VALUE :)
Reply all
Reply to author
Forward
0 new messages