Lazy map over java.util.Iterator skips second item

251 views
Skip to first unread message

Stuart Sierra

unread,
Aug 19, 2008, 12:27:20 PM8/19/08
to Clojure
Not sure what's going on here. It appears that calling map on an
Iterator skips the second item, unless you call seq on the Iterator
first.
-Stuart

user=> (def a (new java.util.ArrayList))
#'user/a
user=> (doseq n (range 1 5) (.add a n))
nil
user=> (seq a)
(1 2 3 4)
user=> (seq (.iterator a))
(1 2 3 4)
user=> (map inc a)
(2 3 4 5)
user=> (map inc (.iterator a))
(2 4 5)
user=> (map inc (seq (.iterator a)))
(2 3 4 5)

Chouser

unread,
Aug 19, 2008, 2:10:24 PM8/19/08
to clo...@googlegroups.com
On Tue, Aug 19, 2008 at 12:27 PM, Stuart Sierra
<the.stua...@gmail.com> wrote:
>
> Not sure what's going on here.

Wow. Scary. After playing with it a bit, it looks like it's because
Java iterators mutate:

user=> (def i (.iterator a))
#'user/i
user=> (seq i)
(1 2 3 4)
user=> (seq i)
nil

Clearly the Clojure seq library isn't expecting this. Besides map
weirdness we also get:

user=> (filter (constantly true) a)
(1 2 3 4)
user=> (filter (constantly true) (.iterator a))
(2 4)
user=> (take 4 a)
(1 2 3 4)
user=> (take 4 (.iterator a))
(1 3 4)

...and so on.

The Clojure seq functions could wrap a single (seq) call around their
collection arguments at the top, and then deal only with that one seq,
but I'm not sure if that's the right solution or not. For example,
here's a version of filter that works:

(defn myfilter
"Returns a lazy seq of the items in coll for which
(pred item) returns true. pred must be free of side-effects."
[pred coll]
(let [s (seq coll)]
(when s
(if (pred (first s))
(lazy-cons (first s) (filter pred (rest s)))
(recur pred (rest s))))))

--Chouser

Stuart Sierra

unread,
Aug 19, 2008, 2:29:43 PM8/19/08
to Clojure
On Aug 19, 2:10 pm, Chouser <chou...@gmail.com> wrote:
> Wow. Scary.  After playing with it a bit, it looks like it's because
> Java iterators mutate:
...
>
> The Clojure seq functions could wrap a single (seq) call around their
> collection arguments at the top, and then deal only with that one seq,
> but I'm not sure if that's the right solution or not.  

I think this has something to do with clojure/lang/IteratorSeq.java,
but I can't figure it out.
In the mean time, a quick-n-dirty fix:

(defn iterator-to-vector [iter]
(loop [coll (vector)]
(if (.hasNext iter)
(recur (conj coll (.next iter)))
coll)))


-Stuart

Chouser

unread,
Aug 19, 2008, 2:41:56 PM8/19/08
to clo...@googlegroups.com
On Tue, Aug 19, 2008 at 2:29 PM, Stuart Sierra
<the.stua...@gmail.com> wrote:
>
> I think this has something to do with clojure/lang/IteratorSeq.java,
> but I can't figure it out.
> In the mean time, a quick-n-dirty fix:
>
> (defn iterator-to-vector [iter]
> (loop [coll (vector)]
> (if (.hasNext iter)
> (recur (conj coll (.next iter)))
> coll)))

Oh, no need for all that. Just wrapping a (seq) around the iterator
works fine as long as you don't use the iterator again, which is a
restriction of your function as well:

user=> (def i (.iterator [1 2 3 4]))
#'user/i
user=> (iterator-to-vector i)
[1 2 3 4]
user=> (iterator-to-vector i)
[]

I think the problem is that first and rest (which are used by map,
filter, take, etc.) automatically take the seq of their arg (RT.java
line 510) before actually getting the first or rest values. As you
suggest, this creates an IteratorSeq. Then when its .first method is
called, it has to advance the underlying Iterator to get that first
value. This is fine as long as you keep using the same IteratorSeq.
However, if you create a new IteratorSeq on the same original
Iterator, you now have two things advancing the Iterator, and chaos
ensues:

user=> (def i (.iterator [1 2 3 4]))
#'user/i
user=> i
clojure.lang.APersistentVector$2@1ef4b2b
user=> (first i)
1
user=> (first i)
2
user=> (first i)
3
user=> (first i)
4

I guess the moral is, don't pass around Iterator objects!

I wonder if the right way to fix this in Clojure is for first and rest
to refuse to work directly on Iterators, forcing you to manually call
(seq i). This would at least allow the seq library to work, although
it still can't save you from yourself if you're insistent enough:

user=> (def i (.iterator [1 2 3 4]))
#'user/i
user=> (first (seq i))
1
user=> (first (seq i))
2

--Chouser

Rich Hickey

unread,
Aug 25, 2008, 4:21:29 PM8/25/08
to Clojure


On Aug 19, 12:27 pm, Stuart Sierra <the.stuart.sie...@gmail.com>
wrote:
Yes, I can't support implicit seq on Iterators or Enumerations as I
have been, due to the fact that code (such as map's) might
incidentally create multiple seqs on the same mutable iterator.

So, I've removed implicit seq support for Iterators and Enumerations,
and added iterator-seq and enumeration-seq explicit constructor
functions.

Note that most collections providing iterators implement Iterable and
thus support seq directly.

Also, FYI, most Java collections such as ArrayList support
construction from collections, and thus one-step initialization from
Clojure data structures, so 'a' above can be created like this:

(def a (java.util.ArrayList. (range 1 5)))

Thanks for the report,

Rich
Reply all
Reply to author
Forward
0 new messages