Converting seqs to vectors & caching lazyseqs

79 views
Skip to first unread message

Garth Sheldon-Coulson

unread,
Aug 1, 2009, 12:37:57 PM8/1/09
to clo...@googlegroups.com
Hi there,

I have a program that produces multidimensional LazySeqs, i.e. seqs of seqs of seqs of...

I would like to write a function to convert these multidimensional LazySeqs to vectors. This is in case a user needs constant lookup time.

The following function will do it:

(defn vectorize [obj]
  (if-not (seq? obj)
    obj
    (vec (map vectorize obj))))

(def foo (list 1 2 3 (list 1 2 4 (list 1 4)) 2))

(vectorize foo)
>> [1 2 3 [1 2 4 [1 4]] 2]

However, since there is no recur there, the function is liable to blow the stack on a very deep data structure (or so I gather from reading about Clojure's lack of TCO).

I cannot figure out how to write the function using recur. Any ideas?

More broadly, this would not be necessary if I could tell my LazySeqs to cache accesses and they cached them using a constant-lookup-time data structure. I've seen mention that LazySeqs cache access, but they don't seem to for me:

user=> (def foo (lazy-seq (range 10000000)))
user=> (time (nth foo 9999999))
"Elapsed time: 655.924286 msecs"
9999999
user=> (time (nth foo 9999999))
"Elapsed time: 675.42013 msecs"
9999999

Maybe I'm misunderstanding what is meant by caching.

Any thoughts on the vectorize function or caching?
Thanks.

Garth

ataggart

unread,
Aug 1, 2009, 5:05:12 PM8/1/09
to Clojure
The results of realizing the seq are cached, but its still a list, so
the lookup is done in linear time. I would guess that the times are
similar because the dominant factor is nth's traversal of the seq, not
range's generation of incremental ints; that or the hotspot hasn't
kicked in.

Meikel Brandmeyer

unread,
Aug 1, 2009, 5:53:34 PM8/1/09
to clo...@googlegroups.com
Hi,

Am 01.08.2009 um 18:37 schrieb Garth Sheldon-Coulson:

> I have a program that produces multidimensional LazySeqs, i.e. seqs
> of seqs of seqs of...
>
> I would like to write a function to convert these multidimensional
> LazySeqs to vectors. This is in case a user needs constant lookup
> time.
>
> The following function will do it:
>
> (defn vectorize [obj]
> (if-not (seq? obj)
> obj
> (vec (map vectorize obj))))
>
> (def foo (list 1 2 3 (list 1 2 4 (list 1 4)) 2))
>
> (vectorize foo)
> >> [1 2 3 [1 2 4 [1 4]] 2]
>
> However, since there is no recur there, the function is liable to
> blow the stack on a very deep data structure (or so I gather from
> reading about Clojure's lack of TCO).
>
> I cannot figure out how to write the function using recur. Any ideas?

This function can obviously be not tail-recursive, because you need
it's return value to do further computations. What you could maybe use
is different "stack".

(defn vectorize
[s]
(loop [s s
v []
stack nil]
(if-let [s (seq s)]
(let [fst (first s)]
(if (seq? fst)
(recur fst [] (conj stack [(next s) v]))
(recur (next s) (conj v fst) stack)))
(if (seq stack)
(let [[s v-s] (peek stack)]
(recur s (conj v-s v) (pop stack)))
v))))

This function runs in constant stack space. It uses a separate list to
store the state needed for the function to continue. This is just what
happens in the call frame on the stack. I'm not sure that this is
better, but it will never cause a stack overflow.

user=> (vectorize '(1 2 (3 4 5) ((6) 7 8) () 9 10))
[1 2 [3 4 5] [[6] 7 8] [] 9 10]

> More broadly, this would not be necessary if I could tell my
> LazySeqs to cache accesses and they cached them using a constant-
> lookup-time data structure. I've seen mention that LazySeqs cache
> access, but they don't seem to for me:

This is what's meant with "caching":

user=> (def x (lazy-seq (cons (do (Thread/sleep 10000) :x) nil)))
#'user/x
user=> (time (first x))
"Elapsed time: 10000.646 msecs"
:x
user=> (time (first x))
"Elapsed time: 0.085 msecs"
:x

They calculate the value of an element only once and then cache the
result.

Hope this helps.

Sincerely
Meikel

Garth Sheldon-Coulson

unread,
Aug 1, 2009, 7:22:06 PM8/1/09
to clo...@googlegroups.com
Wow, that's really good. That's what I was trying to come up with all afternoon. I knew I needed some way to keep track of the function's state, but I couldn't figure it out.

I came up with my own version using Rich's zipper library, traversing the data structure like a tree, but yours is much faster.

Thank you.



More broadly, this would not be necessary if I could tell my LazySeqs to cache accesses and they cached them using a constant-lookup-time data structure. I've seen mention that LazySeqs cache access, but they don't seem to for me:

Reply all
Reply to author
Forward
0 new messages