Questions & workaround regarding Clojure, true laziness, and nil-terminated sequences

12 views
Skip to first unread message

Jason

unread,
Dec 24, 2008, 2:45:38 AM12/24/08
to Clojure
Hi all,

I've been trying to incorporate lazy seqs into my project, but kept
running into troubles where things were evaluated that I didn't think
should be. After thinking a bit and looking into the Clojure source
for the first time, I think I've realized the source of the issue: the
empty seq == nil. This means that every "lazy" function or macro that
returns a seq -- for, lazy-cat, filter, etc. -- must always do enough
pre-emptive evaluation to determine whether the return value should be
nil or an object. So, my first question is, is this analysis
correct? If not, please enlighten me (and either way, maybe consider
devoting a FAQ question to this...).

Now, assuming this is correct, my follow-up question is: why do things
this way (rather than using "empty?" to tell if a seq is empty, like a
collection)? I'm very new to the community and don't mean to suggest
that I know better, I'm just curious about the rationale since I'm
sure it will shed more light on the language for me.

Finally, the workaround. A little background: I was trying to build a
lazy stack data structure (where entire lazy seqs, incidentally
involving a filter or (for :when) could be pushed on the front), and
found that elements kept being unexpectedly evaluated. In particular,
any element ever at the top of the stack was always evaluated, even if
it was never popped and was subsequently buried by other elements.
I wanted to avoid this extra evaluation, without introducing "delay"
and "force" everywhere, or writing my own versions of the built-in seq
functions.

To this end, I designed a new Java class that creates a truly lazy
*collection* from a seq-able expression. Since collections don't have
the same contract as seqs, it does not need to preemptively evaluate
arguments to determine if it is empty, and won't do so until actually
needed (e.g., you call seq on the collection). You can download the
Java class [1], and use it with the following Clojure wrapper:

(defmacro delay-seq "Create a collection representing this seq-able
expr, *really* without evaluating it until needed."
[expr] `(DelayedSeq. (fn [] (seq ~expr))))

Then, if you want an operation to be *really* lazy, you just wrap it
with a call to delay-seq. For instance, (delay-seq (concat .....))
will never evaluate the concat call until actually needed. Note that
the result of delay-seq is a seq-able collection, and no corresponding
force call is ever needed.

[1] http://w01fe.com/?attachment_id=48

Comments about this are very welcome; this was my first voyage into
the internals of Clojure, and I still have lots to learn.

Cheers, Jason

Chouser

unread,
Jan 4, 2009, 1:21:58 AM1/4/09
to clo...@googlegroups.com
On Wed, Dec 24, 2008 at 2:45 AM, Jason <jaw...@berkeley.edu> wrote:
>
> I've been trying to incorporate lazy seqs into my project, but kept
> running into troubles where things were evaluated that I didn't think
> should be. After thinking a bit and looking into the Clojure source
> for the first time, I think I've realized the source of the issue: the
> empty seq == nil. This means that every "lazy" function or macro that
> returns a seq -- for, lazy-cat, filter, etc. -- must always do enough
> pre-emptive evaluation to determine whether the return value should be
> nil or an object. So, my first question is, is this analysis
> correct? If not, please enlighten me (and either way, maybe consider
> devoting a FAQ question to this...).

First, I'd like to point out that it seems very rare that one needs to
understand lazy seqs to the level of detail discussed in the rest of
this post. Generally, built in functions don't consume any more than
they need, and in fact the most natural implementations of many
user-created functions don't either. When functions do take more from
a seq than strictly necessary, this cost is hardly ever high enough to
matter.

That said, what you're suggesting goes to the heart of the seq
abstraction, and so I thought would be worth my time to try to
understand exactly what's going on. I find it a bit difficult to
express the concepts involved -- I don't know if this a failure of
vocabulary or general fogginess in my thinking, but I'll try to get
across what I've found. Please excuse the sluggish pace and
verbosity.

Consider a lazy seq that would print as:

(alice bob)

It is true that in order to get the 'rest' of the above seq, you'd
have to determine if there is a "bob" vs. that "alice" is the end of
the seq. But you do NOT have to compute the value of "bob". The fact
that both the 'first' and 'rest' parts of 'lazy-cons' are delayed is
crucial.

To help experiment with these concepts I wrote a function that reports
each step required of a seq you give it:

(defn report-seq [msg coll]
(lazy-cons
(do (println "(first" msg (first coll) ")") (first coll))
(do (println "(rest after" msg (first coll) ")")
(when (rest coll)
(report-seq msg (rest coll))))))

The function can be used like this:

user=> (empty? (rest (report-seq "friend" '(alice bob))))
(first friend alice )
(rest after friend alice )
false

This shows that using the normal builtin seq functions 'rest' and
'empty?', a minimal amount of the seq is consumed. "alice" is
computed and cached. Then the 'rest' is called by our expression, and
this is report as "(rest after friend alice)" But the value "bob" is
not required in order for 'empty?' to make its determination, so "bob"
is not computed and thus no "(first friend bob)" is seen.

> Finally, the workaround. A little background: I was trying to build a
> lazy stack data structure (where entire lazy seqs, incidentally
> involving a filter or (for :when) could be pushed on the front), and
> found that elements kept being unexpectedly evaluated. In particular,
> any element ever at the top of the stack was always evaluated, even if
> it was never popped and was subsequently buried by other elements.

Without seeing your implementation I can't know exactly what was
causing your extra seq consumption. In order to try to understand your
dilemma, I created my own stack API based on your description. Here's
what I came up with:

(defn single-pop [mstk]
(if (rfirst mstk)
(cons (rfirst mstk) (rest mstk))
(rest mstk)))

(defn single-peek [mstk]
(ffirst mstk))

(defn push-seq [mstk coll]
(if (empty? coll)
mstk
(cons coll mstk)))

A populated mstk might look like: ((5) (4 3) (2 1))
An empty mstk is just an empty list or nil. These functions guarantee
that no empty list or nil is ever in the mstk, so 'single-peek' can be
guaranteed to find the next item right where it expects it.

Now lets put a report-seq on an mstk and see when things get
evaluated:

(defn test-stack []
(-> ()
(push-seq (report-seq "" [2 1]))
(push-seq [3])))

We would hope that doing a single-peek on a test-stack would return 3.
If we pop nothing, we really shouldn't see anything demanded from the
report-seq.

user=> (-> (test-stack) (single-peek))
3

Good enough, but that doesn't prove much. So let's pop once, then
peek. This should return the number 2, so we should see a report that
it had to be produced:

user=> (-> (test-stack) (single-pop) (single-peek))
(first 2 )
2

And indeed we do. Note that the "rest after 2" was not reported,
which is good. However it could be that "first 2" was demanded too
early, the moment that 2 was at the top of the stack, instead of
waiting until 'single-peek' was called. To prove this was not that
case, we can pop the 3 off (causing 2 to be at the top of the stack)
and then push something else on instead.

user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-peek))
9

We know 2 must have been at the top of the stack, but none of the
functions involved caused it to be computed. Just to make sure this
isn't an edge-case, let's pop one level deeper.

user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-pop)
(single-pop) (single-peek))
(first 2 )
(rest after 2 )
(first 1 )
1

Popping off the 2 and then doing a peek of course requires that 1 be
computed. But again note that no extra work was done to determine if
1 was the end of the seq or not. Similarly, if we don't actually ask
for the value at the top of the stack, but simply ask if the seq is
empty, the value 1 isn't computed:

user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-pop)
(single-pop) (empty?))
(first 2 )
(rest after 2 )
false

> I wanted to avoid this extra evaluation, without introducing "delay"
> and "force" everywhere, or writing my own versions of the built-in seq
> functions.

Unless I've completely misunderstood your scenario, I think this
demonstrates that the seq abstraction as it currently exists is
sufficient for providing the kind of laziness you need.

--Chouser

Jason Wolfe

unread,
Jan 4, 2009, 5:51:25 PM1/4/09
to Clojure
Thanks a lot for your very detailed response Chouser! report-seq is a
very useful way to look at this stuff.

> ...
> Unless I've completely misunderstood your scenario, I think this
> demonstrates that the seq abstraction as it currently exists is
> sufficient for providing the kind of laziness you need.

That does satisfy my requirements as a lazy stack data structure.
However, it does so by adding an inner level of cons objects; this is
more or less equivalent to just calling (delay) on each level of the
seq and then having to explicitly (force) each time an element is
extracted. With either workaround, the built-in seq functions are no
longer directly applicable. I can't call filter directly on an mstk,
for example.

This isn't a big deal, I was mostly just curious why the seq
abstraction was set up this way.

Thanks, Jason

Chouser

unread,
Jan 4, 2009, 7:47:20 PM1/4/09
to clo...@googlegroups.com
On Sun, Jan 4, 2009 at 5:51 PM, Jason Wolfe <jaw...@berkeley.edu> wrote:
>
> That does satisfy my requirements as a lazy stack data structure.
> However, it does so by adding an inner level of cons objects; this is
> more or less equivalent to just calling (delay) on each level of the
> seq and then having to explicitly (force) each time an element is
> extracted.

Except that I introduce no new object at all. I don't create any seq
or delay object. Every object I used was given to me from the user of
'push-seq' function, so I'm not quite sure what you mean.

> With either workaround, the built-in seq functions are no
> longer directly applicable. I can't call filter directly on an mstk,
> for example.

This would be true of any compound data structure built out of clojure
collections. 'filter', 'map' etc. don't automatically traverse trees,
graphs, or other nested structures. But a function that returns a
lazy seq over an mstk is pretty easy:

(defn mstk-seq [mstk]
(lazy-cons (single-peek mstk) (single-pop mstk)))

I think you'll find this has the same laziness profile as the raw
calls to single-peek and single-pop I presented earlier.

--Chouser

Reply all
Reply to author
Forward
0 new messages