First, I'd like to point out that it seems very rare that one needs to
understand lazy seqs to the level of detail discussed in the rest of
this post. Generally, built in functions don't consume any more than
they need, and in fact the most natural implementations of many
user-created functions don't either. When functions do take more from
a seq than strictly necessary, this cost is hardly ever high enough to
matter.
That said, what you're suggesting goes to the heart of the seq
abstraction, and so I thought would be worth my time to try to
understand exactly what's going on. I find it a bit difficult to
express the concepts involved -- I don't know if this a failure of
vocabulary or general fogginess in my thinking, but I'll try to get
across what I've found. Please excuse the sluggish pace and
verbosity.
Consider a lazy seq that would print as:
(alice bob)
It is true that in order to get the 'rest' of the above seq, you'd
have to determine if there is a "bob" vs. that "alice" is the end of
the seq. But you do NOT have to compute the value of "bob". The fact
that both the 'first' and 'rest' parts of 'lazy-cons' are delayed is
crucial.
To help experiment with these concepts I wrote a function that reports
each step required of a seq you give it:
(defn report-seq [msg coll]
(lazy-cons
(do (println "(first" msg (first coll) ")") (first coll))
(do (println "(rest after" msg (first coll) ")")
(when (rest coll)
(report-seq msg (rest coll))))))
The function can be used like this:
user=> (empty? (rest (report-seq "friend" '(alice bob))))
(first friend alice )
(rest after friend alice )
false
This shows that using the normal builtin seq functions 'rest' and
'empty?', a minimal amount of the seq is consumed. "alice" is
computed and cached. Then the 'rest' is called by our expression, and
this is report as "(rest after friend alice)" But the value "bob" is
not required in order for 'empty?' to make its determination, so "bob"
is not computed and thus no "(first friend bob)" is seen.
> Finally, the workaround. A little background: I was trying to build a
> lazy stack data structure (where entire lazy seqs, incidentally
> involving a filter or (for :when) could be pushed on the front), and
> found that elements kept being unexpectedly evaluated. In particular,
> any element ever at the top of the stack was always evaluated, even if
> it was never popped and was subsequently buried by other elements.
Without seeing your implementation I can't know exactly what was
causing your extra seq consumption. In order to try to understand your
dilemma, I created my own stack API based on your description. Here's
what I came up with:
(defn single-pop [mstk]
(if (rfirst mstk)
(cons (rfirst mstk) (rest mstk))
(rest mstk)))
(defn single-peek [mstk]
(ffirst mstk))
(defn push-seq [mstk coll]
(if (empty? coll)
mstk
(cons coll mstk)))
A populated mstk might look like: ((5) (4 3) (2 1))
An empty mstk is just an empty list or nil. These functions guarantee
that no empty list or nil is ever in the mstk, so 'single-peek' can be
guaranteed to find the next item right where it expects it.
Now lets put a report-seq on an mstk and see when things get
evaluated:
(defn test-stack []
(-> ()
(push-seq (report-seq "" [2 1]))
(push-seq [3])))
We would hope that doing a single-peek on a test-stack would return 3.
If we pop nothing, we really shouldn't see anything demanded from the
report-seq.
user=> (-> (test-stack) (single-peek))
3
Good enough, but that doesn't prove much. So let's pop once, then
peek. This should return the number 2, so we should see a report that
it had to be produced:
user=> (-> (test-stack) (single-pop) (single-peek))
(first 2 )
2
And indeed we do. Note that the "rest after 2" was not reported,
which is good. However it could be that "first 2" was demanded too
early, the moment that 2 was at the top of the stack, instead of
waiting until 'single-peek' was called. To prove this was not that
case, we can pop the 3 off (causing 2 to be at the top of the stack)
and then push something else on instead.
user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-peek))
9
We know 2 must have been at the top of the stack, but none of the
functions involved caused it to be computed. Just to make sure this
isn't an edge-case, let's pop one level deeper.
user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-pop)
(single-pop) (single-peek))
(first 2 )
(rest after 2 )
(first 1 )
1
Popping off the 2 and then doing a peek of course requires that 1 be
computed. But again note that no extra work was done to determine if
1 was the end of the seq or not. Similarly, if we don't actually ask
for the value at the top of the stack, but simply ask if the seq is
empty, the value 1 isn't computed:
user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-pop)
(single-pop) (empty?))
(first 2 )
(rest after 2 )
false
> I wanted to avoid this extra evaluation, without introducing "delay"
> and "force" everywhere, or writing my own versions of the built-in seq
> functions.
Unless I've completely misunderstood your scenario, I think this
demonstrates that the seq abstraction as it currently exists is
sufficient for providing the kind of laziness you need.
--Chouser
Except that I introduce no new object at all. I don't create any seq
or delay object. Every object I used was given to me from the user of
'push-seq' function, so I'm not quite sure what you mean.
> With either workaround, the built-in seq functions are no
> longer directly applicable. I can't call filter directly on an mstk,
> for example.
This would be true of any compound data structure built out of clojure
collections. 'filter', 'map' etc. don't automatically traverse trees,
graphs, or other nested structures. But a function that returns a
lazy seq over an mstk is pretty easy:
(defn mstk-seq [mstk]
(lazy-cons (single-peek mstk) (single-pop mstk)))
I think you'll find this has the same laziness profile as the raw
calls to single-peek and single-pop I presented earlier.
--Chouser