On a related note, about a month ago, I posted comments about
Clojure's laziness. Rich's response was:
"The argument you've made, however, is theoretical.
I've tried what you said. In fact, it's sitting there in the
implementation right now - just uncomment lazy-seq. cached-seq is
there, as is a non-caching LazySeq class implementing ISeq, and all of
the library functions can be defined in terms of it. I also did that,
and then tried it and some common code.
You should too, and report back your actual experience."
So my blog post has a dual purpose. First, I explain the "gotcha"
that Stuart and I discussed. Second, I report back to the community
about the actual experience I had in the past month, exploring
laziness in Clojure. I decided to blog it rather than post it here
primarily due to its length.
If you're at all interested in laziness, check it out:
http://programming-puzzler.blogspot.com/
> So my blog post has a dual purpose. First, I explain the "gotcha"
> that Stuart and I discussed. Second, I report back to the community
> about the actual experience I had in the past month, exploring
> laziness in Clojure. I decided to blog it rather than post it here
> primarily due to its length.
An interesting analysis. Looking my own applications, I tend to agree
with your observations, but then I also have a similar code base as
you have: almost all of it is purely functional code using Clojure's
basic data structures.
I also agree with your conclusion that the critical point is lazy
sequences whose constructors are not referentially transparent. I
wonder if there is any way to have them detected reliably (at compile
time or at run time), but I expect this can't be done. Another
solution would be to detect lazy sequences guaranteed to be purely
functional because of the exclusive use of safe functions, and make
them uncached by default. That should be doable but perhaps with an
unreasonable effort or a high runtime penalty.
Konrad.
> difference to the consumers of your sequence" is simply untrue. The
> promise of the abstraction is not merely that the nth item/rest will
> be equal - it is that it will be identical. I.e. a seq is persistent
> and immutable. There's nothing wrong with Lisp's list abstraction nor
> its realization as seqs in Clojure.
>
> Making seqs 'non-caching' makes no sense because they are no longer
> the same abstraction. Lazy != ephemeral. Seqs and streams are
> different abstractions.
That makes sense, but I am probably not the only one who has used
lazy sequences to implement streams. It's the closest abstraction
currently available in Clojure.
> I've been holding off on integrating this as it is a fairly
> substantial change (under the hood, no change at all for consumers),
> introduces a new abstraction (though no impact until you use it), and
> might delay 1.0.
>
> Do people want it now?
I'd use it right away in a few places in my code, so for me the
answer is yes. On the other hand, lazy sequences work pretty well for
me as well at the moment. Perhaps the better question is how
important it is to get a version 1.0 out rapidly. Personally I don't
care at all, but I can understand those who do.
Konrad.
I get that Clojure is making a promise of identity here, which is not
possible if things uncached. But I look at this from a pragmatic
standpoint and wonder how much people are relying on that promise? I
notice that I haven't written any code that uses identical? And if I
did, I don't think I'd mind the overhead of explicitly wrapping the
sequence in a call to cache-seq. If people don't need this promise as
a rule, maybe it's okay to rethink the abstraction, and start thinking
of it as just a way to get the "first" and "rest" of a bunch of items,
possibly generated by need, and with no other promises implied.
As to your question, I'd vote for streams now. As you can tell from
my posts, my own code centers around generating long sequences of
things, and the current seq abstraction is just not quite the right
fit. I'd really like to get my hands on streams and see whether they
work better for my needs.
>
> On Thu, Jan 8, 2009 at 5:55 AM, Rich Hickey <richh...@gmail.com>
> wrote:
>> The
>> promise of the abstraction is not merely that the nth item/rest will
>> be equal - it is that it will be identical. I.e. a seq is persistent
>> and immutable.
>
> I get that Clojure is making a promise of identity here, which is not
> possible if things uncached. But I look at this from a pragmatic
> standpoint and wonder how much people are relying on that promise? I
> notice that I haven't written any code that uses identical? And if I
> did, I don't think I'd mind the overhead of explicitly wrapping the
> sequence in a call to cache-seq. If people don't need this promise as
> a rule, maybe it's okay to rethink the abstraction, and start thinking
> of it as just a way to get the "first" and "rest" of a bunch of items,
> possibly generated by need, and with no other promises implied.
I think the real test of non-cached seqs is to swap them in for
regular seqs, rebuild Clojure and some user libs and see what breaks
and why. Then you'll see the dependencies on caching that exist in the
real world. Your purely functional code might not care, other than
perf, but that is not the only user base Clojure and I need to support.
Rich
I tried to do that, swapping lazy-cons in core.clj to use the
commented out code from the lazy-seq macro, but I couldn't get Clojure
to build. Clearly something in the bootstrapping code relied on
lazy-cons working exactly the way it works. So yes, it is clear that
there are currently some dependencies on lazy-cons being cached. What
is less clear is whether those depedencies can be easily isolated and
rewritten to explicitly use a separate caching version of lazy-cons,
leaving the rest of the codebase free to use an uncached variation of
lazy-cons by default. I didn't understand the code well enough to
find the conflict that was preventing the code from building, so I
ended up separating out the uncached variations as a separate library
so I could at least experiment within my own code. But I agree that
rebuilding Clojure in this way would be very enlightening. Maybe I'll
take another stab at it...
Stuart
I know this is nitpicking, but if this is really the promise that the
seq abstraction is supposed to fulfill, then Clojure already violates
this:
(def a (seq [1 2 3]))
(identical? (rest a) (rest a)) yields false
Presumably what you mean is that the nth element should be identical
(but not necessarily the nth rest).
> The sequence abstraction in Clojure is the list abstraction of Lisp.
> No more, no less.
Well, in Lisp, lists are concrete, finite things that are mutable (and
therefore don't provide any kind of guarantee that you'll get the same
element with multiple traversals). So to say that Clojure's sequence
abstraction is no more and no less than the list abstraction of Lisp
is a bit misleading. You have a certain concept of what properties of
this abstraction you want to model, and what promises you want to
keep. And that's great -- honestly, I think it's fantastic that you
have such a clear, coherent design vision that you've maintained
throughout the creation of Clojure. I'm just saying that your
interpretation of what constitutes an abstraction of a Lisp list is
more subjective than your statement implies.
I'm pretty sure my concept of the list abstraction differs from yours.
In my mind the essential abstract concept of a list is anything that
can be explained in this way:
A list has a first and a rest, where first is some kind of element,
and rest is another list or nil.
In other words, I would say a sequence abstraction should apply to
anything that has this sort of nested, self-referential recursive
structure.
There is an elegance to the way that algorithms on self-referential
data structures can be expressed via recursion. To me, a big part of
Clojure's beauty lies in the recognition that so many things can be
converted to this view. Strictly speaking, a vector is not structured
in this way, but you can convert it into something that has this
property, allowing you to program vectors with this sort of recursive
elegance.
When I think about something like the sequence of whole numbers, it
seems like a perfect embodiment of what I consider to be the essential
abstraction of a sequence. You've got a first thing, and the rest
thing is another sequence that has similarity to the original.
Because it has a recursive structure, I want to be able to operate on
it using first and rest. To me, a sequence feels like the "right
abstraction" for the whole numbers, and writing algorithms that
operate on them. And this feels like the right abstraction to me
regardless of implementation details such as whether the traversed
numbers should reside in memory all at once.
> Making seqs 'non-caching' makes no sense because they are no longer
> the same abstraction.
I get that your design vision revolves around the idea that sequences
promise identity, and that there are real technical challenges
surrounding the idea of non-cached sequences. But I don't consider
"non-caching sequences" to be an oxymoron. If it were possible to
implement them in a way that meshed with the rest of Clojure's design,
I think there would be real value to being able to manipulate
non-cached recursive structures with the existing sequence interface.
It's clear you don't think this fits with Clojure's design, but I
haven't yet given up hope of finding a clever way to do this, and
trying to convince you that it is worthwhile :) .
I know that over email, it is all too common for the tone of a post to
be misconstrued. So just in case, I want to emphasize here that these
posts are in no way intended as bashing Clojure or Rich's design
sensibilities. It is precisely because I admire the design, and
already care about Clojure, that I raise these points in the hope that
community discussion can elevate the language even further. I strive
to make my comments as constructive in tone as possible, and I hope
that they are perceived in that light.
I'd certainly make use of lazy-seq if it were public. But assuming
it's not made public, are you thinking right now that the underlying
clojure.lang.LazySeq will eventually go away, or can power users use
it and count on it being there, although not explicitly exposed
through the api?