Is this behavior due to some artifact of destructuring I'm not aware
of (or something else I'm missing), or is there a bug? If it sounds
like a bug, can anyone else reproduce?
Thanks!
Right now I can't see how loop can be made to support both cases.
Hopefully someone else will.
Lets macro-expand Christophe's example with both the old loop and the
new loop.
(loop [[x & xs] s
y xs]
...)
So with the old loop we get this:
(loop*
[G__13702 s
G__13703 xs]
(let [[x & xs] G__13702
y G__13703]
...))
See the problem? "xs" is used before it's defined.
The new loop uses the outer-let to get around this:
(let [G__13697 s
[x & xs] G__13697
y xs]
(loop* [G__13697 G__13697
y y]
(let [[x & xs] G__13697
y y]
...)))
What initially occurs to me is to move the outer loop into loop*'s vector:
(loop*
[G__13702 s
G__13703 (let [[x & xs] G__13702] xs)]
(let [[x & xs] G__13702
y G__13703]
x))
A problem with that is we're going to have to put in a destructuring let
of all the previous arguments for each loop* argument, so it'll blow up
in size pretty quickly and I guess the JVM won't optimize the unused
bindings away as it can't be sure the (first s) and (rest s) that [[x &
xs] s] expands into are side-effect free.
I understand the pragmatism of your approach, but it's really
unfortunate. Seqs are a really convenient abstraction, and the ability
to model arbitrarily large or infinite ones (with laziness) is really
useful. In my opinion, only using seqs when all of the data can be fit
into memory really undermines the value of the abstraction (by
narrowing its usages so severely), and also makes laziness far less
useful (except possibly as a way to amortize costs over time, rather
than as a way to model infinite things).
This path has been well-tread, but the danger of hanging on to the
head of the list is due to the caching behavior of lazy seqs, which is
important for consistency - otherwise, walking the same seq twice
might result in different results.
As with most engineering efforts, there are trade-offs, but I've been
willing to accept the extra caution I need to employ when dealing with
lazy seqs. I've run into a few of these kinds of bugs over time, and
I'm guessing it's generally because in my uses, I'm dealing with
millions of records, and far more data than I can fit in memory. I'm
not sure that this indicates that seqs are the wrong tool in this
instance (as you seem to say), but the answer isn't clear to me.
In Clojure's early days, I complained about this and described some of
my own experiments with uncached sequences. Rich said he was
experimenting with another model for uncached iterator-like
constructs, I think he called them streams. As far as I know, none of
that has ever made it into Clojure. So I still feel there's a need
here that eventually needs to be addressed.
Clojure's built-in "range" function (last time I looked) essentially
produces an uncached sequence. And that makes a lot of sense.
Producing the next value in a range on-demand is way more efficient
and practical than caching those values. I think that Clojure
programmers should have an easy way to make similarly uncached
sequences if that's what they really want/need. (Well, obviously you
can drop down into Java and use some of the same tricks that range
uses, but I mean it should be easy to do this from within Clojure).
The new loop uses the outer-let to get around this:
(let [G__13697 s
[x & xs] G__13697
y xs]
(loop* [G__13697 G__13697
y y]
(let [[x & xs] G__13697
y y]
...)))
Well, in truth, there's a way to fix the loop macro but it is too
ugly: the idea is to wrap the outer let in a closure and replace the
loop by a function, thus we could benefit from the locals clearing on
tail call.
The real solution would be pervasive locals clearing but I seem to
remember Rich saying he'd like to delay such work until clojure in
clojure.
> What's the procedure for creating a ticket for this? Is it at least
> acknowledged that this IS a bug?
It's better to wait for Rich's opinion on this problem before creating a ticket.
'range' has since changed and now produces a chunked lazy seq
(master branch post-1.0).
> Producing the next value in a range on-demand is way more efficient
> and practical than caching those values. I think that Clojure
> programmers should have an easy way to make similarly uncached
> sequences if that's what they really want/need.
This can be done by implementing the ISeq interface, today with
proxy, in the future with newnew/reify/deftype/etc.
(defn incs [i]
(proxy [clojure.lang.ISeq] []
(seq [] this)
(first [] i)
(next [] (incs (inc i)))))
user=> (let [r (range 1e9)] [(first r) (last r)])
java.lang.OutOfMemoryError: GC overhead limit exceeded (NO_SOURCE_FILE:0)
user=> (let [r (incs 10)] [(first r) (nth r 1e9)])
[10 1000000010]
Both those examples retain the head, but since 'incs' isn't
a lazy-seq, the intermediate values can be garbage-collected.
Note the difference between the seq abstraction and the lazy-seq
implementation.
--Chouser
I've been following this thread, and I must say I'm puzzled that Rich
hasn't said anything at all about this issue yet. It seems important
enough to hear his own opinion.
Right - pervasive locals clearing will definitely do the trick here.Interestingly, when I was at Microsoft and asked them about handling
this issue for the CLR they stated plainly it wasn't an issue at all -
their system can fully detect that any such references are in fact
unreachable and are subject to GC. So, in a sense, all locals clearing
on my part is a workaround for a JVM weakness in this area.