Lazy sequences (once again?)

104 views

Skip to first unread message

Edward Knyshov

unread,

Jan 8, 2018, 2:20:57 AM1/8/18

to Clojure

Hi I recently started to dig into lazy sequences processing in clojure because I need to process huge amount of data that doesn't fit in memory.
I found a few articles and examples describing the way lazy seqs work in clojure. But so far all I got myself is a paranoia.
Now everywhere I look in the code I have a feeling that it's gonna retain my sequence in memory.
It seems like realization of lazy clojure sequences is very unclear sometimes and you can't reason about it.

For example:

(let [r (range 1e9)]

(first r)

(last r))

;;=> 999999999

(let [r (range 1e9)]

(last r)

(first r))

;;=> java.lang.OutOfMemoryError: GC overhead limit exceeded

I got this example from here https://github.com/danielmiladinov/joy-of-clojure/blob/master/src/joy-of-clojure/chapter6/laziness.clj
I tried it myself and it works as it's commented above.
But how in the world does it work internally?
How can I reason about this code in more complex example where instead of function first there is some function x and instead of function last there is some function y.
I can't come up with some clear rule of thumb that would help me to be confident that my code works well on lazy collections without running it and checking whether it throws OOME or not..
I would appreciate if anyone could point me to any discussion/article/paper that makes things more clear?

Gary Verhaegen

unread,

Jan 8, 2018, 4:03:39 AM1/8/18

to clo...@googlegroups.com

It really boils down to understanding garbage collection: memory that cannot be reached anymore is freed. In the first code sample, when running (last r), the compiler has detected that r is never used afterwards, so it clears the local variable. This means that while last is walking through the sequence one element at a time, the JVM can detect that all of the elements it has seen so far cannot be accessed anymore (Clojure seqs do not generally have any "back" pointer), so the GC is free to delete them. On the other hand, in the second code sample, when the call to (last r) is running, Clojure needs to keep a reference to r around so that it can later on call (first r), which prevents any garbage collection from happening, as r points to the first element, which points to the second, etc. up until wherever (last r) has gone so far. So you need to realize the full seq in memory.

Now to answer your question: lazy seqs are a functional programming tool. Mixing them with side effecting code is dangerous. How is that relevant? Well, your question only makes sense if you assume that your function x is side-effecting; if it isn’t the answer is simply to not call it at all and call y directly, as you wouldn’t be doing anything with the result of x anyway.

As long as you have a clear idea of the order of evaluation, you should be fine.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages