You can test if this is the problem by doing something like:
(dorun (map cseq (range 1 1000000))) which removes the max-key from
the computation entirely.
You'll probably need to reformulate cseq with lazy lists. Even then,
you will likely find that this program will be too slow without
further optimizations (e.g., memoization or dynamic programming).
That's the nature of project Euler problems; the "obvious" way to
solve the problem is often too slow and further cleverness is
required.
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
You're right, the chains aren't as long as I thought. I can't think
of any good explanation as to why max-key blows the heap when cseq is
non-lazy and doesn't when cseq is lazy. (I confirmed that on my
system, I get the same behavior as you, even set to a 1600MB heap
size).
Interestingly, this:
(apply max (map count (map cseq (range 1 1000000))))
works just fine with non-lazy seq.
The only real difference between this and the max-key version is that
reducing the max-key requires keeping around the longest list so far,
whereas this one just keeps around the count. But keeping around a
525 element list shouldn't be enough to overflow the heap.
I've tried to figure out:
Could max-key be holding on to the heads of these lists?
Could this be an effect of chunked sequences?
The chunked sequences explanation seems almost plausible. You could
in theory have 32 lists realized at once, plus the biggest one you've
seen so far. But even in the worst-case scenario, you're talking
about 15000 elements, and I don't see how that could overflow the
heap. Just out of curiosity, I tried this with an unchunked range,
and still got the heap overflow, so I don't see how it could be a
chunking issue.
Furthermore if either of these were the problem, you'd expect to see
the same problem with lazy cseq, because ultimately count has to
realize the lazy cseq, so if too many heads were being retained at
once, the lazy version would exhibit the same heap space problem.
This leads to the more disturbing possibility that maybe there's a
garbage collection flaw relating to non-lazy lists.
I'd love to see some more people investigate this and see if we can
come up with a good explanation as to why the original poster's code
overflows the heap space.
Odd. I'd expect the JVM to run a GC immediately before reporting that
the heap has been exhausted.
https://groups.google.com/group/clojure/browse_thread/thread/df4ae16ab0952786?tvc=2&q=memory
It isn't a GC problem, it is an issue in the Clojure compiler.
The issue seems to only affect top-level defs. At the top-level:
(reduce + (range 10000000))
- runs quickly without using lots of memory
(def x (#(reduce + (range 10000000))))
- also runs quickly without using lots of memory
(def x (reduce + (range 10000000)))
- uses lots of memory due to the sequence getting forced
The problem seems to be caused by the clojure compiler forcing the sequence when trying to figure out how to compile it, not
by the compiled code retaining the sequence at runtime.
It probably isn't a good idea to put expensive top-level defs in your code, because they get run when you just try to aot
compile your code; but if you need to, a workaround at the moment is to wrap your value in a function and call it as above.
I suspect that it is possible for us to fix the compiler not to force these sequences. I think it is happening in the
InvokeExpr class.
--
Dave
Here is a link to the exact source code of the program I was running
for my recently published results in the sister thread titled "Problem
with garbage collection? Was Euler 14".
https://github.com/jafingerhut/clojure-benchmarks/blob/master/collatz/collatz.clj-1.clj
Is this code also exhibiting the same problem, and if so, which top-
level def is causing it? The only top-level defs in my code are three
functions defined with defn.
Thanks,
Andy
Andreas, How are you running that? Also what do you see in the heap
dump and what is the jvm heap size?
--
I definitely exhausted the heap running your program. I was using
Clojure 1.1, Java 1.6.0_21 with -server -Xmx1600M flags, running via
Clojure Box on Windows XP.
"Fine grained locals clearing" was added in clojure 1.2, so it's
likely that your entire (range 1 1000000) list has to be processed
before it can be GC'ed.
max-key uses destructuring, which was one of the culprits for
unexpectedly holding onto the head of lists before locals clearing was
added.
See http://groups.google.com/group/clojure/msg/9b4e268b85c20cd6
This part of what I said is garbage, I'm sorry. I looked at max-key
too quickly, but there isn't any destructuring there.
That doesn't change that I think that it was probably the addition of
locals clearing in 1.2 that is causing what you guys are seeing (or
not seeing).
I wrote my own implementation of max-key that doesn't use
destructuring before reading the later part of this thread. :)
It is here:
(defn my-max-key [f & xs]
(reduce
(fn [v1 v2]
(let [k1 (f v1)
k2 (f v2)]
(if (> k1 k2)
v1
v2)))
xs))
If the original code had blown the heap on my machine I was going to
substitute my-max-key and see if it still did, and if it did tweak
my-max-key to see if there was a way to make the problem go away.
This my-max-key is less than perfectly efficient in that it keeps
recomputing f of the current max-key; it's a no-frills version
intended to be sure not to hold onto the head of anything, and to be
simple and easy to tweak and debug. I'd have added saving f of current
max-key later and seen if it still avoided blowing the heap on the
original problem, and, if so, submitted it as a possible replacement
for the core implementation of max-key. But that's moot now. It looks
like one of the improvements to locals clearing in 1.2 fixed the OP's
problem.