Euler 14

30 views
Skip to first unread message

Andreas Liljeqvist

unread,
Jan 17, 2011, 10:53:06 AM1/17/11
to clo...@googlegroups.com
Hi.

I am using max-key to get the longest sequence from a couple of sequences.

(defn cseq [n]
  (if (= 1 n)
    [1]
    (cons n (cseq (if (even? n)
            (/ n 2)
            (+ (* 3 n) 1 ))))))

(apply max-key count (map cseq (range 1 1000000)))

This gives me a heap error.

To my understanding max-key should keep at most two sequences in memory.
I could certainly solve this by working around the problem, but I want to understand where my logic fails.
Thankful for any help.


Mark Engelberg

unread,
Jan 17, 2011, 1:27:35 PM1/17/11
to clo...@googlegroups.com
Your cseq is not lazy, and some of the sequences can be quite long, so
it wouldn't surprise me if that's the source of your problem.

You can test if this is the problem by doing something like:
(dorun (map cseq (range 1 1000000))) which removes the max-key from
the computation entirely.

You'll probably need to reformulate cseq with lazy lists. Even then,
you will likely find that this program will be too slow without
further optimizations (e.g., memoization or dynamic programming).

That's the nature of project Euler problems; the "obvious" way to
solve the problem is often too slow and further cleverness is
required.

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Andreas Liljeqvist

unread,
Jan 17, 2011, 2:55:51 PM1/17/11
to clo...@googlegroups.com
I don't see why the cseq's have to be lazy, they are at the most 525 elements long.
shouldn't each sequence only be produced when it is reduced in max-key and then discarded?
....
But it makes a difference:


(defn cseq [n]
  (if (= 1 n)
    [1]
    (lazy-seq (cons n (cseq (if (even? n)
                  (/ n 2)
                  (+ (* 3 n) 1 )))))))


(apply max-key count (map cseq (range 1 1000000)))

gives me results.

(def a (apply max-key count (map cseq (range 1 1000000))))
gives a heap error.

Thanks for your answer.

2011/1/17 Mark Engelberg <mark.en...@gmail.com>

Mark Engelberg

unread,
Jan 17, 2011, 4:21:36 PM1/17/11
to clo...@googlegroups.com
On Mon, Jan 17, 2011 at 11:55 AM, Andreas Liljeqvist <bon...@gmail.com> wrote:
> I don't see why the cseq's have to be lazy, they are at the most 525
> elements long.
> shouldn't each sequence only be produced when it is reduced in max-key and
> then discarded?

You're right, the chains aren't as long as I thought. I can't think
of any good explanation as to why max-key blows the heap when cseq is
non-lazy and doesn't when cseq is lazy. (I confirmed that on my
system, I get the same behavior as you, even set to a 1600MB heap
size).

Interestingly, this:
(apply max (map count (map cseq (range 1 1000000))))
works just fine with non-lazy seq.

The only real difference between this and the max-key version is that
reducing the max-key requires keeping around the longest list so far,
whereas this one just keeps around the count. But keeping around a
525 element list shouldn't be enough to overflow the heap.

I've tried to figure out:
Could max-key be holding on to the heads of these lists?
Could this be an effect of chunked sequences?

The chunked sequences explanation seems almost plausible. You could
in theory have 32 lists realized at once, plus the biggest one you've
seen so far. But even in the worst-case scenario, you're talking
about 15000 elements, and I don't see how that could overflow the
heap. Just out of curiosity, I tried this with an unchunked range,
and still got the heap overflow, so I don't see how it could be a
chunking issue.

Furthermore if either of these were the problem, you'd expect to see
the same problem with lazy cseq, because ultimately count has to
realize the lazy cseq, so if too many heads were being retained at
once, the lazy version would exhibit the same heap space problem.

This leads to the more disturbing possibility that maybe there's a
garbage collection flaw relating to non-lazy lists.

I'd love to see some more people investigate this and see if we can
come up with a good explanation as to why the original poster's code
overflows the heap space.

ka

unread,
Jan 19, 2011, 9:48:16 PM1/19/11
to Clojure
To me this looks totally fine, max-key should keep at most two
sequences in memory. I don't think there should be any difference
between the non-lazy and lazy versions as the depth of cseq is ~500.

The non-lazy version works for me (no heap error) for inputs 1M, 2M,
4M, but for 4M the java process did start to consume very large amount
of memory.

I did some tests with the original non-lazy cseq and (apply max-key
count (map cseq (take 4000000 (iterate inc 1)))) [Free of chunking I
think]

Running in VisualVM suggests that % of Used Heap is actually very
less. So the problem is that GC isn't running often enough, so the JVM
has to keep allocating more memory.

Running with -Xmx200M -XX:+UseParallelGC, did keep heap upto 200MB and
produces results for 4M, 10M (ans=686) w/o any problems. It will
consume much CPU and time though.



Mark Engelberg

unread,
Jan 19, 2011, 10:01:43 PM1/19/11
to clo...@googlegroups.com
On Wed, Jan 19, 2011 at 6:48 PM, ka <sanc...@gmail.com> wrote:
> Running in VisualVM suggests that % of Used Heap is actually very
> less. So the problem is that GC isn't running often enough, so the JVM
> has to keep allocating more memory.

Odd. I'd expect the JVM to run a GC immediately before reporting that
the heap has been exhausted.

ka

unread,
Jan 20, 2011, 3:10:58 AM1/20/11
to Clojure
>> Running in VisualVM suggests that % of Used Heap is actually very
>> less. So the problem is that GC isn't running often enough, so the JVM
>> has to keep allocating more memory.

By problem I meant, in my case, of too much memory consumption.

> Odd. I'd expect the JVM to run a GC immediately before reporting that
> the heap has been exhausted.

That certainly is odd, I would expect the same. But quick googling
doesn't turn up any references of the JVM which mention this.

Can you take a heap dump of the chunked and non-chunked versions (-XX:-
HeapDumpOnOutOfMemoryError)?

David Powell

unread,
Jan 20, 2011, 5:10:26 AM1/20/11
to clo...@googlegroups.com

This same problem was raised recently:

https://groups.google.com/group/clojure/browse_thread/thread/df4ae16ab0952786?tvc=2&q=memory

It isn't a GC problem, it is an issue in the Clojure compiler.


The issue seems to only affect top-level defs. At the top-level:

(reduce + (range 10000000))

- runs quickly without using lots of memory


(def x (#(reduce + (range 10000000))))

- also runs quickly without using lots of memory


(def x (reduce + (range 10000000)))

- uses lots of memory due to the sequence getting forced


The problem seems to be caused by the clojure compiler forcing the sequence when trying to figure out how to compile it, not
by the compiled code retaining the sequence at runtime.

It probably isn't a good idea to put expensive top-level defs in your code, because they get run when you just try to aot
compile your code; but if you need to, a workaround at the moment is to wrap your value in a function and call it as above.

I suspect that it is possible for us to fix the compiler not to force these sequences. I think it is happening in the
InvokeExpr class.

--
Dave

Andy Fingerhut

unread,
Jan 20, 2011, 5:17:58 AM1/20/11
to clo...@googlegroups.com
David:

Here is a link to the exact source code of the program I was running
for my recently published results in the sister thread titled "Problem
with garbage collection? Was Euler 14".

https://github.com/jafingerhut/clojure-benchmarks/blob/master/collatz/collatz.clj-1.clj

Is this code also exhibiting the same problem, and if so, which top-
level def is causing it? The only top-level defs in my code are three
functions defined with defn.

Thanks,
Andy

Andreas Liljeqvist

unread,
Jan 20, 2011, 7:19:54 AM1/20/11
to clo...@googlegroups.com
Thank you, that explains the failure of the lazy-cseq using top-level defs.
I really hope it gets fixed, Clojure is going to be a hard sell if I have to explain things like this to my coworkers :(

Anyhow there is still the problem with nonlazy cseq blowing the heap.


(defn cseq [n]
  (if (= 1 n)
    [1]
    (cons n (cseq (if (even? n)
            (/ n 2)
            (+ (* 3 n) 1 ))))))

(apply max-key count (map cseq (range 1 1000000)))

*BOOM*

2011/1/20 David Powell <djpo...@djpowell.net>

ka

unread,
Jan 20, 2011, 8:32:49 AM1/20/11
to Clojure
Andreas, How are you running that? Also what do you see in the heap
dump and what is the jvm heap size?

Andreas Liljeqvist

unread,
Jan 20, 2011, 9:51:58 AM1/20/11
to clo...@googlegroups.com
I am sorry, I can't seem to reproduce the behavior at the moment :(
Mark, please tell me that I am not delusional...

Will try at home also.

2011/1/20 ka <sanc...@gmail.com>
Andreas, How are you running that? Also what do you see in the heap
dump and what is the jvm heap size?

--

Mark Engelberg

unread,
Jan 20, 2011, 11:57:07 PM1/20/11
to clo...@googlegroups.com
On Thu, Jan 20, 2011 at 6:51 AM, Andreas Liljeqvist <bon...@gmail.com> wrote:
> I am sorry, I can't seem to reproduce the behavior at the moment :(
> Mark, please tell me that I am not delusional...

I definitely exhausted the heap running your program. I was using
Clojure 1.1, Java 1.6.0_21 with -server -Xmx1600M flags, running via
Clojure Box on Windows XP.

Ken Wesson

unread,
Jan 21, 2011, 12:34:28 AM1/21/11
to clo...@googlegroups.com
FWIW, no heap error on my system (Clojure 1.2, Java 1.6.0_13 -server).

Aaron Cohen

unread,
Jan 21, 2011, 6:27:34 PM1/21/11
to clo...@googlegroups.com

"Fine grained locals clearing" was added in clojure 1.2, so it's
likely that your entire (range 1 1000000) list has to be processed
before it can be GC'ed.

max-key uses destructuring, which was one of the culprits for
unexpectedly holding onto the head of lists before locals clearing was
added.

See http://groups.google.com/group/clojure/msg/9b4e268b85c20cd6

Aaron Cohen

unread,
Jan 21, 2011, 6:43:04 PM1/21/11
to clo...@googlegroups.com
> max-key uses destructuring, which was one of the culprits for
> unexpectedly holding onto the head of lists before locals clearing was
> added.

This part of what I said is garbage, I'm sorry. I looked at max-key
too quickly, but there isn't any destructuring there.

That doesn't change that I think that it was probably the addition of
locals clearing in 1.2 that is causing what you guys are seeing (or
not seeing).

Ken Wesson

unread,
Jan 21, 2011, 7:24:09 PM1/21/11
to clo...@googlegroups.com
On Fri, Jan 21, 2011 at 6:43 PM, Aaron Cohen <aa...@assonance.org> wrote:
>> max-key uses destructuring, which was one of the culprits for
>> unexpectedly holding onto the head of lists before locals clearing was
>> added.
>
> This part of what I said is garbage, I'm sorry. I looked at max-key
> too quickly, but there isn't any destructuring there.

I wrote my own implementation of max-key that doesn't use
destructuring before reading the later part of this thread. :)

It is here:

(defn my-max-key [f & xs]
(reduce
(fn [v1 v2]
(let [k1 (f v1)
k2 (f v2)]
(if (> k1 k2)
v1
v2)))
xs))

If the original code had blown the heap on my machine I was going to
substitute my-max-key and see if it still did, and if it did tweak
my-max-key to see if there was a way to make the problem go away.

This my-max-key is less than perfectly efficient in that it keeps
recomputing f of the current max-key; it's a no-frills version
intended to be sure not to hold onto the head of anything, and to be
simple and easy to tweak and debug. I'd have added saving f of current
max-key later and seen if it still avoided blowing the heap on the
original problem, and, if so, submitted it as a possible replacement
for the core implementation of max-key. But that's moot now. It looks
like one of the improvements to locals clearing in 1.2 fixed the OP's
problem.

Reply all
Reply to author
Forward
0 new messages