OutOfMemoryError using coljure.contrib.duck-streams

17 views
Skip to first unread message

Alexander Stoddard

unread,
Jul 24, 2009, 10:28:11 AM7/24/09
to the.stua...@gmail.com, clo...@googlegroups.com
I am a very new clojure user but I believe I have found a bug when
using the clojure.contrib.duck-streams library.

My attempt to stream process a very big file blows up with
"java.lang.OutOfMemoryError: Java heap space".

I can reproduce the problem with the following simple code which I
think rules out most of my own (nearly unlimited) ignorance.

(use '[clojure.contrib.duck-streams :only(reader write-lines)])
(write-lines "test.out" (line-seq (reader "ReallyBigFile")))

Can anyone enlighten my as to what might be going wrong and or suggest
an alternative ?

My original code looked like:
(write-lines "test.out" (map my-line-processing-function (line-seq
(reader "ReallyBigFile"))))

Thank you and kind regards,
Alex Stoddard

Further details below:

I am using clojure and clojure contrib built from the head of the git
repository:
richhickey-clojure-3e60eff602652e753a54ba88b25dbdd2615c3b2e
richhickey-clojure-contrib-e20e8effe977640592b1f285d6c666492d74df00

My java details are:
java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode)

Stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
(test_read_write.clj:0)
at clojure.lang.Compiler.eval(Compiler.java:4617)
at clojure.lang.Compiler.load(Compiler.java:4931)
at clojure.lang.Compiler.loadFile(Compiler.java:4898)
at clojure.main$load_script__6637.invoke(main.clj:210)
at clojure.main$init_opt__6640.invoke(main.clj:215)
at clojure.main$initialize__6650.invoke(main.clj:243)
at clojure.main$null_opt__6672.invoke(main.clj:268)
at clojure.main$legacy_script__6687.invoke(main.clj:299)
at clojure.lang.Var.invoke(Var.java:359)
at clojure.main.legacy_script(main.java:32)
at clojure.lang.Script.main(Script.java:20)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.BufferedReader.readLine(BufferedReader.java:345)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at clojure.core$line_seq__4708$fn__4710.invoke(core.clj:1790)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:56)
at clojure.lang.LazySeq.first(LazySeq.java:78)
at clojure.lang.RT.first(RT.java:549)
at clojure.core$first__3817.invoke(core.clj:43)
at clojure.contrib.duck_streams$write_lines__117.invoke(duck_streams.clj:221)
at user$eval__298.invoke(test_read_write.clj:3)
at clojure.lang.Compiler.eval(Compiler.java:4601)
... 10 more

Stuart Sierra

unread,
Jul 24, 2009, 11:29:57 AM7/24/09
to Clojure
I'm afraid I can't reproduce this error, Alexander. I can run

(write-lines "/tmp/out" (line-seq (reader "/tmp/bigfile")))

on a 4.5 GB file with no problem, and I don't have that much memory.

Out-of-memory errors like this usually occur when your code is
"holding on to the head" of the sequence. For example, this will
fail:

(def lines (line-seq (reader "/tmp/bigfile")))
(write-lines "/tmp/out" lines)

because the "lines" var holds a reference to the first item in the
sequence, so the entire sequence gets cached in memory.

Another possibility is that the your big file doesn't have any line
breaks, or that it has extremely long lines. In that case, you'll
have to increase the Java heap size or manually read the file in
smaller chunks.

-Stuart Sierra


On Jul 24, 10:28 am, Alexander Stoddard <alexander.stodd...@gmail.com>
wrote:

Stuart Sierra

unread,
Jul 24, 2009, 11:52:44 AM7/24/09
to Clojure
I should admit that there may be something else I'm missing here.
write-lines is not a lazy sequence function, so it may be responsible
for holding the head of the sequence. I can't reproduce the error,
though.
-SS


On Jul 24, 11:29 am, Stuart Sierra <the.stuart.sie...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages