Clojure 1.3 head holding bug

52 views
Skip to first unread message

Gerrard McNulty

unread,
Nov 26, 2011, 8:59:02 PM11/26/11
to Clojure
Hi,

I've a head holding problem that I believe is a bug in clojure 1.3. I
wrote the following function to split a a lazy seq of strings across
files of x size:

(defn split-file
([path strs size]
(trampoline split-file path (seq strs) size 0))
([path strs size part]
(with-open [f (clojure.java.io/writer (str path "." part))]
(loop [written 0, ss strs]
(when ss
(if (>= written size)
#(split-file path ss size (inc part))
(let [s (first ss)]
(.write f s)
(recur (+ written (.length s)) (next ss)))))))))

If I call the 3 arg version of the function:
(split-file "foo" (repeat 100000000 "blah blah blah") 100000000)

I see memory usage increases as I'm writing each file with the usual
gc slow down, then memory usage goes back down again as I get to a new
split file.

Memory usage is fine if I call the 4 arg version (which only writes
one part of the split file):
(split-file "foo" (repeat 100000000 "blah blah blah") 100000000 0)

I can also avoid the head holding problem by removing trampoline and
recursively calling split-file directly, but then those recursive
calls use up stack and don't close files until all calls complete

Alan Malloy

unread,
Nov 28, 2011, 3:57:45 PM11/28/11
to Clojure
Interesting. It seems to me like locals-clearing should take care of
this for you, by preparing a call to trampoline, then setting the
locals to nil, then calling trampoline. But you can solve this easily
enough yourself, in this particular case, by splitting the strings up
into chunks before you open any files (you can do this lazily so it's
not a head-holding issue). Then inside the loop body, you open up a
part file, write all the strings you planned to write, close the file,
and recur with the next chunk of strings. Such a chunking function
would look a bit like:

(defn chunk-strings [size strs]
((fn chunk [pending strs written]
(lazy-seq


(if (>= written size)

(cons pending (chunk [], strs, 0))
(when-let [ss (seq strs)]
(let [s (first ss)
len (count s)]
(chunk (conj pending s)
(rest ss)
(+ len written)))))))
[], strs, 0))

I'm sure it can be done more cleanly with reductions, adding up length
as you go, but I had trouble holding that in my head, so primitive
recursion won out.

Juha Arpiainen

unread,
Nov 28, 2011, 4:55:54 PM11/28/11
to Clojure
On Nov 27, 3:59 am, Gerrard McNulty <gerrard.mcnu...@gmail.com> wrote:
> Hi,
>
> I've a head holding problem that I believe is a bug in clojure 1.3.  I
> wrote the following function to split a a lazy seq of strings across
> files of x size:
>
> (defn split-file
>   ([path strs size]
>      (trampoline split-file path (seq strs) size 0))
>   ([path strs size part]
>      (with-open [f (clojure.java.io/writer (str path "." part))]
>        (loop [written 0, ss strs]
>          (when ss
>            (if (>= written size)
>              #(split-file path ss size (inc part))
>              (let [s (first ss)]
>                (.write f s)
>                (recur (+ written (.length s)) (next ss)))))))))

The Clojure compiler can't in general clear closed-over variables such
as
'ss in in #(split-file path ss ...) because the closure could be
called more
than once.

You could try using an (undocumented, compiler internal) feature
to give more information: (^:once fn* [] (split-file path ss size (inc
part)))
instead of #(split-file ...) and change the call to trampoline to
(trampoline (^:once fn* [] (split-file path (seq strs) size 0))) since
the multi-argument version of trampoline doesn't appear to use ^:once.

--
Juha Arpiainen

Alan Malloy

unread,
Nov 28, 2011, 5:35:10 PM11/28/11
to Clojure

But it doesn't need to clear those, because the closure goes out of
scope after being called once, and thus its locals are out of scope as
well. Am I missing something?

Reply all
Reply to author
Forward
0 new messages