I've a head holding problem that I believe is a bug in clojure 1.3. I
wrote the following function to split a a lazy seq of strings across
files of x size:
(defn split-file
([path strs size]
(trampoline split-file path (seq strs) size 0))
([path strs size part]
(with-open [f (clojure.java.io/writer (str path "." part))]
(loop [written 0, ss strs]
(when ss
(if (>= written size)
#(split-file path ss size (inc part))
(let [s (first ss)]
(.write f s)
(recur (+ written (.length s)) (next ss)))))))))
If I call the 3 arg version of the function:
(split-file "foo" (repeat 100000000 "blah blah blah") 100000000)
I see memory usage increases as I'm writing each file with the usual
gc slow down, then memory usage goes back down again as I get to a new
split file.
Memory usage is fine if I call the 4 arg version (which only writes
one part of the split file):
(split-file "foo" (repeat 100000000 "blah blah blah") 100000000 0)
I can also avoid the head holding problem by removing trampoline and
recursively calling split-file directly, but then those recursive
calls use up stack and don't close files until all calls complete
(defn chunk-strings [size strs]
((fn chunk [pending strs written]
(lazy-seq
(if (>= written size)
(cons pending (chunk [], strs, 0))
(when-let [ss (seq strs)]
(let [s (first ss)
len (count s)]
(chunk (conj pending s)
(rest ss)
(+ len written)))))))
[], strs, 0))
I'm sure it can be done more cleanly with reductions, adding up length
as you go, but I had trouble holding that in my head, so primitive
recursion won out.
The Clojure compiler can't in general clear closed-over variables such
as
'ss in in #(split-file path ss ...) because the closure could be
called more
than once.
You could try using an (undocumented, compiler internal) feature
to give more information: (^:once fn* [] (split-file path ss size (inc
part)))
instead of #(split-file ...) and change the call to trampoline to
(trampoline (^:once fn* [] (split-file path (seq strs) size 0))) since
the multi-argument version of trampoline doesn't appear to use ^:once.
--
Juha Arpiainen
But it doesn't need to clear those, because the closure goes out of
scope after being called once, and thus its locals are out of scope as
well. Am I missing something?