chunk-file function

83 views
Skip to first unread message

Brian Doyle

unread,
Nov 12, 2008, 5:20:41 PM11/12/08
to clo...@googlegroups.com
I had to process each line of a very large file, 120MB,
and did not want to read in the whole file at once.   I
wrote this function, chunk-file, that allows me to pass
in a function and args that will process each line. 
It works great for me, but hoping to get any feedback
about coding style or there is already something out
there that does such a thing or whatever.  Thanks.  

(defn chunk-file
  "Takes a file, number of lines, a function and args. 
    Reads in line-size from the file and passes each line
    and the args to the given function." 
  ([file line-size f & args]
      (with-open r file
         (loop [l (.readLine r)
                   tlines []]
           (let [end? (nil? l)
                  lines (if (not end?) (conj tlines l) tlines)
                  chunk? (zero? (rem (count lines) line-size))]
                 (if (or chunk? end?)
                  (do
                    (doseq line lines (apply f line args))
                    (if (not end?)
                      (recur (.readLine r) [])))
                  (recur (.readLine r) lines)))))))

Stephen C. Gilardi

unread,
Nov 12, 2008, 6:02:22 PM11/12/08
to clo...@googlegroups.com

On Nov 12, 2008, at 5:20 PM, Brian Doyle wrote:

> It works great for me, but hoping to get any feedback
> about coding style or there is already something out
> there that does such a thing or whatever. Thanks.
>
> (defn chunk-file
> "Takes a file, number of lines, a function and args.
> Reads in line-size from the file and passes each line
> and the args to the given function."
> ([file line-size f & args]
> (with-open r file
> (loop [l (.readLine r)
> tlines []]
> (let [end? (nil? l)
> lines (if (not end?) (conj tlines l) tlines)
> chunk? (zero? (rem (count lines) line-size))]
> (if (or chunk? end?)
> (do
> (doseq line lines (apply f line args))
> (if (not end?)
> (recur (.readLine r) [])))
> (recur (.readLine r) lines)))))))


A few quick comments:

- When you only have one set of arguments, you can skip the pair of
parens that start before the argument vector

- Clojure has "line-seq". It returns a lazy sequence of lines read
from a reader. To use it with a file, you can use code like this:

(ns my-ns
(:import (java.io BufferedReader FileReader)))

(defn file-lines
[file-name]
(line-seq (BufferedReader. (FileReader. file-name))))

(doseq line (file-lines my-file)
...)

- I usually see names with ? after being functions rather than flags.
(end?)

--Steve

Brian Doyle

unread,
Nov 12, 2008, 6:14:04 PM11/12/08
to clo...@googlegroups.com
In your example using line-seq, should really surround it with the
with-open function so that it will close the reader?   Thanks.

Brian Doyle

unread,
Nov 12, 2008, 7:28:41 PM11/12/08
to clo...@googlegroups.com
I did try the line-seq and it didn't run out of memory and I used a
surrounding with-open and it works great now.   Thanks Stephen!

Mark H.

unread,
Nov 15, 2008, 12:46:10 AM11/15/08
to Clojure
On Nov 12, 2:20 pm, "Brian Doyle" <brianpdo...@gmail.com> wrote:
> I had to process each line of a very large file, 120MB,
> and did not want to read in the whole file at once.   I
> wrote this function, chunk-file, that allows me to pass
> in a function and args that will process each line.

What happens if some mean person stripped all the endlines out of your
file? ;-)

mfh
Reply all
Reply to author
Forward
0 new messages