Stackoverflow on a function listing files in a directory recursively

407 views
Skip to first unread message

Ramesh

unread,
May 20, 2013, 4:12:26 PM5/20/13
to clo...@googlegroups.com
Hi all,

I have the following function to list all the files recursively under a directory. 

(defn list-all-files2 [path]
(let [basepath (java.io.File. path)]
(loop [origlist [basepath] finallist []]
(if-let [cur (first origlist)]
;This line is not required (if (.isDirectory cur)
(recur (concat (rest origlist) (.listFiles cur)) (conj finallist (.getCanonicalPath cur)))
finallist))))


It runs fine, when the number of files are quite less, 10-20.

But when I recurse through even moderately large amount of files, it fails with stackoverflow! The total number of files are not that much (50114). So even if each of the filename is 100 chars long, it would roughly take (5M bytes, which is 5MB). But I get a stack overflow error, and I have no clue. I think I am doing something wrong with the lazy functions.

Here is the stacktrace:
Exception in thread "main" java.lang.RuntimeException: java.lang.StackOverflowError
        at clojure.lang.Util.runtimeException(Util.java:165)
        at clojure.lang.Compiler.eval(Compiler.java:6476)
        at clojure.lang.Compiler.eval(Compiler.java:6455)
        at clojure.lang.Compiler.eval(Compiler.java:6431)
        at clojure.core$eval.invoke(core.clj:2795)
        at clojure.main$eval_opt.invoke(main.clj:296)
        at clojure.main$initialize.invoke(main.clj:315)
        at clojure.main$null_opt.invoke(main.clj:348)
        at clojure.main$main.doInvoke(main.clj:426)
        at clojure.lang.RestFn.invoke(RestFn.java:421)
        at clojure.lang.Var.invoke(Var.java:405)
        at clojure.lang.AFn.applyToHelper(AFn.java:163)
        at clojure.lang.Var.applyTo(Var.java:518)
        at clojure.main.main(main.java:37)
Caused by: java.lang.StackOverflowError
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:466)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$concat$fn__3528.invoke(core.clj:661)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java


Thanks,
ramesh

Ben Wolfson

unread,
May 20, 2013, 4:21:27 PM5/20/13
to clo...@googlegroups.com
Basically, you can't use "concat" in a loop like that. You could wrap it in a (doall ...) to avoid the stack overflow.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry]

Jim - FooBar();

unread,
May 20, 2013, 4:28:48 PM5/20/13
to clo...@googlegroups.com
have you considered using mapcat recursively? something like this perhaps?

(defn list-all-files [^java.io.File dir]
  (let [children (.listFiles dir)
        directories (filter #(.isDirectory %) children)
        files (filter #(.isFile %) children)]
    (concat files (mapcat list-all-files directories))))

Jim

ps: I tried this on my 2T external drive which is almost full and I got tired of waiting after 15 minutes! No overflow, but no answer either... :)

Gary Trakhman

unread,
May 20, 2013, 4:31:51 PM5/20/13
to clo...@googlegroups.com

Jim - FooBar();

unread,
May 20, 2013, 4:39:45 PM5/20/13
to clo...@googlegroups.com
actually yes it is! :) I thought file-seq was going down one level only but looking at the implementation it seems it goes down all levels...I'm trying it out now

Jim

Jim - FooBar();

unread,
May 20, 2013, 4:43:04 PM5/20/13
to clo...@googlegroups.com
well no it doesn't descent into directories...I just tried it...

Jim

Ramesh

unread,
May 20, 2013, 4:45:29 PM5/20/13
to clo...@googlegroups.com
So, I think concat is the problem here. I wish there were recommendation for other options and reason in the Stacktrace to help me code better!

And file-seq is exactly what I'm looking for :). Thanks all!


Thanks,
ramesh

Gary Trakhman

unread,
May 20, 2013, 4:46:02 PM5/20/13
to clo...@googlegroups.com
yes it does..
user> (clojure.pprint/pprint (take 20 (file-seq (clojure.java.io/file "/"))))
(#<File />
 #<File /tmp>
 #<File /tmp/.ICE-unix>
 #<File /tmp/.ICE-unix/2262>
 #<File /tmp/CRX_75DAF8CB7768>
 #<File /tmp/CRX_75DAF8CB7768/manifest.json>
 #<File /tmp/CRX_75DAF8CB7768/crl-set>
 #<File /tmp/hsperfdata_gary>
 #<File /tmp/hsperfdata_gary/5728>
 #<File /tmp/hsperfdata_gary/5776>
 #<File /tmp/ksocket-gary>
 #<File /tmp/ksocket-gary/KSMserver__0>
 #<File /tmp/ksocket-gary/kdeinit4__0>
 #<File /tmp/ksocket-gary/klauncherhX2223.slave-socket>
 #<File /tmp/pulse-PKdhtXMmr18n>
 #<File /tmp/.com.google.Chrome.JWnc7T>
 #<File /tmp/.com.google.Chrome.JWnc7T/SingletonSocket>
 #<File /tmp/.com.google.Chrome.JWnc7T/SingletonCookie>
 #<File /tmp/kde-gary>
 #<File /tmp/kde-gary/xauth-1000-_0>)

Jim - FooBar();

unread,
May 20, 2013, 4:49:00 PM5/20/13
to clo...@googlegroups.com
aaa yes it does...I hadn't pretty printed it and it wasn't easy to see...apologies :)

Jim

Gary Trakhman

unread,
May 20, 2013, 4:50:07 PM5/20/13
to clo...@googlegroups.com
Ramesh, I think the main problem is you're trying to do a recursion over a tree, so you're blowing a stack if you're not careful, concat is lazy so I think you're just shifting the problem around, file-seq gets around this by using tree-seq, which uses 'walk', which is a linearization of the tree.

Ramesh

unread,
May 20, 2013, 11:18:21 PM5/20/13
to clo...@googlegroups.com
Tree-seq is so elegant :)! 

(defn tree-seq
02   "Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
03    branch? must be a fn of one arg that returns true if passed a node
04    that can have children (but may not).  children must be a fn of one
05    arg that returns a sequence of the children. Will only be called on
06    nodes for which branch? returns true. Root is the root node of the
07   tree."
08   {:added "1.0"
09    :static true}
10    [branch? children root]
11    (let [walk (fn walk [node]
12                 (lazy-seq
13                  (cons node
14                   (when (branch? node)
15                     (mapcat walk (children node))))))]
16      (walk root)))


-ramesh

Ramesh

unread,
Jun 23, 2013, 9:24:06 PM6/23/13
to clo...@googlegroups.com
Hi All,

Thanks a lot for your replies. So, I tried to imitate the file-seq, and wrote a function like this

(defn iter-all-files-lazy [path]
  (lazy-seq 
    (cons path 
      (when (.isDirectory (java.io.File. path)) 
        (mapcat 
          (comp iter-all-files-lazy #(. % getCanonicalPath)) 
          (.listFiles (java.io.File. path)))))))

And it works lazily, and I can take a few elements or a few thousand elements or just read everything. "lazy-seq" macro is important here, as it is the one which returns a Seqable object. Without "lazy-seq" this fails with stack overflow. 

(It was successfully able to list all the files without running out of memory)

This makes me wonder.. why does mapcat which is supposed to return a lazy sequence not return a lazy sequence?


Thanks,
ramesh

Gary Verhaegen

unread,
Jul 20, 2013, 4:47:36 PM7/20/13
to clo...@googlegroups.com
Even though mapcat itself is lazy, you do call mapcat immediately to resolve the call to cons. As this is an evaluation of a lazy sequence, it has to produce (at least) the first element (potentially the 32 first elements if the seq is chunked, but let's not go there). But to produce the first element of the lazy sequence yielded by mapcat, the interpreter has to recursively call your function on the first element of the list of files, and there goes your laziness.

In terms of tree, without the lazy-seq (and without chunking), your function would basically be lazy in breadth but eager in depth - you would produce the full left-most branch of the tree of files, never recursing to the second child of any node, but still going all the way down to the first leaf.

If by chance the lazy seq is chunked - I've not been bitten by it yet so I never had to look up the rules for that - you further evaluate 31 more elements at each level, and all of their children. It seems quite likely that you would get something very close to the whole tree that way.
Reply all
Reply to author
Forward
0 new messages