Too many open files with Topic LDA example and 33320 files

22 views
Skip to first unread message

Janek Bogucki

unread,
Sep 5, 2016, 7:04:03 AM9/5/16
to Factorie
Hi,

Any suggestions for getting past this exception?

java.io.FileNotFoundException: (Too many open files)

This works the first time,

$ bin/fac lda --read-dirs $mytextdir --num-topics 20 --num-iterations 100

An exception is seen with this variation

$ bin/fac lda --read-dirs $mytextdir --num-topics 100 --num-iterations 1000

Following that the first, less ambitous invocation fails repeatedly at 15k files read,

$ bin/fac lda --read-dirs $mytextdir --num-topics 20 --num-iterations 100

Exception

$ bin/fac lda --read-dirs $mytextdir --num-topics 20 --num-iterations 100
java -Xmx6g -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath ./src/main/resources:./target/classes:./target/factorie_2.11-1.3-SNAPSHOT-jar-with-dependencies.jar
Reading files from directory <redacted>
 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
 11000 12000 13000 14000 15000Exception in thread "main" java.io.FileNotFoundException: <redacted>/2015-05-21-08503-b121d690-6f96-4d11-97c1-187554b704d2.txt (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at cc.factorie.app.topics.lda.Document$.fromFile(Document.scala:141)
at cc.factorie.app.topics.lda.LDACmd$$anonfun$main$4$$anonfun$apply$1$$anonfun$apply$mcV$sp$2.apply(LDA.scala:333)
at cc.factorie.app.topics.lda.LDACmd$$anonfun$main$4$$anonfun$apply$1$$anonfun$apply$mcV$sp$2.apply(LDA.scala:331)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at cc.factorie.app.topics.lda.LDACmd$$anonfun$main$4$$anonfun$apply$1.apply$mcV$sp(LDA.scala:331)
at scala.util.control.Breaks.breakable(Breaks.scala:38)
at cc.factorie.app.topics.lda.LDACmd$$anonfun$main$4.apply(LDA.scala:331)
at cc.factorie.app.topics.lda.LDACmd$$anonfun$main$4.apply(LDA.scala:328)
at scala.collection.immutable.List.foreach(List.scala:381)
at cc.factorie.app.topics.lda.LDACmd.main(LDA.scala:328)
at cc.factorie.app.topics.lda.LDA.main(LDA.scala)


$mytextdir contains 33320 files and this is my Ubuntu 14.04 max files param,

$ cat /proc/sys/fs/file-max
1641084


Built from source at commit,

$ git log -1
commit e848f78d664e19aeb081e8beceb9d2599503cab7
Merge: 0bd6478 6c3fab9
Author: Emma Strubell <emma.s...@gmail.com>
Date:   Mon Jun 13 11:10:04 2016 -0700

    Merge pull request #371 from JamesSullivan/classify
    
    Fix for --write-classifications see this thread:

Any tip or insight is very much appreciated.

Janek

Emma Strubell

unread,
Sep 7, 2016, 12:37:12 PM9/7/16
to Factorie
My guess is that a file isn't being closed... looking at LDA.scala, this line in particular (331):

for (file <- new File(directory).listFiles; if file.isFile)

is suspect, new File(directory).listFiles is likely unnecessarily opening a new file for each iteration (and there may be other issues as well). I'll try to take a look sometime this week (or you are welcome to!)

Thanks,

Emma

--
--
Factorie Discuss group.
To post, email: dis...@factorie.cs.umass.edu
To unsubscribe, email: discuss+u...@factorie.cs.umass.edu

Janek Bogucki

unread,
Sep 13, 2016, 5:34:46 AM9/13/16
to Factorie
I have a commit that fixes this specific case.

I will raise a PR for further discussion (i.e. in general who should be responsible for closing Readers - the caller or the callee, should this be addressed systematically).

Does factorie have a CLA that needs signing?

Regards,
Janek
Reply all
Reply to author
Forward
0 new messages