Process, ProcessBuilder, ProcessLogger: Too many open files

285 views
Skip to first unread message

William Harvey

unread,
Feb 21, 2012, 5:59:38 PM2/21/12
to scala-user
Dear All,

I'm writing some code to read a bunch of file names from an input
file, then run a linux program on each of them in parallel. I am
using scala.sys.process.Process to create a ProcessBuilder, and I'm
using a custom ProcessLogger to take the output of the linux program
and pluck out the console output that I need. The relevant bits of my
code look like this:

val accInfo = new Array[Float](numConformations)

final class DsspOutputParser(conformationID: Int) {
var numLinesRead = 0
var residueIdx = 0
@inline
def processLine(line: String): Unit = {
numLinesRead += 1
if (numLinesRead > 25) {
val acc = line.substring(34, 38).trim.toInt
accInfo(conformationID) = acc
residueIdx += 1
}
}
}

val tasks = withBufferedReader(new File(config.datasetDir,
"conformation_filenames.txt"))(br => {
Iterator.continually(br.readLine()).takeWhile(_ !=
null).toList.zipWithIndex.map {
case (conformationFilename, i) => {
future {
println((i+1) + " of " + numConformations)

val pb = Process(dsspCommand + " " +
conformationFilename)
val outputParser = new DsspOutputParser(i)
val procLog = ProcessLogger(outputParser.processLine(_))
pb.!(ProcessLogger(outputParser.processLine(_)))

}
}
}
})

tasks.grouped(10).foreach{group =>
scala.actors.Futures.awaitAll(Long.MaxValue / 2L, group: _*)}

The problem is that after running for a while, the program barfs with
the following error:

<function0>: caught java.io.IOException: Cannot run program "dssp-2-
linux-amd64": java.io.IOException: error=24, Too many open files
java.io.IOException: Cannot run program "dssp-2-linux-amd64":
java.io.IOException: error=24, Too many open files
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at scala.sys.process.ProcessBuilderImpl
$Simple.run(ProcessBuilderImpl.scala:68)
at scala.sys.process.ProcessBuilderImpl
$AbstractBuilder.run(ProcessBuilderImpl.scala:99)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun
$runBuffered$1.apply(ProcessBuilderImpl.scala:147)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun
$runBuffered$1.apply(ProcessBuilderImpl.scala:147)
at scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala:
64)
at scala.sys.process.ProcessBuilderImpl
$AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:147)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.
$bang(ProcessBuilderImpl.scala:113)
[et cetera]

So it looks like something might not be cleaning up after itself, or
the garbage collector might not be aggressive enough (i.e. it looks
like the process streams aren't getting properly closed). The output
of lsof lists a bunch of lines that look like this:

java 23555 harveywi 7693w FIFO 0,8 0t0 866166108 pipe

I have seen a similar problem before a few years back (using
scala.io.Source.fromFile(_) in small parallel batches), and the
solution was to invoke System.gc() periodically to make sure that the
offending streams were closed and cleaned up. However, that
workaround doesn't seem to be effective anymore.

I am using sun java 1.6.0_29, and the VM arguments that I am using are
"-Xmx1G -server". I am using the default garbage collector, which
might be the problem.

Am I doing something really silly? If it is up to me to close the
streams manually, how do I do that? I poked through the scala
standard library source and didn't see anything obvious. If it's not
my responsibility to close the streams, and the scala standard library
is properly closing them, do you have any good ideas on how I might be
able to get around this issue?

Thank you!

-William Harvey
http://www.cse.ohio-state.edu/~harveywi

Alex Cruise

unread,
Feb 21, 2012, 6:01:38 PM2/21/12
to William Harvey, scala-user
On Tue, Feb 21, 2012 at 2:59 PM, William Harvey <harv...@cse.ohio-state.edu> wrote:
The problem is that after running for a while, the program barfs with
the following error:

<function0>: caught java.io.IOException: Cannot run program "dssp-2-
linux-amd64": java.io.IOException: error=24, Too many open files

Daniel Sobral

unread,
Feb 21, 2012, 7:02:07 PM2/21/12
to William Harvey, scala-user
You forgot to mention the version of Scala you are using.

By the way, scala.io.Source must be closed explicitly. At the time I
started using Scala, however, most examples didn't do that, because it
would break the wonderful one-liners. Mind you, I was as guilty of
that as anyone else, but, at any rate, if you are not closing your
scala.io.Source, you should revise that.

The process library was leaking file descriptors. A fix to that went
in just last week, which I believe took care of all cases. If,
however, you can reproduce the problem with a recent Scala from trunk,
I'd be most interested in hearing about it. Also, if you are passing a
ProcessIO, then your code is responsible for closing the streams that
are passed to it.

--
Daniel C. Sobral

I travel to the future all the time.

William Harvey

unread,
Feb 21, 2012, 7:26:07 PM2/21/12
to scala-user
(Accidentally posted reply to just Daniel rather than the group, so
I'll summarize here.)

I'm using the Scala IDE for Eclipse, which uses 2.9.1.final. I
checked out the latest scala from trunk, and so far things are running
very well! The number of open files stays right around 100 using 64
threads. This is great!

Thank you all for taking the time and effort to help me out, and for
fixing that issue. I really appreciate it! The scala community is
very lucky to have you!

Cheers,

William Harvey
http://www.cse.ohio-state.edu/~harveywi
Reply all
Reply to author
Forward
0 new messages