reading a file one line at a time (not iterating)

655 views
Skip to first unread message

Russ P.

unread,
Apr 3, 2012, 3:10:53 PM4/3/12
to scala-user
I am trying to read a short file one line at a time. I know how to
iterate over the lines, but that is not what I want to do now. I
simply want to read three or four lines one at a time (using readLine
or some such method) and extract a different kind of information from
each line. I asked about this a couple of years ago, and I was told to
use scala.io.Source.fromPath. However, fromPath does not seem to exist
anymore. What is the simplest way to do this? Thanks.

Russ P.

Vlad Patryshev

unread,
Apr 3, 2012, 4:04:07 PM4/3/12
to Russ P., scala-user
scala.io.Source.fromPath, fromUrl, etc.


Thanks,
-Vlad

Russ P.

unread,
Apr 3, 2012, 4:25:41 PM4/3/12
to scala-user
OK, I see that fromPath has apparently been replaced with fromFile,
but I'm still baffled about how to read a single line. Where is
"readLine" or its equivalent? Thanks.

--Russ P.


On Apr 3, 1:04 pm, Vlad Patryshev <vpatrys...@gmail.com> wrote:
> scala.io.Source.fromPath, fromUrl, etc.
>
> Thanks,
> -Vlad
>

Alex Cruise

unread,
Apr 3, 2012, 4:33:29 PM4/3/12
to Russ P., scala-user
On Tue, Apr 3, 2012 at 12:10 PM, Russ P. <russ.p...@gmail.com> wrote:
I am trying to read a short file one line at a time. I know how to
iterate over the lines, but that is not what I want to do now. I
simply want to read three or four lines one at a time (using readLine
or some such method) and extract a different kind of information from
each line. I asked about this a couple of years ago, and I was told to
use scala.io.Source.fromPath. However, fromPath does not seem to exist
anymore. 

If the number of lines you want to see in each "stanza" is fixed, you can use getLines and grouped:

val src = io.Source.fromFile("/usr/share/dict/words")
src.getLines.grouped(3).take(5).foreach(println)

List(s, abattoir, abattoirs)
List(abattu, abattue, Abatua)
List(abature, abaue, abave)
List(abaxial, abaxile, abay)
List(abayah, abaze, abb)

If, on the other hand, you want to use data from the lines themselves to decide whether any given stanza is done yet, this is the way I usually do it--it works with a Seq rather than an Iterator, but you mentioned your file is short, so I guess you're not worried about fitting it all in memory.

  /**
   * Splits a collection into groups based on a break predicate.  Elements for which the break predicate returns true
   * terminate a stanza and are not included by default, but if includeBreaks is true, they'll be included at the end of
   * each stanza.
   */
  def break[T](in: Seq[T], includeBreaks: Boolean = false)(break: (T) => Boolean): Seq[Seq[T]] = {
    @annotation.tailrec
    def breakLikeTheWind[T](in: Seq[T], break: (T) => Boolean, currentStanza: Seq[T], out: Seq[Seq[T]]): Seq[Seq[T]] = {
      in match {
        case Seq() =>
          out :+ currentStanza

        case Seq(x, xs @ _*) =>
          val broke = break(x)

          val stanza = if (!broke || includeBreaks) {
            currentStanza :+ x
          } else {
            currentStanza
          }

          if (xs.isEmpty) {
            out :+ stanza
          } else {
            if (broke)
              breakLikeTheWind(xs, break, Vector(), out :+ stanza)
            else
              breakLikeTheWind(xs, break, stanza, out)
          }
      }
    }

    breakLikeTheWind(in, break, Vector[T](), Vector[Vector[T]]())
  }

 

Russ P.

Russ P.

unread,
Apr 3, 2012, 5:55:32 PM4/3/12
to scala-user
Thanks, but that's way too complicated. I can do it trivially in
Python in a line or two. After screwing around for a while, I came up
with something like this:

val data =
scala.io.Source.fromFile("myFile.dat").getLines.toArray

I can then access the lines separately as elements of an Array (or I
could have used a List). That's not too bad, but for crying out loud
shouldn't I be able to just write

val file = scala.io.Source.fromFile("myFile.dat")

file.getLine
...


On Apr 3, 1:33 pm, Alex Cruise <a...@cluonflux.com> wrote:
> On Tue, Apr 3, 2012 at 12:10 PM, Russ P. <russ.paie...@gmail.com> wrote:
> > I am trying to read a short file one line at a time. I know how to
> > iterate over the lines, but that is not what I want to do now. I
> > simply want to read three or four lines one at a time (using readLine
> > or some such method) and extract a different kind of information from
> > each line. I asked about this a couple of years ago, and I was told to
> > use scala.io.Source.fromPath. However, fromPath does not seem to exist
> > anymore.
>
> If the number of lines you want to see in each "stanza" is fixed, you can
> use getLines and grouped:
>
> val src = io.Source.fromFile("/usr/share/dict/words")
> src.*getLines*.*grouped(3)*.take(5).foreach(println)

√iktor Ҡlang

unread,
Apr 3, 2012, 6:05:49 PM4/3/12
to Russ P., scala-user
On Tue, Apr 3, 2012 at 11:55 PM, Russ P. <russ.p...@gmail.com> wrote:
Thanks, but that's way too complicated. I can do it trivially in
Python in a line or two. After screwing around for a while, I came up
with something like this:

       val data =
scala.io.Source.fromFile("myFile.dat").getLines.toArray

I can then access the lines separately as elements of an Array (or I
could have used a List). That's not too bad, but for crying out loud
shouldn't I be able to just write

       val file = scala.io.Source.fromFile("myFile.dat")

       file.getLine
       ...

val file = scala.io.Source.fromFile("myFile.dat")
val line0 = file.getLine(0)

?

Cheers,



--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Russ Paielli

unread,
Apr 3, 2012, 6:18:23 PM4/3/12
to √iktor Ҡlang, scala-user
That looks good to me ... so why is getLine deprecated?

Actually, I'd prefer a getLine that just reads the next line rather
than taking an index argument.

--Russ P.

> Typesafe <http://www.typesafe.com/> - The software stack for applications
> that scale
>
> Twitter: @viktorklang
>


--
http://RussP.us

√iktor Ҡlang

unread,
Apr 3, 2012, 6:25:26 PM4/3/12
to Russ Paielli, scala-user


2012/4/4 Russ Paielli <russ.p...@gmail.com>

That looks good to me ... so why is getLine deprecated?

Actually, I'd prefer a getLine that just reads the next line rather
than taking an index argument.

val s = new java.util.Scanner( new java.io.File("myFile.dat") )
s.nextLine()



--
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Lanny Ripple

unread,
Apr 3, 2012, 9:14:07 PM4/3/12
to scala-user
Scala is statically typed so what should .getLine return when the file
is out of lines?

.getLine is called `next` and blows up nicely when you try to go
beyond the last line. You can easily wrap it (see below).

$ echo 'hi
> thre' > zz
$ scala
...

scala> val f = scala.io.Source.fromFile("zz").getLines
f: Iterator[String] = non-empty iterator

scala> f.next
res0: String = hi

scala> f.next
res1: String = there

scala> f.hasNext
res2: Boolean = false

scala> f.next
java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$3.next(Iterator.scala:28)
at ...

scala> def pythonOpenForRead(path: String): Iterator[String] = new
Iterator[String] {
| val iter = scala.io.Source.fromFile(path).getLines
| def hasNext: Boolean = true
| def next: String = if (iter.hasNext == false) "" else iter.next
+ "\n"
| }
pythonOpenForRead: (path: String)Iterator[String]

scala> val f = pythonOpenForRead("zz")
f: Iterator[String] = non-empty iterator

scala> f.next
res3: String =
"hi
"

scala> f.next
res4: String =
"there
"

scala> f.next
res5: String = ""

scala> f.next
res6: String = ""

-ljr

PS - `pythonOpenForRead` is only provided as a party-trick. As the
docs (http://www.scala-lang.org/api/current/
index.html#scala.io.Source) suggest you should use .toIndexedSeq (or
your .toArray solution).

On Apr 3, 5:18 pm, Russ Paielli <russ.paie...@gmail.com> wrote:
> That looks good to me ... so why is getLine deprecated?
>
> Actually, I'd prefer a getLine that just reads the next line rather
> than taking an index argument.
>
> --Russ P.
>
> On 4/3/12, √iktor Ҡlang <viktor.kl...@gmail.com> wrote:

Daniel Sobral

unread,
Apr 3, 2012, 9:27:05 PM4/3/12
to Russ P., scala-user
On Tue, Apr 3, 2012 at 18:55, Russ P. <russ.p...@gmail.com> wrote:
> Thanks, but that's way too complicated. I can do it trivially in
> Python in a line or two. After screwing around for a while, I came up
> with something like this:
>
>        val data =
> scala.io.Source.fromFile("myFile.dat").getLines.toArray
>
> I can then access the lines separately as elements of an Array (or I
> could have used a List). That's not too bad, but for crying out loud
> shouldn't I be able to just write
>
>        val file = scala.io.Source.fromFile("myFile.dat")
>
>        file.getLine


val file = scala.io.Source.fromFile("myFile.dat")

val lines = file.getLines
val line = lines.next // remember, it's an Iterator!

--
Daniel C. Sobral

I travel to the future all the time.

Ido Tamir

unread,
Apr 5, 2012, 6:40:33 AM4/5/12
to scala...@googlegroups.com
whatever you do before, don't forget to close the source in a long running program.


val file = scala.io.Source.fromFile("myFile.dat")
val data = file.getLines.toArray
...
file.close


best,
ido

Rex Kerr

unread,
Apr 5, 2012, 6:50:46 AM4/5/12
to Ido Tamir, scala...@googlegroups.com
On Thu, Apr 5, 2012 at 6:40 AM, Ido Tamir <idom...@gmail.com> wrote:
whatever you do before, don't forget to close the source in a long running program.

Indeed.  Methods like the following are helpful in this regard:

  def cleanly[A,B](resource: => A)(cleanup: A => Unit)(code: A => B): Either[Exception,B] = {
    try {
      val r = resource
      try { Right(code(r)) } finally { cleanup(r) }
    }
    catch { case e: Exception => Left(e) }
  }

used like:

  val text = cleanly(io.Source.fromFile("myFile.dat"))(_.close)(_.getLines().toArray)

(One can write versions that allow the exception to escape or that wrap in Option instead, if one wants, of course.)

--Rex
 

Russ P.

unread,
Apr 5, 2012, 3:10:27 PM4/5/12
to scala-user
On Apr 5, 3:40 am, Ido Tamir <idomta...@gmail.com> wrote:
> whatever you do before, don't forget to close the source in a long running
> program.
>
> val file = scala.io.Source.fromFile("myFile.dat")
> val data = file.getLines.toArray
> ...
> file.close

I understand why output files need to be closed (to flush the buffer
to the file), but I don't understand why input files need to be closed
(assuming that they will only be read once). Am I missing something?

--Russ P.

Michael Swierczek

unread,
Apr 5, 2012, 3:29:12 PM4/5/12
to Russ P., scala-user

Normally the operating system gives a file handle to the program
reading the file, so it knows to prevent other programs from modifying
the file or deleting the file while it is still being read. By
closing the input file, your program signals to the operating system
that it is finished using the file and other programs can change it.

-Mike

Lanny Ripple

unread,
Apr 5, 2012, 9:17:07 PM4/5/12
to scala-user
Well I won't speak to MSWin but that's not the case in many OSs.
(Unix-based come to mind since that's what I work on.)

If you have a trivially small program then there is no reason other
than good hygiene to close your files. The system provides your
program a file descriptor to keep track of the file you are messing
with. Once you get a file descriptor and open it its yours. The file
it references can be deleted out from under you, changed, etc. (A
file on unix is a tree of disk blocks. Someone modifying the file
doesn't change your tree structure and a delete changes the inode the
filename is pointing at but again doesn't change your tree of blocks.)

The system provides only a limited number of open file descriptors any
program can have. Today you get a large number (try running `ulimit -
a' and a terminal prompt) of descriptors. Mine says 256. If I open
257 files in a program without closing any then I'll get an error.
[Well Scala's smarter than I am on this point. I tried variations on
List.fill(300){scala.io.Source.fromFile("xx")} and then mapping and
such over it to open them but Scala happily just gives me an answer.
I'm sure if I worked a bit harder I could make it happen but let's
leave it as "an exercise for the reader".] If I open 1M files one at
a time and close them before the next the system won't have any
problem.

So, you close your open resources when you are done with them to
prevent resource exhaustion.

-ljr

On Apr 5, 2:29 pm, Michael Swierczek <mike.swierc...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages