write method benchmarking

38 views
Skip to first unread message

marten luter

unread,
Jun 1, 2013, 11:44:01 AM6/1/13
to scala-i...@googlegroups.com
I have made some benchmarking tests on writing 2000 bytes array and surprised with results

[info]      benchmark       us linear runtime
[info]      _writeIO     58.8 =
[info]  _writeIOBulk 191834.4 =========
[info] _writeIOBulk2 604051.9 ==============================
[info]  _writeIOJava     99.9 =

I am making some calculation on SA array and write the transformation to file "fname" acording following methods. 
But why _writeIOBulk2 is slower as _writeIOBulk and _writeIOJava is 2x slower as _writeIO i cant understand . And anyway what is the fastest and cleanest way to make this operation :
acording some Iterated object write anther Iterated object in file .

  var SA = new Array[Int](n)
  var chr = new Array[Byte](n)

  def writeIOJava(fname:String):Unit = {
    var output = new java.io.FileOutputStream(fname)
    val bwt= new Array[Byte](n)
    for ( i<- 0 until n) {
      val pIdx = SA(i)-1
      bwt(i) = chr(if (pIdx >= 0) pIdx else pIdx+n ).toByte
    }
    output.write(bwt)
    output.close()
  }
  def writeIO(fname:String):Unit = {
    val output:Output = Resource.fromFile(fname)
    val bwt= new Array[Byte](n)
    for ( i<- 0 until n) {
      val pIdx = SA(i)-1
      bwt(i) = chr(if (pIdx >= 0) pIdx else pIdx+n ).toByte
    }
    output.write(bwt)
  }
  def writeIOBulk(fname:String):Unit = {
    val output:Output = Resource.fromFile(fname)
    //val bwt= new Array[Byte](n)
    for ( i<- 0 until n) {
      val pIdx = SA(i)-1
      output.write(chr(if (pIdx >= 0) pIdx else pIdx+n ).toByte)
    }
  }
  def writeIOBulk2(fname:String):Unit = {
    val output = Path.fromString(fname).outputStream(WriteTruncate:_*)
    //val bwt= new Array[Byte](n)
    for ( i<- 0 until n) {
      val pIdx = SA(i)-1
      output.write(chr(if (pIdx >= 0) pIdx else pIdx+n ).toByte)
    }
  }

Jesse Eichar

unread,
Jun 4, 2013, 12:36:52 AM6/4/13
to scala-i...@googlegroups.com
Hi Marten,

Thanks for this report.  I am going to make a bug report about this to look into it in more detail. I am looking at my performance page and I see that there are no tests for individual small writes so I probably need to add those.  

When benchmarking scala and Scala-IO in particular you have to be extremely careful because scala's autobox will kill you in very surprising ways.  I know this because I spent a decent amount of time writing performance tests for scala-io.

THe writeIO is going to be fast of course because it is just writing a bunch of bytes in an array to a file.  It should be virtually identical to your writeIOJAVA except a FileChannel is used.  THere is also some logic for determining how to open the FIleChannel so that reads and writes are maximally efficient.

I have to express some surprise to your other writes, although I found that when I wrote my tests I got similar results at first until I looked did some profiling to see where the slowdowns were.  (for reading at least).  

I can see a few potential issues:

1.  I usually write to the underlying stream with a ByteBuffer, so maybe I am allocating one for each write, which would be very expensive
2.  There is the OutputConverter class that is responsible for writing and it might be a little stupid regarding raw datatypes.

You wrote the tests pretty intelligently in that you tried to get the same for loop and boxing for each test so I don't see too many cases where autoboxing should be an issue.

Thanks again.

Jesse



--
 
---
You received this message because you are subscribed to the Google Groups "Scala Incubator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-incubat...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jesse Eichar

unread,
Jun 4, 2013, 1:02:29 AM6/4/13
to scala-i...@googlegroups.com
Oh,  One this I realized in your examples is that when you write in bulk you are truncating the file then writing the byte.  Over and over again.  That could be another source of the slowdown.

marten luter

unread,
Jun 4, 2013, 7:00:06 AM6/4/13
to scala-i...@googlegroups.com
Yes i's error in function Naming - bulk writes should be named "NON BULK "
Reply all
Reply to author
Forward
0 new messages