I am working with a JSON tokenizer and comparing an algorithm in Scala with a very similar approach in Java. The Scala version is a full 5x slower.This particular algorithm mostly works with large quantities of primitive types char, int, and byte. I read in Scala these are objects where Java uses unwrapped primitives. Would this account for a significant difference in speed, i.e. is Scala spending time creating lots of wrapper objects for these primitives?
--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
2015-02-18 1:30 GMT+01:00 Greg Zoller <gzo...@gmail.com>:
Profiling was a huge help. Cut time solidly in half after finding some things like calling functions passing large arrays. Got my performance to on-par with Jackson's parser. Most of the remaining time according to the profiler seems to be accessing the large arrays by index, i.e. myArr(i). There's something in the profiler about time spent somewhere in mutable Array in Scala (not using mutable Arrays, directly anyway). This might be a minor bit of wrapping to pimp the java Arrays with the Scala Arrays, but that may be what's costing me a bit.The amazing/depressing thing is I was trying to get close to a tokenizer designed by jenkov in Java. Where mine and Jackson's would tokenize a 5m file in about 75ms, jenkov's consistently parsed less than half that. Truly an impressive piece of work.
This may be a bit more work, but if you’re after primitive performance, the miniboxing plugin can provide some useful warnings:
$ git clone https://github.com/miniboxing/miniboxing-example.git
Cloning into 'miniboxing-example'...
$ cd miniboxing-example/
$ sed -i 's/Nil/"-P:minibox:warn-all"::Nil/' project/Build.scala # get it to display warnings for the library as well
$ sbt console
...
scala> 3::Nil
<console>:8: warning: The method scala.collection.immutable.List.:: would benefit from miniboxing type parameter B, since it is instantiated by a primitive type.
3::Nil
^
res0: List[Int] = List(3)
Miniboxing is something like @specialized but more efficient and user-friendly, and you’ll get these warnings whenever the code is going to box.
The website for the plugin is scala-miniboxing.org and there’s a presentation there explaining these warnings.
HTH,
Vlad
I think I'd have to rebuild Scala itself with the miniboxed plugin to see how much improvement I'd get. I do believe this is probably where the last bit of "lost" performance is (vs Java).
Primitives are currently a "hole" in Scala performance and while I understand why we don't want them in the language, the minibox adaptation is a great optimization.
2015-02-18 9:14 GMT+01:00 Sonnenschein <peter...@arcor.de>:
But collections must box alike. So what are the concrete collections you are using on Java respectively Scala side?
Btw, you can use special collections or just primitive arrays to deal with primitive types both in Java and Scala.
When using generic Array[T], both element access and update cause boxing:
$ cat array.scala
import scala.reflect.ClassTag
class Test[T: ClassTag](t: T) {
val array = new Array[T](3)
array(0)
array(0) = t
}
$ scalac array.scala -Xprint:erasure
[[syntax trees at end of erasure]] // array.scala
package <empty> {
class Test extends Object {
...
runtime.this.ScalaRunTime.array_apply(Test.this.array(), 0);
runtime.this.ScalaRunTime.array_update(Test.this.array(), 0, Test.this.t)
}
}
And you can see array_apply and array_update in the Scala repo: they return and receive Any, which is erased to Object.
Not going into why this is done, but it’s the only correct way to do it.
Vlad
--
i think/hope he means that if you generically create an array (Array[T]), every element has to be boxed, but if you create a primitive array directly (Array[Int]), it does not have to happen.
Array[T], you need a ClassTag[T], which allows the compiler to create the correct unboxed array type (int[], long[], …). So values are stored unboxed in the array. Yet, when you access this array as a generic Array[T], the method doing the access (array_apply/array_update in ScalaRuntime) will return/accept a boxed value. For example:
def firstElement[T](arr: Array[T]): T = arr(0) // bytecode: ScalaRuntime.array_apply(arr, 0), which returns an Object
val array = new Array[Int](100) // bytecode: classTag[Int].newArray(...), which creates an int[]
val x: Int = array(0) // bytecode: access using iaload bytecode instruction, returning an int
val y: Int = firstElement(array) // bytecode: calling firstElement, which returns an object
HTH,
Vlad
To summarize:
Array[T] requires a ClassTag[T] in order to instantiate the unboxed variant of the array (e.g. int[], long[])Array[Int]) is done without boxing via bytecode instructions (e.g. iaload, laload etc)Array[T]) is done by calling ScalaRuntime, which pays for the dispatch and for boxing the result (or unboxing the argument)These are the three golden rules of arrays. Then, collections always take the 3rd path, since they’re not specialized.
I hope this clarifies what I meant…
Vlad
--
2015-02-18 11:52 GMT+01:00 Sonnenschein <peter...@arcor.de>:
No, an array of primitive types (e.g.Array[Int]) is always represented as Java’s unboxed array (e.g.int[]) and reads/updates are on par with Java operations.
However, if you do:
def firstElement[T](arr: Array[T]): T = arr(0)
the body of firstElement will go through the generic dispatch in ScalaRuntime, since unboxed arrays (e.g.int[],long[], etc) have no common ancestor other than Object, which does not expose methods for reading/updating.
By the way, all operations mixing collections and arrays are generic, so they go through ScalaRuntime. :(
Vlad
--
--
It's atrociously slow in my tests also. I'm not sure why. The C++ source doesn't contain any clues, but I haven't found source that precisely matches up with what I expect to see. Maybe something to do with clobbering registers? If you have to push and pop the full set of registers on the stack for each access (like you do with JNI) then you could possibly end up with the 70 ns/call that you see. It shouldn't be that bad on average; 5 branches cached in L2 should be ~20 ns if branch prediction is wrong and less if prediction is right.Also, with non-native code the JVM might unroll the loop and notice that there's a pattern. What happens if you permute the array?