(Apologies in advance if this is not the right mailing list for this, it wasn't too obvious if this should go here or scala-user)
Over the last few weeks I have been doing some digging on poor performance of String concatenations. The java.lang.StringBuilder (and thus scala.collection.mutable.StringBuilder) is more or less terrible at least when it comes to initial capacity (16) and resizing logic, which creates a lot of excess garbage and array copies in regular usage.
However, it turns out there is an very important but underdocumented jvm optimization, OptimizeStringConcat, which is enabled by default. This optimization, which is pretty much only documented by its source code (
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/opto/stringopts.cpp), is capable of turning something of approximately the form
new StringBuilder().append(x).append(y)....append(z).toString() (aka about what javac would generate if you use
x + y + ... + z) into a single
char[] allocation of the optimal length, which is a big performance boost over what the normal java code does (2
char[] allocations at the absolute minimum if the initial size is correct, normally more)
The issue is that this same optimization doesn't work for the identical scala code. Why? well I wish I could tell you I understood enough of the C++ to give you an exact answer, but I can only approximate. To my understanding, the issue is that analysis is only done at a function-level scope of where the java.lang.StringBuilder.toString happens, which is in scala.collection.mutable.StringBuilder.toString. All of the other bits (constructor and append calls) occur elsewhere, so it more or less gives up (from looking at the debug output from -XX:+PrintOptimizeStringConcat in a debug jdk as well as adding my own debugging output in a few places in that while).
Its hard to properly quantify the performance improvement of this, particulary wrt to fewer allocations. I attempted anyways by writing a benchmark that appends a suffix to a String holding a random number millions of times with plenty of warmup in a jvm with a very small heap (32mb) to better see the effects of less GC, and the unscientific results look something like this: (sorry for the odd formatting)
| Optimization Enabled | Optimization Disabled |
Java | 5.9s | 15.2s |
Scala | 16.5s | 16.5s |
Scala w/ Java SB | 6.1s | 16.0s |
(Note that here Optimization Enabled means -XX:+OptimizeStringConcat, and Optimization Disabled means -XX:-OptimizeStringConcat)
I'm not quite sure what the right follow up is here. This is a sizeable performance regression from java, and enough to lead us to want to replace +s with calls to java.lang.StringBuilder in some hot parts of code. This seems potentially bug-worthy, but I'm not sure how it can ever be resolved short of scalac discontinuing the use of scala.collection.mutable.StringBuilder for +s. (The C++ more or less looks for java.lang.StringBuilder / StringBuffer, and will only attempt to optimize those 2 classes)
Thoughts?
-Jackson