scala's StringBuilder and -XX:+OptimizeStringConcat

290 views
Skip to first unread message

Jackson Davis

unread,
Aug 11, 2015, 12:27:33 PM8/11/15
to scala-language

(Apologies in advance if this is not the right mailing list for this, it wasn't too obvious if this should go here or scala-user)

Over the last few weeks I have been doing some digging on poor performance of String concatenations. The java.lang.StringBuilder (and thus scala.collection.mutable.StringBuilder) is more or less terrible at least when it comes to initial capacity (16) and resizing logic, which creates a lot of excess garbage and array copies in regular usage.
However, it turns out there is an very important but underdocumented jvm optimization, OptimizeStringConcat, which is enabled by default. This optimization, which is pretty much only documented by its source code (http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/opto/stringopts.cpp), is capable of turning something of approximately the form new StringBuilder().append(x).append(y)....append(z).toString() (aka about what javac would generate if you use x + y + ... + z) into a single char[] allocation of the optimal length, which is a big performance boost over what the normal java code does (2 char[] allocations at the absolute minimum if the initial size is correct, normally more)
The issue is that this same optimization doesn't work for the identical scala code. Why? well I wish I could tell you I understood enough of the C++ to give you an exact answer, but I can only approximate. To my understanding, the issue is that analysis is only done at a function-level scope of where the java.lang.StringBuilder.toString happens, which is in scala.collection.mutable.StringBuilder.toString. All of the other bits (constructor and append calls) occur elsewhere, so it more or less gives up (from looking at the debug output from -XX:+PrintOptimizeStringConcat in a debug jdk as well as adding my own debugging output in a few places in that while).
Its hard to properly quantify the performance improvement of this, particulary wrt to fewer allocations. I attempted anyways by writing a benchmark that appends a suffix to a String holding a random number millions of times with plenty of warmup in a jvm with a very small heap (32mb) to better see the effects of less GC, and the unscientific results look something like this:  (sorry for the odd formatting)
 

Optimization Enabled

Optimization Disabled

Java

5.9s

15.2s

Scala

16.5s

16.5s

Scala w/

Java SB

6.1s

16.0s

 
(Note that here Optimization Enabled means -XX:+OptimizeStringConcat, and Optimization Disabled means -XX:-OptimizeStringConcat)

I'm not quite sure what the right follow up is here. This is a sizeable performance regression from java, and enough to lead us to want to replace +s with calls to java.lang.StringBuilder in some hot parts of code. This seems potentially bug-worthy, but I'm not sure how it can ever be resolved short of scalac discontinuing the use of scala.collection.mutable.StringBuilder for +s. (The C++ more or less looks for java.lang.StringBuilder / StringBuffer, and will only attempt to optimize those 2 classes)

Thoughts?
-Jackson


Som Snytt

unread,
Aug 11, 2015, 8:34:01 PM8/11/15
to scala-l...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "scala-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-languag...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jackson Davis

unread,
Aug 12, 2015, 1:31:00 PM8/12/15
to scala-language
Ah, I missed that when I searched. Hope someone looks into it eventually :)

Thanks,
-Jackson

ijuma

unread,
Aug 12, 2015, 6:46:02 PM8/12/15
to scala-language
On Wednesday, 12 August 2015 18:31:00 UTC+1, Jackson Davis wrote:
Ah, I missed that when I searched. Hope someone looks into it eventually :)

Yes, the improvement seems more beneficial given your findings.

Ismael

Simon Ochsenreither

unread,
Aug 13, 2015, 12:08:42 PM8/13/15
to scala-language
Hi Jackson,

I will look into it next week.

Thanks,

Simon

Eric Richardson

unread,
Nov 19, 2015, 6:24:44 PM11/19/15
to scala-language
Hi Simon,

Are you aware of this?


Eric

Simon Ochsenreither

unread,
Nov 19, 2015, 9:53:28 PM11/19/15
to scala-language
No, thanks for the hint! If this ships, we can adapt our code accordingly.

Eric Richardson

unread,
Nov 20, 2015, 7:37:31 PM11/20/15
to scala-language
Sorry, GA is not scheduled to late next year but I thought it would be good to keep in mind - http://openjdk.java.net/projects/jdk9/
Reply all
Reply to author
Forward
0 new messages