ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

206 views
Skip to first unread message

Alexandru Nedelcu

unread,
Dec 27, 2016, 12:01:19 PM12/27/16
to scala-i...@googlegroups.com

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?


--
Alexandru Nedelcu

Viktor Klang

unread,
Dec 27, 2016, 12:23:12 PM12/27/16
to scala-i...@googlegroups.com
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Zaugg

unread,
Dec 27, 2016, 6:13:02 PM12/27/16
to scala-i...@googlegroups.com
I suspect you are seeing the same degradation in benchmarks that was reported by the Akka team recently and is being investigated under https://issues.scala-lang.org/browse/SI-10083

The version of FJ in 2.11 used busy waiting more aggressively which helped out on benchmarks with more cores than tasks. However, this comes at the expense of other workloads on the process/machine, especially because the JVM doesn't let the the busy waiting signal its intention to the CPU with a Spin Loop hint. This tradeoff was found during testing of the FJ as it was integrated into parallel j.u.stream.Streams, and is discussed in JDK-8080623.

To restore the performance of the benchmark, you would need to implement your own ExecutionContext in terms of the jsr166 backport of ForkJoin. This would be a useful library to make available for others to use.

-jason

To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.

Viktor Klang

unread,
Dec 27, 2016, 6:39:23 PM12/27/16
to scala-i...@googlegroups.com
Thanks Jason!

Using -Xbootclasspath/p:jsr166.jar is a non-intrusive (code wise) fix for Scala 2.12—from a user PoV.

Worth mentioning is that the reason Scala used to ship with an embeddd version was due to us collaborating with Doug to create the more scalable version of FJ which then was introduced in Java 8, but at the time, with Java 7, it had a single external task submission queue which created a major bottleneck, so we couldn't use the Java 7 FJ. However, embedding the JSR166 code made it a burden to maintain—not to mention the issue of having duplication and incompatibility across ManagedBlocker, ForkJoinTask etc between java.util.concurrent* and scala.concurrent.forkjoin.*.

The more fair behavior of the Java 8 FJ implementation incidentally makes sense for scala.concurrent.ExecutionContext.global since it is a global pool it should most definitely not create fairness issues.


On Wed, Dec 28, 2016 at 12:12 AM, Jason Zaugg <jza...@gmail.com> wrote:
I suspect you are seeing the same degradation in benchmarks that was reported by the Akka team recently and is being investigated under https://issues.scala-lang.org/browse/SI-10083

The version of FJ in 2.11 used busy waiting more aggressively which helped out on benchmarks with more cores than tasks. However, this comes at the expense of other workloads on the process/machine, especially because the JVM doesn't let the the busy waiting signal its intention to the CPU with a Spin Loop hint. This tradeoff was found during testing of the FJ as it was integrated into parallel j.u.stream.Streams, and is discussed in JDK-8080623.

To restore the performance of the benchmark, you would need to implement your own ExecutionContext in terms of the jsr166 backport of ForkJoin. This would be a useful library to make available for others to use.

-jason
On Wed, 28 Dec 2016 at 03:23 Viktor Klang <viktor...@gmail.com> wrote:
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,
On Dec 27, 2016 6:01 PM, "Alexandru Nedelcu" <gro...@alexn.org> wrote:

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?


--
Alexandru Nedelcu

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Cheers,

Alexandru Nedelcu

unread,
Dec 28, 2016, 3:10:49 AM12/28/16
to scala-i...@googlegroups.com
OK, it makes sense.

Thanks Victor and Jason for these details.

--
Alexandru Nedelcu

To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.

Alexandru Nedelcu

unread,
Dec 30, 2016, 5:47:50 PM12/30/16
to scala-i...@googlegroups.com
FYI, I just published those files in a project to be available for my own immediate purposes, but it's synchronizing to Maven Central, so reusable:


Cheers,

--
Alexandru Nedelcu

Reply all
Reply to author
Forward
0 new messages