Parallelism in Scala 2.12: parallel collections vs. Java 8 Streams

Rex Kerr

unread,

Jan 11, 2015, 5:57:20 PM1/11/15

to scala-i...@googlegroups.com

Java Streams, new in Java 8, implement parallelism by requesting that collections create a "Spliterator", which is like a regular iterator except that it can split itself into disjoint sub-iterators upon request. To run things in parallel, the Streams code repeatedly requests a split on the spliterator until it has enough sources of data to fill its thread pool.

Scala collections could hook into this mechanism easily enough as long as they can create a Spliterator for themselves. I tried this out by adapting scala.collection.immutable.Vector's VectorIterator; the resulting VectorSpliterator has nearly identical logic except on the split step (where it needs to update start/end index points). To test how useful this was, I compared a Java Stream wrapper around said Spliterator against a scala.collection.parallel.immutable.ParVector.

I had three test cases:

(1) Tiny Vector (4 elements) with a cheap operation. No reason to parallelize this, but it's good for measuring overhead.

(2) Not so tiny Vector (1024 elements) with a cheap operation. Still mostly dominated by overhead of parallelizing, but not completely.

(3) Tiny Vector (4 elements) with an expensive operation.

In all cases the operation was side-effecting (i.e. that's how it saved the output) and ran with foreach (ParVector) or forEach(j.u.s.Stream). I left the defaults in place for degree of parallelism.

In every case, the Spliterator outperformed the ParVector one:

(1) Spliterator 2.5x faster

(2) Spliterator 3x faster

(3) Spliterator 20% faster

Carefully-deployed scala.concurrent.Future (with a final map Await.ready step) was faster yet on light-workload cases, but only modestly so (maybe 20%).

If other people have been using Spliterators to push Scala collections into Java 8 Streams, does this match what you've been seeing?

Regardless, 2.12 is supposed to have better support for Java 8 Streams, but what form ought that support take? One could write a few generic Spliterator wrappers for different types of collection (Set, Map, Seq), or have every collection create its own optimized Spliterator, and/or do all that but have an implementation of Scala collections operations on top of Java Streams so you could collect etc. in parallel using the Java framework.

(TBD: how to handle concurrent modification with non-thread-safe Scala mutable collections where there's often no way for the spliterator to detect that things are mutating out from under it.)

Specifically--since more is usually better than less, but prioritizing is important!--which of these things would you really like to use, as opposed to it just being kinda reassuring that they're there in case you need them?

--Rex

Dmitry Petrashko

unread,

Jan 12, 2015, 3:40:13 PM1/12/15

to scala-i...@googlegroups.com

Hi Rex,
can you please share your benchmarks?

It would be interesting to augment them with ScalaBlitz, that uses it's own spliterators+kernels.
I've been comparing ScalaBlitz with standard Scala Parallel Collections and I see substantial speedups(100x) on primitive-based tests with cheap operations, and substantial, but smaller(~15x) on operations with boxed elements(eg value classes).

My experience tells that scala parallel collections create enormous overhead for small operations(eg array.par.sum), so I'm interested to see which operations did you measure. I can hardly make dirrect comparison as ScalaBlitz doesn't implement vectors, but the numbers that I get for ranges, arrays, (mutable hashmaps&hashsets) and immutable maps&sets backed by hash tries suggest that Scala Parallel Collections have ennormous overhead.

Speaking of what could be improved, there's a simple week spot: In case time required to perform action is different for different elements, Scala Parallel Collections could be arbitary bad, as it's scheduler is primitive and does fixed size blocking in advance. I believe that the state of art here is workstealing tree scheduling, that Aleksandar Prokopec implemented in ScalaBlitz.

Acutally, using WS-tree should help to substantially lower overhead of orchestrating small operations, and also solve a problem with non-uniform workload.

Is it part of contract that it should? I would say no, if user wants a snapshot he should do something explicitly about it or use datastructure that supports this(ctrie for example).

Specifically--since more is usually better than less, but prioritizing is important!--which of these things would you really like to use, as opposed to it just being kinda reassuring that they're there in case you need them?

--Rex

Cheers,
Dmitry

Rex Kerr

unread,

Jan 12, 2015, 3:53:05 PM1/12/15

to scala-i...@googlegroups.com

On Mon, Jan 12, 2015 at 12:40 PM, Dmitry Petrashko <darkd...@gmail.com> wrote:

Hi Rex,
can you please share your benchmarks?

They were all typed live into a REPL session, so not really at this point. I just wanted to get a rough idea of whether Java 8 Streams had reasonable scheduling and overhead compared to scala.collection.parallel.

As soon as I write something more reproducible, I'll share it.

It would be interesting to augment them with ScalaBlitz, that uses it's own spliterators+kernels.

It would! I was planning on getting to that eventually.

My experience tells that scala parallel collections create enormous overhead for small operations(eg array.par.sum), so I'm interested to see which operations did you measure.

So far, just updating of a mutable field in a simple object.

I can hardly make dirrect comparison as ScalaBlitz doesn't implement vectors, but the numbers that I get for ranges, arrays, (mutable hashmaps&hashsets) and immutable maps&sets backed by hash tries suggest that Scala Parallel Collections have ennormous overhead.

Overhead is around 15-20 microseconds on my machine. Futures are usually down around 4-5. I'm working on a mutable HashMap spliterator, so I'll have some benchmarks for that soon.

I believe that the state of art here is workstealing tree scheduling, that Aleksandar Prokopec implemented in ScalaBlitz.

Acutally, using WS-tree should help to substantially lower overhead of orchestrating small operations, and also solve a problem with non-uniform workload.

Indeed.

On Sunday, 11 January 2015 23:57:20 UTC+1, Rex Kerr wrote:

(TBD: how to handle concurrent modification with non-thread-safe Scala mutable collections where there's often no way for the spliterator to detect that things are mutating out from under it.)
Is it part of contract that it should? I would say no, if user wants a snapshot he should do something explicitly about it or use datastructure that supports this(ctrie for example).

It is still nice to notice when the user is doing something they shouldn't instead of producing arbitrarily bad behavior and unexpectedly wrong results. There's a whole hierarchy of levels of niceness described on the Spliterator javadocs page (which I basically agree with).

The answer might be, "You're totally on your own there," but I want to evaluate if there's anything we could do.

--Rex

Reply all

Reply to author

Forward