Java 8 Stream support in Scala

505 views
Skip to first unread message

Rex Kerr

unread,
Feb 24, 2015, 5:57:57 AM2/24/15
to scala-i...@googlegroups.com
I am currently envisioning Scala's Java 8 stream support as follows (for 2.12, most likely):

(1) Every Scala sequential collection will have two extension methods:
  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)
  Note: in 2.13 they could potentially be moved from extension methods to integrated methods if there is clear value in doing so.  (I'd rather move more things out, honestly, but we should evaluate the option then.)

(2) Java streams will have one extension method, called either extended or asScala, which wraps them in that same value class.  (I favor extended because it won't _really_ be asScala, at least in the usual sense.)

(3) The value class will contain an asJava method to switch back to the plain unwrapped stream, but otherwise will present all the familiar Scala methods that you expect to find on a TraversableOnce.  I do not think I'll be able to make the value class fit into the hierarchy, but you'll have your full set of toIterator etc. methods if you really want to go fully into Scala-land.  I plan on implementing even really inefficient methods, but these will be gated by an implicit that you will have to import separately.

(4) There will be a custom parallel builder optimized for maximum performance at collecting the results of Java streams; you can use it with collect (or build off of the value class) to either just grab all the elements or, optionally, to create a new collection with a Scala builder which will be used sequentially.  I am not yet sure whether the builder itself can be in the usual collections hierarchy without sacrificing too much performance at its primary task, but if not there will be a wrapper class that is.

(5) I am trying to make the value class actually a value class and not a wrapper class, but I would also like specialization to handle the annoying distinctions between the different specialized Java stream types.  I have yet to come up with a scheme that can use a value class and get full benefit of specialization (nor have I fully worked out how to make specialization work).  I think if I can only have one, transparent specialization is the more important.  (But I am not yet positive I can get that to work well enough even alone.)

Comments are welcome.  My goal is to make the Java 8 compatibility layer in Scala the best place to use Java 8 Streams (better than Java), and to make it as easy and performant as possible to switch back and forth between Scala collections and Java 8 streams.

  --Rex

Sébastien Doeraene

unread,
Feb 24, 2015, 6:18:32 AM2/24/15
to scala-internals
Hi,

Here are a couple concerns for Scala.js.

Please use different methods to differentiate sequential from parallel, instead of a boolean parameter. A boolean parameter is a very bad thing for Scala.js, because it cannot decide to eliminate the parallel version. The result will be that the program will not link, because there is no parallelism in Scala.js. With different methods, the reachability analysis of Scala.js can detect that you never call the parallel method, and hence that part of the codebase can be completely ignored, which allows the whole thing to link. (Even better, the parallel versions would be an implicit pimp, in a different artifact.)

With different methods, there's no need for a default. If there must be a default, it should either be sequential, or somehow Scala.js should be able to make it sequential. I am not even sure parallel is the right default on the JVM anyway, but that is out of my area of expertise.

Cheers,
Sébastien



--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Ochsenreither

unread,
Feb 24, 2015, 7:41:58 AM2/24/15
to scala-i...@googlegroups.com

  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)

What's the reason for going with the Java naming in one case, but not in the other? That seems inconsistent.
(Parallel by default also seems like a questionable choice.)
What's the reasoning behind that wrapping?
 
(2) Java streams will have one extension method, called either extended or asScala, which wraps them in that same value class.  (I favor extended because it won't _really_ be asScala, at least in the usual sense.)

"extended" ... feels like a very non-descriptive name to me.
 
(3) The value class will contain an asJava method to switch back to the plain unwrapped stream, but otherwise will present all the familiar Scala methods that you expect to find on a TraversableOnce.  I do not think I'll be able to make the value class fit into the hierarchy, but you'll have your full set of toIterator etc. methods if you really want to go fully into Scala-land.  I plan on implementing even really inefficient methods, but these will be gated by an implicit that you will have to import separately.

What will be the semantics behind this? What's the use-case? Having a Java collection and thinking "these transformations are too fast, let's make them a magnitude slower by converting them to Scala operations" ... (half joking, half serious).
 
Comments are welcome.  My goal is to make the Java 8 compatibility layer in Scala the best place to use Java 8 Streams (better than Java), and to make it as easy and performant as possible to switch back and forth between Scala collections and Java 8 streams.

Don't Scala collections and Java collections serve different use-cases? Scala collections are pleasant to use and Java collections are fast. Hosting one on top of the other will most likely only combine the drawbacks, not the advantages.

Rex Kerr

unread,
Feb 24, 2015, 12:38:53 PM2/24/15
to scala-i...@googlegroups.com
On Tue, Feb 24, 2015 at 3:18 AM, Sébastien Doeraene <sjrdo...@gmail.com> wrote:
Hi,

Here are a couple concerns for Scala.js.

Please use different methods to differentiate sequential from parallel, instead of a boolean parameter. A boolean parameter is a very bad thing for Scala.js, because it cannot decide to eliminate the parallel version.

I had thought exactly the opposite!  Nothing in the Java interface promises any sort of parallelism; you're always entitled to run sequentially.  Isn't it easier for Scala.js to simply ignore the status of the flag and always run the sequential version?  (Not that forwarding pstreamed to streamed would be all that hard for anyone.)
 
The result will be that the program will not link, because there is no parallelism in Scala.js. With different methods, the reachability analysis of Scala.js can detect that you never call the parallel method, and hence that part of the codebase can be completely ignored, which allows the whole thing to link. (Even better, the parallel versions would be an implicit pimp, in a different artifact.)

I could do that, but I don't think it's necessary because the parallelism is handled _entirely_ by the Java library, for which Scala.js would need "only" supply a sequential version.  Also, couldn't you just elect to not support any of the Java 8 Streams stuff at all?  It _is_ specifically for Java iterop.
 

With different methods, there's no need for a default. If there must be a default, it should either be sequential, or somehow Scala.js should be able to make it sequential. I am not even sure parallel is the right default on the JVM anyway, but that is out of my area of expertise.

I am not sure either, but the reasoning is as follows:

In Scala, iterators are as fast as sequential Java 8 streams for non-primitives (or primitives which must be boxed in Java 8), and are strictly more powerful.  So the primary use case is the parallel one.

  --Rex

Sébastien Doeraene

unread,
Feb 24, 2015, 12:47:08 PM2/24/15
to scala-internals
Hi,

Oh, I didn't notice the Java API did not promise parallelism. In that case, I guess it doesn't matter to Scala.js, indeed. We'll just always return a sequential stream when we implement the underlying Java stream.

We could indeed choose not to support Java 8 Streams. If it is indeed only used for Java interop, that's what we'll do. (For example, we don't support Java collections.) Maybe I was misled in all the fuss about Java 8 Streams, that suddenly everyone was going to start using them, including some important parts of Scala. If that's not the case, I'm happy not to support them. That's less work for me.

Thanks for your answer :-)

Cheers,
Sébastien

Rex Kerr

unread,
Feb 24, 2015, 12:49:34 PM2/24/15
to scala-i...@googlegroups.com
On Tue, Feb 24, 2015 at 4:41 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)

What's the reason for going with the Java naming in one case, but not in the other?

In one case you get a plain java Spliterator.  In the other case, you don't.  Hence the different naming.
 
That seems inconsistent.
(Parallel by default also seems like a questionable choice.)

As I mentioned in my response to Sebastien, parallel seems like the primary use case from within Scala.
 
What's the reasoning behind that wrapping?

To enable papering over the difference between the various manually specialized implementations.  I'm not 100% sure it will work, but it would be much nicer to just write the method that you want and have it be fast than to remember that you'd better use mapToInt or your performance will go down by 10x.
 
 
(2) Java streams will have one extension method, called either extended or asScala, which wraps them in that same value class.  (I favor extended because it won't _really_ be asScala, at least in the usual sense.)

"extended" ... feels like a very non-descriptive name to me.

Agreed, that's why I am not sure about it.  Do you have any other ideas?
 
 
(3) The value class will contain an asJava method to switch back to the plain unwrapped stream, but otherwise will present all the familiar Scala methods that you expect to find on a TraversableOnce.  I do not think I'll be able to make the value class fit into the hierarchy, but you'll have your full set of toIterator etc. methods if you really want to go fully into Scala-land.  I plan on implementing even really inefficient methods, but these will be gated by an implicit that you will have to import separately.

What will be the semantics behind this? What's the use-case? Having a Java collection and thinking "these transformations are too fast, let's make them a magnitude slower by converting them to Scala operations" ... (half joking, half serious).

The use case is convenience.  Maybe it's really important to you to do a reverse.  You could do a .build(Vector.newBuilder) (or maybe a .to[Vector] if I can get inference and type classes to work well enough) followed by a .reverse followed by a .streamed again, but if you are sure you don't really care about the speed and just need to interop with Java code, you can use an import and unlock all the slow operations.  Also, you don't have to worry about the different details of what Java and Scala methods are called and which arguments they expect; you just write in Scala style and it works.  (I haven't benchmarked enough cases to tell how often an extra forwarding will mess up performance; preliminary tests seemed encouraging, but this could also sink the idea.)
 
 
Comments are welcome.  My goal is to make the Java 8 compatibility layer in Scala the best place to use Java 8 Streams (better than Java), and to make it as easy and performant as possible to switch back and forth between Scala collections and Java 8 streams.

Don't Scala collections and Java collections serve different use-cases? Scala collections are pleasant to use and Java collections are fast. Hosting one on top of the other will most likely only combine the drawbacks, not the advantages.

Java collections are only fast in parallel or when specialized.  Otherwise, the streams are no faster than and sometimes (because of the awkwardness of fully removing closures in tryAdvance) noticeably slower than Iterator while having only a small fraction of the capability.

But one can easily build a good fraction of the Scala capabilities on top of the Java ones and retain the parallelism advantage.

Still not 100% sure about making manual specialization look automatic, though.  If it ends up being unworkable, I'll probably favor different decisions.

  --Rex
 

Scott Carey

unread,
Feb 24, 2015, 2:51:50 PM2/24/15
to scala-i...@googlegroups.com


On Tuesday, February 24, 2015 at 9:49:34 AM UTC-8, Rex Kerr wrote:


On Tue, Feb 24, 2015 at 4:41 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)

What's the reason for going with the Java naming in one case, but not in the other?

In one case you get a plain java Spliterator.  In the other case, you don't.  Hence the different naming.
 
That seems inconsistent.
(Parallel by default also seems like a questionable choice.)

As I mentioned in my response to Sebastien, parallel seems like the primary use case from within Scala.

Parallel is atrociously slower for most use cases I've run into.   You need a rather large collection for parallel to be worth it.  Most collections are small.  I feel that parallelism for java streams should be the same for java and scala -- not parallel by default, and explicitly requested when desired.  Deviation from what the defaults are between the two will only add extra information that a developer must keep in their head to use them.   Getting a stream from a java collection and one from a scala collection should be as similar as possible, in my opinion.

Rex Kerr

unread,
Feb 24, 2015, 4:47:26 PM2/24/15
to scala-i...@googlegroups.com
On Tue, Feb 24, 2015 at 11:51 AM, Scott Carey <scott...@gmail.com> wrote:


On Tuesday, February 24, 2015 at 9:49:34 AM UTC-8, Rex Kerr wrote:


On Tue, Feb 24, 2015 at 4:41 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)

What's the reason for going with the Java naming in one case, but not in the other?

In one case you get a plain java Spliterator.  In the other case, you don't.  Hence the different naming.
 
That seems inconsistent.
(Parallel by default also seems like a questionable choice.)

As I mentioned in my response to Sebastien, parallel seems like the primary use case from within Scala.

Parallel is atrociously slower for most use cases I've run into.

Well, it's atrociously slower than sequential unless you have heavy lifting needed.  However, it is difficult to find a case where Scala's parallel collections are faster than parallel Java streams.

But this raises an interop point that I probably haven't given enough consideration: there will be increasing numbers of Java libraries that expect streams as input, and we want to easily back those with Scala collections.

So, I think you're right: sequential should be the default, though parallel should be really easy to ask for (either as a parameter or separate method).
 
You need a rather large collection for parallel to be worth it.  Most collections are small.  I feel that parallelism for java streams should be the same for java and scala -- not parallel by default, and explicitly requested when desired.  Deviation from what the defaults are between the two will only add extra information that a developer must keep in their head to use them.   Getting a stream from a java collection and one from a scala collection should be as similar as possible, in my opinion.

Indeed, though I don't want to obscure important differences or throw away ease of use unnecessarily.

If I can't end up making the specialization transparent, then I'd go for just copying .stream .parallelStream and .spliterator as Java collections do.

  --Rex

Rex Kerr

unread,
Feb 24, 2015, 7:53:58 PM2/24/15
to scala-i...@googlegroups.com
Incidentally, transparent specialization via value class wrapping and type classes is looking very promising syntactically.  For instance, I can already do something like

  List("salmon", "herring", "perch").streamed.map(_.length).asJava

and have a java.util.stream.IntStream out with no extra boxing of the Int values.

Whether this will preserve adequate performance is another issue.  Aggressive inlining from scalac could certainly do it, but in its current incarnation I'm a bit doubtful that scalac + JVM will be up to the task.

  --Rex

Simon Ochsenreither

unread,
Feb 24, 2015, 11:36:12 PM2/24/15
to scala-i...@googlegroups.com
I'm not sure that trying to hook into these manually specialized isn't a waste of time. They were obsolete before they were even introduced. No idea what the Oracle developers were thinking when they added these classes.

First they let everyone deal with 15 years of boxed-everything, assuring users that they don't need anything better; and then they turn around and suddenly it's so important that they can't wait the additional 3/4 years until specialization ships ...

Rex Kerr

unread,
Feb 25, 2015, 2:35:14 AM2/25/15
to scala-i...@googlegroups.com


On Tuesday, February 24, 2015, Simon Ochsenreither <simon.och...@gmail.com> wrote:
> I'm not sure that trying to hook into these manually specialized isn't a waste of time. They were obsolete before they were even introduced. No idea what the Oracle developers were thinking when they added these classes.

Maybe they thought that they would make a whole bunch of people think that streams are fast. You have said so on several occasions. The only time when they are impressively fast in a sequential context is when they are specialized.

So as marketing it has already worked. Some people will presumably rely on that speed also (when converting from indexed arrays).

 --Rex


>
> First they let everyone deal with 15 years of boxed-everything, assuring users that they don't need anything better; and then they turn around and suddenly it's so important that they can't wait the additional 3/4 years until specialization ships ...
>

Rex Kerr

unread,
Feb 25, 2015, 5:26:43 AM2/25/15
to scala-i...@googlegroups.com
On Tue, Feb 24, 2015 at 4:53 PM, Rex Kerr <ich...@gmail.com> wrote:
Incidentally, transparent specialization via value class wrapping and type classes is looking very promising syntactically.  For instance, I can already do something like

  List("salmon", "herring", "perch").streamed.map(_.length).asJava

and have a java.util.stream.IntStream out with no extra boxing of the Int values.

Whether this will preserve adequate performance is another issue.  Aggressive inlining from scalac could certainly do it, but in its current incarnation I'm a bit doubtful that scalac + JVM will be up to the task.

Actually, preliminary testing indicates a pretty minor performance hit for hiding the manual specialization: under 1 ns / element for long streams, and maybe 10-15 ns added to creating a stream in the first place.  But that works out to under a 10% and 20% performance penalty respectively, which seems quite reasonable to me.  (Then again, my test shows a 3x penalty for using streams vs. just iterating on an array, so maybe it wouldn't be so tolerable if there are cases where streams have a more negligible penalty.)

  --Rex


Dmitry Petrashko

unread,
Feb 25, 2015, 7:02:49 AM2/25/15
to scala-i...@googlegroups.com


On Tuesday, 24 February 2015 20:51:50 UTC+1, Scott Carey wrote:


On Tuesday, February 24, 2015 at 9:49:34 AM UTC-8, Rex Kerr wrote:


On Tue, Feb 24, 2015 at 4:41 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)

What's the reason for going with the Java naming in one case, but not in the other?

In one case you get a plain java Spliterator.  In the other case, you don't.  Hence the different naming.
 
That seems inconsistent.
(Parallel by default also seems like a questionable choice.)

As I mentioned in my response to Sebastien, parallel seems like the primary use case from within Scala.

Parallel is atrociously slower for most use cases I've run into.   You need a rather large collection for parallel to be worth it.  Most collections are small.  I feel that parallelism for java streams should be the same for java and scala -- not parallel by default, and explicitly requested when desired.  Deviation from what the defaults are between the two will only add extra information that a developer must keep in their head to use them.   Getting a stream from a java collection and one from a scala collection should be as similar as possible, in my opinion.
This is indeed true if overhead that parallelism introduces is high.
Though we have already examples that show that careful implementation can reduce it substantially:  ScalaBlitz is slower than sequential collections only when operation is amazingly fast(eg sum of unboxed integers) and colleciton is less than 10 elements.

Dmitry Petrashko

unread,
Feb 25, 2015, 7:07:36 AM2/25/15
to scala-i...@googlegroups.com


On Tuesday, 24 February 2015 22:47:26 UTC+1, Rex Kerr wrote:


On Tue, Feb 24, 2015 at 11:51 AM, Scott Carey <scott...@gmail.com> wrote:


On Tuesday, February 24, 2015 at 9:49:34 AM UTC-8, Rex Kerr wrote:


On Tue, Feb 24, 2015 at 4:41 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

  1. spliterator, which will return an ordinary java.util.Spliterator
  2. streamed, which will return a value-class-wrapped java Stream created from that spliterator; it will take one argument which is whether the stream should be run in parallel or not (default value: it should be)

What's the reason for going with the Java naming in one case, but not in the other?

In one case you get a plain java Spliterator.  In the other case, you don't.  Hence the different naming.
 
That seems inconsistent.
(Parallel by default also seems like a questionable choice.)

As I mentioned in my response to Sebastien, parallel seems like the primary use case from within Scala.

Parallel is atrociously slower for most use cases I've run into.

Well, it's atrociously slower than sequential unless you have heavy lifting needed.  However, it is difficult to find a case where Scala's parallel collections are faster than parallel Java streams.
Hi Rex. Yes, Scala parallel collections are known to have a lot of performance problems, most of them are fixed in ScalaBlitz.
Do you have some set of automated benchmarks? If yes, I'd volunteer to add ScalaBlitz to comparison.

But this raises an interop point that I probably haven't given enough consideration: there will be increasing numbers of Java libraries that expect streams as input, and we want to easily back those with Scala collections.

So, I think you're right: sequential should be the default, though parallel should be really easy to ask for (either as a parameter or separate method).
 
You need a rather large collection for parallel to be worth it.  Most collections are small.  I feel that parallelism for java streams should be the same for java and scala -- not parallel by default, and explicitly requested when desired.  Deviation from what the defaults are between the two will only add extra information that a developer must keep in their head to use them.   Getting a stream from a java collection and one from a scala collection should be as similar as possible, in my opinion.

Indeed, though I don't want to obscure important differences or throw away ease of use unnecessarily.

If I can't end up making the specialization transparent, then I'd go for just copying .stream .parallelStream and .spliterator as Java collections do.

Also, maybe it would be good to know that ScalaBlitz has spliterators, that at least as of year ago were faster than Java's, were specialized and written in scala(+macros, to be honest).
So maybe, instead of doing specialization my hand, you can use the one that we've implemented in ScalaBlitz?


Cheers,
Dmitry

  --Rex

Rex Kerr

unread,
Feb 25, 2015, 3:17:37 PM2/25/15
to scala-i...@googlegroups.com
Hi Dmitry,

I've been meaning to add ScalaBlitz to my tests, as it's really an improvement (from what I can see) over Scala parallel collections.

The reason I haven't is that I'm mostly focusing on providing good interoperability with Java 8 Streams.  I'm not trying to say that Java 8 Streams are the best possible framework for either sequential or parallel processing of collections, just that a lot of people will use them because of Java 8, and they improve upon the Scala parallel collections.

The reason for manual specialization is purely for Java 8 interop; otherwise I'd just use Scala specialization.  One has to write a bunch of manual glue to link up Scala specialization with Java 8 manually specialized cases.  There aren't so many cases in Java that it's worth writing macros (esp. given potential bugs with e.g. hygiene, and the potential for less informative error messages).

I'll take a look at the ScalaBlitz spliterators, though.  Can you fix the link to the API on the website, though?  Right now it points to a stub of a web page for Aleksandar Prokopec.  I'll get in touch with you if it is not obvious how to extend my benchmarks to ScalaBlitz, and I'll let you know when I've added them if it is obvious how to do it.

Also, we should perhaps discuss draft implementations more here or elsewhere, because it would be better if I do not make arbitrary decisions that end up making it more awkward to use ScalaBlitz.

Thanks!

  --Rex


Scott Carey

unread,
Feb 25, 2015, 4:28:31 PM2/25/15
to scala-i...@googlegroups.com


On Tuesday, February 24, 2015 at 8:36:12 PM UTC-8, Simon Ochsenreither wrote:
I'm not sure that trying to hook into these manually specialized isn't a waste of time. They were obsolete before they were even introduced. No idea what the Oracle developers were thinking when they added these classes.

In my legacy Java codebase, I already use Java 8 streams in many critical places, and the lack of boxing is a huge performance gain in some.  Java 10 is a LONG ways away.   Not as long away as replacing 400,000 lines of Java code with Scala or something else better, but a long ways away.  I expect to be living in a mixed Scala/Java world for a long time.  I'd like mixing the two to be as pleasant and productive as possible.

Rex Kerr

unread,
Feb 25, 2015, 5:13:01 PM2/25/15
to scala-i...@googlegroups.com
On Wed, Feb 25, 2015 at 4:07 AM, Dmitry Petrashko <darkd...@gmail.com> wrote:

Also, maybe it would be good to know that ScalaBlitz has spliterators, that at least as of year ago were faster than Java's, were specialized and written in scala(+macros, to be honest).

The "Stealers" in ScalaBlitz are specialized but don't implement any functionality, and don't really fit that well into the Java framework: lots of methods of no help to the Java Spliterators, and Java Spliterators require state flags that cannot be inferred from the methods in Stealer.
 
So maybe, instead of doing specialization my hand, you can use the one that we've implemented in ScalaBlitz?

Doesn't look like it would make things much easier; use cases are too different.

Also, the coverage of the Scala library isn't very extensive.

It's kind of a shame to maintain two ways of doing essentially the same thing, but I assume that the Java guys need what Spliterators have got, and ScalaBlitz machinery needs what Stealers have got, and it's pretty clunky to combine the two.

I'll think about whether it's practical to construct a unified core that can then be decorated as either a Stealer or a Spliterator.

  --Rex
 

Dmitry Petrashko

unread,
Feb 25, 2015, 6:21:36 PM2/25/15
to scala-i...@googlegroups.com
Hi Rex,

On Wednesday, 25 February 2015 23:13:01 UTC+1, Rex Kerr wrote:


On Wed, Feb 25, 2015 at 4:07 AM, Dmitry Petrashko <darkd...@gmail.com> wrote:

Also, maybe it would be good to know that ScalaBlitz has spliterators, that at least as of year ago were faster than Java's, were specialized and written in scala(+macros, to be honest).

The "Stealers" in ScalaBlitz are specialized but don't implement any functionality, and don't really fit that well into the Java framework: lots of methods of no help to the Java Spliterators, and Java Spliterators require state flags that cannot be inferred from the methods in Stealer.
I assume you mean `characteristics`. They can be easily inferred, as all stealers are collection-specific and know value for every flag from underlying collection(ie if collection is mutable or sorted).

Speaking of methods, all Spliterator methods are implementable in terms of Stealer(the only tricky one is getExactSizeIfKnown, in order to implement this one would need to check if current stealer is PreciseStealer). But not all of Stealer methods can be efficiently implemented in terms of Spliterator.
 
So maybe, instead of doing specialization my hand, you can use the one that we've implemented in ScalaBlitz?

Doesn't look like it would make things much easier; use cases are too different.
Indeed ScalaBlitz wasn't targeted at interop with JavaStreams. But it seems that Oracle guys either followed what Alex was working on or even had direct discussions with him. Or same ideas were floating in the air :-). So implementing one in terms of other is not that hard. ScalaBlitz here has an advantage of being both transparent for Scala and more performant in case of small collections.

Also, the coverage of the Scala library isn't very extensive.
AFAIK, The only commonly used collection that isn't covered is Vector(arrays and ranges are covered, both mutable and immutable maps and sets are covered). It wouldn't be to hard to implement spliterators for other collections like stacks and buffers.
While this isn't indeed a full coverage, that's a wonderful starting ground.

It's kind of a shame to maintain two ways of doing essentially the same thing, but I assume that the Java guys need what Spliterators have got, and ScalaBlitz machinery needs what Stealers have got, and it's pretty clunky to combine the two.

I'll think about whether it's practical to construct a unified core that can then be decorated as either a Stealer or a Spliterator.
Most of methods in Spliterator are there for a reason, though most of those reasons are performance oriented.

I believe that methods in Spliterator are a lot better thought-out, thanks to substantial experience that Alex has in this area,

just to illustrate: instead of having `tryAdvance` as spliterators, stealers have 3 methds `nextBatch`&`next`&`hasNext`. The advantage here is that due to
tryAdvance being effectively higher-order it is  a lot harder to generate optimized code for it, as one would have a huge megamorphic dispatch inside.
This is an example that illustrates "But not all of Stealer methods can be efficiently implemented in terms of Spliterator" claim. And I believe that this is good argument to go with Stealers for scala, as they are better than Spliterator, and will also be enough for java interop.

If you are up to it, I'd propose to have a Skype meeting at some point, I believe we could have a good discussion here and settle on this minimal core.


  --Rex
 

Cheers,
Dmitry

Rex Kerr

unread,
Feb 25, 2015, 8:12:53 PM2/25/15
to scala-i...@googlegroups.com
On Wed, Feb 25, 2015 at 3:21 PM, Dmitry Petrashko <darkd...@gmail.com> wrote:
Hi Rex,
On Wednesday, 25 February 2015 23:13:01 UTC+1, Rex Kerr wrote:


On Wed, Feb 25, 2015 at 4:07 AM, Dmitry Petrashko <darkd...@gmail.com> wrote:

Also, maybe it would be good to know that ScalaBlitz has spliterators, that at least as of year ago were faster than Java's, were specialized and written in scala(+macros, to be honest).

The "Stealers" in ScalaBlitz are specialized but don't implement any functionality, and don't really fit that well into the Java framework: lots of methods of no help to the Java Spliterators, and Java Spliterators require state flags that cannot be inferred from the methods in Stealer.
I assume you mean `characteristics`. They can be easily inferred, as all stealers are collection-specific and know value for every flag from underlying collection(ie if collection is mutable or sorted).

I meant can't be inferred with essentially zero cost.  You can compute each flag at least with the help of instanceof, but you either have to do it once and store it (adding 0 or 8 bytes to every instance, depending on memory packing in that class) or compute every flag each time any flag is asked for.  Otherwise you can often just make it a static value of the class.
 

Speaking of methods, all Spliterator methods are implementable in terms of Stealer(the only tricky one is getExactSizeIfKnown, in order to implement this one would need to check if current stealer is PreciseStealer). But not all of Stealer methods can be efficiently implemented in terms of Spliterator.

Mostly that's because you need to use tryAdvance to push an element into a cache.  Everything else seems to be extra functionality that you can layer on top easily enough.  But I agree: it's hard to do as efficiently.

Note that this only is true if the implementation of Stealer takes care not to duplicate work (at least not in a way that can't be elided by the JIT compiler) between hasNext and next.  Now that I've written several dozen Spliterators, I can say that not having to worry about this makes it quite a bit easier compared to Iterator.
 
 
So maybe, instead of doing specialization my hand, you can use the one that we've implemented in ScalaBlitz?

Doesn't look like it would make things much easier; use cases are too different.
Indeed ScalaBlitz wasn't targeted at interop with JavaStreams. But it seems that Oracle guys either followed what Alex was working on or even had direct discussions with him. Or same ideas were floating in the air :-). So implementing one in terms of other is not that hard. ScalaBlitz here has an advantage of being both transparent for Scala and more performant in case of small collections.

It's not surprising that they're more performant, as they're lower-level.  Good inlining should remove most/all of the difference in most cases (or even make it faster, as the result of hasNext never needs to be stored in the class, removing a side-effect that often can't be neglected).  But right now I agree it's inadequate a lot of the time.
 

Also, the coverage of the Scala library isn't very extensive.
AFAIK, The only commonly used collection that isn't covered is Vector(arrays and ranges are covered, both mutable and immutable maps and sets are covered). It wouldn't be to hard to implement spliterators for other collections like stacks and buffers.
While this isn't indeed a full coverage, that's a wonderful starting ground.

ArrayBuffer gets a non-negligible amount of use, but in any case it's a nice usable subset.  The issue is more whether all the extra implementation details should get imported into the Java 8 compatibility layer, or whether it should be more minimal.

The merging bits are much trickier and if _those_ could be reused it would be a wonderful starting ground.  I can't tell right now how separable the merging logic is from the scheduling logic.  My approach has been to just build everything sequentially, threading a builder through the head-most collector and using arrays to buffer the results in the other places until the builder reaches them.
 

It's kind of a shame to maintain two ways of doing essentially the same thing, but I assume that the Java guys need what Spliterators have got, and ScalaBlitz machinery needs what Stealers have got, and it's pretty clunky to combine the two.

I'll think about whether it's practical to construct a unified core that can then be decorated as either a Stealer or a Spliterator.
Most of methods in Spliterator are there for a reason, though most of those reasons are performance oriented.

I believe that methods in Spliterator are a lot better thought-out, thanks to substantial experience that Alex has in this area,

You must mean Stealer.
 

just to illustrate: instead of having `tryAdvance` as spliterators, stealers have 3 methds `nextBatch`&`next`&`hasNext`. The advantage here is that due to
tryAdvance being effectively higher-order it is  a lot harder to generate optimized code for it, as one would have a huge megamorphic dispatch inside.

Indeed, but if one can inline and avoid that megamorphic dispatch, you can save on logic needed to push results between the various methods.  So for now, I completely agree.  But it might not remain true if the JVM gets better at recognizing megamorphic dispatch and trying to specialize its way out of it by inlining where it normally wouldn't bother.  (The compiler may also be able to do some of this.)

Also, nextBatch is a prime example of a potentially problematic method: it mixes the concerns of simply getting more elements with the particular work-stealing scheme ScalaBlitz uses.  I'm not saying it's not a good scheme.  But now all the logic has to think about CAS blocks and--as implemented now--pad the collections with dummy variables so the unsafe access will hit the right part in memory.  (Obeys DRY, but you have to say a lot more the first time.)
 
This is an example that illustrates "But not all of Stealer methods can be efficiently implemented in terms of Spliterator" claim. And I believe that this is good argument to go with Stealers for scala, as they are better than Spliterator, and will also be enough for java interop.

Stealers are better in some ways, but they have a different separation of concerns than Spliterators do.  You can think about Spliterators without thinking about concurrent anything: its the job of the code that uses Spliterators to only do sane things.  Stealers, right now anyway, have the concurrent stealing logic mixed in with the let's-visit-the-elements logic.

If those can be kept separate, I think it'd be a much clearer decision.  Otherwise in order to support Spliterators with any particular collection you must also think hard about whether it fits the ScalaBlitz concurrent work-stealing model.  (The part I most fundamentally don't understand is why the Stealers steal work.  Why not instead have Stealables with simple logic that get stolen by whomever isn't busy?)
 

If you are up to it, I'd propose to have a Skype meeting at some point, I believe we could have a good discussion here and settle on this minimal core.

Probably a good idea eventually, but as I'm less familiar with ScalaBlitz than I'd like to be, and because someone else might want to chime in, I'd like to go another round or two on email.  I have a feeling that a Skype meeting very quickly turn into me saying, "that sounds reasonable, but I want to check the details" over and over.

  --Rex
 

Haoyi Li

unread,
Feb 25, 2015, 8:24:17 PM2/25/15
to scala-internals
What's the status of scala-blitz right now? Last I saw the mailing list has had two questions in the last forever (the last one unanswered) and the last commit 7 months ago. Are people using it? As an ignorant outsider social proof (XXX person is using it for YYY) is a much easier to understand than technical reasoning =P

--

Dmitry Petrashko

unread,
Feb 26, 2015, 4:09:04 AM2/26/15
to scala-i...@googlegroups.com
ScalaBlitz could be thought as two projects in one:
  • parallel operations `toPar` for collections with small overhead and good scheduling
  • optimize, which removes overhead for sequential operations without rewriting code

While I would say that first one is stable, with 'optimize' there are several limitations that make it hard to proceed:

  • most collections are boxed, until this overhead is removed there's no win in optimizing HashSet.filter, as most of overhead is unboxing-reboxing
  • in order to make some decisions non local knowledge is required(eg for fusion), and this promotes bad practices of putting all methods inside optimize{}, ie in a place where they aren't reusable anymore.
I've been contacted by several users of  `toPar`, though `optimize` seems to get substantially more attention.

Currently, I'm working on Dotty with end goal to make whatever `optimize` was doing apply to all programs, even across separate compilation. That's one of the reasons I joined Eugene's proposal on TASTY. TASTY also gives huge possibilities in removing boxing inside collections.

So, to summarize: I would say that `toPar` is indeed well-thought and is a good thing to base on, but `optimize` has some weak spots, that require huge amount of work as compiler plugin, similar to what miniboxing is doing now. In dotty this is simpler to implement and it feels more natural, as `optimize` relies on blackbox macros, which are in somewhat interesting state for me: AFAIK they are out-of-scope of current scala.meta.

Cheers,
Dmitry

Rex Kerr

unread,
Feb 26, 2015, 5:11:23 AM2/26/15
to scala-i...@googlegroups.com
On Wed, Feb 25, 2015 at 3:21 PM, Dmitry Petrashko <darkd...@gmail.com> wrote:
just to illustrate: instead of having `tryAdvance` as spliterators, stealers have 3 methds `nextBatch`&`next`&`hasNext`. The advantage here is that due to
tryAdvance being effectively higher-order it is  a lot harder to generate optimized code for it, as one would have a huge megamorphic dispatch inside.

I've just finished some exploratory performance testing of different iterator-like frameworks.  They're all rather vulnerable to megamorphic dispatch problems.  But it seems as though the hasNext/next form is the best overall, even if you can't really count on it.  (Sometimes it's as good as a while loop, but it can eventually get megamorphized down to 20x worse performance.)

Some implementations are a bit simpler if you go backwards: get the value first, then detect whether it was an actual value or if you finished instead.  The performance is sometimes even better.  But the difference is never so clear as to make it worth the different post-hoc way of doing things, and it only works if it is deployed optimistically (e.g. map won't check whether it should apply, it just does it, and then you have to check if the result is meaningful) which is difficult/surprising to reason about.

Most importantly, hasNext/next usually does as well at implementing a tryAdvance-like method as does direct creation of the method.  So I'm tentatively on board with a "Stealer"-type splitter (i.e. hasNext/next not tryAdvance semantics).

But all the concurrency logic still bothers me.

Also, I don't see how it could possibly be more efficient to use Stealer.split than Spliterator.trySplit.  Maybe a couple extra object creations are insignificant (probably so).

So I think there could be a common core, but I'm still doubtful that it's just stealing the Stealers from ScalaBlitz.

  --Rex

Dmitry Petrashko

unread,
Feb 26, 2015, 5:58:01 AM2/26/15
to scala-i...@googlegroups.com


On Thursday, 26 February 2015 11:11:23 UTC+1, Rex Kerr wrote:
On Wed, Feb 25, 2015 at 3:21 PM, Dmitry Petrashko <darkd...@gmail.com> wrote:
just to illustrate: instead of having `tryAdvance` as spliterators, stealers have 3 methds `nextBatch`&`next`&`hasNext`. The advantage here is that due to
tryAdvance being effectively higher-order it is  a lot harder to generate optimized code for it, as one would have a huge megamorphic dispatch inside.

I've just finished some exploratory performance testing of different iterator-like frameworks.  They're all rather vulnerable to megamorphic dispatch problems.  But it seems as though the hasNext/next form is the best overall, even if you can't really count on it.  (Sometimes it's as good as a while loop, but it can eventually get megamorphized down to 20x worse performance.)

Some implementations are a bit simpler if you go backwards: get the value first, then detect whether it was an actual value or if you finished instead.  The performance is sometimes even better.  But the difference is never so clear as to make it worth the different post-hoc way of doing things, and it only works if it is deployed optimistically (e.g. map won't check whether it should apply, it just does it, and then you have to check if the result is meaningful) which is difficult/surprising to reason about.
It's also a bit harder if primivites are considered, as they effectively ban all `sentinel` values. So this post-checking is in most cases either some computation(wasNext?) or a field that takes memory.

Most importantly, hasNext/next usually does as well at implementing a tryAdvance-like method as does direct creation of the method.  So I'm tentatively on board with a "Stealer"-type splitter (i.e. hasNext/next not tryAdvance semantics).

But all the concurrency logic still bothers me.

Also, I don't see how it could possibly be more efficient to use Stealer.split than Spliterator.trySplit.  Maybe a couple extra object creations are insignificant (probably so).
 
Here the difference indeed lies in scheduling.  Stealers form a tree, that is used to schedule computations. Note that in hardest examples of irragular workload we haven't see a tree with more that 170 nodes(including leaves).

Unlike this Spliterator.trySplit effectively maintains only leaves of this tree, which makes efficient workstealing and scheduling a lot harder as you miss inner `summary` nodes. This summarty nodes help to join computations for new threads and help to gather the result in the end of operation.

There's also other slight problem that I see with scheduling in `trySplit`s model. One would want to have non-blocking scheduling. Than question comes what happens if multiple threads want to `split` the node. To be more precise, in Spliterator model, what happens in the moment when somebody already succeeded to `trySplit` the node, but didn't yet succeed to publish returned value in datastrucure used for scheduling. It seems for me that here there's no lock freedom, as if thread could get suspended forever at this moment.

Unlike this in ScalaBlitz scheme `split`ing same node could be done several times, and results are published in underlying work-stealing-tree in First-to-publish-wins manner, so everyone has progress without locks.


So I think there could be a common core, but I'm still doubtful that it's just stealing the Stealers from ScalaBlitz.

  --Rex


Cheers,
Dmitry

Rex Kerr

unread,
Feb 26, 2015, 6:34:30 AM2/26/15
to scala-i...@googlegroups.com
On Thu, Feb 26, 2015 at 2:58 AM, Dmitry Petrashko <darkd...@gmail.com> wrote:


On Thursday, 26 February 2015 11:11:23 UTC+1, Rex Kerr wrote:
On Wed, Feb 25, 2015 at 3:21 PM, Dmitry Petrashko <darkd...@gmail.com> wrote:
just to illustrate: instead of having `tryAdvance` as spliterators, stealers have 3 methds `nextBatch`&`next`&`hasNext`. The advantage here is that due to
tryAdvance being effectively higher-order it is  a lot harder to generate optimized code for it, as one would have a huge megamorphic dispatch inside.

I've just finished some exploratory performance testing of different iterator-like frameworks.  They're all rather vulnerable to megamorphic dispatch problems.  But it seems as though the hasNext/next form is the best overall, even if you can't really count on it.  (Sometimes it's as good as a while loop, but it can eventually get megamorphized down to 20x worse performance.)

Some implementations are a bit simpler if you go backwards: get the value first, then detect whether it was an actual value or if you finished instead.  The performance is sometimes even better.  But the difference is never so clear as to make it worth the different post-hoc way of doing things, and it only works if it is deployed optimistically (e.g. map won't check whether it should apply, it just does it, and then you have to check if the result is meaningful) which is difficult/surprising to reason about.
It's also a bit harder if primivites are considered, as they effectively ban all `sentinel` values. So this post-checking is in most cases either some computation(wasNext?) or a field that takes memory.

Indeed.

However, I just tested a real case (scala List) with an actual java.util.Spliterator instead of my mocked up minimal interface for testing, and the results were not encouraging: 60% penalty for going through the extra hasNext/next layer (or 30%, depending on how much dispatch had to be handled).

If you are required to implement a forNext method that the spliterator forwards to, the penalty is less (~15%) in this particular case (if you actually supply it).  If it actually _is_ a Spliterator also with its own implementation there's no penalty, but there's also little to no advantage in code duplication over just having two different frameworks.

But that brings up another point.  I like benchmarking as much as anyone, but I'm not sure I really feel like benchmarking every particular case to make sure that the difference in performance between just-a-Spliterator and a Stealer.  I'll do another two or three, I guess, to see what it looks like.


Most importantly, hasNext/next usually does as well at implementing a tryAdvance-like method as does direct creation of the method.  So I'm tentatively on board with a "Stealer"-type splitter (i.e. hasNext/next not tryAdvance semantics).

But all the concurrency logic still bothers me.

Also, I don't see how it could possibly be more efficient to use Stealer.split than Spliterator.trySplit.  Maybe a couple extra object creations are insignificant (probably so).
 
Here the difference indeed lies in scheduling.  Stealers form a tree, that is used to schedule computations. Note that in hardest examples of irragular workload we haven't see a tree with more that 170 nodes(including leaves).

Unlike this Spliterator.trySplit effectively maintains only leaves of this tree, which makes efficient workstealing and scheduling a lot harder as you miss inner `summary` nodes. This summarty nodes help to join computations for new threads and help to gather the result in the end of operation.

Fair enough.  You can push the work to the Stealer instead of keeping track yourself.
 

There's also other slight problem that I see with scheduling in `trySplit`s model. One would want to have non-blocking scheduling. Than question comes what happens if multiple threads want to `split` the node.

I haven't actually looked at the Java Stream code in depth, but I'd tend to pass Spliterators around atomically.  You are non-blocking because the Spliterator is just passive data.  Use a ring buffer or something to toss them around.

But given that trySplit is already the hard part of the operation (usually), and Stealers make it considerably harder by adding concurrency issues that Spliterators don't have to worry about, I continue to be doubtful that it's actually saving effort to try to use Stealers and wrap Spliterators around them.

  --Rex
 
Message has been deleted

Aleksandar Prokopec

unread,
Feb 26, 2015, 5:27:43 PM2/26/15
to scala-i...@googlegroups.com


>>> Most importantly, hasNext/next usually does as well at implementing a
>>> tryAdvance-like method as does direct creation of the method. So I'm
>>> tentatively on board with a "Stealer"-type splitter (i.e. hasNext/next not
>>> tryAdvance semantics).
>>>
>>> But all the concurrency logic still bothers me.
>>>
>>> Also, I don't see how it could possibly be more efficient to use
>>> Stealer.split than Spliterator.trySplit. Maybe a couple extra object
>>> creations are insignificant (probably so).
>>>

Hi Rex,

Here is some additional information about how the work-stealing tree works:

http://axel22.github.io/resources/docs/lcpc2013_submission_6.pdf

and an old 2013 tech-report on Stealers:

http://infoscience.epfl.ch/record/186071/files/workstealing-collections.pdf

Although ScalaBlitz in its current design relies on the
`tryAdvance`/`trySteal`/`tryComplete` methods, note that these could
easily be made optional.

This means that much of the concurrency logic could be eliminated from
most data structures -- by default, most collections can just return a
Stealer that only supports the `split` method in addition to
`next`/`hasNext` (same as Parallel Collection Splitters work).
Scheduling a data-parallel operation would then proceed similarly to how
it is done now in parallel collections.

Only specific collections like arrays, strings, array buffers or hash
tables can, in addition to `split`, provide `tryX` methods from the full
Stealer interface. For such data structures, irregular workloads would
then be parallelized much more efficiently.

>>
>> Here the difference indeed lies in scheduling. Stealers form a tree, that
>> is used to schedule computations. Note that in hardest examples of
>> irragular workload we haven't see a tree with more that 170 nodes(including
>> leaves).
>>
>> Unlike this Spliterator.trySplit effectively maintains only leaves of this
>> tree, which makes efficient workstealing and scheduling a lot harder as you
>> miss inner `summary` nodes. This summarty nodes help to join computations
>> for new threads and help to gather the result in the end of operation.
>>
>
> Fair enough. You can push the work to the Stealer instead of keeping track
> yourself.
>
>
>>
>> There's also other slight problem that I see with scheduling in
>> `trySplit`s model. One would want to have non-blocking scheduling. Than
>> question comes what happens if multiple threads want to `split` the node.
>>
>
> I haven't actually looked at the Java Stream code in depth, but I'd tend to
> pass Spliterators around atomically. You are non-blocking because the
> Spliterator is just passive data. Use a ring buffer or something to toss
> them around.
>
> But given that trySplit is already the hard part of the operation
> (usually), and Stealers make it considerably harder by adding concurrency
> issues that Spliterators don't have to worry about, I continue to be
> doubtful that it's actually saving effort to try to use Stealers and wrap
> Spliterators around them.
>
> --Rex
>

--
Aleksandar Prokopec

Rex Kerr

unread,
Feb 26, 2015, 7:49:19 PM2/26/15
to scala-i...@googlegroups.com
Thanks for the pointers to the papers, Aleks.  It helps me understand the reasoning behind the data structure (though it seems not incredibly hard to provide hooks to build the same tree using a self-mutating trySplit as long as you don't have heavy contention for the nodes to split).

I still am not entirely certain, though, whether we would be making more or less work for everyone involved to try to have a common basis for Stealers and Spliterators.

There is, of course, the problem of duplicating code (up to 2x) by having two different ways to concurrently visit all the elements of a collection.

But the requirements are otherwise rather different.  trySplit, for instance, is _required_ to have the current Spliterator jump ahead (leaving the skipped part for the new Spliterator), while split is presumably required to _not_ alter the state of a Stealer (its purpose is to sit there, stolen).  hasNext works well on indexed collections where it's a simple check, but less well on e.g. HashTables where you have to write separate advancing and executing logic instead of just seeking and consuming if you find something.  Stealers need CAS logic; Spliterators need flags.

So I still think they're similar but uncomfortable bedfellows, especially since I can't really recapitulate Java 8 stream performance if I'm obliged to go through a Stealer-type interface (in the sequential case).  I get sequential penalties ranging from 25% on Iterator (singly wrapped in Spliterator or doubly wrapped in Stealer then Spliterator) to 80% on Array, at least when going through tryAdvance.  (Haven't tried forEachRemaining path yet.)

This makes me think that stealing the core parts of the algorithms that are common between the approaches is ultimately more efficient and maintainable, at least from the perspective of providing Java 8 compatibility, than to actually try for code reuse.  The performance depends too much on the details.

  --Rex



Aleksandar Prokopec

unread,
Feb 27, 2015, 5:31:50 PM2/27/15
to scala-i...@googlegroups.com

I just read your complete proposal. Some comments:

(1)

We previously had a lot of pain with configuring parameters such as the parallelism level in Parallel Collections.
Mutating the tasksupport field is nothing short of ugly.
We even came close to changing it at some point - the idea was to give the `par` method an implicit scheduler parameter.
Due to source compatibility issues, it never happened.

In ScalaBlitz, we gave each parallel method an implicit scheduler parameter.
Thus, the ScalaBlitz `toPar` method could really return a thin wrapper around the collection, which did not

I would suggest to:
1. either use an implicit scheduler parameter for each combinator method
2. or use an implicit scheduler parameter for the `streamed` method

(3)
Be careful about the providing all the methods from TraversableOnce (or extending it). Some of those methods are naturally not parallelizable, and having them on an interface which is supposed to be parallel violates the principle of least surprise.
Confused users yelled at us more than once about this.

(5)
A single object allocation for the wrapper is not a high price to pay, compared to invoking a parallel operation.
In ScalaBlitz, I avoided value classes in some places precisely for this reason.



On 2/27/2015 1:49 AM, Rex Kerr wrote:
Thanks for the pointers to the papers, Aleks.  It helps me understand the reasoning behind the data structure (though it seems not incredibly hard to provide hooks to build the same tree using a self-mutating trySplit as long as you don't have heavy contention for the nodes to split).

I still am not entirely certain, though, whether we would be making more or less work for everyone involved to try to have a common basis for Stealers and Spliterators.

There is, of course, the problem of duplicating code (up to 2x) by having two different ways to concurrently visit all the elements of a collection.

Maybe there could be a common basis that does not cause a lot of work.
Please see my comment further below about this - duplication might not be necessary.



But the requirements are otherwise rather different.  trySplit, for instance, is _required_ to have the current Spliterator jump ahead (leaving the skipped part for the new Spliterator), while split is presumably required to _not_ alter the state of a Stealer (its purpose is to sit there, stolen).

Note that the contract is: `split` leaves the Parallel Collection Splitter (or a ScalaBlitz Stealer) in an undefined state - it is not presumed to alter the state of the original, or leave it intact. The implementation is free to do whatever it wants.

To my understanding, this is a good thing:
- if you need to convert (wrap) a Spliterator to a Parallel Collections Splitter, a Splitter can return `(original splitter, new splitter)` from the perspective of Parallel Collections

class WrapperSplitter[T](s: Spliterator[T]) extends Splitter[T] {
  def split: (Splitter[T], Splitter[T]) = (s, s.trySplit())
}

- if you need to convert (wrap) a Splitter to a Spliterator, the Spliterator can update its internal Splitter reference:

class WrapperSpliterator[T](var s: Splitter[T]) extends Spliterator[T] {
  def trySplit(): Spliterator[T] = {
    val (a, b) = s.split
    s = a
    b
  }
}

It seems to me that the two should be compatible.


hasNext works well on indexed collections where it's a simple check, but less well on e.g. HashTables where you have to write separate advancing and executing logic instead of just seeking and consuming if you find something.  Stealers need CAS logic; Spliterators need flags.


I think that a Stealer can be something like this (simplified):

trait Stealer[T] extends Splitter[T] {
  def state: State
  def trySteal(): Boolean
  def tryComplete(): Boolean
  def tryAdvance(): Boolean
}

A collection must return a Splitter.
If it happens to also return a Stealer, all the better.
E.g. an array could easily implement a stealer.
HashTable Stealers are not super-complicated, but they don't have to.
Tree Stealers should not be implemented.

A data-parallel scheduler expects to get at least a Splitter.
If a data-parallel scheduler gets a Stealer, all the better - it can schedule irregular workloads and nested parallelism more efficiently.
A default scheduler can just ignore the Stealer, and use it as if it were a Splitter.
A more complex, fancy work-stealing tree scheduler might do something fancier.


So I still think they're similar but uncomfortable bedfellows, especially since I can't really recapitulate Java 8 stream performance if I'm obliged to go through a Stealer-type interface (in the sequential case).  I get sequential penalties ranging from 25% on Iterator (singly wrapped in Spliterator or doubly wrapped in Stealer then Spliterator) to 80% on Array, at least when going through tryAdvance.  (Haven't tried forEachRemaining path yet.)


I would recommend using macros for inlining, to eliminate iterator penalties altogether. At least for arrays, or array buffers.

Rex Kerr

unread,
Feb 27, 2015, 7:26:36 PM2/27/15
to scala-i...@googlegroups.com
Thanks for the comments!

On Fri, Feb 27, 2015 at 2:31 PM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:

I just read your complete proposal. Some comments:

(1)

We previously had a lot of pain with configuring parameters such as the parallelism level in Parallel Collections.
Mutating the tasksupport field is nothing short of ugly.

This is why I was planning to ignore the issue entirely.  There are not-very-attractive ways to get a desired parallelism with Java 8 Streams (e.g. http://stackoverflow.com/questions/21163108/custom-thread-pool-in-java-8-parallel-stream) and I was going to leave that to the user.

If one wants to run using a different parallel framework, one will need to use an extra or different method, and that can take the implicit parameter.  Otherwise I think it's beyond the scope of interop with Java.
 

(3)
Be careful about the providing all the methods from TraversableOnce (or extending it). Some of those methods are naturally not parallelizable, and having them on an interface which is supposed to be parallel violates the principle of least surprise.
Confused users yelled at us more than once about this.

I was going to try gating these with an implicit that was not normally in scope.  I agree about not just extending TraversableOnce.  (Though I'd point out that TraversableOnce supports a lot of operations that are naturally not possible with a single traversal, and the choice is always to buffer everything and do the operation instead of not providing it.)
 

(5)
A single object allocation for the wrapper is not a high price to pay, compared to invoking a parallel operation.
In ScalaBlitz, I avoided value classes in some places precisely for this reason.

Agreed, but others have pointed out that interop is for sequential operations also, since Streams are the only Java 8 way to have high-level operations on collections.  There I do need to pay a bit more attention to wrapper costs.  I'll benchmark it all, of course, and drop the value classes if performance is worse or if it is simply unnecessary for adequate performance.
 


On 2/27/2015 1:49 AM, Rex Kerr wrote:
Thanks for the pointers to the papers, Aleks.  It helps me understand the reasoning behind the data structure (though it seems not incredibly hard to provide hooks to build the same tree using a self-mutating trySplit as long as you don't have heavy contention for the nodes to split).

I still am not entirely certain, though, whether we would be making more or less work for everyone involved to try to have a common basis for Stealers and Spliterators.

There is, of course, the problem of duplicating code (up to 2x) by having two different ways to concurrently visit all the elements of a collection.

Maybe there could be a common basis that does not cause a lot of work.
Please see my comment further below about this - duplication might not be necessary.


But the requirements are otherwise rather different.  trySplit, for instance, is _required_ to have the current Spliterator jump ahead (leaving the skipped part for the new Spliterator), while split is presumably required to _not_ alter the state of a Stealer (its purpose is to sit there, stolen).

Note that the contract is: `split` leaves the Parallel Collection Splitter (or a ScalaBlitz Stealer) in an undefined state - it is not presumed to alter the state of the original, or leave it intact. The implementation is free to do whatever it wants.

To my understanding, this is a good thing:
- if you need to convert (wrap) a Spliterator to a Parallel Collections Splitter, a Splitter can return `(original splitter, new splitter)` from the perspective of Parallel Collections

class WrapperSplitter[T](s: Spliterator[T]) extends Splitter[T] {
  def split: (Splitter[T], Splitter[T]) = (s, s.trySplit())
}

- if you need to convert (wrap) a Splitter to a Spliterator, the Spliterator can update its internal Splitter reference:

class WrapperSpliterator[T](var s: Splitter[T]) extends Spliterator[T] {
  def trySplit(): Spliterator[T] = {
    val (a, b) = s.split
    s = a
    b
  }
}

It seems to me that the two should be compatible.

The problem with any wrapper is that there are runtime penalties for wrapping.  Anything that gets called every iteration (e.g. trySplit or hasNext) very easily ends up costing more wrapped than unwrapped.  Yes, there are microbenchmarks you can write where the JVM manages to fully unwrap the logic (and I wrote some), but once you deploy those into more realistic uses (take an actual collection, run actual operations), the JVM is likely to give up and leave you with some pretty noticeable penalty.

So I think that everything has to actually be a Spliterator if performance is not going to suffer.  It could be something else as well, but it must at _least_ be a Spliterator.
 

hasNext works well on indexed collections where it's a simple check, but less well on e.g. HashTables where you have to write separate advancing and executing logic instead of just seeking and consuming if you find something.  Stealers need CAS logic; Spliterators need flags.


I think that a Stealer can be something like this (simplified):

trait Stealer[T] extends Splitter[T] {
  def state: State
  def trySteal(): Boolean
  def tryComplete(): Boolean
  def tryAdvance(): Boolean
}

A collection must return a Splitter.
If it happens to also return a Stealer, all the better.
E.g. an array could easily implement a stealer.
HashTable Stealers are not super-complicated, but they don't have to.
Tree Stealers should not be implemented.

That's probably workable.  I do worry about the maintainability of the entire Java 8 compatibility project if people have to keep track of and maintain Stealers.
 

A data-parallel scheduler expects to get at least a Splitter.
If a data-parallel scheduler gets a Stealer, all the better - it can schedule irregular workloads and nested parallelism more efficiently.
A default scheduler can just ignore the Stealer, and use it as if it were a Splitter.
A more complex, fancy work-stealing tree scheduler might do something fancier.

So I still think they're similar but uncomfortable bedfellows, especially since I can't really recapitulate Java 8 stream performance if I'm obliged to go through a Stealer-type interface (in the sequential case).  I get sequential penalties ranging from 25% on Iterator (singly wrapped in Spliterator or doubly wrapped in Stealer then Spliterator) to 80% on Array, at least when going through tryAdvance.  (Haven't tried forEachRemaining path yet.)


I would recommend using macros for inlining, to eliminate iterator penalties altogether. At least for arrays, or array buffers.

Well, that presumes that writing a method that uses macros is less than twice as hard to write and maintain as to write two separate methods not using macros.  I'm not really sure that's true.  The tradeoff is not between easier-to-maintain code and better performance, it's between two different ways of getting the same performance.

  --Rex
 
Reply all
Reply to author
Forward
0 new messages