Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

A Fork-Join Calamity

2,997 views
Skip to first unread message

Martin Thompson

unread,
Apr 13, 2014, 3:43:31 AM4/13/14
to
The following article proposes that the FJ framework and parallel collections could be a calamity for Java. 


I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.

I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.

The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.


I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.

Regards,
Martin...


Peter Lawrey

unread,
Apr 13, 2014, 4:09:48 AM4/13/14
to mechanica...@googlegroups.com

I agree with everything you have said. Even for simpler frameworks, there is still a surprising number of ways to misuse them. To some degree this was Java's brilliance. A minimum of features which minimise edge cases. To be fair, my own libraries are *really* bad in this regard. ;)

I recently had cause to migrate some C# code to Java and have seen some cool uses of closures but also some really dire ones.

    List<String> list = new ArrayList<>();
    list.stream().forEach(p -> listA.add(p));
    list.stream().forEach(p -> listB.add(p));

One of the big things is that stream() methods like sum() don't work on BigDecimal or BigInteger.  Why is using BigDecimal in Java so painful. :(

On 13 Apr 2014 08:38, "Martin Thompson" <mjp...@gmail.com> wrote:
The following article proposes that the FJ framework and parallel collections could be a calamity for Java. 


I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.

I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.

The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.


I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are a really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.

Regards,
Martin...


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kirk Pepperdine

unread,
Apr 13, 2014, 4:37:27 AM4/13/14
to mechanica...@googlegroups.com
I would have to say that every highly scalable system I’ve been involved with has employed pipe-lining. FJ is interesting but IMHO it’s use cases are limited and questionable. Unfortunately I fear that FJ nepotism has influenced Java’s implementation of Lambda’s. I smell a “Spring” like opportunity here.

Regards,
Kirk

Remi Forax

unread,
Apr 13, 2014, 8:01:38 AM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 09:38 AM, Martin Thompson wrote:
> The following article proposes that the FJ framework and parallel
> collections could be a calamity for Java.
>
> http://coopsoft.com/ar/Calamity2Article.html
>
> I've been uncomfortable with FJ for some time. I see people struggling
> to design for it, debug it, and more often fail to get performance
> benefits. Other approaches such as pipelining tasks can often be way
> more effective and easier to reason about.
>
> I also find it amusing that after years of trying to hide databases
> behind ORMs that to use the parallel collections effectively you need
> to understand set theory for writing good queries.
>
> The following blog shows just how bloggers can so easily misuse
> parallel collections by having no sympathy for CPU resource on a
> system. I think this is only the tip of the iceberg.
>
> http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/
> <http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/>
>
> I'm curious to know if others have doubts about either Fork-Join or
> parallel collections, or if these are a really good ideas and somehow
> the penny has not dropped for me? I'd really like to see a good
> evidence based debate on this subject.
>
> Regards,
> Martin...

Like any workers pool, the default fork/join pool (ForkPools.commonPool)
used by the parallel Stream API has to be configured globally for the
whole application. The default configuration consider that all cores are
available which is obviously wrong if you have a server.
So what ?

Rémi

>
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-symp...@googlegroups.com
> <mailto:mechanical-symp...@googlegroups.com>.

Remi Forax

unread,
Apr 13, 2014, 8:20:10 AM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 10:37 AM, Kirk Pepperdine wrote:
> I would have to say that every highly scalable system I’ve been
> involved with has employed pipe-lining. FJ is interesting but IMHO
> it’s use cases are limited and questionable. Unfortunately I fear that
> FJ nepotism has influenced Java’s implementation of Lambda’s.

Hi Kirk,
You're right that the implementation of the parallel part of the Stream
API is fully FJ based.
You can use the word "nepotism" to describe this fact, I prefer to talk
about short of skilled people willing to help.

> I smell a “Spring” like opportunity here.

Having a SPI mechanism for the Stream API is now scheduled for Java 9
(that's why StreamSupport exists).

>
> Regards,
> Kirk

cheers,
Rémi

>
> On Apr 13, 2014, at 9:38 AM, Martin Thompson <mjp...@gmail.com
>> <mailto:mechanical-symp...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-symp...@googlegroups.com
> <mailto:mechanical-symp...@googlegroups.com>.

Kirk Pepperdine

unread,
Apr 13, 2014, 8:40:49 AM4/13/14
to mechanica...@googlegroups.com
Hi Remi,

>
> Hi Kirk,
> You're right that the implementation of the parallel part of the Stream API is fully FJ based.
> You can use the word "nepotism" to describe this fact, I prefer to talk about short of skilled people willing to help.

There are certainly cases where the community would have liked to have contributed but conditions on the ground (ie how Oracle treats community) make it difficult. There are way too many behind closed door conversations for OpenJDK to be… well.. open.
As for lack of skilled people… the JVM logging JEP is a fine example of that.. the JEP defenders know nothing about messaging… I know because I spoke with them. They are ill equipped to implement, let alone propose a logging API… Our own Peter Lawrey has Java Chronicle which could be used as a model to support that JEP but I fear a NIH attitude would prevent that from happening… better to copy the worst example of logging frameworks that java developers have been forced to live with for years. So yeah, theres a bit of despair in the air about Oracle’s inability to interact with a real community.. ;-)

First and foremost I really don’t want to cast a negative air over the great effort that Brian and his minions have put forth to get Lambda’s out the door.... that said, when you get past the shock and awe demo’s that have been making the conference circuit and try to convert that to something in the large…. you get a sense of the limitation that FJ has placed on Lambdas. Initially Lambda’s make Java feel more like what OO programming was suppose to be like.. then you immediately feel the limitation in the streams and filters. They are meant to work for about 1 use case and to suggest i might want to use them in another way didn’t quite result in the response I’d anticipated..

>
>> I smell a "Spring" like opportunity here.
>
> Having a SPI mechanism for the Stream API is now scheduled for Java 9 (that's why StreamSupport exists).

A Streams SPI will be appreciated!!!

Regards,
Kirk

Martin Thompson

unread,
Apr 13, 2014, 8:49:27 AM4/13/14
to mechanica...@googlegroups.com

Like any workers pool, the default fork/join pool (ForkPools.commonPool)
used by the parallel Stream API has to be configured globally for the
whole application. The default configuration consider that all cores are
available which is obviously wrong if you have a server.
So what ?

The streaming API is aimed at general use from what I've heard on the conference circuit of late. That means it is sharing a machine with lots of other threads involved in "general use" within applications, e.g. a web container with many threads in its pool. If the default is to assume exclusive access to the system resources then I'd say that is somewhat naive. The same can be said for any component/framework that starts its own "inconsiderate" thread pool. For map-reduce on a system like Hadoop people assume exclusive access to system resources. I'd argue that streams are not the same. If they are to be used in general programming and co-exist with larger solutions then there is something significant missing.

Let's take a step back and consider usability. It does not matter what any of our own personal views are. If any technology causes users to make usability mistakes then it has low affordance. We have somewhat learned our lessons here in UX design but we seem to be in the dark ages about usability when it comes to lower level components.

Remi Forax

unread,
Apr 13, 2014, 9:24:59 AM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 02:40 PM, Kirk Pepperdine wrote:
> Hi Remi,
>
>> Hi Kirk,
>> You're right that the implementation of the parallel part of the Stream API is fully FJ based.
>> You can use the word "nepotism" to describe this fact, I prefer to talk about short of skilled people willing to help.
> There are certainly cases where the community would have liked to have contributed but conditions on the ground (ie how Oracle treats community) make it difficult. There are way too many behind closed door conversations for OpenJDK to be... well.. open.

It doesn't match my experience. I think that things improved.
By example, the mailing list for invokedynamic was private while the one
for lambdas and streams are public.

> As for lack of skilled people... the JVM logging JEP is a fine example of that.. the JEP defenders know nothing about messaging... I know because I spoke with them. They are ill equipped to implement, let alone propose a logging API... Our own Peter Lawrey has Java Chronicle which could be used as a model to support that JEP but I fear a NIH attitude would prevent that from happening... better to copy the worst example of logging frameworks that java developers have been forced to live with for years. So yeah, theres a bit of despair in the air about Oracle's inability to interact with a real community.. ;-)

Yes, JEP current process is mostly one way, the ways to improve the
process was discussed in the open, I think it was at latest FOSDEM, Mark
Reinhold recently proposed a new JEP process.
http://cr.openjdk.java.net/~mr/jep/jep-2.0.html

>
> First and foremost I really don't want to cast a negative air over the great effort that Brian and his minions have put forth to get Lambda's out the door....

:)

> that said, when you get past the shock and awe demo's that have been making the conference circuit and try to convert that to something in the large.... you get a sense of the limitation that FJ has placed on Lambdas. Initially Lambda's make Java feel more like what OO programming was suppose to be like.. then you immediately feel the limitation in the streams and filters. They are meant to work for about 1 use case and to suggest i might want to use them in another way didn't quite result in the response I'd anticipated..

could you be a little more specific, I have hard time to figure out what
you are talking about ?

cheers,
Rémi

Remi Forax

unread,
Apr 13, 2014, 10:24:20 AM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 02:49 PM, Martin Thompson wrote:
>
> Like any workers pool, the default fork/join pool
> (ForkPools.commonPool)
> used by the parallel Stream API has to be configured globally for the
> whole application. The default configuration consider that all
> cores are
> available which is obviously wrong if you have a server.
> So what ?
>
>
> The streaming API is aimed at general use from what I've heard on the
> conference circuit of late. That means it is sharing a machine with
> lots of other threads involved in "general use" within applications,
> e.g. a web container with many threads in its pool. If the default is
> to assume exclusive access to the system resources then I'd say that
> is somewhat naive. The same can be said for any component/framework
> that starts its own "inconsiderate" thread pool. For map-reduce on a
> system like Hadoop people assume exclusive access to system resources.
> I'd argue that streams are not the same. If they are to be used in
> general programming and co-exist with larger solutions then there is
> something significant missing.

This is a little different here because this is part of the JDK and not
a framework.
When web containers will support Java 8, most of them will either set
the property "java.util.concurrent.ForkJoinPool.common.parallelism" or
use a specific fork/join pool instead of the default one [1].

BTW, this was discussed in the open lambda-util mailing list,
http://cs.oswego.edu/pipermail/lambda-lib/2011-March/000122.html
and you can verify by yourself that the current implementation is not
different from what was decided
http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/7eea0aa4c7ad/src/share/classes/java/util/concurrent/ForkJoinPool.java#l3300
http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/7eea0aa4c7ad/src/share/classes/java/util/concurrent/ForkJoinTask.java#l691

>
> Let's take a step back and consider usability. It does not matter what
> any of our own personal views are. If any technology causes users to
> make usability mistakes then it has low affordance. We have somewhat
> learned our lessons here in UX design but we seem to be in the dark
> ages about usability when it comes to lower level components.

I agree with you that it should work out of the box and I think the
current state is mostly due to the fact that most web containers are not
yet configured to work with Java 8.

cheers,
Rémi
[1]
http://stackoverflow.com/questions/21163108/custom-thread-pool-in-java-8-parallel-stream


Kirk Pepperdine

unread,
Apr 13, 2014, 10:26:24 AM4/13/14
to mechanica...@googlegroups.com
Hi Remi,

>
> It doesn't match my experience. I think that things improved.

In certain areas certainly.. but in a number of cases one can only sort out things after the fact.. which is often too late to affect meaningful change.. without seeming like a disruptive force…
>
> Yes, JEP current process is mostly one way, the ways to improve the process was discussed in the open, I think it was at latest FOSDEM, Mark Reinhold recently proposed a new JEP process.
> http://cr.openjdk.java.net/~mr/jep/jep-2.0.html

Right, while making an announcement @ some random conference is ok.. what about those that weren’t able to be @ FOSDEM? Mark, of all people, should know the the Java community is very fragmented as apposed to OSG.. a model in which Oracle is trying to borg all of the JUGs into… But this is really getting off topic so I’ll take this one off-line.
>
>>
>> First and foremost I really don't want to cast a negative air over the great effort that Brian and his minions have put forth to get Lambda's out the door....
>
> :)
>
>> that said, when you get past the shock and awe demo's that have been making the conference circuit and try to convert that to something in the large.... you get a sense of the limitation that FJ has placed on Lambdas. Initially Lambda's make Java feel more like what OO programming was suppose to be like.. then you immediately feel the limitation in the streams and filters. They are meant to work for about 1 use case and to suggest i might want to use them in another way didn't quite result in the response I'd anticipated..
>
> could you be a little more specific, I have hard time to figure out what you are talking about ?

Sorry for being so terse…. I’d like to fork streams as one example.. or have filters to split streams into multiple streams based on some criteria, like a tap? the response to fork() was, you could run your JVM out of memory should you have a slow consumer. so we don’t want to provide that feature.

To Martin’s point, having the parallelism feed from a common configurable thread pool might help. I’ve certainly run into situations where competing threading requirements/pools completely crushed a server.

Regards,
Kirk

Aleksey Shipilev

unread,
Apr 13, 2014, 11:29:56 AM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 11:38 AM, Martin Thompson wrote:
> The following article proposes that the FJ framework and parallel
> collections could be a calamity for Java.
>
> http://coopsoft.com/ar/Calamity2Article.html

Oh, this article is still around... If you read it carefully, you will
notice that after listing all the bad things about FJP, it does not
offer a sound alternative. The "alternative" presented is to scrap the
current highly-tuned and bugfixed implementation, and start over. As if
that magically guarantees the second thing is implicitly better.

I think that FJP is the only viable option for the divide-and-conquer
style of tasks. Of course it has shortcomings, because that's how the
world works: there are always tradeoffs. If you know some other
implementation that makes better tradeoffs, then you should totally come
forward with it and advocate its inclusion into JDK. Or, at least make
that claim substantial, meaning getting a sound comparative research.
Listing all the shortcomings for a particular framework is hardly a
research. Oh wait, this feels like something I said before...

http://mail.openjdk.java.net/pipermail/lambda-dev/2012-October/006169.html
http://mail.openjdk.java.net/pipermail/lambda-dev/2012-October/006177.html

Saying one framework is bad without bringing up the viable alternative
is as constructive as the rally against the Second Law of
Thermodynamics. It sure feels like you can reverse entropy and bring the
happiness to a dying Universe, but that belief quickly diminishes as you
actually try to do it, because you haven't considered *why* those
pitfalls are there.


> I've been uncomfortable with FJ for some time. I see people struggling
> to design for it, debug it, and more often fail to get performance
> benefits. Other approaches such as pipelining tasks can often be way
> more effective and easier to reason about.

I am comfortable with FJP, and I am happy seeing its use in JDK 8
Streams, because, frankly, you don't frequently see the execution
frameworks with that kind of performance magic: striped submission
queues, in-submitter execution while pool threads ramp up, false sharing
avoidance in thread queues, randomized balancing with super-fast PRNGs,
lock-free/relaxed-ops work queues, avoiding multiword-cas/locked
implementations of control words, branch prediction considerations, etc.

FJP tackles the problem of exploiting the internal parallelism without
sacrificing the external one. How successful is pipelining at those
things? I mean, surely, you can do something like Disruptor with
busy-wait handoffs, but in my mind, it is even more "non-sympathetic" to
other code than running a few additional pools full of threads.

The performance model for parallel execution is very hard and in most
cases context-depending, and this is why JDK 8 *did not* made parallel()
implicit. You might find it a coward move to burden the developers with
choice to make the computation parallel or sequential, but once again,
consider the alternatives: either implicit parallel() which blows up
unexpectedly and nothing can be done to turn it off, or not doing
parallel() at all.

> I also find it amusing that after years of trying to hide databases
> behind ORMs that to use the parallel collections effectively you need to
> understand set theory for writing good queries.
>
> The following blog shows just how bloggers can so easily misuse parallel
> collections by having no sympathy for CPU resource on a system. I think
> this is only the tip of the iceberg.

>
> http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/
> <http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/>

Breaking news: not a silver bullet again! You can't actually run faster
with #Threads > #CPUs! That parallel() thing is a lie!

If you look at the benchmarks there... well... um... I would just say it
is a good exercise for seasoned benchmark guys to spot the mistakes
which make the results questionable. Anyway, if we want to *speculate*
the experimental setup is miraculously giving us the sane performance data:

* Sorting in now only 20% faster – a 23X decline.
* Filtering is now only 20% faster – a 25X decline.
* Grouping is now 15% slower.

(That 23X-25X decline is red herring because it compares the results of
two different tests).

Am I reading it right? You put 10 client threads submitting the same
task in the pool, and you are *still* 20% faster on parallel tests? And
that is on 8 hardware threads machine (which is a funny pitfall on its
own)? That means, even when external parallelism is present, you can
still enjoy the benefits of the internal one? Or that is a fever dream
of an overloaded machine?

-Aleksey.

Remi Forax

unread,
Apr 13, 2014, 11:44:43 AM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 04:26 PM, Kirk Pepperdine wrote:
> Hi Remi,
>
>> It doesn't match my experience. I think that things improved.
> In certain areas certainly.. but in a number of cases one can only sort out things after the fact.. which is often too late to affect meaningful change.. without seeming like a disruptive force...
>> Yes, JEP current process is mostly one way, the ways to improve the process was discussed in the open, I think it was at latest FOSDEM, Mark Reinhold recently proposed a new JEP process.
>> http://cr.openjdk.java.net/~mr/jep/jep-2.0.html
> Right, while making an announcement @ some random conference is ok.. what about those that weren't able to be @ FOSDEM? Mark, of all people, should know the the Java community is very fragmented as apposed to OSG.. a model in which Oracle is trying to borg all of the JUGs into... But this is really getting off topic so I'll take this one off-line.

Just to be clear, Mark did not do any announcement, after Mark's
presentation on another topic, a guy ask why the JEPs have no mailing
list (or bug tracking, I can't remember), a discussion spring up and as
a result Mark proposes a new JEP process.


>>> First and foremost I really don't want to cast a negative air over the great effort that Brian and his minions have put forth to get Lambda's out the door....
>> :)
>>
>>> that said, when you get past the shock and awe demo's that have been making the conference circuit and try to convert that to something in the large.... you get a sense of the limitation that FJ has placed on Lambdas. Initially Lambda's make Java feel more like what OO programming was suppose to be like.. then you immediately feel the limitation in the streams and filters. They are meant to work for about 1 use case and to suggest i might want to use them in another way didn't quite result in the response I'd anticipated..
>> could you be a little more specific, I have hard time to figure out what you are talking about ?
> Sorry for being so terse.... I'd like to fork streams as one example.. or have filters to split streams into multiple streams based on some criteria, like a tap? the response to fork() was, you could run your JVM out of memory should you have a slow consumer. so we don't want to provide that feature.

You can not split a stream into multiple streams because of the way the
API is crafted,
it works like that, you construct your pipeline using lazy operations
and at the end
you call a terminal operation that start the pump that take values and
push them on the pipeline.
Because it's the terminal operation that starts the process you can not
have more than one tail.

Now, if you want to implement fork() i.e. using another thread to
process a sub-stream
with a queue in the middle, you can use Stream.peek() which is
equivalent to what you call tap.

>
> To Martin's point, having the parallelism feed from a common configurable thread pool might help. I've certainly run into situations where competing threading requirements/pools completely crushed a server.

see my answer to Martin.

>
> Regards,
> Kirk
>

cheers,
Rémi

Kirk Pepperdine

unread,
Apr 13, 2014, 11:46:58 AM4/13/14
to mechanica...@googlegroups.com

On Apr 13, 2014, at 5:44 PM, Remi Forax <fo...@univ-mlv.fr> wrote:

> On 04/13/2014 04:26 PM, Kirk Pepperdine wrote:
>> Hi Remi,
>>
>>> It doesn't match my experience. I think that things improved.
>> In certain areas certainly.. but in a number of cases one can only sort out things after the fact.. which is often too late to affect meaningful change.. without seeming like a disruptive force...
>>> Yes, JEP current process is mostly one way, the ways to improve the process was discussed in the open, I think it was at latest FOSDEM, Mark Reinhold recently proposed a new JEP process.
>>> http://cr.openjdk.java.net/~mr/jep/jep-2.0.html
>> Right, while making an announcement @ some random conference is ok.. what about those that weren't able to be @ FOSDEM? Mark, of all people, should know the the Java community is very fragmented as apposed to OSG.. a model in which Oracle is trying to borg all of the JUGs into... But this is really getting off topic so I'll take this one off-line.
>
> Just to be clear, Mark did not do any announcement, after Mark's presentation on another topic, a guy ask why the JEPs have no mailing list (or bug tracking, I can't remember), a discussion spring up and as a result Mark proposes a new JEP process.

I think you’ve made my point… thanks ;-)

>
>
>>>> First and foremost I really don't want to cast a negative air over the great effort that Brian and his minions have put forth to get Lambda's out the door....
>>> :)
>>>
>>>> that said, when you get past the shock and awe demo's that have been making the conference circuit and try to convert that to something in the large.... you get a sense of the limitation that FJ has placed on Lambdas. Initially Lambda's make Java feel more like what OO programming was suppose to be like.. then you immediately feel the limitation in the streams and filters. They are meant to work for about 1 use case and to suggest i might want to use them in another way didn't quite result in the response I'd anticipated..
>>> could you be a little more specific, I have hard time to figure out what you are talking about ?
>> Sorry for being so terse.... I'd like to fork streams as one example.. or have filters to split streams into multiple streams based on some criteria, like a tap? the response to fork() was, you could run your JVM out of memory should you have a slow consumer. so we don't want to provide that feature.
>
> You can not split a stream into multiple streams because of the way the API is crafted,

Again, you’ve made my point… and thanks again ;-)

Regards,
Kirk

Martin Thompson

unread,
Apr 13, 2014, 12:02:54 PM4/13/14
to mechanica...@googlegroups.com
I'm trying to stimulate a healthy debate and increase the understanding in our community. My primary goal to to see software developed that delivers real value for a business.

Divide-and-conquer is one way to address parallel computing. You are right in that it is a shame this paper is very one sided. However I think the core focus on parallelism within the Java community is very one sided towards shared memory designs and FJ. From personal experience on high volume systems I've seen a lot of success with pipeline and actor models, plus shared nothing is significantly easier to reason about and tune. I'd like to see a lot more open thinking in this area from the core Java team.

A very valid alternative is to get better at writing single threaded code. It is amazing what can be achieved on a single thread when code is not grossly inefficient. Also without going parallel, and/or concurrent, the code is so much easier to reason about. But this has a big drawback, no company can market writing better code as a product they can sell on any scale.

Every design has benefits and consequences. Let's discuss both sides freely so all can learn and make informed decisions.

> I've been uncomfortable with FJ for some time. I see people struggling
> to design for it, debug it, and more often fail to get performance
> benefits. Other approaches such as pipelining tasks can often be way
> more effective and easier to reason about.

I am comfortable with FJP, and I am happy seeing its use in JDK 8
Streams, because, frankly, you don't frequently see the execution
frameworks with that kind of performance magic: striped submission
queues, in-submitter execution while pool threads ramp up, false sharing
avoidance in thread queues, randomized balancing with super-fast PRNGs,
lock-free/relaxed-ops work queues, avoiding multiword-cas/locked
implementations of control words, branch prediction considerations, etc.

FJP tackles the problem of exploiting the internal parallelism without
sacrificing the external one. How successful is pipelining at those
things? I mean, surely, you can do something like Disruptor with
busy-wait handoffs, but in my mind, it is even more "non-sympathetic" to
other code than running a few additional pools full of threads.

I would not suggest the Disruptor for general purpose use. It is far too specialised. I have said this openly many times. Busy spin strategies are best suited to environments where the number of available cores is greater than the number of threads wanting to run.
 
>
> The following blog shows just how bloggers can so easily misuse parallel
> collections by having no sympathy for CPU resource on a system. I think
> this is only the tip of the iceberg.

>
>   http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/
> <http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/>

Breaking news: not a silver bullet again! You can't actually run faster
with #Threads > #CPUs! That parallel() thing is a lie!

If you look at the benchmarks there... well... um... I would just say it
is a good exercise for seasoned benchmark guys to spot the mistakes
which make the results questionable. Anyway, if we want to *speculate*
the experimental setup is miraculously giving us the sane performance data:

 * Sorting in now only 20% faster – a 23X decline.
 * Filtering is now only 20% faster – a 25X decline.
 * Grouping is now 15% slower.

(That 23X-25X decline is red herring because it compares the results of
two different tests).

Am I reading it right? You put 10 client threads submitting the same
task in the pool, and you are *still* 20% faster on parallel tests? And
that is on 8 hardware threads machine (which is a funny pitfall on its
own)? That means, even when external parallelism is present, you can
still enjoy the benefits of the internal one? Or that is a fever dream
of an overloaded machine?

I hope you do not think I'm advocating this blog!!! Quite the opposite. I think it is an example of how easy it is to get a mental model very wrong with parallel streams, never mind how broken the benchmark is.

Loads of people are going to want to use this cool new feature. Well at least us who earn a living from consulting can look forward to increased sources of revenue as people dig some very deep holes.

Martin...

Kirk Pepperdine

unread,
Apr 13, 2014, 12:39:38 PM4/13/14
to mechanica...@googlegroups.com
I don’t see FJP as being mutually exclusive from pipelining. With the ability to selectively fork a stream they should be able to work hand in hand.

As for single threaded performance, 3 weeks ago we took an app performing work on FPML documents up to 5,5 million TPS on a laptop by simply focusing on single thread performance. Unfortunately using lambda’s (in their current form) we would have never have been able to reach this number as my own benching shows a 20x performance hit when moving from the classical imperative code to using lambda’s. I only wish I could share the code but...

Regards,
Kirk

Richard Warburton

unread,
Apr 13, 2014, 12:56:00 PM4/13/14
to mechanica...@googlegroups.com
Hi,

One of the big things is that stream() methods like sum() don't work on BigDecimal or BigInteger.  Why is using BigDecimal in Java so painful. :(

Well they've focused on the primitive specialised stream variants, which really needed core library support. You can implement sum as reduce(BigDecimal.Zero, BigDecimal::add) so its not that painful.

regards,

  Richard Warburton

Richard Warburton

unread,
Apr 13, 2014, 12:58:39 PM4/13/14
to mechanica...@googlegroups.com
Hi,

I would have to say that every highly scalable system I’ve been involved with has employed pipe-lining. FJ is interesting but IMHO it’s use cases are limited and questionable. Unfortunately I fear that FJ nepotism has influenced Java’s implementation of Lambda’s. I smell a “Spring” like opportunity here.

Not so much the implementation of Lambdas; more the implementation of Streams. I do agree with you though Kirk: it would be easy for a 3rd party library to provide an alternative.

Aleksey Shipilev

unread,
Apr 13, 2014, 1:06:56 PM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 08:02 PM, Martin Thompson wrote:
> I'm trying to stimulate a healthy debate and increase the understanding
> in our community. My primary goal to to see software developed that
> delivers real value for a business.

Ok, the real value for business is code brevity, which means more
readability, more expressiveness, less bugs, less maintenance burden.
You seem to be leaning towards peak performance, and that thing is at
odds with usability. For 99.9999% of businesses peak application
performance is the second order concern. If there is an easy performance
boost with minimal effort, business will go there as well.

> Divide-and-conquer is one way to address parallel computing. You are
> right in that it is a shame this paper is very one sided. However I
> think the core focus on parallelism within the Java community is very
> one sided towards shared memory designs and FJ.

Ummm. How would you do otherwise with the language which embraces shared
memory? Anyway, that statement is invalidated by Akka (which is an
obvious departure from shared memory model) that is driven by FJP. Why?
Because metal is shared memory, and to have close to bare metal
performance, you have to face shared memory at some level.

> From personal experience
> on high volume systems I've seen a lot of success with pipeline and
> actor models, plus shared nothing is significantly easier to reason
> about and tune. I'd like to see a lot more open thinking in this area
> from the core Java team.

Good. The existence of these libraries, and their high performance
tailored with maintainability is the direct consequence of many core
developers from both those libraries and Java core facing the music
*for* users. This is why JDK 8 Streams are exposeable to users: users
should not be burdened with low-level stuff for their code to "just work".

> A very valid alternative is to get better at writing single threaded
> code. It is amazing what can be achieved on a single thread when code is
> not grossly inefficient. Also without going parallel, and/or concurrent,
> the code is so much easier to reason about. But this has a big drawback,
> no company can market writing better code as a product they can sell on
> any scale.

Sure, you can write single-threaded code, except for the cases you
can't. That seems a very generic and self-obvious claim, so I can't
follow that point any further.

> I hope you do not think I'm advocating this blog!!! Quite the opposite.
> I think it is an example of how easy it is to get a mental model very
> wrong with parallel streams, never mind how broken the benchmark is.

Every technology and every improvement complicates the mental model
(even those trying to simplify the model, surprisingly, because they do
it by shaving off the unnecessary details, which are, at times, very
necessary -- and there's no way out of this, because the Universe is
complicated and you can only re-balance complexity, not hide it).
Parallel streams are not exception to this rule.

-Aleksey.

Richard Warburton

unread,
Apr 13, 2014, 1:07:18 PM4/13/14
to mechanica...@googlegroups.com
Hi,

The streaming API is aimed at general use from what I've heard on the conference circuit of late. That means it is sharing a machine with lots of other threads involved in "general use" within applications, e.g. a web container with many threads in its pool. If the default is to assume exclusive access to the system resources then I'd say that is somewhat naive. The same can be said for any component/framework that starts its own "inconsiderate" thread pool.

I was under the impression that things work a little bit different from that actually. My understanding was that when your JVM is running in a Java EE container in order to stop different threads from being blocked from making progress by waiting on results coming back from an overly contended FJP even if you run .parallelStream() then things are sequential.

Peter Lawrey

unread,
Apr 13, 2014, 1:40:41 PM4/13/14
to mechanica...@googlegroups.com

Thank you for the tip.

--

Martin Thompson

unread,
Apr 13, 2014, 1:43:21 PM4/13/14
to mechanica...@googlegroups.com
On 13 April 2014 18:06, Aleksey Shipilev <aleksey....@gmail.com> wrote:
On 04/13/2014 08:02 PM, Martin Thompson wrote:
> I'm trying to stimulate a healthy debate and increase the understanding
> in our community. My primary goal to to see software developed that
> delivers real value for a business.

Ok, the real value for business is code brevity, which means more
readability, more expressiveness, less bugs, less maintenance burden.
You seem to be leaning towards peak performance, and that thing is at
odds with usability. For 99.9999% of businesses peak application
performance is the second order concern. If there is an easy performance
boost with minimal effort, business will go there as well.

How did I give the impression I'm leaning towards peak performance? I'm only exploring the subject of parallel streams and FJ for if they meet their goals.

Performance is a misdirection in this context. Going parallel is this context is about increasing utilisation of our modern multicore hardware.

Code brevity itself does not necessarily lead to increased business value. Just look at Perl against the criteria you list. Here you are saying business value is coming from parallel streams making things easier then later you say every technology "complicates the mental model". This feels like a contradiction.

For code to be maintainable it must be clear and easy to reason about.  I think many would argue that larger scale apps built with FJ or Map-Reduce are not easy to maintain or debug.

I hope the point of this discussion for people to understand the benefits and be able to avoid the pitfalls as they increase understanding.

> Divide-and-conquer is one way to address parallel computing. You are
> right in that it is a shame this paper is very one sided. However I
> think the core focus on parallelism within the Java community is very
> one sided towards shared memory designs and FJ.

Ummm. How would you do otherwise with the language which embraces shared
memory? Anyway, that statement is invalidated by Akka (which is an
obvious departure from shared memory model) that is driven by FJP. Why?
Because metal is shared memory, and to have close to bare metal
performance, you have to face shared memory at some level.

The statement is not invalidated by Akka. Akka is from the Scala community and not to be found in the JDK or JEE. Also FJP is only one of many possible ways of scheduling actors.

Real performance and utilisation comes from cores working on memory in their core local caches free from the contention of other cores. Shared memory is an illusion afforded to each core via a messaging protocol to achieve cache coherence. From a bare metal perspective it is very easy to make an argument that shared memory is an illusion and a very leaky abstraction. One has to only consider the NUMA effects on modern servers with many sockets.

When I go for bare metal performance I only used shared memory as a means of message passing as this maps very cleanly to the cache coherence model I'm actually sitting on as a non-leaky abstraction.
 
Martin...

Aleksey Shipilev

unread,
Apr 13, 2014, 2:12:38 PM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 09:43 PM, Martin Thompson wrote:
> How did I give the impression I'm leaning towards peak performance? I'm
> only exploring the subject of parallel streams and FJ for if they meet
> their goals.

Parallel streams obviously meet their goals of providing the accessible
parallelism to users. FJP obviously meets its goals of providing the
foundation for that parallel work (validated by JDK 8 itself, Akka,
GPars, etc).

> Performance is a misdirection in this context. Going parallel is this
> context is about increasing utilisation of our modern multicore hardware.

Wait, what? Which context? I don't care about the utilization, I don't
think anyone cares about increasing the utilization unless you run the
power company billing the datacenter. I do care about performance though.

> Code brevity itself does not necessarily lead to increased business
> value. Just look at Perl against the criteria you list.

Your counter-argument is incorrect in assuming business value is linear
to code brevity. It is not: there is a sweet-spot of brevity which pays
off maximally well. You would argue this:

public void printGroups(List<People> people) {
Set<Group> groups = new HashSet<>();
for (Person p : people) {
if (p.getAge() >= 65)
groups.add(p.getGroup());
}
List<Group> sorted = new ArrayList<>(groups);
Collections.sort(sorted, new Comparator<Group>() {
public int compare(Group a, Group b) {
return Integer.compare(a.getSize(), b.getSize())
}
});
for (Group g : sorted)
System.out.println(g.getName());
}

...is more readable than this?

public void printGroups(List<People> people) {
people.stream()
.filter(p -> p.getAge() > 65)
.map(p -> p.getGroup())
.distinct()
.sorted(comparing(g -> g.getSize())
.forEach(g -> System.out.println(g.getName());
}


> Here you are saying business value is coming from parallel streams
> making things easier then later you say every technology "complicates
> the mental model". This feels like a contradiction.

If you re-read that thought carefully: every technology *does*
complicate the mental model, by sweeping unnecessary things under the
rug, but adding to the under the rug mess. The "common" usages, however,
are simplified at the expense of increased complexity elsewhere.

This is what I see in this thread: it is harder to bend parallel streams
to do *exactly* what you want low-level-wise, but that's only the price
for exposing the entire realm of Java developers to readable and
maintainable parallel code.

> For code to be maintainable it must be clear and easy to reason about.
> I think many would argue that larger scale apps built with FJ or
> Map-Reduce are not easy to maintain or debug.

And I would argue programming is hard. Not easy to maintain or debug
compared to what? Is there an option which makes solving the problems
FJ/MR systems are facing easier *without* sacrificing the benefits of
FJ/MR? (Hint-hint: you are not in the single-threaded Texas anymore).

> The statement is not invalidated by Akka. Akka is from the Scala
> community and not to be found in the JDK or JEE. Also FJP is only one of
> many possible ways of scheduling actors.

...and yet, FJP is their default high-performance executor.

> When I go for bare metal performance I only used shared memory as a
> means of message passing as this maps very cleanly to the cache
> coherence model I'm actually sitting on as a non-leaky abstraction.

That accurately describes the we-care-about-performance approach for
modern Java today: using, providing, and improving light-weight
inter-thread communication primitives [see e.g. entire j.u.c.*, other
lock-free stuff, fences, enhanced volatiles, etc]. Does that mean Java
community and core Java team is "open thinking in this area", contrary
to what you seem to be implying?

-Aleksey

Edward Harned

unread,
Apr 13, 2014, 2:28:12 PM4/13/14
to mechanica...@googlegroups.com
I wrote the article and I stand by it today.

For the life of me, I’ve never been able to figure out why Oracle pushes this F/J framework so hard. F/J is just a niche in parallel computing. Recursive decomposition is just a tiny niche in F/J. Yet they push this F/J framework as if it were the Queen of the Ball.

The JVM will probably never have a parallel engine like the one found in Microsoft’s .Net Framework for Parallel Programming. The Sumatra project may be ready for Java9 but integrating parallel streams to use GPU’s is not going to be easy, probably won’t be ready for Java9, and certainly won’t be usable by many laptop/desktop users. So look at the next best thing.

Instead of a fork-and-join F/J framework, use a scatter-gather F/J framework. No submission queues. No “continuation threads/continuation fetching.” No horrific complexity. In application development frameworks, flexibility, simplicity, error recovery, accountability, and documentation are the primary properties. Speed comes later.

If I can build a scatter-gather engine in a few months then so can they. In 2010 I built a proof-of-concept that work-sharing is simpler and works just as well as work-stealing without submission queues and an intermediate join(). The good professor ignored it and suggested I read the research papers on work-stealing. I took the parallel engine out of another F/J project I maintain and dropped in a scatter-gather engine. Today it performs superbly as the TymeacDSE project on SourceForge.
http://sourceforge.net/projects/tymeacdse/

The is an application server/service and not suitable for inclusion into the JDK which is why I never proposed it.

Dynamic decomposition works where recursive decomposition fails. Dynamic decomposition greatly simplifies stream parallelization. As you can see, it’s not that hard and it doesn’t take that long. If you build it so the caller, server, and application code are all separate then you can easily make the server local or remote.

For Java EE just have a Remote Procedure Call version. Set up a separate JVM just for RPC. With system properties in EE (and even SE) you can offload the multi-threading to a separate machine. The ability to do that would solve many problems in both places.

In a nutshell, start from the beginning. Build a parallel engine that can sustain Java into the future. A tiny niche product with severe restrictions trying to emulate a better methodology is not the way to go.

ed




On Sunday, April 13, 2014 3:38:49 AM UTC-4, Martin Thompson wrote:
The following article proposes that the FJ framework and parallel collections could be a calamity for Java. 


I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.

I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.

The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.


I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.

Regards,
Martin...


Martin Thompson

unread,
Apr 13, 2014, 2:44:09 PM4/13/14
to mechanica...@googlegroups.com
On 13 April 2014 19:12, Aleksey Shipilev <aleksey....@gmail.com> wrote:
On 04/13/2014 09:43 PM, Martin Thompson wrote:
> How did I give the impression I'm leaning towards peak performance? I'm
> only exploring the subject of parallel streams and FJ for if they meet
> their goals.

Parallel streams obviously meet their goals of providing the accessible
parallelism to users. FJP obviously meets its goals of providing the
foundation for that parallel work (validated by JDK 8 itself, Akka,
GPars, etc)

"Obviously", where is the evidence? You may be right but you cannot make that statement yet.


> Performance is a misdirection in this context. Going parallel is this
> context is about increasing utilisation of our modern multicore hardware.

Wait, what? Which context? I don't care about the utilization, I don't
think anyone cares about increasing the utilization unless you run the
power company billing the datacenter. I do care about performance though.

Without efficient utilisation you do not get performance. You need to efficiently utilise the other cores to get the parallel speedup. Provided that speedup attempt does not suffer a costly coherency penalty as described by Universal Scalability Law.
Streams can absolutely improve code clarity for those who embrace set theory. This is a very separate argument from the use of parallel streams and FJ.  When *parallel* it is much harder to reason about and debug. 

As I've said repeatedly, and you don't acknowledge, that being able to reason about and debug code is a huge part of the maintenance cost. Syntax is one, but only one part, of that.
 
> Here you are saying business value is coming from parallel streams
> making things easier then later you say every technology "complicates
> the mental model". This feels like a contradiction.

If you re-read that thought carefully: every technology *does*
complicate the mental model, by sweeping unnecessary things under the
rug, but adding to the under the rug mess. The "common" usages, however,
are simplified at the expense of increased complexity elsewhere.

This is what I see in this thread: it is harder to bend parallel streams
to do *exactly* what you want low-level-wise, but that's only the price
for exposing the entire realm of Java developers to readable and
maintainable parallel code.

Going parallel on the same data can have benefits and does have consequences. There are alternatives that are sometimes a better fit. It is healthy to have diversity.
 
> For code to be maintainable it must be clear and easy to reason about.
>  I think many would argue that larger scale apps built with FJ or
> Map-Reduce are not easy to maintain or debug.

And I would argue programming is hard. Not easy to maintain or debug
compared to what? Is there an option which makes solving the problems
FJ/MR systems are facing easier *without* sacrificing the benefits of
FJ/MR? (Hint-hint: you are not in the single-threaded Texas anymore).

Absolutely programming is hard. It gets a lot harder when we go parallel or concurrent. That is why single threaded actors, or pipeline stages, are much easier to program. By using actors and pipelines we have an option for staying in single threaded Texas (Kansas).

> The statement is not invalidated by Akka. Akka is from the Scala
> community and not to be found in the JDK or JEE. Also FJP is only one of
> many possible ways of scheduling actors.

...and yet, FJP is their default high-performance executor.

Without out core local memory, thread affinity, or other primitives that is the best they can hope for.
 
> When I go for bare metal performance I only used shared memory as a
> means of message passing as this maps very cleanly to the cache
> coherence model I'm actually sitting on as a non-leaky abstraction.

That accurately describes the we-care-about-performance approach for
modern Java today: using, providing, and improving light-weight
inter-thread communication primitives [see e.g. entire j.u.c.*, other
lock-free stuff, fences, enhanced volatiles, etc]. Does that mean Java
community and core Java team is "open thinking in this area", contrary
to what 

What you list above is all about concurrent programming and dealing with the issues of sharing memory.  To have less contention on shared memory things like stack allocation, core local memory, and memory layout control could be argued for being on the list.

Martin Thompson

unread,
Apr 13, 2014, 2:57:36 PM4/13/14
to mechanica...@googlegroups.com
How do you ensure all thread pools in the various app servers, 3rd party libs, and the parallel streams all play well together? I think it is a tricky problem because we then have a centralised pool it is a contention point plus issues like unhandled exceptions and rejected executions become tricky. We need to work out a clean way of surfacing the issues so developers don't fall into a trap that makes their life worse rather than better.

I've seen a lot of talks on how we can use streaming APIs which are great and it can be really nice. You're own talk being a great example of it done well :-) I just think is is fair to explore the implications and consequences which is more boring but just as important.

Aleksey Shipilev

unread,
Apr 13, 2014, 3:30:47 PM4/13/14
to mechanica...@googlegroups.com
On 04/13/2014 10:44 PM, Martin Thompson wrote:
> On 13 April 2014 19:12, Aleksey Shipilev <aleksey....@gmail.com
> "Obviously", where is the evidence? You may be right but you cannot make
> that statement yet.

The blog links you were posting are the evidence for that: users get
parallel speedups with parallelStream(). Since that code uses FJP to
achieve those speedups, it validates the use of FJP.

But you want something else? You want it to deliver speedups in all the
cases? (To quote yourself, being unable to "so easily misuse parallel
collections by having no sympathy for CPU resource on a system").

Now if you think there are better options, the burden of proof is on
you. Can you beat the FJP-backed parallelStream() performance with
non-FJP-backed actors and/or pipelines in similar scenarios?

> Without efficient utilisation you do not get performance. You need to
> efficiently utilise the other cores to get the parallel speedup.

Um, no? Utilization is tangential to performance. I don't have to
"efficiently" utilize the cores to get the speedup (note you mix
"speedup" and "parallel speedup" freely, but these are not the same), I
just have to use the cores... sometimes. For example, the non-obvious
thing for FJP and Streams is that there are clear cases where it is
better *not* to use the core and stay local for short tasks (this is
where execute-in-submitter thing was born from -- contrary to the belief
that those bookworm academicians are here to kill us all).

> Streams can absolutely improve code clarity for those who embrace set
> theory.

Oh. I guess programming is even harder for alphabet deniers. Seriously,
Martin! I stopped reading after this line.

-Aleksey.

Martin Thompson

unread,
Apr 13, 2014, 4:46:49 PM4/13/14
to mechanica...@googlegroups.com
On 13 April 2014 20:30, Aleksey Shipilev <aleksey....@gmail.com> wrote:
On 04/13/2014 10:44 PM, Martin Thompson wrote:
> On 13 April 2014 19:12, Aleksey Shipilev <aleksey....@gmail.com
> "Obviously", where is the evidence? You may be right but you cannot make
> that statement yet.

The blog links you were posting are the evidence for that: users get
parallel speedups with parallelStream(). Since that code uses FJP to
achieve those speedups, it validates the use of FJP.

But you want something else? You want it to deliver speedups in all the
cases? (To quote yourself, being unable to "so easily misuse parallel
collections by having no sympathy for CPU resource on a system").

We have been talking about how people will be able to cope with this. That is development effort and maintenance. It is all about how one can reason about code.

The evidence I'm looking for is can people easily reason about this code compared to alternatives. Where sometimes the alternative is just single threaded code. Let us be clear this is not about streams and their API, it is about the potential scale up of parallel streams and FJ, with the context of the implications of adopting such techniques in the full context of software delivery.
 
Now if you think there are better options, the burden of proof is on
you. Can you beat the FJP-backed parallelStream() performance with
non-FJP-backed actors and/or pipelines in similar scenarios?

Erlang
 
> Without efficient utilisation you do not get performance. You need to
> efficiently utilise the other cores to get the parallel speedup.r

Um, no? Utilization is tangential to performance. I don't have to
"efficiently" utilize the cores to get the speedup (note you mix
"speedup" and "parallel speedup" freely, but these are not the same), I
just have to use the cores... sometimes. For example, the non-obvious
thing for FJP and Streams is that there are clear cases where it is
better *not* to use the core and stay local for short tasks (this is
where execute-in-submitter thing was born from -- contrary to the belief
that those bookworm academicians are here to kill us all).

If you think utilisation is not directly related to performance then Amdahl, Erlang, LIttle, Gunther and many others must all be wrong and you are right.
 
> Streams can absolutely improve code clarity for those who embrace set
> theory.

Oh. I guess programming is even harder for alphabet deniers. Seriously,
Martin! I stopped reading after this line.

It seems we will have to differ. I think there is value in FJ and parallel streams, there are also implications and consequences. You seem to think it is the only thing and that is fine. 

Michael Barker

unread,
Apr 13, 2014, 5:15:02 PM4/13/14
to mechanica...@googlegroups.com
Is anyone on the list using FJ or parallel streams in production to get real speed-ups on solving parallel problems (and not just as a better executor, a-la Akka)?  Be interested in hearing real-world experiences.

Personally we've just removed the only piece of software from our production environment that was just FJ and replaced it with a message passing/request based concurrency solution, but the problem in question didn't really fit the parallel model particularly well.  So that is not really a comment on FJ, more about using the wrong tool for the job.  I'd like to hear about the cases where it is the right tool.

Mike.


--

Aleksey Shipilev

unread,
Apr 13, 2014, 5:54:52 PM4/13/14
to mechanica...@googlegroups.com
On 04/14/2014 12:46 AM, Martin Thompson wrote:
>
> On 13 April 2014 20:30, Aleksey Shipilev <aleksey....@gmail.com
> <mailto:aleksey....@gmail.com>> wrote:
>
> On 04/13/2014 10:44 PM, Martin Thompson wrote:
> > On 13 April 2014 19:12, Aleksey Shipilev
> <aleksey....@gmail.com <mailto:aleksey....@gmail.com>
> > "Obviously", where is the evidence? You may be right but you
> cannot make
> > that statement yet.
>
> The blog links you were posting are the evidence for that: users get
> parallel speedups with parallelStream(). Since that code uses FJP to
> achieve those speedups, it validates the use of FJP.
>
> But you want something else? You want it to deliver speedups in all the
> cases? (To quote yourself, being unable to "so easily misuse parallel
> collections by having no sympathy for CPU resource on a system").
>
>
> We have been talking about how people will be able to cope with this.
> That is development effort and maintenance. It is all about how one can
> reason about code.

Ah, you conflate the reasoning about the code correctness (I submit this
is what most people mean under "reasoning about the code"), and
understanding the performance model of the code (this *may* be called
reasoning, but this will not resonate with most people, including me --
since most people don't care about performance provided it is not
horrendously bad).

I submit that reasoning about the code is greatly simplified with
Streams. Performance model? Not so much. As I keep saying, the
performance model for *any* parallel application is rather complex, and
gets even more complex as you try to abstract things. I further submit
that no abstraction is immune from that, either parallel streams,
actors, pipelines, or whatever other thing there is -- it is a tradeoff
for making common use case more appealing.

> The evidence I'm looking for is can people easily reason about this code
> compared to alternatives. Where sometimes the alternative is just single
> threaded code. Let us be clear this is not about streams and their API,
> it is about the potential scale up of parallel streams and FJ, with the
> context of the implications of adopting such techniques in the full
> context of software delivery.

This problem (e.g. predicting performance and guiding optimization) is
widely unsolved even for sequential code (algorithmic complexities vs.
real life performance, anyone?). It is not magically solved for parallel
code either, but the approach is the same: you start off getting
empirical data to construct the models which are fitting that data, and
from there decide if the predicted behavior from the model is what you need.


> Now if you think there are better options, the burden of proof is on
> you. Can you beat the FJP-backed parallelStream() performance with
> non-FJP-backed actors and/or pipelines in similar scenarios?
>
>
> Erlang

That's not the answer, Martin. Verifiable experiments, please. You know,
those things that produce "evidence" you want in this thread. P.S.
Erlang/OTP is a funny non-argument in this discussion for two reasons:
a) it uses work-stealing executors as well, hm; b) somehow Akka is not
an example, but Erlang is, hm.

> > Streams can absolutely improve code clarity for those who embrace set
> > theory.
>
> Oh. I guess programming is even harder for alphabet deniers. Seriously,
> Martin! I stopped reading after this line.
>
>
> It seems we will have to differ. I think there is value in FJ and
> parallel streams, there are also implications and consequences. You seem
> to think it is the only thing and that is fine.

I fail to see how my "of course it has shortcomings, because that's how
the world works: there are always tradeoffs", "every technology and
every improvement complicates the mental model", and "it is harder to
bend parallel streams to do *exactly* what you want low-level-wise, but
that's only the price for exposing the entire realm of Java developers
to readable and maintainable parallel code" -- can be treated as me
thinking "it is the only thing".

-Aleksey.

Jin Mingjian

unread,
Apr 13, 2014, 11:42:53 PM4/13/14
to mechanica...@googlegroups.com
I am coming late:) I think the discussion are becoming to a combat of literature, although it may be part of this kind discussion.

I read the first version of that FJ calamity article(OK, Edward come in) in the past year. My own understanding is, some words of this article are a little much subjective, suck as "academic exercise"; some others can hit some points.

And again, such as "For 99.9999% of businesses peak application performance is the second order concern" should be avoided as its strong subjective color unless you have at less a public poll to support this. As Joshua Bloch "suggested", the performance is always important for library authors. There are truly trades-offs. But when your businesses needs it, it matters. If to see the performance aspect on the top of timeline, not in a static point, on the contrary,  many "of businesses peak application performance" would be a best concern in one time point, IMHO.

One word from Martin I really like is "measuring". Microbench has its problem indeed. But if we have not any measuring results, it is not much sense to draw any conclusions from the discussion. 

I've done some experimental investigation to FJ side works ago. Here are my 5c:

1. the detail implementation of FJ is very very complex. 
I personally feel only Doug Lee has the ability to maintain it. But this is just a engineering flaw of FJ. If you meet a bug in FJ with your critical system, it may be harder to workaround or fix it soon. But you need in that it is in your critical core...

2. I benched the external task submissions of FJ in the favor of async mode. 
The FJ's performance is slower around 5x(the largest is 10x, but does not always happen) than that of my lockfree MPMC powered thread pool in a simple benchmark, which gives how long a batch of simple tasks takes from submitting to finishing to measure the overhead of pool framework itself. The design is just prototyping kind, without any optimization, and far from my ideal thoughts. But the result makes me more open to the outside of FJ.

3. AKKA's FJ usage can not show that the FJ is best common pool design. 
Because there are not strong competitors in the concurrency libraries design. The new-coming library authors just say "you see FJ based AKKA is trendy, we can use it". But we have not seen any measuring to say why.

4. Finally, #3 uncovers a key: the diversity of current Java concurrency libraries now is not enough, as Martin hinted. I'd like to see more works in this field in the future.


best regards,
Jin...






Aleksey Shipilev

unread,
Apr 14, 2014, 3:24:20 AM4/14/14
to mechanica...@googlegroups.com
On 04/14/2014 07:42 AM, Jin Mingjian wrote:
> One word from Martin I really like is "measuring". Microbench has its
> problem indeed. But if we have not any measuring results, it is not much
> sense to draw any conclusions from the discussion.

If your benchmarks are flawed, you can not draw *any* conclusion from
them, because you are looking at garbage data. I understand people may
see things even in the white noise, but this is not "measuring", this is
"guessing".

> I've done some experimental investigation to FJ side works ago. Here are
> my 5c:
>
> 1. the detail implementation of FJ is very very complex.
> I personally feel only Doug Lee has the ability to maintain it. But this
> is just a engineering flaw of FJ. If you meet a bug in FJ with your
> critical system, it may be harder to workaround or fix it soon. But you
> need in that it is in your critical core...

That's true for any other system which abstracts things. Write the code
straight in machine assembly. Wait, there are also bugs in CPUs
themselves...

Of course, FJP code is complex, but that is only because it struggles to
attain consistent performance on the platforms where it is hard to do
otherwise. (Unfortunately, this includes Java and the variety of JVMs --
that means most of the code there is heavily optimized to break the
reliance on smart compiler coming and fixing everything). This is hardly
an "engineering flaw", this in an engineering excellence to hide all
these complexities in the library -- that's what libraries are for.

But you can only appreciate this if/when you are dealing with these
issues yourself in, say, writing your own thread pool. This is a good
exercise, and if diversity matters for you, get on writing the FJP
killer. If that FJP killer is close in performance for the similar
scenarios, but is much less complex, it would be silly not to adopt it.

> 2. I benched the external task submissions of FJ in the favor of async
> mode.
> The FJ's performance is slower around 5x(the largest is 10x, but does
> not always happen) than that of my lockfree MPMC powered thread pool in
> a simple benchmark, which gives how long a batch of simple tasks takes
> from submitting to finishing to measure the overhead of pool framework
> itself. The design is just prototyping kind, without any optimization,
> and far from my ideal thoughts. But the result makes me more open to the
> outside of FJ.

Two things:
a) Where are your benchmarks? Are your results peer-reviewed? ;)
b) Comparisons of prototype code which obviously has bugs due to a lack
of wide testing vs. production-grade code used everywhere and dealt with
lots of non-obvious real world quirks may be not telling anything useful
at all

> 3. AKKA's FJ usage can not show that the FJ is best common pool design.

This is from the bookworm me: in the open world, you can not show
*anything* is the best unless you try all other infinite possibilities.
Hence, this is the one of the questions which is unanswerable by
construction.

> Because there are not strong competitors in the
> concurrency libraries design. The new-coming library authors just
> say "you see FJ based AKKA is trendy, we can use it". But we have not
> seen any measuring to say why.

I dunno, this?
http://letitcrash.com/post/17607272336/scalability-of-fork-join-pool

Or, this?
http://shipilev.net/talks/jeeconf-May2013-forkjoin.pdf (in Russian, but
graphs are pretty much self-explanatory).

> 4. Finally, #3 uncovers a key: the diversity of current Java concurrency
> libraries now is not enough, as Martin hinted. I'd like to see more
> works in this field in the future.

The mere wish is not enough, and the "lack of competition" may as well
indicate the competition can not even compete. But you can prove me
wrong by providing the implementation which is making better tradeoffs
than FJP does; since I know how much engineering is put into FJP, I
doubt you can provide one in nearest 3 years, even if you have the
experience of Doug Lea himself.

-Aleksey.