I agree with everything you have said. Even for simpler frameworks, there is still a surprising number of ways to misuse them. To some degree this was Java's brilliance. A minimum of features which minimise edge cases. To be fair, my own libraries are *really* bad in this regard. ;)
I recently had cause to migrate some C# code to Java and have seen some cool uses of closures but also some really dire ones.
List<String> list = new ArrayList<>();
list.stream().forEach(p -> listA.add(p));
list.stream().forEach(p -> listB.add(p));
The following article proposes that the FJ framework and parallel collections could be a calamity for Java.I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.
The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.
--I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are a really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.Regards,Martin...
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Like any workers pool, the default fork/join pool (ForkPools.commonPool)
used by the parallel Stream API has to be configured globally for the
whole application. The default configuration consider that all cores are
available which is obviously wrong if you have a server.
So what ?
> I've been uncomfortable with FJ for some time. I see people struggling
> to design for it, debug it, and more often fail to get performance
> benefits. Other approaches such as pipelining tasks can often be way
> more effective and easier to reason about.
I am comfortable with FJP, and I am happy seeing its use in JDK 8
Streams, because, frankly, you don't frequently see the execution
frameworks with that kind of performance magic: striped submission
queues, in-submitter execution while pool threads ramp up, false sharing
avoidance in thread queues, randomized balancing with super-fast PRNGs,
lock-free/relaxed-ops work queues, avoiding multiword-cas/locked
implementations of control words, branch prediction considerations, etc.
FJP tackles the problem of exploiting the internal parallelism without
sacrificing the external one. How successful is pipelining at those
things? I mean, surely, you can do something like Disruptor with
busy-wait handoffs, but in my mind, it is even more "non-sympathetic" to
other code than running a few additional pools full of threads.
>
> The following blog shows just how bloggers can so easily misuse parallel
> collections by having no sympathy for CPU resource on a system. I think
> this is only the tip of the iceberg.
>
> http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/
> <http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/>
Breaking news: not a silver bullet again! You can't actually run faster
with #Threads > #CPUs! That parallel() thing is a lie!
If you look at the benchmarks there... well... um... I would just say it
is a good exercise for seasoned benchmark guys to spot the mistakes
which make the results questionable. Anyway, if we want to *speculate*
the experimental setup is miraculously giving us the sane performance data:
* Sorting in now only 20% faster – a 23X decline.
* Filtering is now only 20% faster – a 25X decline.
* Grouping is now 15% slower.
(That 23X-25X decline is red herring because it compares the results of
two different tests).
Am I reading it right? You put 10 client threads submitting the same
task in the pool, and you are *still* 20% faster on parallel tests? And
that is on 8 hardware threads machine (which is a funny pitfall on its
own)? That means, even when external parallelism is present, you can
still enjoy the benefits of the internal one? Or that is a fever dream
of an overloaded machine?
One of the big things is that stream() methods like sum() don't work on BigDecimal or BigInteger. Why is using BigDecimal in Java so painful. :(
I would have to say that every highly scalable system I’ve been involved with has employed pipe-lining. FJ is interesting but IMHO it’s use cases are limited and questionable. Unfortunately I fear that FJ nepotism has influenced Java’s implementation of Lambda’s. I smell a “Spring” like opportunity here.
The streaming API is aimed at general use from what I've heard on the conference circuit of late. That means it is sharing a machine with lots of other threads involved in "general use" within applications, e.g. a web container with many threads in its pool. If the default is to assume exclusive access to the system resources then I'd say that is somewhat naive. The same can be said for any component/framework that starts its own "inconsiderate" thread pool.
Thank you for the tip.
--
On 04/13/2014 08:02 PM, Martin Thompson wrote:Ok, the real value for business is code brevity, which means more
> I'm trying to stimulate a healthy debate and increase the understanding
> in our community. My primary goal to to see software developed that
> delivers real value for a business.
readability, more expressiveness, less bugs, less maintenance burden.
You seem to be leaning towards peak performance, and that thing is at
odds with usability. For 99.9999% of businesses peak application
performance is the second order concern. If there is an easy performance
boost with minimal effort, business will go there as well.
> Divide-and-conquer is one way to address parallel computing. You areUmmm. How would you do otherwise with the language which embraces shared
> right in that it is a shame this paper is very one sided. However I
> think the core focus on parallelism within the Java community is very
> one sided towards shared memory designs and FJ.
memory? Anyway, that statement is invalidated by Akka (which is an
obvious departure from shared memory model) that is driven by FJP. Why?
Because metal is shared memory, and to have close to bare metal
performance, you have to face shared memory at some level.
The following article proposes that the FJ framework and parallel collections could be a calamity for Java.
I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.
I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.
The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.
I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.Regards,Martin...
On 04/13/2014 09:43 PM, Martin Thompson wrote:Parallel streams obviously meet their goals of providing the accessible
> How did I give the impression I'm leaning towards peak performance? I'm
> only exploring the subject of parallel streams and FJ for if they meet
> their goals.
parallelism to users. FJP obviously meets its goals of providing the
foundation for that parallel work (validated by JDK 8 itself, Akka,
GPars, etc)
Wait, what? Which context? I don't care about the utilization, I don't
> Performance is a misdirection in this context. Going parallel is this
> context is about increasing utilisation of our modern multicore hardware.
think anyone cares about increasing the utilization unless you run the
power company billing the datacenter. I do care about performance though.
> Here you are saying business value is coming from parallel streamsIf you re-read that thought carefully: every technology *does*
> making things easier then later you say every technology "complicates
> the mental model". This feels like a contradiction.
complicate the mental model, by sweeping unnecessary things under the
rug, but adding to the under the rug mess. The "common" usages, however,
are simplified at the expense of increased complexity elsewhere.
This is what I see in this thread: it is harder to bend parallel streams
to do *exactly* what you want low-level-wise, but that's only the price
for exposing the entire realm of Java developers to readable and
maintainable parallel code.
> For code to be maintainable it must be clear and easy to reason about.And I would argue programming is hard. Not easy to maintain or debug
> I think many would argue that larger scale apps built with FJ or
> Map-Reduce are not easy to maintain or debug.
compared to what? Is there an option which makes solving the problems
FJ/MR systems are facing easier *without* sacrificing the benefits of
FJ/MR? (Hint-hint: you are not in the single-threaded Texas anymore).
> The statement is not invalidated by Akka. Akka is from the Scala...and yet, FJP is their default high-performance executor.
> community and not to be found in the JDK or JEE. Also FJP is only one of
> many possible ways of scheduling actors.
> When I go for bare metal performance I only used shared memory as aThat accurately describes the we-care-about-performance approach for
> means of message passing as this maps very cleanly to the cache
> coherence model I'm actually sitting on as a non-leaky abstraction.
modern Java today: using, providing, and improving light-weight
inter-thread communication primitives [see e.g. entire j.u.c.*, other
lock-free stuff, fences, enhanced volatiles, etc]. Does that mean Java
community and core Java team is "open thinking in this area", contrary
to what
On 04/13/2014 10:44 PM, Martin Thompson wrote:
> On 13 April 2014 19:12, Aleksey Shipilev <aleksey....@gmail.com
> "Obviously", where is the evidence? You may be right but you cannot makeThe blog links you were posting are the evidence for that: users get
> that statement yet.
parallel speedups with parallelStream(). Since that code uses FJP to
achieve those speedups, it validates the use of FJP.
But you want something else? You want it to deliver speedups in all the
cases? (To quote yourself, being unable to "so easily misuse parallel
collections by having no sympathy for CPU resource on a system").
Now if you think there are better options, the burden of proof is on
you. Can you beat the FJP-backed parallelStream() performance with
non-FJP-backed actors and/or pipelines in similar scenarios?
> Without efficient utilisation you do not get performance. You need to
> efficiently utilise the other cores to get the parallel speedup.r
Um, no? Utilization is tangential to performance. I don't have to
"efficiently" utilize the cores to get the speedup (note you mix
"speedup" and "parallel speedup" freely, but these are not the same), I
just have to use the cores... sometimes. For example, the non-obvious
thing for FJP and Streams is that there are clear cases where it is
better *not* to use the core and stay local for short tasks (this is
where execute-in-submitter thing was born from -- contrary to the belief
that those bookworm academicians are here to kill us all).
> Streams can absolutely improve code clarity for those who embrace setOh. I guess programming is even harder for alphabet deniers. Seriously,
> theory.
Martin! I stopped reading after this line.
--