I agree with everything you have said. Even for simpler frameworks, there is still a surprising number of ways to misuse them. To some degree this was Java's brilliance. A minimum of features which minimise edge cases. To be fair, my own libraries are *really* bad in this regard. ;)
I recently had cause to migrate some C# code to Java and have seen some cool uses of closures but also some really dire ones.
List<String> list = new ArrayList<>();
list.stream().forEach(p -> listA.add(p));
list.stream().forEach(p -> listB.add(p));
The following article proposes that the FJ framework and parallel collections could be a calamity for Java.I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.
The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.
--I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are a really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.Regards,Martin...
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Like any workers pool, the default fork/join pool (ForkPools.commonPool)
used by the parallel Stream API has to be configured globally for the
whole application. The default configuration consider that all cores are
available which is obviously wrong if you have a server.
So what ?
> I've been uncomfortable with FJ for some time. I see people struggling
> to design for it, debug it, and more often fail to get performance
> benefits. Other approaches such as pipelining tasks can often be way
> more effective and easier to reason about.
I am comfortable with FJP, and I am happy seeing its use in JDK 8
Streams, because, frankly, you don't frequently see the execution
frameworks with that kind of performance magic: striped submission
queues, in-submitter execution while pool threads ramp up, false sharing
avoidance in thread queues, randomized balancing with super-fast PRNGs,
lock-free/relaxed-ops work queues, avoiding multiword-cas/locked
implementations of control words, branch prediction considerations, etc.
FJP tackles the problem of exploiting the internal parallelism without
sacrificing the external one. How successful is pipelining at those
things? I mean, surely, you can do something like Disruptor with
busy-wait handoffs, but in my mind, it is even more "non-sympathetic" to
other code than running a few additional pools full of threads.
>
> The following blog shows just how bloggers can so easily misuse parallel
> collections by having no sympathy for CPU resource on a system. I think
> this is only the tip of the iceberg.
>
> http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/
> <http://www.takipiblog.com/2014/04/03/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour/>
Breaking news: not a silver bullet again! You can't actually run faster
with #Threads > #CPUs! That parallel() thing is a lie!
If you look at the benchmarks there... well... um... I would just say it
is a good exercise for seasoned benchmark guys to spot the mistakes
which make the results questionable. Anyway, if we want to *speculate*
the experimental setup is miraculously giving us the sane performance data:
* Sorting in now only 20% faster – a 23X decline.
* Filtering is now only 20% faster – a 25X decline.
* Grouping is now 15% slower.
(That 23X-25X decline is red herring because it compares the results of
two different tests).
Am I reading it right? You put 10 client threads submitting the same
task in the pool, and you are *still* 20% faster on parallel tests? And
that is on 8 hardware threads machine (which is a funny pitfall on its
own)? That means, even when external parallelism is present, you can
still enjoy the benefits of the internal one? Or that is a fever dream
of an overloaded machine?
One of the big things is that stream() methods like sum() don't work on BigDecimal or BigInteger. Why is using BigDecimal in Java so painful. :(
I would have to say that every highly scalable system I’ve been involved with has employed pipe-lining. FJ is interesting but IMHO it’s use cases are limited and questionable. Unfortunately I fear that FJ nepotism has influenced Java’s implementation of Lambda’s. I smell a “Spring” like opportunity here.
The streaming API is aimed at general use from what I've heard on the conference circuit of late. That means it is sharing a machine with lots of other threads involved in "general use" within applications, e.g. a web container with many threads in its pool. If the default is to assume exclusive access to the system resources then I'd say that is somewhat naive. The same can be said for any component/framework that starts its own "inconsiderate" thread pool.
Thank you for the tip.
--
On 04/13/2014 08:02 PM, Martin Thompson wrote:Ok, the real value for business is code brevity, which means more
> I'm trying to stimulate a healthy debate and increase the understanding
> in our community. My primary goal to to see software developed that
> delivers real value for a business.
readability, more expressiveness, less bugs, less maintenance burden.
You seem to be leaning towards peak performance, and that thing is at
odds with usability. For 99.9999% of businesses peak application
performance is the second order concern. If there is an easy performance
boost with minimal effort, business will go there as well.
> Divide-and-conquer is one way to address parallel computing. You areUmmm. How would you do otherwise with the language which embraces shared
> right in that it is a shame this paper is very one sided. However I
> think the core focus on parallelism within the Java community is very
> one sided towards shared memory designs and FJ.
memory? Anyway, that statement is invalidated by Akka (which is an
obvious departure from shared memory model) that is driven by FJP. Why?
Because metal is shared memory, and to have close to bare metal
performance, you have to face shared memory at some level.
The following article proposes that the FJ framework and parallel collections could be a calamity for Java.
I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.
I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.
The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.
I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.Regards,Martin...
On 04/13/2014 09:43 PM, Martin Thompson wrote:Parallel streams obviously meet their goals of providing the accessible
> How did I give the impression I'm leaning towards peak performance? I'm
> only exploring the subject of parallel streams and FJ for if they meet
> their goals.
parallelism to users. FJP obviously meets its goals of providing the
foundation for that parallel work (validated by JDK 8 itself, Akka,
GPars, etc)
Wait, what? Which context? I don't care about the utilization, I don't
> Performance is a misdirection in this context. Going parallel is this
> context is about increasing utilisation of our modern multicore hardware.
think anyone cares about increasing the utilization unless you run the
power company billing the datacenter. I do care about performance though.
> Here you are saying business value is coming from parallel streamsIf you re-read that thought carefully: every technology *does*
> making things easier then later you say every technology "complicates
> the mental model". This feels like a contradiction.
complicate the mental model, by sweeping unnecessary things under the
rug, but adding to the under the rug mess. The "common" usages, however,
are simplified at the expense of increased complexity elsewhere.
This is what I see in this thread: it is harder to bend parallel streams
to do *exactly* what you want low-level-wise, but that's only the price
for exposing the entire realm of Java developers to readable and
maintainable parallel code.
> For code to be maintainable it must be clear and easy to reason about.And I would argue programming is hard. Not easy to maintain or debug
> I think many would argue that larger scale apps built with FJ or
> Map-Reduce are not easy to maintain or debug.
compared to what? Is there an option which makes solving the problems
FJ/MR systems are facing easier *without* sacrificing the benefits of
FJ/MR? (Hint-hint: you are not in the single-threaded Texas anymore).
> The statement is not invalidated by Akka. Akka is from the Scala...and yet, FJP is their default high-performance executor.
> community and not to be found in the JDK or JEE. Also FJP is only one of
> many possible ways of scheduling actors.
> When I go for bare metal performance I only used shared memory as aThat accurately describes the we-care-about-performance approach for
> means of message passing as this maps very cleanly to the cache
> coherence model I'm actually sitting on as a non-leaky abstraction.
modern Java today: using, providing, and improving light-weight
inter-thread communication primitives [see e.g. entire j.u.c.*, other
lock-free stuff, fences, enhanced volatiles, etc]. Does that mean Java
community and core Java team is "open thinking in this area", contrary
to what
On 04/13/2014 10:44 PM, Martin Thompson wrote:
> On 13 April 2014 19:12, Aleksey Shipilev <aleksey....@gmail.com
> "Obviously", where is the evidence? You may be right but you cannot makeThe blog links you were posting are the evidence for that: users get
> that statement yet.
parallel speedups with parallelStream(). Since that code uses FJP to
achieve those speedups, it validates the use of FJP.
But you want something else? You want it to deliver speedups in all the
cases? (To quote yourself, being unable to "so easily misuse parallel
collections by having no sympathy for CPU resource on a system").
Now if you think there are better options, the burden of proof is on
you. Can you beat the FJP-backed parallelStream() performance with
non-FJP-backed actors and/or pipelines in similar scenarios?
> Without efficient utilisation you do not get performance. You need to
> efficiently utilise the other cores to get the parallel speedup.r
Um, no? Utilization is tangential to performance. I don't have to
"efficiently" utilize the cores to get the speedup (note you mix
"speedup" and "parallel speedup" freely, but these are not the same), I
just have to use the cores... sometimes. For example, the non-obvious
thing for FJP and Streams is that there are clear cases where it is
better *not* to use the core and stay local for short tasks (this is
where execute-in-submitter thing was born from -- contrary to the belief
that those bookworm academicians are here to kill us all).
> Streams can absolutely improve code clarity for those who embrace setOh. I guess programming is even harder for alphabet deniers. Seriously,
> theory.
Martin! I stopped reading after this line.
--
-Aleksey.
Andy I think your point of micro-parallelism is well made but missing at least one possibility. SIMD can provide for micro-parallelism without the coherence costs. Many of the examples of divide-and-conquer doing well are filtering, sorting, reducing, etc. SIMD can add a lot of value here without the scheduling issues of going across threads. Each generation of our CPUs can do more work per cycle using vecorization instructions. Even languages like C#, Dart, and JavaScript are getting implicit and explicit support.
> an email to mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Just a guess but high sys% may be due to use of sleeping wait strategy (I realize that's what original benchmark used). Out of curiosity, can you try using the busy spin one?
Sent from my phone
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
> > an email to mechanical-sympathy+unsub...@googlegroups.com
> > <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>> > > an email to mechanical-sympathy+unsub...@googlegroups.com
> > > <mailto:mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>>.
> > > For more options, visit https://groups.google.com/d/optout
> > <https://groups.google.com/d/optout>.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "mechanical-sympathy" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to mechanical-sympathy+unsub...@googlegroups.com> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>
> > <mailto:mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>>.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to mechanical-sympathy+unsub...@googlegroups.com> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
> > an email to mechanical-sympathy+unsub...@googlegroups.com
> > <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
double post, see below. Google groups worst webapp ever.
> > an email to mechanical-sympathy+unsub...@googlegroups.com
> > <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
Sorry, I'd argue the Fork Join benchmark implementation misses the point.
The point of the benchmark is to measure the speed of processing 1 million individual, independent processing jobs. I simulate processing by computing a slice of PI as a place holder for business logic.
However your test basically creates one Job per thread:"for (int i = 0; i < Shared.THREADS; i++) {PiForkJoinTask task = new PiForkJoinTask(Shared.SLICES / Shared.THREADS);task.fork();tasks.add(task);}"and then computes the slices in a big loop:"protected Double compute() {double acc = 0D;for (int s = 0; s < slices; s++) {acc += Shared.calculatePi(s);}return acc;}"so it basically does 4 jobs computing each a 25.000.000 million slice of Pi instead of processing 1 million jobs each computing a 100 iteration slice of Pi.A correct implementation would have to submit 1 million jobs from the main loop. In a event sourced system e.g. when parallelizing decoding you can't do stuff like that.If implemented like this, FJ will be at a huge disadvantage just because one has to create a lot of FJ jobs, while disruptor just recycles the ringbuffer objects.This kind of micro-parallelization is very common in real application where a process receives hundreds of thousands of events or requests per second with very low processing time for each request/event.
Am Dienstag, 15. April 2014 22:46:52 UTC+2 schrieb Aleksey Shipilev:
> > an email to mechanical-sympathy+unsub...@googlegroups.com
> > <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsub...@googlegroups.com
> <mailto:mechanical-sympathy+unsub...@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
> > an email to mechanical-sympathy+unsubscribe...@googlegroups.com
> > <mailto:mechanical-sympathy+unsubs...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsubscribe...@googlegroups.com
> <mailto:mechanical-sympathy+unsubs...@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
On my I7 laptop (T=3) [did not make the q size adjustment]1.7
Benchmark Mode Samples Mean Mean error Units
n.s.Disruptor.run avgt 60 355,577 13,418 ms/op1.8
Benchmark Mode Samples Mean Mean error Units
n.s.Disruptor.run avgt 60 518,855 14,611 ms/opn.s.ForkJoinRecursiveDeep.run avgt 60 280,507 1,534 ms/opFJ is an amazing piece of work, I am somewhat baffled .. but how can I leverage this for streaming input ?
Am Mittwoch, 16. April 2014 03:06:50 UTC+2 schrieb mikeb01:
> > an email to mechanical-sympathy+unsubscribe...@googlegroups.com
> > <mailto:mechanical-sympathy+unsubs...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-sympathy+unsubscribe...@googlegroups.com
> <mailto:mechanical-sympathy+unsubs...@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
On 04/16/2014 03:04 AM, Rüdiger Möller wrote:
> Sorry, I'd argue the Fork Join benchmark implementation misses the point.
>
> The point of the benchmark is to measure the speed of processing 1
> million individual, independent processing jobs. I simulate processing
> by computing a slice of PI as a place holder for business logic.
That is why we have ForkJoinRecursiveDeep.
> This kind of micro-parallelization is very common in real application
> where a process receives hundreds of thousands of events or requests per
> second with very low processing time for each request/event.
The original benchmark states nothing about it. You wanted to compute N
slices of M iterations each, ForkJoinRecursiveDeep does just that. True
that FJP may not be useful when you face a specific use case which
pushes you to submit millions of tasks in the pool, but there are
obvisouly multiple ways you can tackle the problem. Solving the
particular problem (computing Pi in slices) in F/J-natural way brings
lots of performance, and therefore, this particular experimenta serves
as a perfect counter-example that this:
On 04/15/2014 10:32 PM, Rüdiger Möller wrote:
> If I want to multithread for performance, I am better off using a
> pipelining scheme, as the intrinsic overhead of lambda/FJ is high
> compared to a pipelining (or well implemented actor) approach.
...is generally wrong.
I think point-wise measures are not enough, btw. We need to measure this
with different number of threads in the pool, and different number of
threads submitting the work. JMH API can simplify coding up the scenario
like that.
Yeah, I did not mean to say "narrow", sorry (if you are going to write
mail just after waking up, you gonna have a bad time).
But, I did want
to say that in scenarios which FJP was designed for (that is, massive
*data* parallelism, which fits the use case of JDK 8 Streams), it works
remarkably well.
> Any possibility to backport the FJ bench to 1.7 ? I noticed a
> significant performance degradation of disruptor with 1.8.
It is possible, but the caveat would be FJP missing lots of very
profitable optimizations done for Streams work. Pick your poison I
guess. That said, I think something like this will be runnable on JDK 7:
Not only I think single client *is* a special case (most of the business
I know are running for multiple clients; HFT is different in this
regard), benchmarking-wise it could introduce the unwarranted effects of
clients unable to saturate the executors. See below.
I pushed a few tiny changes to worspace, including the one which does
JDK 8 Stream-ish Pi calculation (still arguably fair since it will
decompose sensibly, not until the last slice). Here's the result on my
laptop:
Hi,
I finally did the original benchmark with an adaption of your ForkJoinRecursiveDeep on my opteron 2x8c8t. The results are amazing. FJ scales perfectly even on this machine (notorious to descale whenever cross socket comes into play) ! The descaling of disruptor does not occur on intel. I'll do tomorrow.
The standalone 1.7 Test (copy pasteable) is here:
I had to adapt your impl to actually compute PI which required the addition of a 2cnd int to FJTask. I ran it using 1.7_45 (to keep everything same compared to original benchmarks).
On 04/17/2014 01:15 AM, Rüdiger Möller wrote:
> <https://lh5.googleusercontent.com/-6QzYX6x9LOg/U07y1Le-JDI/AAAAAAAAAOI/VVNZUkUODRQ/s1600/fjointable.png>
>
> <https://lh3.googleusercontent.com/-f1rS4z9ftec/U07yxFAEcSI/AAAAAAAAAOA/io7KY_u3Z5U/s1600/ForkJoin.png>
(checks the thread subject)
So... This is still a thread on "Fork-Join *Calamity*", right?
* Error bounds would be nice to have on that graph.
* Disruptor results seem very worrying: what happens past the 9 cores?
-Aleksey.
googlegroups interface sucks hard, had to move to windoze in order to be able to comment the post .. JS is the future of programming btw.
* Disruptor results seem very worrying: what happens past the 9 cores?
Don't know. Mike Barker also had no clue. Its an Opteron artefact, does not happen on intel machines.
I think the reason the Disruptor behaves this way on AMD for this is due to a floating point unit being shared between pairs of cores on Opteron. Intel cores have their own FP execution unit. I suspect, but unfortunately don't have time to verify, that the FJP does a much better job of coping when insufficient CPU resource exists compared to the Disruptor. The Disruptor is not a good general purpose framework to be used when more threads want to run than cores are available.
Work stealing pools can be a good solution for throughput focused use cases. The Disruptor is designed for response time focused use cases.
The following article proposes that the FJ framework and parallel collections could be a calamity for Java.I've been uncomfortable with FJ for some time. I see people struggling to design for it, debug it, and more often fail to get performance benefits. Other approaches such as pipelining tasks can often be way more effective and easier to reason about.I also find it amusing that after years of trying to hide databases behind ORMs that to use the parallel collections effectively you need to understand set theory for writing good queries.The following blog shows just how bloggers can so easily misuse parallel collections by having no sympathy for CPU resource on a system. I think this is only the tip of the iceberg.I'm curious to know if others have doubts about either Fork-Join or parallel collections, or if these are really good ideas and somehow the penny has not dropped for me? I'd really like to see a good evidence based debate on this subject.Regards,Martin...
I think people would be much better off not saying "general" in 99.99%
of the cases. There's nothing "general" at all.
Before you jump to conclusions, make sure you did three things:
a) Do proper benchmarking: I already did the JMH stub for FJP and
Disruptor -- not only because it does introduce run-to-run variance
estimates, computes error bounds, et cetera, but also because it is more
easily reviewable.
b) From the usages you are having for FJP and the previous requests, I
infer you are running JDK 7? Although I specifically called this
experiment potentially misleading, because it misses the major FJP
optimizations in JDK 8.
c) Understanding the performance model includes the research how all
implementations react to changing the duration of those "events", in
order to quantify what does each implementation consider to be "unfitting".
> Like 90% of real world (server) applications are processing
> independent, short running events. Splitting up big computational
> tasks is somewhat rare.
As someone else said in this thread, mentioning percents of real world
"should be avoided as its strong subjective color unless you have at
less a public poll to support this". JDK 8 Streams is the major evident
counter-example of this.
> Therefore (besides more fundamental considerations scratched in OP) I
> think adding a pipelining based concurrency component to JDK would be
> reasonable (e.g. something like Disruptor).
Cool, advocate the need for that implementation with better due
diligence (see above), push it through the community process (e.g. JEP),
and actually work on coding/adjusting it for JDK. This would be real
community work :)
On a side node (also to Mr Harned): We are all here to gain insight, there is no point in devaluing each others work or insulting other people.
I'm not interested in a benchmark pissing contest. Benchmarks are great but only when comparing like-for-like. Maybe my little brain cannot see where I best use them in typical applications and I'm just too comfortable with existing tool kit of pipelines, actors, and work pools; each used where appropriate.
--
Rüdiger Möller: I know the protocol is to go after the object not the man. However, in any investigation there comes a time when one needs to look at the perpetrators. These articles have been around for four years now scrutinizing the object in detail and today most people don’t even want to look at the crime scene much less the participants.
The professor made an honest mistake in repeatedly trying to port Cilk to Java but Oracle has gravely exacerbated that mistake by using the F/J framework as their parallel engine in Java8. Perhaps they were looking for a cheap, readily available multitasking program rather than building one themselves. In any case, the framework has serious flaws that have the potential to cause harm in the coming years.
ed
If we're done with benchmarks. Let me respond to the original subject: evidence based debate on the design of the F/J framework.
Fork/Join means to split the work into fragments and join the results. Specifically, fork repeatedly and when done, join the results. This framework doesn’t do that. It uses a recursive technique that is fork-and-join. The join does not, cannot, and will never work outside of a closed environment. The professor tried and tried again to make it work in the open environment of Java but it is fatally flawed; it can never work. Therefore, the framework’s basic design is a failure.
However in real systems the dominant use-case is the latter (e.g. Akka). Another example: F/J would is very fast at sorting a 100 million array of elements, but slow when I have to sort 10 million arrays of length 10 without being allowed to batch them in large chunks.
However in real systems the dominant use-case is the latter (e.g. Akka). Another example: F/J would is very fast at sorting a 100 million array of elements, but slow when I have to sort 10 million arrays of length 10 without being allowed to batch them in large chunks.
Sitting on a mailing list denying that data parallel problems don't exist in real systems is the kind of claim that you would really need to do a proper empirical analysis in order to convince me of. If you've done such an analysis then by all means link me to it.
I've been following this discussion from the sidelines and I think that there's a lot of assumptions being fired around about popularity of use cases. I can think of a fair few examples where people have jobs that are suited to data-parallelism and jobs where things aren't suited to data parallelism and where a task parallel approach would be better.Sitting on a mailing list denying that data parallel problems don't exist in real systems is the kind of claim that you would really need to do a proper empirical analysis in order to convince me of. If you've done such an analysis then by all means link me to it.
If you haven't done that then a much more constructive approach would be to take a look at core Java APIs and identify things that need to be fixed to support pipeline based task parallelism. To me this is an actual issue, unlike the introduction of support for data parallel operations on collections.
Here's an example of the kind of thing I mean: JDBC is an inherently synchronous design. Its use doesn't fit at all well into a pipelined/reactive/async world view, although it does work fine in a blocking servlet-container model. I appreciate that a lot of the HFT people on this list are immediately rolling their eyes at the use of a SQL database but this is something that's commonly used and if you're wanting to evolve Java its a viable target for improvement.
However in real systems the dominant use-case is the latter (e.g. Akka). Another example: F/J would is very fast at sorting a 100 million array of elements, but slow when I have to sort 10 million arrays of length 10 without being allowed to batch them in large chunks.[citation needed]
Sitting on a mailing list denying that data parallel problems don't exist in real systems is the kind of claim that you would really need to do a proper empirical analysis in order to convince me of. If you've done such an analysis then by all means link me to it.+1
Why would someone use FJ for 10 million 10 entry arrays? There is nothing to FJ. You would use a simple partitioned work queue.
Am Samstag, 19. April 2014 15:01:17 UTC+2 schrieb Robert Engels:
Why would someone use FJ for 10 million 10 entry arrays? There is nothing to FJ. You would use a simple partitioned work queue.
The array example is fuzzy as its not a diamond pattern anymore (consider something like having to add up the size of all the tiny sorted arrays). But anyway the overhead of scheduling/enqueing using JDK's threadpools is too high. You won't see any scaling effects.