Hi Gabriel,
Most of the discussion has taken place on phone calls, in email, and on the
irc channel (#pjs on
irc.mozilla.org). I'm not aware of much having been
posted to blogs or mailing lists.
Let me summarize briefly my view of the matter (I'm only one of several
engineers working on PJS and I came late to it, so this is not necessarily
a consensus view). And this is about PJS as it was implemented in
Spidermonkey, not necessarily the idea in general. I'll use the present
tense even though the code is gone now.
Consider a simple pjs calculation where a is some Array or TypedObject
array:
var b = a.mapPar((x) => x*x)
The kernel function that's the argument to mapPar can be written in a
fairly large subset of Javascript. b is always a fresh array of the same
type as a.
Two significant problems with this API are predictability and performance.
First, it turns out to be hard to describe the subset of Javascript that
can be parallelized reliably, because that subset tends to depend on engine
details. That's not to say it couldn't be done, and it'd be a blocker to
standardization not to do so, but it appears difficult to make that
portable subset large enough to be interesting without very significant
work.
Second, that API has poor performance in many cases. A fresh array is
returned every time, driving up memory management costs, and if the kernel
function is small then the overhead of setting up the parallel computation,
which is unfortunately significant, is only recovered if the array is quite
large. (The overhead includes: warming up and recompiling the kernel code
for parallel computation; setting up and tearing down memory management for
the parallel section; and distributing the work to the workers.) We see
good parallel speedup on large computations that are done repeatedly, but
it is usually a chore to structure the computation to get that speedup.
"Casual" use of PJS with that API will not be useful in many cases;
"nobody" with some random array that they're currently mapping would
benefit from mapping it in parallel.
As a consequence of especially the performance problem - if we couldn't fix
that then the subset description issue would not matter - we started to
investigate alternative APIs, including a parallel pipeline pattern and a
more imperative OpenCL-like pattern, these would perhaps have allowed the
performance to be controlled better by reducing per-iteration overhead and
controlling storage use. Both saw some work but had a ways to go. We also
worked hard at reducing setup overhead, and had some success; more would
have to be done.
In the end, we had used about all the resouces we could afford to use on
PJS, and instead of letting the code languish (and in the process
complicate the engine) we decided to remove it.
--lars