Hi Niko,
thanks for your great blog post [1] on the evolution of the Parallel JavaScript API. It was really time to finally write this up somewhere. While I agree on the general theme, I have some comments to flesh out the remainder of the API.
In your blog post, the parallel methods are provided on instances of ArrayType, like the ImageType in your example. This enables the use of the type information from such an instance to define the iteration space and result type for pbuild. However, at the same time, it forces applications of pbuild to always parallelize up to the first non-array element in a type. Yet, a reasonable use case would be to compute a matrix of values where each row is computed sequentially but all the rows can be computed in parallel. Using ParallelArray, this could be expressed by providing only a 1-element size vector to the constructor and compute an array value as the result of the elemental function.
One solution to enable this in the binary data version of the API would be to add an (optional) depth argument to pbuild to give the programmer a handle on the iteration depth. Or, we could make the type creation implicit in pbuild. This, however, would require making pbuild a method on ArrayType. Here is what this could look like using your example
function computePixel(x, y) {
...
return PixelType({ r:..., g:..., b:..., a:... });
}
var myImage = ArrayType.pbuild([W, H], (x, y) => computePixel(x, y))
Here, the result would be an object of type ArrayType(ArrayType(ObjectPointer, H), W), as we did not tell the system what the actual result of computePixel looks like. This can be fixed by providing an optional type specification for the elements like so:
var myImage = ArrayType.pbuild([W, H], (x, y) => computePixel(x, y), PixelType)
Now, pbuild will convert whatever computePixel returns into an object of type PixelType (which it happens to already be in this example) and we get the expected overall result of type ArrayType(ArrayType(PixelType, H), W).
While it may seem weird at first glance to have pbuild create the type and an instance of that type in the same step, this is the natural semantics of map: Given some object of some existing type, it creates a new object of a potentially new type. A possible signature could be
myImage.map(2, computeGrayscale, uint8)
where computeGrayscale is the usual function from PixelType to a single grayscale value. The first argument to map provides the depth and the last argument specifies the type of each new element computed. Here, we would end up with an overall type of ArrayType( ArrayType( uint8, H), W). This is very similar to the proposal in [2].
While for pbuild, just using a depth and deriving the result type from the ArrayType instance that pbuild is called on is an option, requiring map to always produce results of the type of the object is was called on is too restrictive. So does it make sense to unify the API of pbuild and map to use type specifications or should pbuild be different?
Other than in [2], when adding the API to binary data, it does no longer make sense to specify an element type for filter, as this can easily be derived, in particular as filter does not alter elements. For scatter and scan, however, one might want to convert to a more flexible representation. So these two might use a signature similar to that of map.
In the interest of keeping the discussion focused, I'll address some other points I have in a later email.
Stephan
[1]
http://smallcultfollowing.com/babysteps/blog/2013/05/29/integrating-binary-data-and-pjs/
[2]
http://wiki.ecmascript.org/doku.php?id=strawman:data_parallelism#support_for_binary_data