Integrating Binary Data and PJs: pbuild and map

Herhut, Stephan A

unread,

Jun 4, 2013, 9:05:05 PM6/4/13

to Niko Matsakis, dev-tech-js-en...@lists.mozilla.org

Hi Niko,

thanks for your great blog post [1] on the evolution of the Parallel JavaScript API. It was really time to finally write this up somewhere. While I agree on the general theme, I have some comments to flesh out the remainder of the API.

In your blog post, the parallel methods are provided on instances of ArrayType, like the ImageType in your example. This enables the use of the type information from such an instance to define the iteration space and result type for pbuild. However, at the same time, it forces applications of pbuild to always parallelize up to the first non-array element in a type. Yet, a reasonable use case would be to compute a matrix of values where each row is computed sequentially but all the rows can be computed in parallel. Using ParallelArray, this could be expressed by providing only a 1-element size vector to the constructor and compute an array value as the result of the elemental function.

One solution to enable this in the binary data version of the API would be to add an (optional) depth argument to pbuild to give the programmer a handle on the iteration depth. Or, we could make the type creation implicit in pbuild. This, however, would require making pbuild a method on ArrayType. Here is what this could look like using your example

function computePixel(x, y) {
...
return PixelType({ r:..., g:..., b:..., a:... });
}
var myImage = ArrayType.pbuild([W, H], (x, y) => computePixel(x, y))

Here, the result would be an object of type ArrayType(ArrayType(ObjectPointer, H), W), as we did not tell the system what the actual result of computePixel looks like. This can be fixed by providing an optional type specification for the elements like so:

var myImage = ArrayType.pbuild([W, H], (x, y) => computePixel(x, y), PixelType)

Now, pbuild will convert whatever computePixel returns into an object of type PixelType (which it happens to already be in this example) and we get the expected overall result of type ArrayType(ArrayType(PixelType, H), W).

While it may seem weird at first glance to have pbuild create the type and an instance of that type in the same step, this is the natural semantics of map: Given some object of some existing type, it creates a new object of a potentially new type. A possible signature could be

myImage.map(2, computeGrayscale, uint8)

where computeGrayscale is the usual function from PixelType to a single grayscale value. The first argument to map provides the depth and the last argument specifies the type of each new element computed. Here, we would end up with an overall type of ArrayType( ArrayType( uint8, H), W). This is very similar to the proposal in [2].

While for pbuild, just using a depth and deriving the result type from the ArrayType instance that pbuild is called on is an option, requiring map to always produce results of the type of the object is was called on is too restrictive. So does it make sense to unify the API of pbuild and map to use type specifications or should pbuild be different?

Other than in [2], when adding the API to binary data, it does no longer make sense to specify an element type for filter, as this can easily be derived, in particular as filter does not alter elements. For scatter and scan, however, one might want to convert to a more flexible representation. So these two might use a signature similar to that of map.

In the interest of keeping the discussion focused, I'll address some other points I have in a later email.

Stephan

[1] http://smallcultfollowing.com/babysteps/blog/2013/05/29/integrating-binary-data-and-pjs/
[2] http://wiki.ecmascript.org/doku.php?id=strawman:data_parallelism#support_for_binary_data

Niko Matsakis

unread,

Jun 5, 2013, 5:45:15 AM6/5/13

to Herhut, Stephan A, dev-tech-js-en...@lists.mozilla.org

> While for pbuild, just using a depth and deriving the result type
> from the ArrayType instance that pbuild is called on is an option,
> requiring map to always produce results of the type of the object is
> was called on is too restrictive. So does it make sense to unify the
> API of pbuild and map to use type specifications or should pbuild be
> different?

This is an interesting point. I had assumed (but not stated in that
post) that `pbuild` would take a depth argument (defaulting to the
number of dimensions present on the array type). However you are
correct that there is an analogy to map that could be made.

I've been wanting to write up a strawman of the binary-data-based API
so that we can all be talking about the same concrete document. I will
try to come up with something today and post it.

Niko

Hudson, Rick

unread,

Jun 5, 2013, 9:52:17 AM6/5/13

to Niko Matsakis, Herhut, Stephan A, dev-tech-js-en...@lists.mozilla.org

" Every block is associated with a fixed block type, which describes the permanent shape, size, and interpretation of the block, somewhat like a runtime type tag. All references to a given block in the program store are associated with the same block type. Consequently, implementations can allocate blocks as untagged memory buffers (e.g., raw C data structures) without violating memory safety."
- http://wiki.ecmascript.org/doku.php?id=harmony:binary_data (as of 6/5/2013)

One question we did have related to binary data was whether partition and flatten are doable (without copying the data) given binary data's statement above. We also think that typed arrays have similar issues since they allow type punning of the same backing store (ArrayBuffer). The concern above is that punning on types might be unsafe but in the case of flatten / partition and typed arrays this isn't an issue w.r.t. memory safety.

So is binary data going to live beside typed arrays or is there going to be sharing of the prototype chain?

- Rick

_______________________________________________
dev-tech-js-engine-rivertrail mailing list dev-tech-js-en...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-rivertrail

Niko Matsakis

unread,

Jun 5, 2013, 9:59:23 AM6/5/13

to Hudson, Rick, Herhut, Stephan A, dev-tech-js-en...@lists.mozilla.org, Niko Matsakis

> One question we did have related to binary data was whether
> partition and flatten are doable (without copying the data) given
> binary data's statement above. We also think that typed arrays have
> similar issues since they allow type punning of the same backing
> store (ArrayBuffer). The concern above is that punning on types
> might be unsafe but in the case of flatten / partition and typed
> arrays this isn't an issue w.r.t. memory safety.

Although this is not spelled out in the wiki, my understanding is that
you can "reinterpret cast" by re-using the same backing store. For example:

var Array2d = new ArrayType(new ArrayType(uint8, W), H);
var Array1d = new ArrayType(uint8, W * H);
var data2d = new Array2d();
var data1d = new Array1d(data2d.buffer, 0);

We could retain partition and flatten as functions, but they are
trivially implementable in terms of reinterpret casting.

> So is binary data going to live beside typed arrays or is there going to be sharing of the prototype chain?

I am not 100% sure on this point.

Niko

Hudson, Rick

unread,

Jun 5, 2013, 10:51:27 AM6/5/13

to Niko Matsakis, Herhut, Stephan A, dev-tech-js-en...@lists.mozilla.org, Niko Matsakis

The other issue was the issue of using p (or parallel) as a prefix or a postfix.
Parallel: this seems a bit verbose and in the long term is likely to be annoying.
p as a postfix has a Lisp tradition of indicating a predicate.
p as a prefix remains the color of my boatshed.

- Rick

Felix S. Klock II

unread,

Jun 5, 2013, 10:58:21 AM6/5/13

to Hudson, Rick, Herhut, Stephan A, dev-tech-js-en...@lists.mozilla.org, Niko Matsakis, Niko Matsakis

I have sympathy for the "Lisp tradition" argument, but I think the
tooling argument trumps it.

(As I understand the situation, code-completion tools match based on
prefix-matching, so that's why adding 'p' as a suffix rather than a
prefix matters. Then again, I don't use code-completion tools too much,
so maybe I am mistaken.)

There was some recent email about this between Niko and a person I did
not recognize; unfortunately I cannot find the e-mail. (I really need
to drop Thunderbird as my e-mail client. Or maybe someone can tell me
if its search engine can be replaced with something decent.)

-Felix

--
irc: pnkfelix on irc.mozilla.org
email: {fklock, pnkfelix}@mozilla.org

Niko Matsakis

unread,

Jun 5, 2013, 11:08:01 AM6/5/13

to Felix S. Klock II, Herhut, Stephan A, dev-tech-js-en...@lists.mozilla.org, Hudson, Rick, Niko Matsakis

I think at this point I prefer "buildPar", "mapPar", and so forth,
mostly because of the tooling argument.

Niko