Dear all,
Just to summarize our discussion during out last meeting:
Nomenclature: I will use the concept of cell and frame shape in the following discussion. Here is an informal definition: The frame shape describes the structure that is created by the array operation. For example, in a comprehension constructor new ParallelArray([2,4], f) the frame shape would be [2,4]. This is essentially the scaffolding that the array operation creates to hold the values produced by the elemental function. I will use cell shape to refer to the shape of those values, i.e., the values produced by the elemental function. Thus, the cell shape is a property of the elemental function.
The key idea in the binary data proposal for ParallelArray then is to allow the programmer to provide type descriptions for the cell shape as an optional argument _to every array operation_. There is no implicit propagation nor inference. Also, the annotations are designed such that they are fully optional, i.e., the programmer can simply erase them and the code will still work, but might produce slightly different results due to lack of data coercions. Type annotations may only contain fixed size types, i.e., we do not allow ArrayType with dynamic length.
An example would be:
new ParallelArray([2,3], function (j,k) { return [j,k]; }, ArrayType(uint8, 2)}
Here, the resulting ParallelArray object would have a backing store of type ArrayType(ArrayType(ArrayType(uint8, 2), 3), 2). On write, we use the existing coercion rules to convert the result of the elemental function into the annotated cell type. If that fails (not even sure it can, depends on the final binary data spec), we throw the same exception that sequential code would throw in such situations.
Interaction with dynamic shape: In the above example, the result would have a dynamic shape of [2,3], as the leaf elements are not ParallelArray structures. If the programmer instead writes
new ParallelArray([2,3], function (j,k) { return new ParallelArray([j,k], ArrayType(uint8, 2)); }, ArrayType(uint8, 2)}
the dynamic shape will be [2,3,2]. Note the explicit type annotation for the inner ParallelArray. The backing store in this case would be the same as before.
Shu's criticism was that in the above case, the type annotations can no longer be understood as coercions on write, as the ParallelArray object returned by the elemental function is not actually coerced into a ArrayType(uint8, 2) but remains a ParallelArray object with a matching backing store. I have not come up with a solution to this. In the end, if we want to keep type annotations optional and therefore orthogonal to the rest of ParallelArray's workings (which I am strongly in favor of), we have to keep the same dynamic shape semantics as in the untyped case.
To allow programmers to make direct use of the binary data store, we will add a new method to ParallelArray objects
asBinary()
that returns a frozen view of the binary data backing store. If no type annotations were supplied, this view will be a nesting of ArrayType up to the length of the dynamic shape with Any as the element type. If called on data that was not created using type annotations, it is perfectly valid to copy the data to create the requested view.
I hope this covers the results of our discussion.
Stephan
> I see basically two options here:
>
>
> a) Use the full binary data proposal to describe the entire storage layout
>
> b) Support only the numerical data types to describe the storage layout of
> leafs but not the results overall structure
>
> If we go for a, which I think has been discussed here but is not what I aimed
> at initially, I think it is only feasible to have programmers specify the storage
> layout for every operation. So, for instance,
>
> const Color = StructType({r:uint8, g:uint8, b:uint8, a:uint8}); var a = new
> ParallelArray([10], function () { return new Color({r:1, g:2, b:3, a:255});},
> ArrayType(Color, 10)); var b = a.map(toGrayScale, ArrayType(uint8, 10)); //
> where toGrayScale converts rgba to a single luminance value
>
> This tells the engine exactly how to represent data and also gives enough
> information to preallocate. In this setting, it also makes sense to reject values
> that cannot be converted into the specified storage layout. The drawback is
> that it is more difficult to use as the programmer has to do the type inference
> in his head. Then again, if he wants a certain storage layout, he probably has
> a reason for that. In the end, if binary data takes off and JS developers learn
> to write code that way, why shouldn’t they learn to write River Trail that way.
>
> Option b is less restrictive and allows for dynamic inference. It won’t support
> the use of structures or similar but the user will at least be able to define a
> storage class for the values. The above example would then be
>
> var a = new ParallelArray([10], function () { return [1,2,3,255];}, uint8); //
> inferred as ArrayType(ArrayType(uint8, 4), 10) var b = a.map(toGrayScale); //
> uint8 inherited, structure inferred as ArrayType(uint8, 10)
>
> We only tell the engine what type the leaves should have. The existing rules
> for dynamic shape etc. apply. Here, we should not fail but convert e.g.
> “hello” to 0. However, we could fail if the leaf of the dynamic shape is not a
> primitive. So
>
> var a = new ParallelArray([4], function ( return “hello”;}, uint8) //-> <0,0,0,0>
> var b = new ParallelArray([4], function ( return [1,2];}, uint8) //->
> <<1,2>,…,<1,2>>
>
> but
>
> var x = new ParallelArray([4], function(i) { return new Array(i); } //-> fail, as
> the dynamic shape is [4] and leaves are not primitives.
>
> I still think option b is easier to use (less specification overhead) and powerful
> enough. Option a is more expressive but it feels like writing typed code.
>
> Sent: Wednesday, October 10, 2012 7:38 AM
> To: Shu-yu Guo
> Cc: Herhut, Stephan A;
dhe...@mozilla.com; Shpeisman, Tatiana; dev-
>
tech-js-engi...@lists.mozilla.org; Hudson, Rick; Sreeram, Jaswanth
> Subject: Re: Adding support for different storage formats to ParallelArray
> (again)
>
> To summarize some discussion between Shu and I on IRC:
>
> In some sense, we *can* have a "logical backing store". But the *physical
> backing store* that comes out of map() and friends would be "any", since we
> must be prepared for any return value. But this seems to defeat the whole
> point of this exercise. As I understood it, the goal was to allow us to allocate
> something that is appropriately sized and to give us permission to force the
> return value into this spot, so that the user had some guarantees. So if the
> user says uint8 but returns 257 we get to store 1 (257 % 256).
>
> We would not have this permission with the "logical backing store" model,
> instead we'd have to infer a backing store type that accommodates all return
> values without losing any information. If we're back at the place where we
> have to use inference to decide when we can allocate a tighter backing store
> without losing information, which it sounds like we are, then we haven't
> really gotten very far. In that case I might prefer to just scrap the idea
> entirely and say "if you would like a uint8 backing store, coerce your return
> values to uint8 and cross your fingers that your engine implementors took
> the time to optimize that case".
>
> Moreover, given that the binary data spec errors when an inappropriate
> value is used, it seems mildly inconsistent that the *constructor* form yields
> an error if you specify uint8 but return strings, but *map* silently changes
> backing store type to any.
>
>
> Niko
>
>
>
> October 10, 2012 6:48 AM
> [cid:image0...@01CDA6D2.AFC579F0]
> Shu-yu Guo<mailto:
s...@rfrn.org>
> October 10, 2012 3:16 AM
> graphics programming. :)
>
>
>
>
> --
> shu
> [cid:image0...@01CDA6D2.AFC579F0]
> Niko Matsakis<mailto:
ni...@alum.mit.edu>
> October 9, 2012 8:10 PM
> Yes, I was wondering about this question as well on the train ride home. In
> general it seems that multidimensional map, dynamic shape, and user-
> specified backing stores interact in a rather strange way that is not especially
> intuitive to me.
>
> For example, if you create an uint8[NxNx4] array and then use map, you will
> be mapping over an uint8[Nx4] PA elements. We said that the array which
> results from higher-order operations carries the same backing store type as
> its origin... but that doesn't quite seem to make sense in this scenario. What
> happens if you then return uint16[N] array? I know this is the same question
> you were asking (or I think it was) but I don't quite understand what answer
> you proposed. :)
>
>
> Niko
> [cid:image0...@01CDA6D2.AFC579F0]
> October 9, 2012 5:24 PM
> Then the question is what that means for multi-dimensional arrays. Should
> we expose that they use a flattened representation or can one only have 1D
> ParallelArray objects when using other backing stores than any? If we expose
> that they have a flattened representation, than we will also need to define
> how deep we flatten, which in essence makes the current dynamic shape a
> static property of the layout.
>
> Consider processing canvas data on a per pixel level. The map operation will
> want to use uint8clamped but the elemental function will return 4 element
> arrays. Are these converted to 0 when writing to the backing store or do we
> add 4 elements to the backing store and make the PA object an array of 4-
> tuples?
>
> The beauty of dynamic shape is that we can always revert to a nested
> backing store yet, after the fact, create an n-dimensional array. If the
> constructor only defines storage semantics, we can still do that. If it gives the
> actual backing store, we can’t as an array of type T cannot simply be nested.
>
> Another wild idea would be to have the programmer specify the exact type
> of the outcome (the one he wants). For the above example, that would be
> ArrayType( ArrayType(4, uint8clamped), <outer dim size>). This would then
> also specify a template for conversion of the results of the elemental
> function into what the ParallelArray object actually stores. Somewhat
> awkward to use but maybe good enough for a performance programmer?
> This should always be optional, though!
>
> I currently still prefer dynamic shape with user defined backing-store
> semantics but might convince myself otherwise…
>
> <
tatiana....@intel.com<mailto:
tatiana....@intel.com>>,"de
>
v-tech-js-eng...@lists.mozilla.org<mailto:
dev-tech-js-engine-
>
river...@lists.mozilla.org>" <dev-tech-js-engine-
>
river...@lists.mozilla.org<mailto:
dev-tech-js-engine-
>
river...@lists.mozilla.org>>,"Hudson, Rick"
> <
rick....@intel.com<mailto:
rick....@intel.com>>,"Sreeram,
> Jaswanth"
> <
jaswanth...@intel.com<mailto:
jaswanth...@intel.com>>,"Shu
> Typed arrays are generalized by and superseded by the ES6 binary data types
> library. This library provides a full generalization of first-class type descriptors,
> via a little combinator library. So instead of taking typed array constructors, a
> better API would take type descriptor values representing the element type:
>
> var paContainingUint8s = new ParallelArray(/* whatever */, uint8);
> var paContainingFloat32s = new ParallelArray(/* whatever */, float32);
>
> Dave
>
> On Oct 9, 2012, at 1:55 PM, "Herhut, Stephan A"
> <
stephan....@intel.com<mailto:
stephan....@intel.com>>
> dev-tech-js-engine-rivertrail mailing list dev-tech-js-engine-
>
river...@lists.mozilla.org<mailto:
dev-tech-js-engine-
>
river...@lists.mozilla.org>
> October 9, 2012 4:56 PM
> I am wondering if we can just say that the backing store is an array of some
> type T where T is provided in the constructor. It could default to "any" to get
> the current behavior, essentially.
>
>
> Niko
>
> (Sent from a cramped, unreliable keyboard)
>
>
> -------- Original message --------
> Subject: Re: Adding support for different storage formats to ParallelArray
> (again)
> From: David Herman
> <
dhe...@mozilla.com><mailto:
dhe...@mozilla.com>
> To: "Herhut, Stephan A"
> <
stephan....@intel.com><mailto:
stephan....@intel.com>
> CC: "Shpeisman, Tatiana"
> <
tatiana....@intel.com><mailto:
tatiana....@intel.com>,"de
>
v-tech-js-eng...@lists.mozilla.org"<mailto:
dev-tech-js-engine-
>
river...@lists.mozilla.org> <dev-tech-js-engine-
>
river...@lists.mozilla.org><mailto:
dev-tech-js-engine-
>
river...@lists.mozilla.org>,"Hudson, Rick"
> <
rick....@intel.com><mailto:
rick....@intel.com>,"Sreeram,
> Jaswanth"
> <
jaswanth...@intel.com><mailto:
jaswanth...@intel.com>,"Shu
> Typed arrays are generalized by and superseded by the ES6 binary data types
> library. This library provides a full generalization of first-class type descriptors,
> via a little combinator library. So instead of taking typed array constructors, a
> better API would take type descriptor values representing the element type:
>
> var paContainingUint8s = new ParallelArray(/* whatever */, uint8);
> var paContainingFloat32s = new ParallelArray(/* whatever */, float32);
>
> Dave
>
> On Oct 9, 2012, at 1:55 PM, "Herhut, Stephan A"
> <
stephan....@intel.com><mailto:
stephan....@intel.com>
> dev-tech-js-engine-rivertrail mailing list dev-tech-js-engine-
>
river...@lists.mozilla.org<mailto:
dev-tech-js-engine-
>
river...@lists.mozilla.org>
>
https://lists.mozilla.org/listinfo/dev-tech-js-engine-rivertrail