Even in the case of WebGL textures/etc, to do it right you need to detect
the byteorder and select the proper resource from the server that already
has that byteorder when you fetch it via XHR or WS. It would be easier to
know that I am fetching a little-endian texture from the server, and then
when I pass it to something that needs big-endian data it gets converted at
that point (or perhaps when I create a Uint16Array from a Uint16LEArray).
Leaving it unspecified and requiring developers to figure it out so they
can fetch the proper resources means they are more likely to just assume
the native byteorder is the 99%-case and not worry about the rest.
If you were designing a mechanism for JS to manipulate binary data from
scratch, you would not design it this way. However, we are likely stuck
with what we have, so we can only talk about enhancements -- perhaps adding
byte-order-specific views would be sufficient, or maybe extending DataView
to support array views (with strides, perhaps) would be better.
--
John A. Tamplin
Software Engineer (GWT), Google
The top priority should be to implement DataView universally. DataView
is specifically designed for correct, portable manipulation of binary
data coming from or going to files or the network. Fortunately,
DataView is supported in nearly every actively developed UA; once
https://bugzilla.mozilla.org/show_bug.cgi?id=575688 is fixed, it
should be present in every major UA -- even the forthcoming IE 10! See
http://blogs.msdn.com/b/ie/archive/2011/12/01/working-with-binary-data-using-typed-arrays.aspx
.
Once DataView is available everywhere then the top priority should be
to write educational materials regarding binary I/O. It should be
possible to educate the web development community about correct
practices with only a few high profile articles.
Changing the endianness of Uint16Array and the other multi-byte typed
arrays is not a feasible solution. Existing WebGL programs already
work correctly on big-endian architectures specifically because the
typed array views use the host's endianness. If the typed array views
were changed to be explicitly little-endian, it would be a requirement
to introduce new big-endian views, and all applications using typed
arrays would have to be rewritten, not just those which use WebGL.
Finally, to reiterate one point: the typed array design was informed
by prior experience with the design and performance characteristics of
a similar API, specifically Java's New I/O (NIO) Buffer classes. NIO
merged the two distinct use cases of file and network I/O, and
interaction with graphics and audio devices, into one API. The result
was increased polymorphism at call sites, which defeated the Java VM's
optimizing compiler and led to 10x slowdowns in many common
situations. It was so difficult to fix these performance pitfalls that
they remained for many years, and I don't know how robust the
solutions are in current Java VMs. To avoid these issues the typed
array spec consciously treats these use cases separately. It is
possible to make incorrect assumptions leading to non-portable code,
but at some level this is possible with nearly any API that extends
beyond a small, closed world. I believe the focus should be on
educating developers about correct use of the APIs, developing
supporting libraries to ease development, and advancing the ECMAScript
language with constructs like struct types
(http://wiki.ecmascript.org/doku.php?id=harmony:binary_data).
-Ken
FWIW, here is a way to do this that will always work and won't rely on "luck". The key idea is that by the time one draws stuff, all the information about how vertex attributes use buffer data must be known.
1. In webgl.bufferData implementation, don't call glBufferData, instead just cache the buffer data.
2. In webgl.vertexAttribPointer, record the attributes structure (their types, how they use buffer data). Do not convert/upload buffers yet.
3. In the first WebGL draw call (like webgl.drawArrays) since the last bufferData/vertexAttribPointer call, do the conversion of buffers and the glBufferData calls. Use some heuristics to drop the buffer data cache, as most WebGL apps will not have a use for it anymore.
> In practice, if forced to implement a UA on a big-endian system today, I
> would likely pick option (C).... I wouldn't classify that as a victory
> for standardization, but I'm also not sure what we can do at this point
> to fix the brokenness.
I agree that seems to be the only way to support universal webgl content on big-endian UAs. It's not great due to the memory overhead, but at least it shouldn't incur a significant performance overhead, and it typically only incurs a temporary memory overhead as we should be able to drop the buffer data caches quickly in most cases. Also, buffers are typically 10x smaller than textures, so the memory overhead would typically be ~ 10% in corner cases where we couldn't drop the caches.
In conclusion: WebGL is not the worst here, there is a pretty reasonable avenue for big-endian UAs to implement it in a way that allows running the same unmodified content as little-endian UAs.
Benoit
It would never be possible to drop the CPU side buffer data cache. A
subsequent draw call may set up the vertex attribute pointers
differently for the same buffer object, which would necessitate going
back through the buffer's data and generating new, appropriately
byte-swapped data for the GPU.
>> In practice, if forced to implement a UA on a big-endian system today, I
>> would likely pick option (C).... I wouldn't classify that as a victory
>> for standardization, but I'm also not sure what we can do at this point
>> to fix the brokenness.
>
> I agree that seems to be the only way to support universal webgl content on big-endian UAs. It's not great due to the memory overhead, but at least it shouldn't incur a significant performance overhead, and it typically only incurs a temporary memory overhead as we should be able to drop the buffer data caches quickly in most cases. Also, buffers are typically 10x smaller than textures, so the memory overhead would typically be ~ 10% in corner cases where we couldn't drop the caches.
Our emails certainly crossed, but please refer to my other email.
WebGL applications that assemble vertex data for the GPU using typed
arrays will already work correctly on big-endian architectures. This
was a key consideration when these APIs were being designed. The
problems occur when binary data is loaded via XHR and uploaded to
WebGL directly. DataView is supposed to be used in such cases to load
the binary data, because the endianness of the file format must
necessarily be known.
The possibility of forcing little-endian semantics was considered when
typed arrays were originally being designed. I don't have absolute
performance numbers to quote you, but based on previous experience
with Java's NIO Buffer classes, I am positive that the performance
impact for WebGL applications on big-endian architectures would be
very large. It would prevent applications which manipulate vertices in
JavaScript from running acceptably on big-endian machines.
-Ken
As Ken pointed out, if you are populating your arrays from javascript or a
JSON file or something similar this is a non-issue. The problem only occurs
when you are attempting to load a binary blob directly into a typed array.
Unless that blob is entirely homogenous (ie: all Float32's or all Int16's,
etc) it's impossible to trivially swap endianness without being provided a
detailed breakdown of the data patterns contained within the blob.
Consider this example (using WebGL, but the same could apply elsewhere): I
download a binary file containing tightly packed interleaved vertices that
I want to pass directly to a WebGL buffer. The data contains little endian
vertex positions, texture coordinates, texture ID's and a 32 bit color per
vertex, so the data looks something like this:
struct {
Float32[3] pos,
Float32[4] uv,
Uint16 textureId,
Uint32 color
};
I will receive this data from XHR as an opaque TypedArray, and if the
platform is little endian I can pass it directly to the GPU. But on big
endian systems, a translation needs to be done somewhere:
xhr.responseType = "arraybuffer";
xhr.onload = function() {
var vertBuffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, vertBuffer);
// If bigEndian then... magic!
gl.bufferData(gl.ARRAY_BUFFER, this.response, gl.STATIC_DRAW);
}
So the question is: What exactly are we expecting that "magic" to be? We
can't just swizzle every 4 bytes. Either the graphics driver must do the
endian swap as it processes the buffer, which is possible but entirely out
of the browsers control, or we would have to provide data packing
information to the browser so that it could do the appropriate swap for us.
And if I'm going to have to build up a data definition and pass that
through to the browser anyway... well I've just destroyed the whole "don't
make me care about endianness" ideal, haven't I? I might as well just do
the swap in my own code via a DataView, or better yet cache a big endian
version of the same file on the server side if I'm worried about
performance.
So yeah, it sucks that we have to plan for devices that are practically
non-existant and difficult to test for, but I don't really see a nicer
(practical) solution.
That said, one thing that DataView doesn't handle too nicely right now is
arrays. You're basically stuck for-looping over your data, even if it's all
the same type. I would fully support having new DataView methods available
like:
Int32Array getInt32Array(unsigned long byteOffset, unsigned long elements,
optional boolean littleEndian);
Which would be a nice, sensible optimization since I'm pretty sure the
browser backend could do that faster than a JS loop.
--Brandon
I would suggest that you pass down the schema of the data to the
client application along with the raw binary file, and always iterate
down it with DataView, reading each individual value and storing it
into one of multiple typed array views of a new ArrayBuffer. Then
upload the new ArrayBuffer to WebGL. This way, if you get the code
working on one platform, you are guaranteed that it will work on all
platforms.
As one simple concrete example, please look at
http://code.google.com/p/webglsamples/source/browse/hdr/hdr.js#235 .
This demo downloads high dynamic range textures as binary files
containing floating-point values. The data is copied from the XHR's
ArrayBuffer using a DataView, knowing that the source data is in
little endian format, and stored into a Float32Array for upload to
WebGL. This code works identically on big-endian and little-endian
architectures.
> So yeah, it sucks that we have to plan for devices that are practically
> non-existant and difficult to test for, but I don't really see a nicer
> (practical) solution.
>
> That said, one thing that DataView doesn't handle too nicely right now is
> arrays. You're basically stuck for-looping over your data, even if it's all
> the same type. I would fully support having new DataView methods available
> like:
>
> Int32Array getInt32Array(unsigned long byteOffset, unsigned long elements,
> optional boolean littleEndian);
>
> Which would be a nice, sensible optimization since I'm pretty sure the
> browser backend could do that faster than a JS loop.
Definitely agree that adding array readers and writers to DataView is
worth considering; it's even mentioned in the typed array spec at
http://www.khronos.org/registry/typedarray/specs/latest/#11 . I would
however like to work on optimizing DataView's single-element accessors
first so that we could do a good measurement of the potential speedup.
Right now DataView is completely unoptimized in WebKit's
implementation, but the typed array views have had the benefit of
months of optimization work in both the JavaScriptCore and V8 engines.
-Ken
> The top priority should be to implement DataView universally.
>
DataView isn't relevant to the discussion. People are using
ArrayBufferViews in ways that assume little-endian access, and that isn't
going to go away. If any production browser is ever released on a
production big-endian system, it'll expose ArrayBuffers as little-endian,
even if that causes performance problems with WebGL, because it will
minimize the amount of breakage.
It's also unreasonable to expect web developers to go to extra effort for
big-endian systems (eg. by serving different, big-endian resources from
XHR). You're expecting every developer using ArrayBuffer to jump hoops,
adding significant complexity and testing requirements, in order to support
an optimization on systems with essentially zero market share. I'm
certainly not going to do that, and I doubt many other web developers will,
either; big endian systems aren't worth our time.
And for all realistic purposes, the world has standardized on
little-endian. If people choose to develop a big-endian system today, the
challenges of doing so are their problem. Just as new APIs (such as
WebSockets and Server-Sent Events) no longer spend time supporting legacy
encodings, simplifying everyone's life by only supporting UTF-8, new APIs
should stop pretending big endian is important and optimize for
little-endian.
All that said, to partially sidestep the issue, I'd propose adding
separate, explicitly big-endian and little-endian view types for each view
type. That is,
Int16LEArray, Uint16LEArray
Int32LEArray, Uint32LEArray
Float32LEArray, Float64LEArray
Int16BEArray, Uint16BEArray
Int32BEArray, Uint32BEArray
Float32BEArray, Float32LEArray,
Float64BEArray, Float64LEArray
I think adding these is a no-brainer, since it's trivial and obviously
useful, and lets us at least pretend the issue doesn't exist...
On Wed, Mar 28, 2012 at 4:27 PM, Brandon Jones <toj...@gmail.com> wrote:
> So the question is: What exactly are we expecting that "magic" to be? We
> can't just swizzle every 4 bytes. Either the graphics driver must do the
> endian swap as it processes the buffer, which is possible but entirely out
> of the browsers control,
Please see Benoit's mail, for at least one possible approach. It's hard to
do, and harder to do efficiently, but if you're creating a platform with
unusual endianness, it's you that needs to jump the hoops to make it work,
not every web developer in the world.
--
Glenn Maynard
> Once DataView is available everywhere then the top priority should be
> to write educational materials regarding binary I/O. It should be
> possible to educate the web development community about correct
> practices with only a few high profile articles.
>
We can give that a try. I would be astounded if it works. Nothing like that
has ever worked before for features I was involved in.
I think we should set some success criteria for that strategy. How about
this: six months after all shipping browers that support typed arrays also
support DataViews, let's evaluate a sample of the very latest applications
using typed arrays to see if they're using DataViews correctly and would
work on big-endian systems. If more than 10% would be broken for
big-endian, we'll declare that strategy to have failed and spec
little-endian. Fair?
Changing the endianness of Uint16Array and the other multi-byte typed
> arrays is not a feasible solution. Existing WebGL programs already
> work correctly on big-endian architectures specifically because the
> typed array views use the host's endianness. If the typed array views
> were changed to be explicitly little-endian, it would be a requirement
> to introduce new big-endian views, and all applications using typed
> arrays would have to be rewritten, not just those which use WebGL.
>
It seems to me that to support little-endian semantics on big-endian
machines, what's needed is to byteswap on JS typed array accesses, and
either to have a modified GL driver that byteswaps before sending data to
the GPU, or preferably to have a GPU that can be switched into
little-endian mode (on a per-context basis, say). None of that seems
technically challenging to me. I imagine that every GPU part will have a
little-endian mode anyway, in order to work with the vast installed base of
little-endian CPUs.
The result
> was increased polymorphism at call sites, which defeated the Java VM's
> optimizing compiler and led to 10x slowdowns in many common
> situations.
I don't see any need for increased polymorphism at call sites in our case.
It is possible to make incorrect assumptions leading to non-portable code,
> but at some level this is possible with nearly any API that extends
> beyond a small, closed world.
Yes, but we work extremely hard in the Web platform to minimize the
possibility.
Rob
--
“You have heard that it was said, ‘Love your neighbor and hate your enemy.’
But I tell you, love your enemies and pray for those who persecute you,
that you may be children of your Father in heaven. ... If you love those
who love you, what reward will you get? Are not even the tax collectors
doing that? And if you greet only your own people, what are you doing more
than others?" [Matthew 5:43-47]
> And for all realistic purposes, the world has standardized on
> little-endian.
Other than network protocols, which are all big-endian.
> On Wed, Mar 28, 2012 at 7:30 PM, Glenn Maynard <gl...@zewt.org> wrote:
>
>> And for all realistic purposes, the world has standardized on
>> little-endian.
>
>
> Other than network protocols, which are all big-endian.
>
I'm talking about the endianness of systems, not of file or network formats.
(Nobody should be designing new binary networking protocols in big-endian,
either, as the only difference is extra byte swaps, but anyway...)
> The result was increased polymorphism at call sites, which defeated the
Java VM's
> optimizing compiler and led to 10x slowdowns in many common
> situations.
FWIW, I think this has no bearing at all on JS.
--
Glenn Maynard
> > The result was increased polymorphism at call sites, which defeated the
> Java VM's
> > optimizing compiler and led to 10x slowdowns in many common
> > situations.
>
> FWIW, I think this has no bearing at all on JS.
>
Check out this blog post about optimization in V8, in particular the
difference between monomorphic and polymorphic calls and the effect on
inlining:
http://floitsch.blogspot.com/2012/03/optimizing-for-v8-hydrogen.html
> On Wed, Mar 28, 2012 at 8:04 PM, Glenn Maynard <gl...@zewt.org> wrote:
>
>> > The result was increased polymorphism at call sites, which defeated the
>> Java VM's
>> > optimizing compiler and led to 10x slowdowns in many common
>> > situations.
>>
>> FWIW, I think this has no bearing at all on JS.
>>
>
> Check out this blog post about optimization in V8, in particular the
> difference between monomorphic and polymorphic calls and the effect on
> inlining:
Maybe I misunderstood what he was referring to; I was thinking about the
comparative cost of making, say, a C++ function virtual (added dispatch
cost). It wouldn't make calls to functions any more polymorphic--you
already have many view types that you can pass around, and that wouldn't go
up.
Anyway, I'd expect the only difference in the ArrayBuffer implementation
would be to make it look like this:
int16_t Int16Array::get_item(int index)
{
int16_t val = this->buf[index];
#ifdef BIG_ENDIAN
val = byte_swap(val);
#endif
return val;
}
or equivalent JITted assembly. The cost of this should be small--certainly
not 10x.
(Dealing with it in the GPU may be harder, but as others have pointed out,
it should be possible to put any GPU in little-endian mode, even if it
requires some cooperation from the vendor to accomplish it.)
--
Glenn Maynard
> Maybe I misunderstood what he was referring to; I was thinking about the
> comparative cost of making, say, a C++ function virtual (added dispatch
> cost). It wouldn't make calls to functions any more polymorphic--you
> already have many view types that you can pass around, and that wouldn't go
> up.
>
> Anyway, I'd expect the only difference in the ArrayBuffer implementation
> would be to make it look like this:
>
> int16_t Int16Array::get_item(int index)
> {
> int16_t val = this->buf[index];
> #ifdef BIG_ENDIAN
> val = byte_swap(val);
> #endif
> return val;
> }
>
> or equivalent JITted assembly. The cost of this should be
> small--certainly not 10x.
>
> (Dealing with it in the GPU may be harder, but as others have pointed out,
> it should be possible to put any GPU in little-endian mode, even if it
> requires some cooperation from the vendor to accomplish it.)
I assumed the point Kenneth was making that you objected to was that if you
had Uint16BEArray and Uint16LEArray, so you could have both native (with
Uint16Array being one of them) and specific endian-ness was that you are
introducing polymorphic calls.
Ie,
a = someFunctionReturningUint16ArrayWhichMightBeLEorBE();
a[4] = 100;
the call to store into a might need to call Uint16LEArray or
Uint16BEArray.storeInto under the hood. If you only ever use one, then the
JIT'er can just assume that one and inline it. Otherwise, it either has to
make a virtual call based on the actual type or it needs to test the type
-- either way, it is going to be a lot slower than *((uint16_t*)
addressOf(a)) = 100 where addressOf is likely a simple offset calculation.
> I assumed the point Kenneth was making that you objected to was that if
> you had Uint16BEArray and Uint16LEArray, so you could have both native
> (with Uint16Array being one of them) and specific endian-ness was that you
> are introducing polymorphic calls.
>
Well, I hadn't suggested having both BE and LE versions of views yet at
that point, but I don't think it matters there, either. Introducing those
types won't automatically cause code to start using them--code that only
needs a particular endianness will still only need one.
But making the existing views always LE wouldn't increase the amount of
polymorphic calls at all, it'd just change the behavior of the existing
calls.
--
Glenn Maynard
> a = someFunctionReturningUint16ArrayWhichMightBeLEorBE();
> a[4] = 100;
>
> the call to store into a might need to call Uint16LEArray or
> Uint16BEArray.storeInto under the hood. If you only ever use one, then the
> JIT'er can just assume that one and inline it.
Can you give a more concrete example? I can't think of a situation where
we'd have a function that toggles between returning little-endian and
returning big-endian buffers.