Four recent emscripten updates

Alon Zakai

unread,

Mar 3, 2013, 7:07:47 PM3/3/13

to emscripte...@googlegroups.com

1. LTO off by default

We've been seeing more LLVM LTO bugs recently, probably not new bugs
but just looks like the chance of hitting them rises with bigger
codebases. (Bugs have been filed on LLVM of course.) So I disabled
LLVM LTO by default, it's too dangerous to keep it on. It will now
only run in -O3 by default (-O3 contains other unsafe optimizations).

You might notice slower execution and larger code sizes as a result of
LTO being disabled. If you want to re-enable it, build with --llvm-lto
1

2. EMCC_OPTIMIZE_NORMALLY

Emscripten will *not* optimize individual bitcode files, instead it
optimizes when converting the total combined bitcode into final JS,

https://github.com/kripken/emscripten/wiki/Building-Projects

But this has the downside of LLVM opt and link taking longer in that
final conversion to JS. For huge projects, this can be significant.
The environment variable EMCC_OPTIMIZE_NORMALLY changes that and
optimization works like a "normal" build system: emcc -O2 a.cpp -o
a.bc will run LLVM opt on that individual file, and LLVM optimizations
are *not* run during the final link+conversion to JS. (LLVM LTO might
be, if you enabled it.)

This can speed up the final build stage, with minor slowdowns when
optimizing individual files (but that parallelizes); overall, this
should be faster to build. It is however unsafe - it only works if you
build the individual files and the final JS with *exactly* the same
optimization and emscripten flags (-s stuff, -Ox, etc. etc.), and we
can't check that without inventing a metadata format that we store in
each object file, which has not been done. Any inconsistency could
cause errors or suboptimal results. So this is not recommended (and
just an env var, not a normal option).

3. FULL_ES2

We now have 3 'levels' of GL support:

a. WebGL-friendly subset of OpenGL
b. GLES2 emulation
c. GL emulation, attempt to emulate as much desktop GL features as possible

(a) is the default, and (c) was written to support codebases like
Sauerbraten which need us to emulate older GL features. (b) is new and
was just written by vlad. It attempts to emulate GLES2 as much as
possible, so it supports things like client-side data (however it does
not do things like GL_INT in places where WebGL supports only GL_BYTE
or GL_SHORT, so it isn't 100% complete).

The idea of GLES2 emulation isn't new, I believe LCID_Fire argued in
favor of it a while back. I was negative on it then, but more
codebases have appeared that really need it, so apologies about not
focusing on this earlier.

This should change nothing for existing code, and (a) is still more
recommended, that is, to use the WebGL-friendly subset, since that is
more efficient. But if you do have GLES2 code that you are porting,
then the new GLES2 stuff might help you. To use it, build with -s
FULL_ES2=1 . Note that this is a compiler flag, there is no way to
check at compile time whether it is needed or not. If your code needs
it but you didn't set the flag, the browser error console should show
warnings about array buffers not being bound etc.

4. ASM_JS close to ready for general use

asm.js code generation mode (-s ASM_JS=1) no longer shows the "this is
experimental/not recommended" warning. It's fairly robust at this
point, appears to withstand fuzzing, generates faster code in most
cases even without special JS engine optimizations (but with such
optimizations is significantly faster still), and we've used it on
some large and complex codebases successfully.

It doesn't yet support C++ exceptions or setjmp/longjmp, but aside
from that should be considered ready for general use. It will likely
become the default code generation mode once those limitations are
fixed and we have a minifier for it.

- azakai

Floh

unread,

Mar 4, 2013, 11:31:02 AM3/4/13

to emscripte...@googlegroups.com

Thanks for the update! Unfortunately I'm seeing new hangs of my most complex demo in Google Chrome. Might be related to this bug, which hopefully will be fixed soon: https://code.google.com/p/chromium/issues/detail?id=177883

I didn't have the time yet to look into the problem in more details, just wanted to give you a quick heads-up, but I'll keep you posted as soon as I find out more.

I'll give the new GLES2 layer a try, since our code is also actually derived from OpenGLES2. Can you elaborate a bit on the "client-side data" stuff?

Another thing I noted is that some GL functions with a high call-frequency like uniformMatrix4fv do a makeHEAPView everytime they are called. I'm not sure about the performance implications, or if there is a better/faster way though... but my dsomapviewer demo is doing close to a thousand uniformMatrix4fv and uniform4fv calls per frame (I optimized this revently by grouping all transform matrix updates into a single matrix array update).

And finally I noticed the new vertex array object functions. Looks like this is currently completely emulated. Are there plans to seamlessly support the OES_vertex_array_object extension if available? This would be really cool :)

By the way: Especially the EnableVertexAttribArray functions could benefit from a global redundant-state-check (not enabling vertex attrib arrays again if they are already enabled, same for disabling). In complex demos this might save hundreds WebGL calls per frame.

Apologies for mixing all those topics into a single post ;)

Cheers,

-Floh.

Jukka Jylänki

unread,

Mar 4, 2013, 12:46:57 PM3/4/13

to emscripte...@googlegroups.com

I played the role of GL spec lawyer a bit and dug through the various specs:

- Desktop OpenGL supports GL_UNSIGNED_BYTE, GL_UNSIGNED_SHORT and
GL_UNSIGNED_INT in call to glDrawElements since OpenGL 1.1.
- GLES2 (and GLES1) supports only GL_UNSIGNED_BYTE and
GL_UNSIGNED_SHORT in glDrawElements. That is, GLES2 core doesn't
support 32-bit index buffers.
- If GLES2 implementation has the GL_OES_element_index_uint
extension, then GL_UNSIGNED_INT is also supported.
http://www.khronos.org/registry/gles/extensions/OES/OES_element_index_uint.txt
- GLES3 brings the abovementioned extension to core, so
GL_UNSIGNED_INT are supported as well.
- WebGL does not support GL_UNSIGNED_INT in core either. WebGL also
has OES_element_index_uint extension to allow 32-bit indices (
http://www.khronos.org/registry/webgl/extensions/ ), so WebGL and
GLES2 are identical.

(I erroneously stated on IRC that desktop GL wouldn't have supported
U8 index buffers without an extension, but I was wrong)

Floh: Client-side data is when you directly render vertex buffers
from CPU memory, and not from a VBO in GPU memory. When rendering with
client-side data, you pass a pointer to CPU memory in
glVertexAttribPointer (while having VBO 0 bound). When you render with
VBOs, you pass an offset-casted-to-pointer in glVertexAttribPointer
(while having a non-zero VBO bound).

"By the way: Especially the EnableVertexAttribArray functions could
benefit from a global redundant-state-check (not enabling vertex
attrib arrays again if they are already enabled, same for disabling).
In complex demos this might save hundreds WebGL calls per frame."

One of the more tedious things with OpenGL is that there are no
performance guarantees of any of the functions. As a result, a lot of
sources instruct C++ engine coders to directly implement these
redundant state change checks manually to their codebases, since you
just can't know. I can imagine the GL implementation itself to do
redundant state checking. If emscripten adds its own custom redundant
state tracking layer, that'll bring one more source of potential state
caching, and we could have in worst case the application, the
emscripten GLES2->WebGL glue, the browser GL implementation/backend,
and the GL driver implementation all doing their own redundant state
change tracking logic on top of each other. Therefore, I would
recommend adding minimal amount of state change tracking in
emscripten, unless proven by profiling, and recommend app developers
do the state change tracking to their application level codebases,
since there it can work on the highest possible layer.

Jukka

2013/3/4 Floh <flo...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-disc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Floh

unread,

Mar 4, 2013, 2:40:31 PM3/4/13

to emscripte...@googlegroups.com

Thanks for the info! I totally agree that redundant-state-filtering is best done up in the application, however specifically in the vertex array object emulation it might be worth considering it in the emulation layer as a special case. I had a closer look at the implementation because I've spent some time optimizing my own vertex attrib array wrapper recently in the case that the vertex array object extension is not available, and the emscripten layer looks almost exactly like my pre-optimization code.

Let's say you have 2 different vertex array objects which only differ by the bound vertex and index buffer but have the same vertex layout (apologies for talking in D3D terms). If I understand the current code right, Enable/DisableVertexAttribArray and EnableClientState calls will be made for each vertex attribute, but if the vertex layout doesn't change, these would be redundant. I might overlook something though (like mixing rendering from buffer objects and client-side arrays).

I my case, those Enable/DisableVertexAttribArray calls were the highest frequency calls (5 to 10 per geometry switch), and by redudancy-filtering those alone I was able to reduce the overall number of WebGL calls per frame from 7k down to about 4.3k (compared to 3.1k with the vertex array object extension supported). If I would use the vao emulation layer, the number of actual GL calls would actually go back up again towards 7k because of the hidden calls inside the emulation layer.

I didn't do precise measurements yet, but I have a gut feeling that all WebGL calls come with a fairly static non-too-trivial call overhead (of course differing from browser to browser). The traditional advantage of OpenGL was always that call-overhead is usually much lower then D3D9, but I think we have to be much more careful in WebGL, since each call is first going through the emscripten emulation layer, then through the JS engine, then (at least in Chrome) to another process, and then through the ANGLE emulation layer. What's extremely cheap in a native GL app might blow up in WebGL if called many thousand times per frame, so that we might have to do different optimization strategies in a WebGL app compared to a native app, and I think that all extensions which reduce the number of WebGL client-side calls (like OES_vertex_array_object, ANGLE_instanced_arrays, or some extension which can group glUniform calls) are many times more valuable than in a native GL app.

Phew, ok, enough for now. Sorry for derailing the thread, but I've been thinking a lot about function call overhead lately ;)

Cheers,

-Floh.

> email to emscripten-discuss+unsub...@googlegroups.com.

Floh

unread,

Mar 4, 2013, 3:34:21 PM3/4/13

to emscripte...@googlegroups.com

...on further thought... scratch that if it means adding overhead to the actual glEnable/DisableVertexAttribArray functions for the redundancy checks (which I think would be necessary in case someone mixes vertex-array-object and non-vertex-array-object-code for whatever reason...

Having said that, "official" OES_vertex_array_object support in the WebGL layer would be niiice, so C code would have to check for existance of the extension and then call the VertexAttribArray function with OES postfix, and use fallback code if the extension is not present (and in this case do the redundant-state-filtering itself on the application level).

Cheers,

-Floh.

Alon Zakai

unread,

Mar 4, 2013, 10:35:27 PM3/4/13

to emscripte...@googlegroups.com

On Mon, Mar 4, 2013 at 8:31 AM, Floh <flo...@gmail.com> wrote:
> Thanks for the update! Unfortunately I'm seeing new hangs of my most complex
> demo in Google Chrome. Might be related to this bug, which hopefully will be
> fixed soon: https://code.google.com/p/chromium/issues/detail?id=177883

New in what sense - new code output, or new hangs on the same output
from before?

>
> I didn't have the time yet to look into the problem in more details, just
> wanted to give you a quick heads-up, but I'll keep you posted as soon as I
> find out more.
>
> I'll give the new GLES2 layer a try, since our code is also actually derived
> from OpenGLES2. Can you elaborate a bit on the "client-side data" stuff?

Basically what Jukka said - you can call glDrawArrays etc. without a
bound buffer, and with a pointer into memory. The glue will set up the
buffer for you.

>
> Another thing I noted is that some GL functions with a high call-frequency
> like uniformMatrix4fv do a makeHEAPView everytime they are called. I'm not
> sure about the performance implications, or if there is a better/faster way
> though... but my dsomapviewer demo is doing close to a thousand
> uniformMatrix4fv and uniform4fv calls per frame (I optimized this revently
> by grouping all transform matrix updates into a single matrix array update).

makeHeapView does .subarray, which creates a new object. So the cost
is creation and collection. It should not be too bad, and JS engines
could optimize it out with escape analysis, but I don't think they do.

Sadly the WebGL API doesn't let you avoid this kind of thing. One
option is to cache them, but that might not work in general. If this
shows up high in profiles though, we should investigate that.

>
> And finally I noticed the new vertex array object functions. Looks like this
> is currently completely emulated. Are there plans to seamlessly support the
> OES_vertex_array_object extension if available? This would be really cool :)
>

Yes. Right now we added VAOs just because some project needed it
urgently, but it isn't going to be fast without using that extension
;) So hopefully someone will find time soon.

> By the way: Especially the EnableVertexAttribArray functions could benefit
> from a global redundant-state-check (not enabling vertex attrib arrays again
> if they are already enabled, same for disabling). In complex demos this
> might save hundreds WebGL calls per frame.
>

Do you mean that if C++ code calls it redundantly we should optimize
that at runtime?

- azakai

Alon Zakai

unread,

Mar 4, 2013, 10:36:41 PM3/4/13

to emscripte...@googlegroups.com

Yeah, I really don't want to add any emulation code unless absolutely
necessary. It always adds some constant overhead and some chance of
odd bugs.

- azakai

Floh

unread,

Mar 5, 2013, 5:01:56 AM3/5/13

to emscripte...@googlegroups.com

Am Dienstag, 5. März 2013 04:35:27 UTC+1 schrieb azakai:

On Mon, Mar 4, 2013 at 8:31 AM, Floh <flo...@gmail.com> wrote:
> Thanks for the update! Unfortunately I'm seeing new hangs of my most complex
> demo in Google Chrome. Might be related to this bug, which hopefully will be
> fixed soon: https://code.google.com/p/chromium/issues/detail?id=177883

New in what sense - new code output, or new hangs on the same output
from before?

No big source code changes, I merged several glUniformMatrix4fv calls into one. I also played around with compiling options as well. I'll try to properly analyse this ASAP (restoring original compile options, and stepping back incoming versions).

Cheers,

-Floh.

Mark Callow

unread,

Mar 5, 2013, 5:32:28 AM3/5/13

to emscripte...@googlegroups.com, Floh

On 2013/03/05 1:31, Floh wrote:

I'll give the new GLES2 layer a try, since our code is also actually derived from OpenGLES2. Can you elaborate a bit on the "client-side data" stuff?

Avoid client-side data, if at all possible. Data has to be copied from the client-side arrays to a buffer object. The GLES2 layer will have to copy the data every time the array is used in a draw call.

Regards

-Mark

--

注意：この電子メールには、株式会社エイチアイの機密情報が含まれている場合が有ります。正式なメール受信者では無い場合はメール複製、再配信または情報の使用を固く禁じております。エラー、手違いでこのメールを受け取られましたら削除を行い配信者にご連絡をお願いいたします.

NOTE: This electronic mail message may contain confidential and privileged information from HI Corporation. If you are not the intended recipient, any disclosure, photocopying, distribution or use of the contents of the received information is prohibited. If you have received this e-mail in error, please notify the sender immediately and permanently delete this message and all related copies.

Floh

unread,

Mar 5, 2013, 8:55:05 AM3/5/13

to emscripte...@googlegroups.com, Floh

Yeah no worries :)

Reply all

Reply to author

Forward