Emscripten with SDL2 performance on browsers

Rob Probin

unread,

Feb 24, 2020, 5:05:23 PM2/24/20

to emscripten-discuss

I've been using Emscripten with SDL2 to do some 2D game work. I wrote a test program (concerned that my code was problematic) .. it turns out it gives some weird results... like software renderers are faster than hardware renderers, and that with accelerated SDL_FillRect and SDL_CopyRect are about 120x faster on a native build than a web build....

Old build uploaded for your excitement (instant results!)

http://robprobin.com/SDL2_emscripten_tests/

Test code:

https://github.com/robzed/SDL2_emscripten_tests

Some results here:

https://discourse.libsdl.org/t/emscripten-sdl2-performance/27236/

Comments or thoughts welcome.

Yeah, less sprites or rects would make it work better :-)

Floh

unread,

Feb 25, 2020, 8:02:06 AM2/25/20

to emscripten-discuss

I don't know how the 2D operations in the emscripten SDL shim are implemented versus 'native SDL', but I would actually expect massively different performance behaviour, since I'm seeing the same for WebGL vs native OpenGL, and the same might be true for HTML canvas vs whatever SDL is doing in the native implementing (assuming the 2D operations of the emscripten SDL "emulation" go through HTML canvas).

To give you an idea of how bad it is on WebGL vs GL: the "number of trivial drawcalls (16 byte uniform update plus glDrawArrays()) before dropping below 60fps" on WebGL in desktop browsers is around 5000, in a good native GL implementation (like NVIDIA on Windows), that number is about half a million (although the number varies there extremely too, e.g. Intel GPUs in common laptops only manage around 15k).

I would expect similar differences for your situation.

So basically: just compiling the code usually isn't enough, the code must also be changed for the specific performance characteristics of the web platform (e.g. for WebGL vs GL this means that "old school" batching tricks must be used to reduce the number of calls into the GL API as much as possible).

Cheers,

-Floh.

Floh

unread,

Feb 25, 2020, 8:16:31 AM2/25/20

to emscripten-discuss

PS: if I'm seeing it right, than SDL2 is using GLES2/WebGL under the hood, and an SDL_CopyRect() might be this under the hood:

https://github.com/emscripten-ports/SDL2/blob/ca38e297ce62322ea9bcde2ab78afab6aa93f423/src/render/opengles2/SDL_render_gles2.c#L1820-L1858

...and if this is true, then each CopyRect is one dynamic buffer update (vla glBufferData() or glBufferSubData()) and a draw call, and that's for each rectangle (so it looks like no batching happens at all).

This sort of stuff is very fast in native GL implementations, and very slow in WebGL (e.g. it's even more expensive than the 'trivial draw call' example in my post above. IME this explains the performance difference you are seeing, and also why the software renderer is faster, the WebGL calling overhead is much more expensive than the actual drawing operations.

The way to make this scenario fast in WebGL is via "sprite batching": have two big vertex buffers (alternating each frame, one 'in flight' for rendering, the other currently filled with the CPU), write all the rectangle vertices for one frame into a memory chunk, do a single glBufferSubData() to copy into a GL vertex buffer, followed by a single glDrawArrays(). Ideally use a single texture atlas which has the images for all used sprites, or if that's not possible, have very few atlasses, and sort sprites within a "depth layer" by texture. E.g. do everything to minimize the number of draw calls as much as possible.

Floh

unread,

Feb 25, 2020, 8:28:58 AM2/25/20

to emscripten-discuss

> then each CopyRect is one dynamic buffer update

PS: *two* buffer updates, so even worse ;)

Rob Probin

unread,

Feb 26, 2020, 1:21:46 PM2/26/20

to emscripten-discuss

Thank you for the detailed reply!

Reply all

Reply to author

Forward