WebGL2: looking for advice on dynamic uniform buffer usage

385 views
Skip to first unread message

Andre Weissflog

unread,
Aug 24, 2016, 2:19:34 PM8/24/16
to WebGL Dev List
Hi,

I'm currently implementing WebGL2/GLES3 support in my 3D framework (via emscripten), and I'd like to implement dynamic uniform updates similar to how I do it in Metal and D3D12: have one big uniform buffer where I'm recording the uniform updates for the entire frame, and then before each draw call, only set an offset into this uniform buffer for the next draw call.

In pseudo code it would look like this:

    // first record all uniform updates into a linear buffer, and store offsets
    for dc in draw_calls {
        dc.uniform_offset = copy_to_linear_buffer(dc.uniform_data, dc.uniform_size);
    }

    // flush uniform data for entire frame in one go to GL
    glBufferSubData(GL_UNIFORM_BUFFER, 0, max_uniform_offset, linear_buffer);

    // draw loop, only bind offset into big uniform buffer
    for dc in draw_calls {
        glBindBufferRange(GL_UNIFORM_BUFFER, bind_point, uniform_buffer, dc.uniform_offset, dc.uniform_size);
        glDrawElements(...);
    }

Does that make sense, and would it be fast in the current WebGL2 implementations? It basically depends on glBindBufferRange()
being a very fast operation (it should just 'record' the argument values, and not an actual data copy, the big copy would happen
in glBufferSubData()).

In real code I would use alternate between 2 big uniform buffers from frame to frame, that's what I'm doing for dynamic vertex- and index-data, and it looks like this is the fastest way in WebGL.

Thanks!
-Floh.       

Corentin Wallez

unread,
Aug 24, 2016, 6:57:18 PM8/24/16
to WebGL Dev List
Hey Floh,

This would work as you expect on an OpenGL-backed WebGL2 context, however from my understanding, on an ANGLE D3D11-backed one, things are a bit more complex.

Because D3D buffer are less flexible than WebGL's, ANGLE keeps a cache of backing buffers for each WebGL buffer. If the D3D11 device supports binding uniform buffers with an offset, then the copy from system memory to constant buffer memory is done once, at the next draw call using that buffer. If offsets are not supported, then ANGLE uses a tiny cache of buffers per offset it has seen which would result in a data copy per draw call.

It is linked to the ConstantBufferOffsetting member of D3D11_FEATURE_DATA_D3D11_OPTIONS. From this Wikipedia page (search for offsetting) it seems to affect only old AMD and NVIDIA card as well as Ivy Bridge Intel GPUs. I'm what proportion of users that represents though.

If you really need it, you could speed up thing up a little bit by making the interface block an array and updating the offset at a lower frequency but that's a lot of effort.

Hope this helps,

Corentin

Andre Weissflog

unread,
Aug 25, 2016, 2:36:39 AM8/25/16
to WebGL Dev List
This is exactly the advice I'm looking for :)

I knew that the buffer-offsetting exists in some D3D11 feature level, but wasn't sure how ANGLE handles it. Once I got a first implementation running I'll see how it performs on different systems. But I know now that I'm generally heading into the right direction.

Thanks!

Jamie Madill

unread,
Sep 7, 2016, 1:09:18 AM9/7/16
to webgl-d...@googlegroups.com
Couple notes:

- buffer offsetting works well on Windows 8 and 10 machines, since they have the new features of D3D11.1. Windows 7 uses the emulation Corentin mentioned above.
- Intel currently doesn't handle UBOs perfectly due to some issues with how ANGLE does caching like Corentin explained. We work OK for some cases, but not all, especially Win7 buffer offsetting emulation.
- If you BufferData with GL_UNIFORM_BUFFER, ANGLE will try to do the right thing and init the ID3D11Buffer immediately with constant buffer usage immediately.

--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-list+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andre Weissflog

unread,
Sep 7, 2016, 5:25:01 AM9/7/16
to WebGL Dev List
Thanks for the info! I have this generally working now, and indeed Intel GPU on Windows has problems (on Chrome it even seems to crash the graphics driver, on Firefox I just get a black canvas), see the tickets URLs below.

I also found a few other things that might be interesting for others:

- GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT can be pretty big, I'm seeing 256 bytes on the configs I tested on (I'm also seeing this restriction in other APIs like Metal or D3D12), this means that for simple scenarios with just a few uniforms per drawcall, a lot of empty space needs to be reserved and transferred via glBufferSubData(). I wonder if the traditional granular glUniform calls have the same 'wastage' under the hood.

- the size argument in glBindBufferRange() must be a multiple of GL_UNIFORM_BLOCK_DATA_SIZE, Chrome correctly warns about this, Firefox doesn't, and also all native implementations I tested on don't care or warn about this (see: https://bugs.chromium.org/p/chromium/issues/detail?id=642304 and https://bugzilla.mozilla.org/show_bug.cgi?id=1300078)

- in order to update the whole uniform buffer before the first draw call I had to change my GL renderer backend to record draw commands into my own little command buffer, and I need to copy granular uniform buffer updates into a separate linear memory buffer before copying in one big chunk it via glBufferSubData(), in this situation where glMapBuffer() would be really handy

- the big per-frame uniform-buffer is double-buffered (alternated each frame) to prevent sync-stalls, I do this already for other sorts of buffer updates and this seems to work well in browsers (the other trick of 'buffer orphaning' doesn't make sense in WebGL as far as I'm aware).

I'm trying to come up with a demo which compares traditional glUniform() calls vs my method with the one big uniform buffer update and glBindBufferRange() per draw call. Because of the extra copy and alignment requirements I'm a bit concerned whether the uniform buffer scenario even makes sense (despite the fewer WebGL calls for complex shaders with many uniforms).

Cheers,
-Floh.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.

Andre Weissflog

unread,
Sep 7, 2016, 5:29:23 AM9/7/16
to WebGL Dev List
Oops, correction: GL_UNIFORM_BLOCK_DATA_SIZE must be queried from GL when for each uniform block and glBindBufferRange() must be called with this value. The size argument is *not* a 'multiple of GL_UNIFORM_BLOCK_DATA_SIZE'.

Zhenyao Mo

unread,
Sep 7, 2016, 2:14:05 PM9/7/16
to webgl-d...@googlegroups.com
On Wed, Sep 7, 2016 at 2:25 AM, Andre Weissflog <flo...@gmail.com> wrote:
> Thanks for the info! I have this generally working now, and indeed Intel GPU
> on Windows has problems (on Chrome it even seems to crash the graphics
> driver, on Firefox I just get a black canvas), see the tickets URLs below.
>
> I also found a few other things that might be interesting for others:
>
> - GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT can be pretty big, I'm seeing 256 bytes
> on the configs I tested on (I'm also seeing this restriction in other APIs
> like Metal or D3D12), this means that for simple scenarios with just a few
> uniforms per drawcall, a lot of empty space needs to be reserved and
> transferred via glBufferSubData(). I wonder if the traditional granular
> glUniform calls have the same 'wastage' under the hood.
>
> - the size argument in glBindBufferRange() must be a multiple of
> GL_UNIFORM_BLOCK_DATA_SIZE, Chrome correctly warns about this, Firefox
> doesn't, and also all native implementations I tested on don't care or warn
> about this (see:
> https://bugs.chromium.org/p/chromium/issues/detail?id=642304 and
> https://bugzilla.mozilla.org/show_bug.cgi?id=1300078)

To clarify: WebGL2 don't require the size has to be a multiple of
UNIFORM_BLOCK_DATA_SIZE, it simply requires the data store is big
enough to hold the entire uniform block. The native ES3 doesn't
require this, but according to their spec, the behavior is undefined,
i.e., driver could crash or the shader could get random uniform data.

Zhenyao Mo

unread,
Sep 7, 2016, 2:15:17 PM9/7/16
to webgl-d...@googlegroups.com
On Wed, Sep 7, 2016 at 2:29 AM, Andre Weissflog <flo...@gmail.com> wrote:
> Oops, correction: GL_UNIFORM_BLOCK_DATA_SIZE must be queried from GL when
> for each uniform block and glBindBufferRange() must be called with this
> value. The size argument is *not* a 'multiple of
> GL_UNIFORM_BLOCK_DATA_SIZE'.


Right, just replied then I saw this. The size can be
UNIFORM_BLOCK_DATA_SIZE or bigger, but smaller would trigger an
INVALID_OPERATION at draw calls.
Reply all
Reply to author
Forward
0 new messages