Diffs between in Linux and in ChromeOS

57 views
Skip to first unread message

Leon Han

unread,
Oct 26, 2020, 12:24:10 PM10/26/20
to WebGL Dev List
Hi, 

Regarding WebGL impl in the GPU process, by my current understanding, [1] it's based on Native EGL/GLES2 in ChromeOS, but EGL/GL implemented using ANGLE in Linux. Is it right?...

And, could anyone help clarify whether this could bring some perf/power diff? We are analyzing an issue about higher power consumption in Linux compared to ChromeOS, any hints for that would be appreciated... Thanks much for your help.


BR,
Han Leon

Ken Russell

unread,
Oct 26, 2020, 5:17:48 PM10/26/20
to WebGL Dev List
Hi Leon,

Chromium's WebGL implementation on Linux was just changed to run on top of ANGLE, yes. Yes, on ChromeOS the browser is still binding directly to EGL / GLES2 / GLES3.

Of course, anything is possible with a large architectural change like this. In general we have seen significant performance improvements from the switch; see for example http://crbug.com/1045643#c9 .

I think the Chromium-specific group https://groups.google.com/a/chromium.org/g/graphics-dev would be a better forum for discussing this topic. Would you consider posting your question there? Also, please include some sort of reproduction scenario for the problem you're seeing.

Thanks,

-Ken



--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webgl-dev-list/cec98a44-4414-4735-9890-024d3409a162n%40googlegroups.com.

Andre Weissflog

unread,
Nov 2, 2020, 9:56:41 AM11/2/20
to WebGL Dev List
> ...we have seen significant performance improvements...

Ooooh that's very interesting, because I'm seeing some strange performance differences around bufferSubData() between the WebGL implementation on ChromeOS (and also Android) versus other platforms.

Here's an example (crank up the slider):


This runs great on Windows where ANGLE is running on top of D3D, runs absolutely terrible on ChromeOS and Android (single-digit fps), and "so-so" on Linux and MacOS (drops to 30fps and wobbly framerate).

This is using an update pattern new geometry is appended to a buffer via bufferSubData, and then this new data is rendered, repeating several times per frame, next frame switching to a new buffer:

frame0:
append(vbuf0); append(ibuf0); draw() => append(vbuf0); append(ibuf0); draw() => ...

frame1:
append(vbuf1); append(ibuf1); draw() => append(vbuf1); append(ibuf1); draw() => ...

frame2:
append(vbuf0); append(ibuf0); draw() => append(vbuf0); append(ibuf0); draw() => ...

Any suggestions how to improve this update-draw-pattern across all WebGL platforms would be very welcome ;)

Cheers,
-Floh.

On Monday, 26 October 2020 22:17:48 UTC+1, Kenneth Russell wrote:
Hi Leon,

Chromium's WebGL implementation on Linux was just changed to run on top of ANGLE, yes. Yes, on ChromeOS the browser is still binding directly to EGL / GLES2 / GLES3.

Of course, anything is possible with a large architectural change like this. In general we have seen significant performance improvements from the switch; see for example http://crbug.com/1045643#c9 .

I think the Chromium-specific group https://groups.google.com/a/chromium.org/g/graphics-dev would be a better forum for discussing this topic. Would you consider posting your question there? Also, please include some sort of reproduction scenario for the problem you're seeing.

Thanks,

-Ken



On Mon, Oct 26, 2020 at 9:24 AM Leon Han <leo...@intel.com> wrote:
Hi, 

Regarding WebGL impl in the GPU process, by my current understanding, [1] it's based on Native EGL/GLES2 in ChromeOS, but EGL/GL implemented using ANGLE in Linux. Is it right?...

And, could anyone help clarify whether this could bring some perf/power diff? We are analyzing an issue about higher power consumption in Linux compared to ChromeOS, any hints for that would be appreciated... Thanks much for your help.


BR,
Han Leon

--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-d...@googlegroups.com.

Ken Russell

unread,
Nov 3, 2020, 2:13:05 PM11/3/20
to WebGL Dev List
Hi Andre,

Thanks for your example - on macOS, it runs at 60 FPS with Chrome's older "validating" command decoder, and unevenly with ANGLE and the new "passthrough" command decoder. You can toggle between these on the command line:


/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --user-data-dir=/tmp/t1 --use-cmd-decoder=validating


vs:


/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --user-data-dir=/tmp/t1 --use-cmd-decoder=passthrough



There's no good reason the passthrough command decoder should be slower. I filed http://crbug.com/1145248 to track this. Please feel free to follow up on it with any additional information. I haven't tried profiling or running a trace yet.

Is there any chance you could provide another smaller test case in JavaScript? It would make it a lot easier to try different permutations of the test case.

On first glance I would try to avoid doing repeated bufferSubData/draw pairs to and from the same buffer in the same frame. Is there any way for you to batch up all of your buffer updates and issue a single draw call encompassing all of them at the end of the frame?

Thanks,

-Ken



To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webgl-dev-list/913fef29-ba0f-4d81-af16-5a28a6235853o%40googlegroups.com.

Andre Weissflog

unread,
Nov 5, 2020, 10:33:18 AM11/5/20
to WebGL Dev List
Is there any chance you could provide another smaller test case in JavaScript? It would make it a lot easier to try different permutations of the test case.

In this case it's a bit tricky, but I'll see if I can cobble something similar together in JS, will take a while though.
 
On first glance I would try to avoid doing repeated bufferSubData/draw pairs to and from the same buffer in the same frame. Is there any way for you to batch up all of your buffer updates and issue a single draw call encompassing all of them at the end of the frame?

Sort of, I can batch up all the vertex-updates from Dear ImGui and upload them in a single glBufferData call before drawing from that buffer. The actual drawing needs to be split into separate draw calls though, because there are also scissor rect updates inbetween.

Reportedly and surprisingly the original Dear ImGui GL example code works much better in WebGL on Android and Chromebooks, this simply does many small glBufferData() updates of varying sizes into the same buffer in the same frame followed by glDrawElement() calls:


...I'd have thought that this is pretty much the worst case *shrugs*

Thanks & Cheers!
-Floh.

Jeff Gilbert

unread,
Nov 5, 2020, 12:33:55 PM11/5/20
to webgl-d...@googlegroups.com
I should probably add it to the webgl best practices, but generally
for graphics you'll indeed want to have a two phase updates/draws
split, even if you have multiple updates and draws. Interleaved
draw/update is likely to cause stalls at some point or another in the
pipelines down to the hardware, even if not at the browser level. (New
updates wanting to wait on old updates to finish)

Fewer updates, like fewer draw calls, is better, but just splitting
them into phases instead of interleaving should be generally
benefitial!
> --
> You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/webgl-dev-list/eb689326-5d8e-451c-a334-f72dbb656039o%40googlegroups.com.

Ken Russell

unread,
Nov 5, 2020, 8:51:56 PM11/5/20
to WebGL Dev List
Agreed - as I understand it, issuing a draw call which sources data from a buffer and then updating the buffer requires shadow copies to be made inside the driver.

If you can boil this test case down further, it would be very interesting to know whether it's the interleaved bufferSubData/draw calls that are adding the cost, or the updating of the scissor rectangles. If the latter, then perhaps those rectangles could be passed down as per-vertex (or instanced) data, and the discard done inside the fragment shader instead of using the scissor rectangle, again to get better batching.

Looking forward to seeing what you come up with and to optimizing this scenario!

-Ken



Andre Weissflog

unread,
Nov 8, 2020, 11:13:54 AM11/8/20
to WebGL Dev List
My (somewhat naive I guess) assumption was that the WebGL implementations essentially build a render-command list under the hood the same way it is exposed in modern 3D APIs like Metal and WebGPU, for instance the same update-draw pattern in my Metal backend looks like this:

- memcpy new chunk of data to mapped buffer content
- on macOS only call didModifyRange to mark the dirty data
- 'bind' buffer with offset to newly copied data
- issue a draw call for the newly copied data

The last two steps are just recordings into the RenderCommandEncoder, so no synchronization with the buffer updates need to happen until the end of the frame when the recorded rendering commands are committed.

But I guess as soon as the WebGL wrapper code passes the GL calls to the underlying GL implementation, all bets are off (e.g. when ANGLE runs on top of D3D, the same pattern doesn't seem to incur any penalties).

Cheers!
-Floh.

> To unsubscribe from this group and stop receiving emails from it, send an email to webgl-d...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/webgl-dev-list/eb689326-5d8e-451c-a334-f72dbb656039o%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-d...@googlegroups.com.

James Darpinian

unread,
Nov 9, 2020, 3:12:03 PM11/9/20
to webgl-d...@googlegroups.com
The performance issue comes in the second frame when you call bufferSubData on the same buffer you used in the first frame (or, if you overwrite part of a buffer that has already been used for rendering in the same frame). Because of pipelining the first frame has not actually been rendered yet, and the old contents of the buffer need to be preserved. So the driver makes a copy, and for various reasons it may copy more than strictly necessary. For example, it may copy the entire buffer instead of a small part, or it may copy data even though the ranges you updated were not actually overlapping previously used ranges. Some platforms will handle this better than others.

When you use bufferData instead of bufferSubData, you're not overwriting the buffer's existing memory. Internally the driver simply keeps the old memory around until rendering actually happens and then discards it automatically. This is called "orphaning". So the old data doesn't need to be copied, and the new data is placed in newly allocated memory. This does require allocation, but in a good driver the allocation will be optimized, for example by maintaining a pool of preallocated chunks of memory of the right size, and letting orphaned memory return to the pool after use. You can also maintain a queue of buffers manually yourself, however you don't know how deep the pipeline is so you don't know how many buffers you need in your pool. Probably the best thing is to let the driver handle this, but help it out by making your bufferData calls the same size every frame as much as possible so it can easily reuse orphaned chunks of memory as they return to the pool.

So the imgui example code using bufferData may be the right thing to do in this case. As a bonus it is also simpler :)

To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webgl-dev-list/2907f064-e7c1-4f08-afdd-9f9ecd06cbf7o%40googlegroups.com.

Andre Weissflog

unread,
Nov 10, 2020, 6:51:51 AM11/10/20
to WebGL Dev List
> This is called "orphaning"

I actually tried the explicit orphaning trick (calling glBufferData with a nullptr and the buffer size) in WebGL several times over the years without ever seeing a difference in behaviour, it works fine in native GL implementations but seems to do nothing in WebGL (via the emscripten GL shim, I actually didn't check so far how the shim handles a glBufferData call with a nullptr argument). Instead the fact that the Dear ImGui example renderer works "well enough" in WebGL seems to imply that some sort of implicit orphaning happens in glBufferData call without calling it with a nullptr/size pair)..

One orphaning-variation that I tried together with my appending glBufferSubData pattern I described above was "unlinking" the inflight-buffer from the last frame at the start of a new frame by calling glBufferData with nullptr and size once before the glBufferSubData+Draw sequence starts for the new frame, but that didn't make any difference.

It would actually be nice to have some sort of "magic call sequence" which guarantees to trigger buffer orphaning in WebGL, but it's a bit late and a bit much to ask ;)

-Floh.

James Darpinian

unread,
Nov 10, 2020, 1:15:58 PM11/10/20
to webgl-d...@googlegroups.com
You're right, calling glBufferData *always* orphans the buffer, null is not required. I'm not sure why that page recommends calling glBufferData with null specifically.

If you are drawing from the same buffer multiple times in a frame with bufferSubData calls in between, then orphaning the buffer at the beginning of a frame will not help much. It's best to orphan the buffer each time you upload to it after a draw, regardless of frame boundaries. Using bufferData instead of bufferSubData accomplishes that automatically.

--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.

Ken Russell

unread,
Nov 10, 2020, 3:22:19 PM11/10/20
to WebGL Dev List
It'd be interesting to know whether the major speed loss comes from the multiple bufferSubData + draw calls per frame, or the changing of the scissor rectangle between each draw. Can another code path be added which assembles the buffer's data so that one bufferData call per frame is made, followed by the multiple draw calls and scissor rect changes?



Andre Weissflog

unread,
Nov 14, 2020, 8:04:49 AM11/14/20
to WebGL Dev List
I have created two new variations:

This doesn't do any scissor rect upates between the buffer updates and draw calls:


...and this does a glBufferData(tgt, buf_size, nullptr, GL_STREAM_DRAW) right before the glBufferSubData() calls:


I also added a comment with those links to the crbug tickets.

I haven't created a version yet which batches all geometry updates into a single glBufferData() call, I'll try this next because all other options seem to be exhausted :)

Reply all
Reply to author
Forward
0 new messages