Intercepting WebGL in a webview, serializing, and rendering natively with ANGLE?

967 views

Skip to first unread message

Simon Taylor

unread,

Feb 11, 2022, 3:17:56 PM2/11/22

to angleproject

Hi all,

We're building a product called "Zapbox" that leverages smartphones to deliver VR and video-see-through MR at a really affordable price (a la Google Cardboard) but fully 6-DoF. It also has 2x 6-DoF controllers; tracked via computer vision on the smartphone but also including accelerometer / gyro for better tracking and lower latency, along with Oculus Quest compatible inputs.

We're aiming for somewhere around the $60 price point at retail, which we think should make for a pretty compelling product. I've attached a render to give a better idea.

Native apps will be able to embed our SDK directly, but we're also keen to allow users to run content from the web, ie by somehow exposing WebXR.

The SDK / runtime part will have to be in a native app so the web content would be in a webview. Though we could try making the webview background transparent and just polyfill the stereo-rendering side of WebXR and pass across the inputs and tracking updates, we'd ideally like to ensure full control over exactly when the rendering happens, to allow for stuff like late warping to minimise latency.

So I think the only viable approach is to intercept and serialise WebGL calls, and deserialise and render them on the native side.

I've successfully put together a quick proof-of-concept that just replayed a subset of WebGL 1 in a native ES 2.0 context and that was sufficient to run a few simple three.js demos. Covering the full webgl2 API surface, and with iOS still keen to deprecate OpenGL ES, is what brings me to ANGLE...

My understanding is rather than replaying to a native context, I could replay to an ANGLE-provided context to make use of the Metal backend on iOS and avoid needing to manually handle things like UNPACK_FLIP_Y_WEBGL, GL_DEPTH_STENCIL_OES to GL_DEPTH24_STENCIL8_OES, etc?

Secondly I wondered if there was some existing command buffer stuff that I could reuse, at least on the decoding side. Is that code in ANGLE or somewhere else in Chromium?

Any thoughts or pointers gratefully received!

Thanks everyone,

Simon

full_anzb_kit.jpg

Simon Taylor

unread,

Feb 18, 2022, 8:06:23 AM2/18/22

to angleproject

I realize that was probably a bit too open-ended a question. Having done some more reading around of the Angle docs and code, I think the parts I'm really more interested in understanding are elsewhere in Chromium - specifically the WebGL -> Open GL ES mapping and the command buffer encoding / decoding.

Are there any high-level overview docs for the whole Chromium WebGL stack that someone can point me towards?

Feel free to stop reading here, but I also thought I'd provide a bit more background on my previous proof-of-concept. I've put a video here showing a demo, which demonstrates this three.js example.

The video shows an iOS WKWebView semi-transparent on top of a native OpenGL ES 2.0 view. The camera frame is from ARKit, so is pretty high-res and 60 FPS - rendered directly in the native context with YUV conversion in a shader.

When ARKit reports a new pose for the anchor, we send a message across to the webview, which does whatever rendering it wants into a webgl canvas as normal, but with the webgl context wrapped into a "SerializingWebGLContext".

Most webgl calls through that class are just encoded into an ArrayBuffer with a simple JS binary serializer. There's still a real underlying WebGL context too, so any creation commands can generate real WebGL objects but we add a property into the objects with a unique integer ID that we use when we need to serialize calls referencing those objects. The underlying context also allows an easy passthrough mode for debugging. Shader compilation and linking always go through the underlying context as well as being serialized, and before serialising the linkProgram call we get the locations for all the active attributes and serialize fake bindAttribLocation calls for all of them to ensure consistency between the WebGL and native contexts.

The native side then deserializes the calls and plays them back on a native OpenGL ES 2.0 context. The native deserialiser also handles the required webgl -> Open GL ES mapping (eg it tracks UNPACK_FLIP_Y_WEBGL state and flips data before calling texImage2D, etc).

There are a couple of more straightforward approaches I considered to achieve something similar:

1) Just render the camera frame in the native context, and the content in a normal webgl canvas with a transparent background in the webview. The downside is there's no control over exactly when the webgl content will appear on screen, so the content and camera are likely to be updated out of sync and the content won't look correctly anchored on the camera frame.

2) Pass across the raw camera data from native to a webview, and just texImage2d in webgl and do all the rendering in the webview, no need for compositing. Pretty simple to implement and gives a lot of flexibility for the webpage to process / render the camera frames as it wants, but adds quite a bit of overhead. It was feasible on iOS with the WKWebView API but I found the android webview APIs for passing binary data from native -> webview had too much overhead (see Chromium Bug 1023334 where I was being somewhat more secretive on my true intentions to use this for camera textures...).

So the "serialize and playback webgl" approach was my favourite but never went beyond the PoC. Now I'm investigating officially adding support for web content with our Zapbox VR/MR product, this is again likely the best solution to give the lowest overhead and the best control over latency.

So to summarise the stack is:

1. Intercept and serialize webgl to a binary command buffer of some sort. I assume we'll need to write / maintain this bit.

2. Map webgl command stream to ES 2/3 command stream. Perhaps something in ANGLE/Chromium that can help here?

3. Native ES 2/3 context to render the content. I know ANGLE can help here for platforms without native ES2/3 backends and I'm also intending to try the iOS metal backend.

So before I dive into trying to bring my PoC implementation for 1. and 2. up to production quality and increase the API coverage I thought I'd see if anyone has ideas for leveraging anything in the chromium webgl stack that might help for those parts?

Thanks again!

Simon

Ken Russell

unread,

Feb 22, 2022, 7:05:40 PM2/22/22

to si...@zappar.com, angleproject

Hi Simon,

Some design documents for Chromium's graphics stack - most of which haven't been updated in a long while - are here:

https://www.chromium.org/developers/design-documents/chromium-graphics/

It's a lot to read and digest, and I wouldn't really suggest you do so.

Your results look nice and smooth. I'd like to be able to recommend you just use WebXR, but Apple hasn't shipped it in WebKit yet.

I'm afraid our team is really heavily committed and doesn't have the time to advise on this project, though it sounds interesting. This direction of serializing WebGL calls is going to be fraught with corner cases; for example, all of the texImage2D overloads taking DOM elements like videos will have to be rewritten. This sounds like a long project with no general solutions. If you're only trying to support a few well-defined cases of rendering WebGL content, that seems more tractable.

Chrome, Firefox and Safari are all either shipping a separate GPU process or are in the process of doing so. You could look at any such browser's implementation of this serialization and deserialization, but I don't think it will trivially apply to your use case. Chrome's is in:

https://source.chromium.org/chromium/chromium/src/+/main:gpu/command_buffer/

ANGLE has the option of creating its EGL contexts with "WebGL compatibility" mode enabled, which subsumes the majority of validation of the WebGL command stream. Still, there is a fair amount of code at the WebGL layer, beyond what ANGLE implements.

Again, having heard your description here, I can't in good faith recommend you pursue this direction. Just using a transparent WebView on top of the camera stream and trying to keep the WebGL rendering fast to reduce latency sounds like a more feasible approach.

-Ken

--
You received this message because you are subscribed to the Google Groups "angleproject" group.
To unsubscribe from this group and stop receiving emails from it, send an email to angleproject...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/angleproject/9c05d3b4-7d35-4b96-885c-4481d97d98adn%40googlegroups.com.

Simon Taylor

unread,

Feb 24, 2022, 9:38:45 AM2/24/22

to Ken Russell, angleproject

Hi Ken,

Thanks for the message, great to get your insight.

I’ve put some responses in-line below, but I completely understand this is non-core stuff so don’t feel any obligation to reply again!

On 23 Feb 2022, at 00:05, Ken Russell <k...@chromium.org> wrote:

Hi Simon,

Some design documents for Chromium's graphics stack - most of which haven't been updated in a long while - are here:
https://www.chromium.org/developers/design-documents/chromium-graphics/

It's a lot to read and digest, and I wouldn't really suggest you do so.

Thanks, I’ve skimmed a few but as you say there’s a lot there.

I think I had previously read some of that, in particular the old WebGL2 / ANGLE planning doc from 2015:

https://docs.google.com/document/d/1MkJxb1bB9_WNeCViVZ4bf4opCH_NhqFn049xGq6lf4Q/edit

In very high level terms it sounds like the plan was to move validation into ANGLE, along with driver workarounds etc, simplifying the command buffer layer.

"MANGLE is in large part driven by the desire to replace command buffer, to avoid doing similar kinds of work (validation, state tracking, driver workarounds) on two levels in Chromium. Eventually we’d like to treat command buffer as an RPC layer on top of ANGLE, with little in the way of workarounds and translation at that level.”

Is that how things ended up working out?

Your results look nice and smooth. I'd like to be able to recommend you just use WebXR, but Apple hasn't shipped it in WebKit yet.

I'm afraid our team is really heavily committed and doesn't have the time to advise on this project, though it sounds interesting. This direction of serializing WebGL calls is going to be fraught with corner cases; for example, all of the texImage2D overloads taking DOM elements like videos will have to be rewritten. This sounds like a long project with no general solutions. If you're only trying to support a few well-defined cases of rendering WebGL content, that seems more tractable.

The texImage2d cases are quite easy to get functional just with a separate conversion webgl canvas doing texImage2d -> drawElements -> readPixels and serialising as-if an ArrayBuffer was passed directly. For performance I agree we’d probably want a specific wrapper for video textures that would keep them on the native side entirely.

A bigger problem is functions that need to return results synchronously - readPixels being the biggie (probably wouldn’t be supported) but get[*] / getError also a bit problematic. The easiest solution seems to be relying on the real underlying webgl context for those, would just need to ensure all state-setting calls were also passed through. Passing through all the gl calls except clear / drawArrays / drawElements probably gives a reasonable balance between correctness and performance. In cases where frames don’t use any get* calls we can probably avoid calling any functions on the underlying context though.

The video I shared was more to demonstrate I’d had some success with this approach in the past (that was a couple of years ago). My interest now is in using a similar strategy for our VR (& video-see-through MR) product that targets current Android / iOS phones. I’m definitely looking forward to Apple shipping their WebXR implementation for mobile use-cases, but right now I’m thinking more about the headset & controllers slice of WebXR that won’t be natively supported on the mobile platforms.

I’d love WebXR content to run on Zapbox as a first-class citizen, with the same latency as native apps that embed our SDK, so I think this approach is the only viable way to achieve that - at least for iOS where we have to use WKWebView. We could ship a full browser on android, but leveraging system webview would be my preference there too.

I’m not necessarily worried if we can’t hit full webgl conformance, I imagine content authors may be willing to adjust their content for optimal performance on Zapbox. Obviously the closer we can get to full conformance the better.

Chrome, Firefox and Safari are all either shipping a separate GPU process or are in the process of doing so. You could look at any such browser's implementation of this serialization and deserialization, but I don't think it will trivially apply to your use case. Chrome's is in:
https://source.chromium.org/chromium/chromium/src/+/main:gpu/command_buffer/

ANGLE has the option of creating its EGL contexts with "WebGL compatibility" mode enabled, which subsumes the majority of validation of the WebGL command stream. Still, there is a fair amount of code at the WebGL layer, beyond what ANGLE implements.

Thanks for the hints - I’ll take a look at the Chromium command buffer code, as you say I expect it will prove non-trivial to make use of it.

When I was first considering this in 2019 Safari didn’t support webgl2 at all, so I was only really thinking about using the Chromium implementation as a reference.

Thinking about it again now, I realise Webkit/Safari might be more useful - currently I believe WebGL still runs in the content process with the WebGL calls going directly into ANGLE. So it might be a more straightforward reference of how to implement WebGL 1/2 contexts backed by ANGLE. It also potentially opens the door to re-using that code on the native side. I’d still need to write (or likely code-generate from Typescript definitions or something) a thin WebGL RPC layer but that does feel broadly feasible given the success of the PoC.

Again, having heard your description here, I can't in good faith recommend you pursue this direction. Just using a transparent WebView on top of the camera stream and trying to keep the WebGL rendering fast to reduce latency sounds like a more feasible approach.

-Ken

That’s definitely a reasonable fallback - polyfill the stereo distortion part of the WebXR headset API on the JS side, do all content rendering into the webview, with the camera feed in the background on a low-latency native GLES or Metal view.

Thanks again for the thoughts and advice - you may well prove to be right that it ends up infeasible to achieve a useful level of conformance, but |’m not quite ready to throw in the towel on the serialisation approach just yet!