New Servo WebGL Architecture Proposal (benchmarks included!)

Showing 1-7 of 7 messages
New Servo WebGL Architecture Proposal (benchmarks included!) Imanol Fernández 7/21/17 1:58 PM
Hello,

I want to share the ideas, code and some benchmarks of the new Servo WebGL
architecture and refactor I'm working on. I'd love to hear some feedback
about the big picture. For smaller things or nits we could do that in the
PR review (which I'll plan to do  beginning next week if there aren´t
strong opinions about the overall architecture).

First, I'll mention some of the problems of the current Servo/WebRender
WebGL implementation and source code organization

* Part of the WebGL implementation is included in WebRender. This makes
very slow to make WebGL contributions. For example in order to add a new
WebGL method I usually need push a PR to gleam (wait until it's merged and
the crate published), then submit another PR to WebRender (wait for the
review & merge) and wait again until the new WR is merged into Servo (with
low control about it because WR can have in-progress non-related features
not ready-to merge)

* When adding WebGL related code to WR, the WebGL code is not tested when
it lands into WR. It's only tested when a WR update lands into Servo. When
a WebGL related test fails in a WR update it's usually fixed by reverting
the commit that caused it to avoid delaying the WR update. This makes the
testing process difficult.

* The current implementation creates a new WebGL renderer thread (with it's
ipc-channel) for each canvas created in JavaScript. This can quickly
exhaust the file descriptor's available in the system for webpages that
create a lot of canvases (e.g Shadertoy) or on games that require many
canvases. Creating a canvas instance in JavaScript is slower than it should
because of this too.

* WebGL commands are currently run in the WR backend thread which could be
bad for UI latency (see https://github.com/servo/webrender/issues/607)

* Currently there are two ipc-channels steps for each WebGL-call to hit the
driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of
overhead.

* Slow compilation times: Modifying a WebGL command code recompiles
WebRender_traits which causes additional component recompilations in Servo.

* Flickering and shyncronization problems with the WR compositor (I created
this issue some time ago: https://github.com/servo/servo/issues/14235)

Goals that I want to achieve in the new WebGL architecture design:

* Make WebGL development cycle easier.
* Minimize the render path overhead for WebGL commands.
* Flexible architecture for multiprocess and inprocess scenarios.
* A special mode for packaging WebGL applications (not enabled by default,
more details later)
* Faster compilation times
* Proper shyncronization to avoid flickering issues
* Compatibility with the WebVR render path

Now I'm going into some detail of how I tried to best achieve each stated
goal. You can check the "almost ready for PR" source code here:
https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf
(still have to add more docs, do a rebase and clean some tests)

Make WebGL development cycle easier.
-------------------------------------------------------

This is fixed by splitting the WebGL code into it's own component and
removing all WebGL code from WebRender. Additional benefits for WebRender
(see https://github.com/servo/webrender/issues/1353):

* Simplify the internals of  resource cache by removing the GL context
management code.
* Remove the WebGL cargo feature, since not all clients use it.

I opted to move the WebGL code into a Servo component. It eases
development, testability and I don't think that there are use cases to use
the component outside Servo. Anyway, it's a low coupled component, so it
will be easy to move if required but I don't think that it's worth it.

The connection between a WebGL canvas and WR is implemented via the WR
ExternalImage APIs as Glennw recommended.

When the new Architecture Lands in servo, I'll make a separate PR to
removes all the unneeded WebGL management code in WR.

Minimize the render path overhead for WebGL commands
------------------------------------------------------------------------------

The new architecture creates a single thread/process to manage all WebGL
commands from multiple canvas sources. This will reduce the footprint for
creating a new canvas (avoid creating specific thread + ipc-channel) and
reduces the render-path for each WebGL command by using a single "channel
step"

Flexible architecture for multiprocess and inprocess scenarios
----------------------------------------------------------------------------------

One of the things I don't like about the current implementation is that it
uses a IpcSender<WebGLCommand> instance as the entry point to send the
WebGL commands from the Script Thread to the renderer in many parts of the
source code. IMO this is not very flexible because we may be interested in
batching WebGLCommands, using shared memory, or other channel approaches to
improve performance.

A WebGL demo can easily send e.g 200 channels messages per frame (doubled
in VR mode) and we need to achieve steady framerates (e.g. 90 fps on
desktop VR).

I decided to create custom/opaque types for all WebGL API traits:
WebGLSender, WebGLReceiver, WebGLChan, etc. instead of using  fixed
IpcSender<T> types. This types are defined at compilation time with no
runtime overhead (e.g.: we can easily do pub type WebGLSender<T> = IpcChanne
l<T>).

Additionally I also tried to make the threading implementation flexible
using some templating in the WebGLThread struct implementation, which uses
types that can be changed at compile time based on a cargo feature (or
using a runtime preference just adding a enum wrapper). The thing I like
best is that we can easily change the threading and channel models only
modifying one or two small files, while the remaining WebGL will keep
untouched. I created 2 WebGL threading/channel models based on this idea
(some interesting performance benchmarks at the end o the post!)

* Multiprocess. This model creates a unique process/thread to handle all
WebGL commands from all canvases from all script threads. It uses
ipc-channels for communication. Servo is not ready yet for a real
multiprocess mode and we'll need to add a lot of platform dependant code in
Webrender and rust-offscreen-context (eg. IOSurface/GLXPixmap/etc). But I
wanted to add this model now so we can start testing or adding some steps
towards a real multiprocess mode.

* Inprocess. This mode lazily creates a WebGL renderer thread for each
Servo ScriptThread/tab and uses pure std mpsc channels. Using mpsc channels
has a lot of impact in the performace because the channel is faster and it
avoids serialization overhead (serialization also happens enabling the
force-inprocess mode in servo/ipc-channel crate). w.r.t threading model it
could be easily changed to create a single thread for all
ScriptThread/tabs. It was a bit more difficult to implement (and to sync
with the WR ExternalImage callbacks) but I opted for different threads per
tab because:
-- This way we only need to use two threads at the same time for sending
WebGL commands from script to the renderer. This allows to use a spsc
channel instead of a mpsc channel which allows a faster shyncronization
algorithm. AFAIK a rust std mpsc channel uses a faster spsc algorithm until
it is cloned. We could add another kind of ad-hoc spsc channel
implementations for the fastest performance (have you used/ do you
recommend any of them?)
-- Having different threads per tab wii not exhaust CPU because
requestAnimationFrame only runs in the current active tab per spec so the
other threads will be paused.

A special mode for packaging WebGL applications
--------------------------------------------------------------------
In addition to the multiprocess/inprocess modes I also added a "packaging
mode" to improve performance in this scenario.

Disclaimer: Before joining Mozilla I was a core developer in Cocoon.io. We
created a custom "browser" from scratch called canvas+ (using standalone V8
and JavaScriptCore VM engines) to run WebGL/Canvas2D games. We were able to
achieve better WebGL/Canvas performance than the Chromium/Safari webviews
available in Android/iOS. Some of the optimizations we did are not allowed
by the spec or security rules in a multiparadigm browser that loads
arbitrary content. There are other engines/companies that opted for this
approach too instead of using multiparadigm webviews (e.g: Impact Engine,
Chukong and cocos2dx-html, etc.).

My idea is to use this special mode (really a cargo feature) in order
include some optimizations that these kind of standalone VM engines do. It
will only be enabled for packaging trusted source code (e.g. android apks,
electron, and more). When you package a tested and trusted source code you
don't really need a lot of the validations, error checking or security
rules that the spec enforces or a full browser requires. Some examples or
forbidden things that could be allowed in this mode:

* Immediate WebGL calls to a GL context living in the Script thread (I did
some benchmarks with this enabled (sometimes it's faster running the gl
call than sending it) . I'd like to focus on a thread bases render path
though in order to parallelize JS code better)
* Rendering a full screen WebGL scene to the main window context (e.g.
bypass all the compositor)
* Disable some validations and error checking enforced by the spec.

Faster compilation times
-----------------------------------------------------------

WebGL code is now splitted into canvas_traits and canvas components.
Changing the DOM code or traits is still compilation costly but now we can
change the WebGL commands implementation or do things like adding logs for
each WebGL call (which I tend to use for testing) with a lot shorter
compilation times (no need to recompile WebRender_traits anymore!)

Proper synchronization to avoid flickering issues
----------------------------------------------------------------

Flcikering issues have improved a lot in the three.js demos I have tested.
The synchronization is done using the ExternalImageHandler lock and unlock
calls that Glennw recommended combined with OpenGL Sync Objects.

There are some flickering issues when moving the mouse on Linux. I think
that's probably a bug in servo. Mouse move seem to trigger extra composites
that aren't synced with the servo animation loop. IMO this could be fixed
when we finish the wip Glutin unfork (
https://github.com/servo/servo/pull/17311)

Benchmarks
-----------------------------------------------------------------
Here are the benchmark results for the old WebGL implementation, the three
modes in the new Architecture (multiprocess, inprocess and packaging) and
Chrome/Firefox.

I used a desktop linux to do the benchmarks (Ubuntu 16.04, GTX 1060,
Skylake i7 4GHz)

I used the Three.js performance test with a fixed camera, and with 5K and
10 3D objects: https://threejs.org/examples/webgl_performance.html

Browser------------------Average FPS-----------Average-FrameTime

Servo Old WebGL 5K: -------26 fps------------------37.31 ms
Servo Old WebGL 10K: ------13 fps------------------71.4  ms
Servo "multiprocess" 5K----43 fps------------------22.61 ms
Servo "multiprocess" 10K---21 fps------------------46.27 ms
Servo "inprocess" 5K-------56-60fps----------------17.13 ms
Servo "inprocess" 10K------28-30fps----------------33.5 ms
Servo "packaging" 5K-------60fps-------------------15.83 ms
Servo "packaging" 10K------32fps-------------------30.92 ms

Firefox 5K------------------42-45fps---------------16.98 ms
Firefox 10K-----------------25fps------------------34.01 ms

Chrome 5K-------------------57-60fps---------------11.5 ms
Chrome 10K------------------42-46ps----------------22.5 ms


Future Steps
-----------------------------------------------------------------
There are other ideas that I wanted to implement but I'll do that in later
PRs (or contributions in this areas are welcomed too!)

* WebGL double buffering: It may be a good idea to improve performance and
synchronization (e.g. ping pong WebGL render textures each frame, so WR
uses a texture to composite the main context while the WebGL frame is
rendering the next frame to a different texture). This may add some memory
overhead. We can implement some texture pooling (I think that Firefox does
that)

* Test/benchmark different spsc channels or message batching or shared
memory in order to improve frame times. We want to be fastest ones in the
benchmarks!

* WebGL 2.0 branch (My idea is to create the layer that reuses some code
with WebGL 1.0 implementation and open a issue will a list of TODOs in
WebGL 2.0 so contributors can start implementing features step by step).

I have some additional questions about Servo that I think may affect the
benchmark performances:

* How does the SpiderMonkey version used in Servo compare to V8?
* Is the cost of JS to Rust call measured for normal functions and for
functions with many arguments? is it compared to bindings implemented in
Firefox/Chrome? When I worked on the standalone VM engine area the binding
JS/C++ binding code had a lot of impact.

Long post... Thanks for reading!

Looking forward to hearing your feedback about architecture and benchmarks
;)

Cheers

Imanol Fernandez
Re: New Servo WebGL Architecture Proposal (benchmarks included!) Alex Webster 7/21/17 2:50 PM
Unsubscribe me
_______________________________________________
dev-servo mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo
Re: New Servo WebGL Architecture Proposal (benchmarks included!) Emilio Cobos Álvarez 7/22/17 10:37 AM
Hi Imanol,
Yeah, I'm to blame for this one... This was one of my very first
contributions to Servo and I followed what Canvas2D did (which is
equally broken), and I never got around to clean it up.

Can't wait to see it going away!

> * WebGL commands are currently run in the WR backend thread which could be
> bad for UI latency (see https://github.com/servo/webrender/issues/607)
>
> * Currently there are two ipc-channels steps for each WebGL-call to hit the
> driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of
> overhead.
>
> * Slow compilation times: Modifying a WebGL command code recompiles
> WebRender_traits which causes additional component recompilations in Servo.
>
> * Flickering and shyncronization problems with the WR compositor (I created
> this issue some time ago: https://github.com/servo/servo/issues/14235)> Goals that I want to achieve in the new WebGL architecture design:
>
> * Make WebGL development cycle easier.
> * Minimize the render path overhead for WebGL commands.
> * Flexible architecture for multiprocess and inprocess scenarios.
> * A special mode for packaging WebGL applications (not enabled by default,
> more details later)
> * Faster compilation times
> * Proper shyncronization to avoid flickering issues
> * Compatibility with the WebVR render path
>
> Now I'm going into some detail of how I tried to best achieve each stated
> goal. You can check the "almost ready for PR" source code here:
> https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf
> (still have to add more docs, do a rebase and clean some tests)

(and hopefully split it into a few commits? That way would make
reviewing it much easier... reviewing the WebVR landing was honestly
quite a pain, and I'm sure some things slipped through because of it was
a single massive commit)

> Make WebGL development cycle easier.
> -------------------------------------------------------
>
> This is fixed by splitting the WebGL code into it's own component and
> removing all WebGL code from WebRender. Additional benefits for WebRender
> (see https://github.com/servo/webrender/issues/1353):
>
> * Simplify the internals of  resource cache by removing the GL context
> management code.
> * Remove the WebGL cargo feature, since not all clients use it.
>
> I opted to move the WebGL code into a Servo component. It eases
> development, testability and I don't think that there are use cases to use
> the component outside Servo. Anyway, it's a low coupled component, so it
> will be easy to move if required but I don't think that it's worth it.
>
> The connection between a WebGL canvas and WR is implemented via the WR
> ExternalImage APIs as Glennw recommended.>
> When the new Architecture Lands in servo, I'll make a separate PR to
> removes all the unneeded WebGL management code in WR.
>
> Minimize the render path overhead for WebGL commands
> ------------------------------------------------------------------------------
>
> The new architecture creates a single thread/process to manage all WebGL
> commands from multiple canvas sources. This will reduce the footprint for
> creating a new canvas (avoid creating specific thread + ipc-channel) and
> reduces the render-path for each WebGL command by using a single "channel
> step"

Yeah, I agree multiplexing contexts is the way to go.

You still need the Webrender API channel (which in the case of Servo is
another IPC channel), right?

If not, I'd be really curious about how have you managed to get rid of it.
Well, this is hard assuming you need to share textures cross-process,
right? If so, why would we want to do that? In general the content
process shouldn't have access to the GPU.

It'd be nice to have a small explanation about which commands are
executed in which process with your proposed architecture.
I don't love to have less error-checking in general, though I understand
the point... In any case, this sounds like the kind of standalone
feature that would be nice to have in its own PR (or commit)...

Also, is there any use case for this for now? I think if we're going to
introduce a cargo feature for "Servo as electron-like back-end"
officially, we should do so separately, and add tests for this.
Right now the SM version is quite old, though there's an attempt to
update it going on[1].

I could believe that a bunch of the benchmarks are kinda biased because
of this.

> * Is the cost of JS to Rust call measured for normal functions and for
> functions with many arguments? is it compared to bindings implemented in
> Firefox/Chrome? When I worked on the standalone VM engine area the binding
> JS/C++ binding code had a lot of impact.

I remember people opening bugs about the overhead of function calls into
JS, and a bunch of fixes a while ago. That's probably going to change
substantially with the JS engine update though, so you should probably
talk with Nick or Josh about it.

> Long post... Thanks for reading!

Thanks for writing it, this is exciting stuff!

 -- Emilio
Re: New Servo WebGL Architecture Proposal (benchmarks included!) Emilio Cobos Álvarez 7/22/17 2:20 PM


On 07/22/2017 11:14 PM, Imanol Fernández wrote:
> Hi Emilio!
>
>  
>
>     Yeah, I'm to blame for this one... This was one of my very first
>     contributions to Servo and I followed what Canvas2D did (which is
>     equally broken), and I never got around to clean it up.
>     Can't wait to see it going away!
>
>
> It's normal to follow a existing pattern in the code in your first
> contribution. Also things have changed a lot since you started: WR was
> not ready, you didn't have the external image API and more. So it's
> easier to make this kind of changes now. In any case I only mentioned
> the "bad" things. All the WebGL work you did in the DOM/script
> component, all the validations, error checking, conformance tests is
> great.  The same for the WebGLCommand enum and traits.  All that code is
> still there and WebGL will be imposible without all that work ;)
>
> Yes, Canvas2D is broken too. Any volunteers to fix that? ;)
>  
>
>     (and hopefully split it into a few commits? That way would make
>     reviewing it much easier... reviewing the WebVR landing was honestly
>     quite a pain, and I'm sure some things slipped through because of it was
>     a single massive commit)
>
>
> I know that the PR was big, sorry about that. Ok, I'll try to reduce the
> PR as much a as possible. Some of the webgl modes I mentioned are just
> tests to validate ideas and measure performance differences.
>
> Hopefully the review will be a lot easier this time. Github shows a lot
> of changes in the DOM objects but it's a repeated change (remove
> CanvasMsg wrapper and use the new sender method). WebGLCommands are the
> same ones as before. So a big number of lines is already reviewed (just
> moved to a new place) and almost all the modifications in the DOM follow
> the same pattern. The conformance tests will help to guarantee that we
> don't miss anything.
>
>     Yeah, I agree multiplexing contexts is the way to go.
>     You still need the Webrender API channel (which in the case of Servo is
>     another IPC channel), right?
>     If not, I'd be really curious about how have you managed to get rid
>     of it.
>
>
> There is a WebRenderAPI channel, but it's only used when a canvas is
> created or resized in order to send the required webrender_api::ImageKeys
>
> I was talking about channel steps per WebGL command. I considered that
> other channel insignificant for performance.

Oh, ok, yeah, makes sense.

>
>     Well, this is hard assuming you need to share textures cross-process,
>     right? If so, why would we want to do that? In general the content
>     process shouldn't have access to the GPU.
>     It'd be nice to have a small explanation about which commands are
>     executed in which process with your proposed architecture.
>
>
> That's only a fake multiprocess mode. I wanted to validate that we won't
> need to change the WebGL types used in parts of the code (constellation,
> pipelinestate, script thread, dom) in the long term. For example, we
> could make this mode to batch all the WebGLCommands and send them to a
> separate GPU process using shared memory (still using the same types in
> all the code). I also used it to test the overhead difference between
> ipc and mpsc.
>
>
>     I don't love to have less error-checking in general, though I understand
>     the point... In any case, this sounds like the kind of standalone
>     feature that would be nice to have in its own PR (or commit)...
>     Also, is there any use case for this for now? I think if we're going to
>     introduce a cargo feature for "Servo as electron-like back-end"
>     officially, we should do so separately, and add tests for this.
>
>
> Right now it's also another test to validate the types used across all
> the parts of the code and measure performace. This mode it's a long term
> idea to squeeze performance.  It doesn't need to land now. I can do it
> in a separate PR if we decide to use it.
>
> We want to talk with the "Synchro" team about all the packaging
> requirements that we need to publish WebGL/WebVR content in the stores
> (Google Play, Steam, App Store ...) ;)
>
>     Right now the SM version is quite old, though there's an attempt to
>     update it going on[1].
>     I could believe that a bunch of the benchmarks are kinda biased because
>     of this.
>
>     I remember people opening bugs about the overhead of function calls into
>     JS, and a bunch of fixes a while ago. That's probably going to change
>     substantially with the JS engine update though, so you should probably
>     talk with Nick or Josh about it.
>
>
> That's great news (in the sense of margin for improvement). I remember
> having nice performance boost when updating VM engines in the past ;)

Yeah, also I could believe that there are a bunch of easy wins to be had
re. bindings performance.

Btw, the link that was supposed to be [1] in my previous email was
<https://github.com/servo/mozjs/pull/119>.

> On 22 July 2017 at 19:35, Emilio Cobos Álvarez <emi...@crisal.io
>     <https://github.com/servo/webrender/issues/607>)
>     >
>     > * Currently there are two ipc-channels steps for each WebGL-call to hit the
>     > driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of
>     > overhead.
>     >
>     > * Slow compilation times: Modifying a WebGL command code recompiles
>     > WebRender_traits which causes additional component recompilations in Servo.
>     >
>     > * Flickering and shyncronization problems with the WR compositor (I created
>     > this issue some time ago: https://github.com/servo/servo/issues/14235
>     <https://github.com/servo/servo/issues/14235>)> Goals that I want to
>     achieve in the new WebGL architecture design:
>     >
>     > * Make WebGL development cycle easier.
>     > * Minimize the render path overhead for WebGL commands.
>     > * Flexible architecture for multiprocess and inprocess scenarios.
>     > * A special mode for packaging WebGL applications (not enabled by default,
>     > more details later)
>     > * Faster compilation times
>     > * Proper shyncronization to avoid flickering issues
>     > * Compatibility with the WebVR render path
>     >
>     > Now I'm going into some detail of how I tried to best achieve each stated
>     > goal. You can check the "almost ready for PR" source code here:
>     > https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf
>     <https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf>
>     > (still have to add more docs, do a rebase and clean some tests)
>
>     (and hopefully split it into a few commits? That way would make
>     reviewing it much easier... reviewing the WebVR landing was honestly
>     quite a pain, and I'm sure some things slipped through because of it was
>     a single massive commit)
>
>     > Make WebGL development cycle easier.
>     > -------------------------------------------------------
>     >
>     > This is fixed by splitting the WebGL code into it's own component and
>     > removing all WebGL code from WebRender. Additional benefits for WebRender
>     > (see https://github.com/servo/webrender/issues/1353
>     <https://github.com/servo/webrender/issues/1353>):
>     <https://github.com/servo/servo/pull/17311>)
Re: New Servo WebGL Architecture Proposal (benchmarks included!) Nicolas Silva 7/23/17 8:45 AM
Hi, this plan looks like a great improvement over the current state of
webgl in servo.

> * WebGL commands are currently run in the WR backend thread which could
> be
> bad for UI latency (see https://github.com/servo/webrender/issues/607)

Yeah, thanks for getting rid of this!

> The connection between a WebGL canvas and WR is implemented via the WR
> ExternalImage APIs as Glennw recommended.

Great!

> The new architecture creates a single thread/process to manage all WebGL
> commands from multiple canvas sources. This will reduce the footprint for
> creating a new canvas (avoid creating specific thread + ipc-channel) and
> reduces the render-path for each WebGL command by using a single "channel
> step"

Cool, the thread-per-canvas thing is insane.

> we may be interested in
> batching WebGLCommands, using shared memory, or other channel approaches
> to improve performance.

You probably already have, but I suggest having a look at what chrome's
doing in this are (batching up commands to send to their GPU process
instead of sending each command)


> * Immediate WebGL calls to a GL context living in the Script thread (I
> did
> some benchmarks with this enabled (sometimes it's faster running the gl
> call than sending it) . I'd like to focus on a thread bases render path
> though in order to parallelize JS code better)

In-script-thread webgl is also good for latency. Currently Chrome's VR
implementation has an extra frame of latency compared to Firefox (which
I believe is due to the gl remoting) and it is very noticeable in VR.


> * WebGL double buffering: It may be a good idea to improve performance
> and
> synchronization (e.g. ping pong WebGL render textures each frame, so WR
> uses a texture to composite the main context while the WebGL frame is
> rendering the next frame to a different texture). This may add some
> memory
> overhead. We can implement some texture pooling (I think that Firefox
> does
> that)

Yes, Firefox does that.


> * How does the SpiderMonkey version used in Servo compare to V8?

SpiderMonkey is doing particularly well on the games type of things
compared to V8 (for various good and bad reasons). There's also the
better support for asmjs.
More generally, I'd rather avoid picking the competitor's tech instead
of ours unless there is a very strong incentive because of the PR
foot-gun that it generates.


Cheers,

Nical
Re: New Servo WebGL Architecture Proposal (benchmarks included!) Imanol Fernández 7/23/17 8:45 AM
> > * WebGL commands are currently run in the WR backend thread which could
> be
> > bad for UI latency (see https://github.com/servo/webrender/issues/607)
> >
> > * Currently there are two ipc-channels steps for each WebGL-call to hit
> the
> > driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of
> > overhead.
> >
> > * Slow compilation times: Modifying a WebGL command code recompiles
> > WebRender_traits which causes additional component recompilations in
> Servo.
> >
> > * Flickering and shyncronization problems with the WR compositor (I
> created
> > this issue some time ago: https://github.com/servo/servo/issues/14235)>
> Goals that I want to achieve in the new WebGL architecture design:
> >
> > * Make WebGL development cycle easier.
> > * Minimize the render path overhead for WebGL commands.
> > * Flexible architecture for multiprocess and inprocess scenarios.
> > * A special mode for packaging WebGL applications (not enabled by
> default,
> > more details later)
> > * Faster compilation times
> > * Proper shyncronization to avoid flickering issues
> > * Compatibility with the WebVR render path
> >
> > Now I'm going into some detail of how I tried to best achieve each stated
> > goal. You can check the "almost ready for PR" source code here:
> > https://github.com/MortimerGoro/servo/commit/
> c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf
> > (still have to add more docs, do a rebase and clean some tests)
>
> (and hopefully split it into a few commits? That way would make
> reviewing it much easier... reviewing the WebVR landing was honestly
> quite a pain, and I'm sure some things slipped through because of it was
> a single massive commit)
>
> > Make WebGL development cycle easier.
> > -------------------------------------------------------
> >
> > This is fixed by splitting the WebGL code into it's own component and
> > removing all WebGL code from WebRender. Additional benefits for WebRender
> > (see https://github.com/servo/webrender/issues/1353):
> >
> > * Simplify the internals of  resource cache by removing the GL context
> > management code.
> > * Remove the WebGL cargo feature, since not all clients use it.
> >
> > I opted to move the WebGL code into a Servo component. It eases
> > development, testability and I don't think that there are use cases to
> use
> > the component outside Servo. Anyway, it's a low coupled component, so it
> > will be easy to move if required but I don't think that it's worth it.
> >
> > The connection between a WebGL canvas and WR is implemented via the WR
> > ExternalImage APIs as Glennw recommended.>
> > When the new Architecture Lands in servo, I'll make a separate PR to
> > removes all the unneeded WebGL management code in WR.
> >
> > Minimize the render path overhead for WebGL commands
> > ------------------------------------------------------------
> ------------------
> >
> > The new architecture creates a single thread/process to manage all WebGL
> > commands from multiple canvas sources. This will reduce the footprint for
> > creating a new canvas (avoid creating specific thread + ipc-channel) and
> > reduces the render-path for each WebGL command by using a single "channel
> > step"
>
> Yeah, I agree multiplexing contexts is the way to go.
>
> You still need the Webrender API channel (which in the case of Servo is
> another IPC channel), right?
>
> If not, I'd be really curious about how have you managed to get rid of it.
>
> > Flexible architecture for multiprocess and inprocess scenarios
> > ------------------------------------------------------------
> ----------------------
> >
> > One of the things I don't like about the current implementation is that
> it
> > uses a IpcSender<WebGLCommand> instance as the entry point to send the
> > WebGL commands from the Script Thread to the renderer in many parts of
> the
> > source code. IMO this is not very flexible because we may be interested
> in
> > batching WebGLCommands, using shared memory, or other channel approaches
> to
> > improve performance.
> >
> > * Immediate WebGL calls to a GL context living in the Script thread (I
> did
> > some benchmarks with this enabled (sometimes it's faster running the gl
> > call than sending it) . I'd like to focus on a thread bases render path
> > though in order to parallelize JS code better)
> > https://github.com/servo/servo/pull/17311)
> >
> > Benchmarks
> > -----------------------------------------------------------------
> > Here are the benchmark results for the old WebGL implementation, the
> three
> > modes in the new Architecture (multiprocess, inprocess and packaging) and
> > Chrome/Firefox.
> >
> > I used a desktop linux to do the benchmarks (Ubuntu 16.04, GTX 1060,
> > Skylake i7 4GHz)
> >
> > I used the Three.js performance test with a fixed camera, and with 5K and
> > 10 3D objects: https://threejs.org/examples/webgl_performance.html
> >
> > Browser------------------Average FPS-----------Average-FrameTime
> >
> > Servo Old WebGL 5K: -------26 fps------------------37.31 ms
> > Servo Old WebGL 10K: ------13 fps------------------71.4  ms
> > Servo "multiprocess" 5K----43 fps------------------22.61 ms
> > Servo "multiprocess" 10K---21 fps------------------46.27 ms
> > Servo "inprocess" 5K-------56-60fps----------------17.13 ms
> > Servo "inprocess" 10K------28-30fps----------------33.5 ms
> > Servo "packaging" 5K-------60fps-------------------15.83 ms
> > Servo "packaging" 10K------32fps-------------------30.92 ms
> >
> > Firefox 5K------------------42-45fps---------------16.98 ms
> > Firefox 10K-----------------25fps------------------34.01 ms
> >
> > Chrome 5K-------------------57-60fps---------------11.5 ms
> > Chrome 10K------------------42-46ps----------------22.5 ms
> >
> >
> > Future Steps
> > -----------------------------------------------------------------
> > There are other ideas that I wanted to implement but I'll do that in
> later
> > PRs (or contributions in this areas are welcomed too!)
> >
> > * WebGL double buffering: It may be a good idea to improve performance
> and
> > synchronization (e.g. ping pong WebGL render textures each frame, so WR
> > uses a texture to composite the main context while the WebGL frame is
> > rendering the next frame to a different texture). This may add some
> memory
> > overhead. We can implement some texture pooling (I think that Firefox
> does
> > that)
> >
> > * Test/benchmark different spsc channels or message batching or shared
> > memory in order to improve frame times. We want to be fastest ones in the
> > benchmarks!
> >
> > * WebGL 2.0 branch (My idea is to create the layer that reuses some code
> > with WebGL 1.0 implementation and open a issue will a list of TODOs in
> > WebGL 2.0 so contributors can start implementing features step by step).
> >
> > I have some additional questions about Servo that I think may affect the
> > benchmark performances:
> >
> > * How does the SpiderMonkey version used in Servo compare to V8?
>
Re: New Servo WebGL Architecture Proposal (benchmarks included!) Glenn Watson 7/23/17 3:17 PM
Hi,

This all sounds very good from a WR perspective, exciting!

I'm glad to hear the synchronization objects in the external image
callback APIs work well, that's great.

I also agree that the remaining flickering is a Servo bug - we shouldn't
be doing extra composites when not required just because the mouse
moved. We need to fix this at some point, since it causes other issues too.

Look forward to seeing the PR and getting this merged! :)

Cheers
> * WebGL commands are currently run in the WR backend thread which could be
> bad for UI latency (see https://github.com/servo/webrender/issues/607)
>
> * Currently there are two ipc-channels steps for each WebGL-call to hit the
> driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of
> overhead.
>
> * Slow compilation times: Modifying a WebGL command code recompiles
> WebRender_traits which causes additional component recompilations in Servo.
>
> * Flickering and shyncronization problems with the WR compositor (I created
> this issue some time ago: https://github.com/servo/servo/issues/14235)
>
> Goals that I want to achieve in the new WebGL architecture design:
>
> * Make WebGL development cycle easier.
> * Minimize the render path overhead for WebGL commands.
> * Flexible architecture for multiprocess and inprocess scenarios.
> * A special mode for packaging WebGL applications (not enabled by default,
> more details later)
> * Faster compilation times
> * Proper shyncronization to avoid flickering issues
> * Compatibility with the WebVR render path
>
> Now I'm going into some detail of how I tried to best achieve each stated
> goal. You can check the "almost ready for PR" source code here:
> https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf
> (still have to add more docs, do a rebase and clean some tests)
>
> Make WebGL development cycle easier.
> -------------------------------------------------------
>
> This is fixed by splitting the WebGL code into it's own component and
> removing all WebGL code from WebRender. Additional benefits for WebRender
> (see https://github.com/servo/webrender/issues/1353):
>
> * Simplify the internals of  resource cache by removing the GL context
> management code.
> * Remove the WebGL cargo feature, since not all clients use it.
>
> I opted to move the WebGL code into a Servo component. It eases
> development, testability and I don't think that there are use cases to use
> the component outside Servo. Anyway, it's a low coupled component, so it
> will be easy to move if required but I don't think that it's worth it.
>
> The connection between a WebGL canvas and WR is implemented via the WR
> ExternalImage APIs as Glennw recommended.
>
> When the new Architecture Lands in servo, I'll make a separate PR to
> removes all the unneeded WebGL management code in WR.
>
> Minimize the render path overhead for WebGL commands
> ------------------------------------------------------------------------------
>
> The new architecture creates a single thread/process to manage all WebGL
> commands from multiple canvas sources. This will reduce the footprint for
> creating a new canvas (avoid creating specific thread + ipc-channel) and
> reduces the render-path for each WebGL command by using a single "channel
> step"
>
> * Is the cost of JS to Rust call measured for normal functions and for
> functions with many arguments? is it compared to bindings implemented in
> Firefox/Chrome? When I worked on the standalone VM engine area the binding
> JS/C++ binding code had a lot of impact.
>
> Long post... Thanks for reading!
>