Multiple processes

rocal...@gmail.com

unread,

Jan 27, 2009, 9:00:18 PM1/27/09

to

We don't want to be the last browser that allows content JS execution
to block the browser UI. Script timeouts prevent catastrophic loss of
control, but UI responsiveness is killed long before they fire. I'm
particularly worried about this problem because if most browsers work
OK with pages with scripts that don't yield enough, then developers
won't yield enough, since it's annoying to code and hurts performance.
So our behaviour will get increasingly worse. (And if you're really
paranoid, it would be very easy for people to intentionally game us
with plausible deniability.)

One "obvious" way to fix this is to support running chrome and content
on separate threads. Once we're doing that, IMHO we may as well be
using multiple processes, making memory "private by default" instead
of "shared by default" for easier programming, and also capturing
other benefits of separation too, such as security and robustness
benefits if we can sandbox the content processes.

There are other ways we could try to fix the UI responsiveness, such
as Opera-style interruption of a running JS context, but IMHO that
would introduce many of the same difficulties without capturing the
benefits of parallelism, security or robustness.

Multiprocess is obviously going to be hard. A major concern was always
going to be how to support our existing extension model, which assumes
chrome JS can synchronously interact with content DOMs. I think we can
support it; there's already been a discussion about how that could
work and we can work out more details in another forum.

One way we could start making progress would be to make a naive
separation where <browser> tabs run slave processes embedding Gecko,
using separate profiles and very limited chrome-content interaction
--- much like IETab. Then we can start proceeding in several
directions simultaneously:
-- supporting deeper chrome access to content, including extensions
-- sharing profile components --- cache, preferences, history, etc
-- sandboxing
-- optimizing Gecko for increased cross-process resource sharing

I imagine that this work would happen on trunk, guarded by a run-time
mode switch, initially with a simple XULrunner application as the
testbed. Hopefully quite early on we could run reftests and mochitests
in a parallel mode, which would be interesting and useful for testing
and in its own right.

Rob

Jim Blandy

unread,

Jan 28, 2009, 12:59:18 PM1/28/09

to rocal...@gmail.com

For what it's worth, there is a middle way between multi-process and
multi-threading.

What makes multi-threaded development so excruciating is not so much
cross-thread memory writes per se; it's that it's hard to predict when
they'll happen, and hard to reproduce a particular order in testing.
Single-threaded code stomps on memory it shouldn't all the time, and
while such bugs are annoying, they're nothing compared to a stomp that
depends on a race condition.

But the characteristic of threads that we need to keep the UI responsive
isn't actually the parallelism. Rather, it's the ability to have
multiple independent C stacks live at the same time. That is, if we can
be eleventeen C++ calls deep in content JavaScript, but then suspend
that to run chrome code in response to a user event, and then when
that's done resume our eleventeen-deep execution, that's good enough.
There's no need to have the content and chrome actually be preemptively
scheduled or run in parallel.

So, if we use one thread for each tab (not sure about my terminology
here), and one thread for UI code, but then switch between them in a
coroutine-like way that ensures that, when thread A activates thread B,
thread A always becomes blocked --- until thread B (or someone else)
passes the baton back to it.

The C++ code running content threads would need to periodically check if
it should yield to chrome code, but this would be a single function
call, that simply returns when the UI may run again: there's no need to
unwind the stack and save all the state to be used when the UI is resumed.

Mozilla's behavior would still be entirely deterministic, because there
would be no preemptive scheduling or parallel execution on multiple
cores taking place. Only one thread would run at a time, and switches
would be deterministic. All memory writes would happen as predictably
and as reproducibly as they do in a single-threaded program.

This arrangement is not hard to enforce, if you write the primitives for
creating a new thread and handing the baton from one thread to another
properly. Here's an API sketch. If it's not clear how these would
work, I could do a quick-and-dirty implementation to make everything
explicit. (Forgive me for using POSIX threads; it's what I'm familiar
with at the moment. And of course, C++ could type all this more
accurately.)

Of course, this does nothing to match the other advantages of a
separate-process-per-tab architecture. My thought was simply that it
could address the UI responsiveness issue with much less effort.

/* Create a new coroutine thread running WORKER. The new thread is
initially blocked. Call coroutine_pass to activate it; WORKER will
receive the MESSAGE given to coroutine_pass as its argument. */
pthread_t coroutine_create(void (*worker) (void *));

/* Pass control to coroutine thread RECIPIENT, passing MESSAGE.
This blocks the calling thread; the call returns only when some
other coroutine calls coroutine_pass, specifying us as the
RECIPIENT; our return value will be the MESSAGE they pass. */
void *coroutine_pass (pthread_t recipient, void *message);

/* Like coroutine_pass, but terminate the calling thread. */
void *coroutine_exit (pthread_t recipient, void *message);

Robert O'Callahan

unread,

Feb 4, 2009, 9:43:44 PM2/4/09

to

On 31/1/09 11:30 AM, Vladimir Vukicevic wrote:
> Given that we really want to do hardware accelerated rendering so that
> things like SVG, video, etc. stop being dog-slow, we'll have to figure
> out a solution that can accommodate this. I don't see any chance of us
> being able to do, say, native transformed 720p or 1080p video playback
> in the browser without direct hardware accelerated rendering.
>
> One possibility, that I'm not very excited about but might end up being
> what we have to do, is to do essentially what X does -- the process
> forwards rendering commands (GLX-style) to the UI process, which
> executes them, instead of passing rendered bitmaps around. I really,
> really don't want to do that, though.

Having control over both ends of the pipe and being able to upgrade them
simultaneously, at will, would be a big step up over X. Having a hard
dependency on shared memory would also be a big step up.

The only alternative, really, is support for shared surfaces on the
major platforms, and really well tested drivers on the major platforms.
I'm not sure how practical that is.

Rob

Robert O'Callahan

unread,

Feb 4, 2009, 9:49:42 PM2/4/09

to

On 29/1/09 6:59 AM, Jim Blandy wrote:
> But the characteristic of threads that we need to keep the UI responsive
> isn't actually the parallelism. Rather, it's the ability to have
> multiple independent C stacks live at the same time. That is, if we can
> be eleventeen C++ calls deep in content JavaScript, but then suspend
> that to run chrome code in response to a user event, and then when
> that's done resume our eleventeen-deep execution, that's good enough.
> There's no need to have the content and chrome actually be preemptively
> scheduled or run in parallel.
>
> So, if we use one thread for each tab (not sure about my terminology
> here), and one thread for UI code, but then switch between them in a
> coroutine-like way that ensures that, when thread A activates thread B,
> thread A always becomes blocked --- until thread B (or someone else)
> passes the baton back to it.

So that would work, but it gets hairy when chrome does something that
needs to reenter content script. For example, the user might do
shift-reload and then we need to fire beforeunload at the content
script, but if there's a script already running for that window, we have
to block chrome, switch back to that script, run it to completion doing
who knows what else while chrome waits, possibly reenter chrome along
the way, ...

Mostly these are problems we'd also have to tackle with multiple
processes, but the point is that even cooperative threading presents a
lot of difficulties. And it doesn't lead to as good a place because it
lacks the other benefits of multiple processes.

Rob

Robert O'Callahan

unread,

Feb 4, 2009, 9:53:14 PM2/4/09

to

On 30/1/09 5:27 AM, Benjamin Smedberg wrote:
> I'm interested in tackling a basic testbed. I have a few questions on how
> much should be tackled at first:
>
> I think I could do this without modifying gecko at all by writing a plugin.
> Do you think this makes sense? Will a windowed plugin work with tab switching?

I don't know if that really makes sense since the first thing we'd have
to do next is break though the plugin API to talk to the slaves through
a wider interface.

A windowed plugin would work with tab switching.

> This has the advantage that I don't have to communicate any graphics data
> across the pipe, just control data. All networking would initially be
> handled by the child process, and for the moment I'd just run it without a
> profile. This means HTTPS wouldn't work, but for an initial experiment
> that's ok.

I agree that cross-process HWNDs would be a good first step. This should
be pretty easy to get working on X too.

Rob

John J. Barton

unread,

Feb 4, 2009, 10:58:36 PM2/4/09

to

Robert O'Callahan wrote:
> On 29/1/09 6:59 AM, Jim Blandy wrote:

... ...

>
> Mostly these are problems we'd also have to tackle with multiple
> processes, but the point is that even cooperative threading presents a
> lot of difficulties. And it doesn't lead to as good a place because it
> lacks the other benefits of multiple processes.

But what about the down sides of multiple processes? Cooperation between
multiple processes has proven to be as difficult as isolation has been
for threading. Assuming one is going to solve these difficulties as
well as anyone ever has, the question really comes down to whether one
wants to create single thing vulnerable to a weak link or a multi-thing
without a center. The core/chrome/extension strength of mozilla is not,
I think, well served by an architecture of glued processes. Gluing IE
windows together adds value because they don't have anything to lose.

jjb

Question

unread,

Feb 5, 2009, 12:03:26 AM2/5/09

to Vladimir Vukicevic, dev-pl...@lists.mozilla.org

This is actually directly at odds with any plans for hardware accelerated
rendering. We can probably do it as long as we expose the final HWND or the
X Drawable to the separate process and have it draw directly; I don't know
what we'd do on OSX, but we can maybe do some fun things with OpenGL context
sharing. But if you have that HWND, then you're breaking the process
isolation somewhat.

>
> Given that we really want to do hardware accelerated rendering so that
> things like SVG, video, etc. stop being dog-slow, we'll have to figure out a
> solution that can accommodate this. I don't see any chance of us being able
> to do, say, native transformed 720p or 1080p video playback in the browser
> without direct hardware accelerated rendering.
>
> One possibility, that I'm not very excited about but might end up being
> what we have to do, is to do essentially what X does -- the process forwards
> rendering commands (GLX-style) to the UI process, which executes them,
> instead of passing rendered bitmaps around. I really, really don't want to
> do that, though.
>

> - Vlad

Opera's vector graphics library Vega almost have a hardware accelerated
back-end (OpenGL and Direct3D). So I think the HW acceleration, XPConnect
remove and advancing JS spped would be the P1 issues.

--
>: ~

dbradley

unread,

Feb 5, 2009, 10:04:22 AM2/5/09

to

Separating out the UI into it's own process would open up some
interesting opportunities. For small devices, the UI process could run
on the device while some server runs the heavy lifting DOM and other
code.

This seems very similar to what an OS does and would have a lot of the
same issues. I know Windows used to have the UI code all at the system
level, but they had to move it out due to the transitions it incurred.
How chatty is our UI API and is there ways we could optimize it.

I've witnessed sluggish responsiveness from apps using similar
architecture when system is under load. Noticeably more than
conventional applications. I'm not sure how these apps communicate to
the UI, but it appears, however they are doing it, they suffer when
the system is experiencing load. Just something to consider.

Logically if the UI was segregated you could opt to run it out of
process or in process. Once you have established the communication
interface whether it's in another thread, another process, another
process on some other machine, shouldn't make a difference. I think
the big win is enforcing the separation. How it's physically done is
less important.

The core UI implementation wouldn't need XPCOM or anything, since
communication to it would be over this new interface. I haven't looked
at that code in a long time and I know a lot of stuff has been decom'd
so might not be that big of a gain there. On the back end you'd have
XPCOM interfaces provided so that JS and XPCOM can communicate to the
UI.

The downside is that if you're going to run multiple processes on the
backend, for stability, you'll have multiple JSEngines running which
is no small thing.

David

Christopher Blizzard

unread,

Feb 9, 2009, 3:26:11 PM2/9/09

to dbradley, dev-pl...@lists.mozilla.org

So let me ask this question: there are lots of ideas floating around
here. What's the forum, timeline and owner to drive the discussion to
the point where we're making decisions or figuring out where we need
more data and/or experiments? Who's going to own and lead here?

Robert O'Callahan

unread,

Feb 9, 2009, 3:38:03 PM2/9/09

to

I think it can be adequately served by making the slave processes
rendezvous with the master process and using RPC.

Rob

John J Barton

unread,

Feb 9, 2009, 4:52:47 PM2/9/09

to

I think investigating the successes and failures of that approach and
comparing them to the architecture in mozilla would be worthwhile.

Presumably rendezvous really means posting events to slave process so
they pick them up in normal event queue processing. Presumably RPC means
xpcom. So one effect would be to undo the ongoing shift from
xpcom-based extensions to wrappedJSObject extensions. Or perhaps
extensions would divide in to in-process and cross-process thingys.
(that's a new term we have to invent now that processes, thread,
components, modules, elements, and nodes are taken).

jjb

Robert O'Callahan

unread,

Feb 10, 2009, 7:20:10 AM2/10/09

to

On 10/2/09 10:52 AM, John J Barton wrote:
> Presumably rendezvous really means posting events to slave process so
> they pick them up in normal event queue processing. Presumably RPC means
> xpcom. So one effect would be to undo the ongoing shift from xpcom-based
> extensions to wrappedJSObject extensions.

No, the idea would be to transparently forward cross-process JS operations.

> Or perhaps extensions would
> divide in to in-process and cross-process thingys.

That shouldn't be necessary.

I didn't really want to do this now, but let me gesticulate wildly about
how it could work, perhaps making things more clear. Suppose an
extension does something like
var x = myBrowser.contentDocument.body.textContent;

1) myBrowser.contentDocument creates a proxy object representing the
slave process's document.
2) contentDocument.body is invoked on the proxy object.
a) Block the master process while we wait for the document's slave
process to finish whatever it's currently doing (i.e., return to the
event loop, running all its JS to completion)
b) "Lock" the slave process so that for now, it will only service the
master's requests and not run its own JS or other events
c) Forward a call to .body on the slave's document, and create a
proxy in the master for the resulting object
3) body.textContent is invoked on that proxy
a) Forward a call to .textContent on the slave, and pass the
resulting string back to the master
4) When the extension's script has run to completion, then and only then
unlock the slave process.

This locking protocol preserves the "run to completion" semantics of
content and chrome scripts that we have today, but it allows content
script to run concurrently with chrome as long as chrome isn't looking!

Now, there are some issues, such as:

-- Step 2a could time out if the slave is hung or compromised. So in
some cases we would have to throw an exception, which could break
extensions that don't handle it. Same goes for any cross-process access.
Too bad, but I think we'd deal.

-- Step 2a could block chrome for a short time, hurting responsiveness.
There are a few ways to mitigate that:
a) when we run a chrome event handler due to the firing of an event
in a slave (content) process, lock the slave and don't actually fire the
event until it's locked. Then chrome will not have to block on that
slave, providing the slave is well-behaved.
b) add a setTimeout variant that takes a contentWindow parameter and
pre-locks the slave before running the handler. This would also be a
convenient way to schedule asynchronous operations on content without
blocking chrome.

-- Managing cross-process object references is tricky. I think in
general we don't want to allow content to hold references to chrome,
which makes it easier. The master process has a set of proxy objects,
and each proxy object roots its associated object in the slave process.

-- You should be able to do something like contentWindow.eval("...") to
construct code and data in the slave and run code there. You won't be
able to trust its results, of course.

-- Doing a lot of cross-process calls could get slow. One could use the
eval trick to reduce the number of context switches, or we could use
automatic techniques to package a chain of operations into a single call.

Rob