Making all Chrome subprocesses use zygote on Linux

783 views
Skip to first unread message

Paweł Hajdan, Jr.

unread,
Oct 1, 2013, 6:31:41 PM10/1/13
to chromium-dev
Do you have any advice for making all Chrome subprocesses (like utility process, service, anything else that executes the "chrome" binary) use the zygote on Linux?

This is https://code.google.com/p/chromium/issues/detail?id=22703 and I'd need to fix it before enabling side-by-side packages on Linux. This also affects current Chrome Linux anyway, see below.

Selected quotes from that 2009 bug:

The more I think about this, the more I'm convinced that this needs to be fixed before 
launch. (John Abd-El-Malek)

We don't want to ever go out to disk when looking for data after
startup, since they can be changed by an update. (Evan Martin)

Have the utility process run out of process on Linux again by
using the /proc/self/exe trick we use for plugins.  Since we don't
need any resources from .pak files, this should be safe. (Tony Chang)

And the above seems to no longer be true - which shouldn't be surprising since the comment is from 2009 and the correct behaviour is pretty hard to test for in an automated manner.

My conclusion is that current behaviour is definitely not correct, it was considered serious enough to possibly block the launch on Linux, and it would be great to finally fix that.

Some specific questions:

1. So should I add a non-sandboxed zygote host for processes that do not run sandboxed?
1b. How can I easily see which of the current Chrome subprocesses are SUID-sandboxed and which are not?

2. Do we have a good way to prevent future breakages caused by people forgetting to launch a Chrome subprocess through zygote?

Paweł

On Tue, Sep 24, 2013 at 2:00 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
I'm working on https://code.google.com/p/chromium/issues/detail?id=295103 , which is related to Linux SxS (side-by-side) migration, but also affects current Chrome Linux packages but in a more subtle way.

The problem is that ResourceBundle lazily loads resources from disk when requested. The resources might have been removed from under the browser (as is the case with above bug), but it's also possible even without SxS that the package manager has done an upgrade and the files on disk are out of sync with running version of Chrome.

One easy way to deal with this problem is to say it's inevitable. It's indeed non-trivial to fix since even if we load all the resources at startup (which is also not that easy since the list of resources to load needs to be complete and stay accurate; also it'd likely regress startup performance) there'd still be a small window of time between chrome binary being loaded into memory and resource files being loaded from disk during which the resource files on disk may become out of sync with the running binary.

One solution to this problem would be to bake the resources into chrome binary. This might also regress startup performance, but generally it should result in lazy loading of resources and should guarantee things staying in sync.

What are your opinions, thoughts and recommendations about this? Please let me know if I should explain the problem better.

Pawel

Antoine Labour

unread,
Oct 1, 2013, 7:13:29 PM10/1/13
to Paweł Hajdan, Jr., chromium-dev
On Tue, Oct 1, 2013 at 3:31 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Do you have any advice for making all Chrome subprocesses (like utility process, service, anything else that executes the "chrome" binary) use the zygote on Linux?

This is https://code.google.com/p/chromium/issues/detail?id=22703 and I'd need to fix it before enabling side-by-side packages on Linux. This also affects current Chrome Linux anyway, see below.

Selected quotes from that 2009 bug:

The more I think about this, the more I'm convinced that this needs to be fixed before 
launch. (John Abd-El-Malek)

We don't want to ever go out to disk when looking for data after
startup, since they can be changed by an update. (Evan Martin)

Have the utility process run out of process on Linux again by
using the /proc/self/exe trick we use for plugins.  Since we don't
need any resources from .pak files, this should be safe. (Tony Chang)

And the above seems to no longer be true - which shouldn't be surprising since the comment is from 2009 and the correct behaviour is pretty hard to test for in an automated manner.

My conclusion is that current behaviour is definitely not correct, it was considered serious enough to possibly block the launch on Linux, and it would be great to finally fix that.

Some specific questions:

1. So should I add a non-sandboxed zygote host for processes that do not run sandboxed?

The zygote is what currently sets up the setuid sandbox, so you'd need a different one for processes that don't want the setuid sandbox (GPU process, NPAPI plugin processes, others?).
 
1b. How can I easily see which of the current Chrome subprocesses are SUID-sandboxed and which are not?

They should be 1:1 with the ones that use the zygote. So I suppose, audit callers of BrowserChildProcessHost::Launch?

2. Do we have a good way to prevent future breakages caused by people forgetting to launch a Chrome subprocess through zygote?

Remove the use_zygote bool from BrowserChildProcessHost::Launch.

Antoine


Paweł

On Tue, Sep 24, 2013 at 2:00 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
I'm working on https://code.google.com/p/chromium/issues/detail?id=295103 , which is related to Linux SxS (side-by-side) migration, but also affects current Chrome Linux packages but in a more subtle way.

The problem is that ResourceBundle lazily loads resources from disk when requested. The resources might have been removed from under the browser (as is the case with above bug), but it's also possible even without SxS that the package manager has done an upgrade and the files on disk are out of sync with running version of Chrome.

One easy way to deal with this problem is to say it's inevitable. It's indeed non-trivial to fix since even if we load all the resources at startup (which is also not that easy since the list of resources to load needs to be complete and stay accurate; also it'd likely regress startup performance) there'd still be a small window of time between chrome binary being loaded into memory and resource files being loaded from disk during which the resource files on disk may become out of sync with the running binary.

One solution to this problem would be to bake the resources into chrome binary. This might also regress startup performance, but generally it should result in lazy loading of resources and should guarantee things staying in sync.

What are your opinions, thoughts and recommendations about this? Please let me know if I should explain the problem better.

Pawel

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Paweł Hajdan, Jr.

unread,
Oct 11, 2013, 1:55:53 PM10/11/13
to Antoine Labour, Julien Tinnes, Chris Evans, chromium-dev
On Tue, Oct 1, 2013 at 4:13 PM, Antoine Labour <pi...@google.com> wrote:
On Tue, Oct 1, 2013 at 3:31 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
1. So should I add a non-sandboxed zygote host for processes that do not run sandboxed?

The zygote is what currently sets up the setuid sandbox, so you'd need a different one for processes that don't want the setuid sandbox (GPU process, NPAPI plugin processes, others?).

Looks like there are multiple. Do we have a reasonable consensus that for now we need another unsandboxed zygote process on Linux?
 
1b. How can I easily see which of the current Chrome subprocesses are SUID-sandboxed and which are not?

They should be 1:1 with the ones that use the zygote. So I suppose, audit callers of BrowserChildProcessHost::Launch?

Indeed, everything that uses zygote is also sandboxed and vice versa.
 
2. Do we have a good way to prevent future breakages caused by people forgetting to launch a Chrome subprocess through zygote?

Remove the use_zygote bool from BrowserChildProcessHost::Launch.

I added DCHECK(use_zygote) there and quickly got the following stack trace:

[24424:24449:1010/172820:FATAL:browser_child_process_host_impl.cc(141)] Check failed: use_zygote. 
 [0x7f4f85402298] base::debug::StackTrace::StackTrace()
 [0x7f4f8543f48b] logging::LogMessage::~LogMessage()
 [0x7f4f8869bd78] content::BrowserChildProcessHostImpl::Launch()
 [0x7f4f888ca5b0] content::UtilityProcessHostImpl::StartProcess()
 [0x7f4f888c9ac8] content::UtilityProcessHostImpl::Send()
 [0x7f4f88a31247] content::PluginLoaderPosix::LoadPluginsInternal()

Now I think PluginLoaderPosix is not so trivial to sandbox, since it accesses plugins on disk and tries to dynamically load them from the utility process.

My plan is to add an unsandboxed zygote process and make all child processes use a zygote: some of them sandboxed, some unsandboxed.

I'd like to get feedback about that design change on this list.

Paweł

David Turner

unread,
Oct 12, 2013, 2:39:28 AM10/12/13
to Antoine Labour, chromium-dev, Paweł Hajdan, Jr.


On Oct 2, 2013 1:13 AM, "Antoine Labour" <pi...@google.com> wrote:

>>
>> 1. So should I add a non-sandboxed zygote host for processes that do not run sandboxed?
>
>
> The zygote is what currently sets up the setuid sandbox, so you'd need a different one for processes that don't want the setuid sandbox (GPU process, NPAPI plugin processes, others?).
>  

Not knowing too much about the setuid sandbox details. What would prevent the zygote to set it up after the fork() and before handling control to the subprocess-specific code?

That's what happens with the Android Zygote (not the Chromium one which doesn't exist on this platform). This allows for several levels of sandboxing.

Paweł Hajdan, Jr.

unread,
Oct 15, 2013, 2:44:58 PM10/15/13
to David Turner, Antoine Labour, chromium-dev
On Fri, Oct 11, 2013 at 11:39 PM, David Turner <di...@chromium.org> wrote:

Not knowing too much about the setuid sandbox details. What would prevent the zygote to set it up after the fork() and before handling control to the subprocess-specific code?

Short answer is: it'd need to exec to execute the setuid sandbox, and that is not guaranteed to work since the sandbox binary on disk may have changed or moved (and in the scenario I'm fixing, it will get moved).

I've noticed there are not many responses to my threads about these zygote changes. I'd really prefer to hear important feedback about it here, and not when I spend some time working on a CL. Of course it's all fine if what I'd like to change here is fine for everybody. :)

Paweł

Antoine Labour

unread,
Oct 15, 2013, 5:09:10 PM10/15/13
to Paweł Hajdan, Jr., David Turner, chromium-dev
Not many people understand the zygote infrastructure and trade-offs, and many of those who do have left the project. So tag, you're it.

From my pov, I think using an unsandboxed zygote for plugin/gpu/utility is a good thing, with 2 potential caveats:
- startup time impact
- memory impact

Do you have any data on this?

Thanks,
Antoine


Paweł

Justin Schuh

unread,
Oct 16, 2013, 1:42:11 PM10/16/13
to Antoine Labour, Julien Tinnes, Paweł Hajdan, Jr., David Turner, chromium-dev
[+jln]

IIRC each zygote shares the same ASLR layout. So, there would appear to be definite security implications to sharing a zygote for every process. But I would defer to Julien's take on this.

-j


Antoine Labour

unread,
Oct 16, 2013, 1:46:51 PM10/16/13
to Justin Schuh, Julien Tinnes, Paweł Hajdan, Jr., David Turner, chromium-dev
On Wed, Oct 16, 2013 at 10:42 AM, Justin Schuh <jsc...@chromium.org> wrote:
[+jln]

IIRC each zygote shares the same ASLR layout. So, there would appear to be definite security implications to sharing a zygote for every process. But I would defer to Julien's take on this.

Since we only fork and don't exec (by definition), I'd assume that'd be the case indeed, however, it's already the case between the browser and the sandboxed zygote (i.e. renderers, pepper). It would bring the gpu process and NPAPI plugins into that same layout.

Antoine

Paweł Hajdan, Jr.

unread,
Oct 16, 2013, 2:58:18 PM10/16/13
to Chris Evans, Antoine Labour, Justin Schuh, Julien Tinnes, David Turner, chromium-dev
On Wed, Oct 16, 2013 at 11:53 AM, Chris Evans <cev...@google.com> wrote:
The fact that the GPU process currently has a different ASLR layout to renderer processes is considered a pretty useful property.

Are we talking about bringing the GPU under the same zygote as renderers or a brand new zygote for process types that do not live under the setuid sandbox?

The latter. As said in one of my posts above, some code that currently doesn't go through zygote would be non-trivial to sandbox, e.g. PluginLoaderPosix.

I'd like to address the immediate issue first, which is processes that don't go through zygote break upgrading Chrome on Linux, by adding a second, unsandboxed zygote.

After that, anyone would be free to work on moving processes from unsandboxed to the sandboxed zygote.

Note that for any concerns you might raise, should we sacrifice correctness (not breaking during upgrade) for an increased security benefit? We've recently made an emergency beta release because of this issue (Chrome broken on upgrade - it would refuse to load any web pages), so this is serious.

Paweł

Scott Hess

unread,
Oct 16, 2013, 3:11:24 PM10/16/13
to Paweł Hajdan, Jr., Chris Evans, Antoine Labour, Justin Schuh, Julien Tinnes, David Turner, chromium-dev
Crazy notion: bounce a request for the existing pinned resource fds off the current zygote (or even a derived renderer).

-scott

Chris Evans

unread,
Oct 16, 2013, 3:25:28 PM10/16/13
to Paweł Hajdan, Jr., Antoine Labour, Justin Schuh, Julien Tinnes, David Turner, chromium-dev
On Wed, Oct 16, 2013 at 11:58 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
On Wed, Oct 16, 2013 at 11:53 AM, Chris Evans <cev...@google.com> wrote:
The fact that the GPU process currently has a different ASLR layout to renderer processes is considered a pretty useful property.

Are we talking about bringing the GPU under the same zygote as renderers or a brand new zygote for process types that do not live under the setuid sandbox?

The latter. As said in one of my posts above, some code that currently doesn't go through zygote would be non-trivial to sandbox, e.g. PluginLoaderPosix.

I'd like to address the immediate issue first, which is processes that don't go through zygote break upgrading Chrome on Linux, by adding a second, unsandboxed zygote.

Well, this is nice, because IMHO it is security positive. Currently the browser and GPU processes share the same ASLR layout and it sounds like they would not if we had a new unsandboxed zygote.


After that, anyone would be free to work on moving processes from unsandboxed to the sandboxed zygote.

Note that for any concerns you might raise, should we sacrifice correctness (not breaking during upgrade) for an increased security benefit?

No, we should have both correctness and security.


Cheers
Chris

Antoine Labour

unread,
Oct 16, 2013, 3:53:47 PM10/16/13
to Chris Evans, Paweł Hajdan, Jr., Justin Schuh, Julien Tinnes, David Turner, chromium-dev
On Wed, Oct 16, 2013 at 12:25 PM, Chris Evans <cev...@chromium.org> wrote:
On Wed, Oct 16, 2013 at 11:58 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
On Wed, Oct 16, 2013 at 11:53 AM, Chris Evans <cev...@google.com> wrote:
The fact that the GPU process currently has a different ASLR layout to renderer processes is considered a pretty useful property.

Are we talking about bringing the GPU under the same zygote as renderers or a brand new zygote for process types that do not live under the setuid sandbox?

The latter. As said in one of my posts above, some code that currently doesn't go through zygote would be non-trivial to sandbox, e.g. PluginLoaderPosix.

I'd like to address the immediate issue first, which is processes that don't go through zygote break upgrading Chrome on Linux, by adding a second, unsandboxed zygote.

Well, this is nice, because IMHO it is security positive. Currently the browser and GPU processes share the same ASLR layout and it sounds like they would not if we had a new unsandboxed zygote.

I think that's the opposite.
Today we exec the GPU process, but with the unsandboxed zygote, we would only fork.
 


After that, anyone would be free to work on moving processes from unsandboxed to the sandboxed zygote.

Note that for any concerns you might raise, should we sacrifice correctness (not breaking during upgrade) for an increased security benefit?

No, we should have both correctness and security.

I think we can get away with forking and execing the unsandboxed zygote right at the start of chrome, and then load the resources, etc. immediately. This would, I think, satisfy the ASLR requirement - GPU would be different from the renderers and the browser, though it would be the same as utility and NPAPI.
The breakage window is very small, similar to the breakage window we have on the start of the browser process (updates could, in theory, happen between the start of the program and when we load the resources). I don't think it's worth agonizing over it.

However, it means we do load/map the resources twice, which may have a perf impact on startup.

If that's not ok, then maybe we should restrict the unsandboxed zygote to processes that actually need resources, i.e. the utility process. AFAIK neither the gpu process nor NPAPI need it.

Lastly, Scott's suggestion is good too.

Antoine

Julien Tinnes

unread,
Oct 16, 2013, 5:51:37 PM10/16/13
to Antoine Labour, Chris Evans, Paweł Hajdan, Jr., Justin Schuh, David Turner, chromium-dev, Mark Seaborn, a...@chromium.org, Adam Langley, Jorge Lucangeli Obes
(Oops, I had missed this thread!)

Both Mark and myself have been pondering adding new "sub-Zygotes".
It's something I wanted to consider this quarter, but I would like to
have a good sense of what the changes related to Mojo should be first.

Some of the current well-known problems are:
- Threads: anything that fork() cannot be threaded without jumping
through hoops. This creates a tremendous amount of complexity and
makes APIs such "GetTerminationStatus()" roughly impossible to
implement correctly on Linux (since they need to be blocking). To fix
this I've been adding the "known_dead" flag to a bunch of APIs which
is very awkward.

This is a difficult problem. CLONE_PARENT could be used to solve it
(the thing that actually does the fork() would be a child, but it
feels very hack-ish. See crbug.com/157458 or crbug.com/274827 for some
examples and explanations.

- The weird "ZygoteForkDelegate" interface that is used by NaCl when
it should really be its own separate Zygote. Currently a fork request
for NaCl goes to the "normal" Zygote which will route it to the NaCl
Zygote. Since both have to be monothreaded, this is especially
problematic. c.f. crbug.com/133453

- The Zygote was created partly because Chrome is updated "in-place"
on Linux. (And executing a new version of Chrome when starting a new
renderer process wouldn't work). However, this property has been lost
since there are too many "non Zygote" process types. Moreover, a lot
of the code that tries to be careful in re-executing /proc/pid/fd/X is
broken because base/ added a few 'readlink' in some of the high level
APIs which break the "same-inode" goal. Example: crbug.com/257149

When designing a new model for a Zygote, we need keep a few things in mind:

- It can't be "one Zygote". We need a few "model process" around. How
many is a matter of trade-offs
The more process types one model process supports, the less the Zygote
becomes useful. We want to be able to "pre-warm" as many things as
possible in the Zygote, both for sandboxing and performance reasons.
But pre-warming for all process types would be wasteful in a number of
ways.

- Sandboxing makes a lot of this difficult. Getting rid of the need
for the setuid sandbox (perhaps via unprivileges containers) would go
a long way towards making a nicer Zygote possible.

- We need to re-think APIs around process life-time in Chrome. Some of
the APIs expect to be able to get an exit status of a process
synchronously which can be very difficult to support (c.f. threading
issue).

- In particular, the threading issue mentioned earlier is really key.
We should figure out if we're happy to use CLONE_PARENT.

Chris Evans

unread,
Oct 16, 2013, 5:58:59 PM10/16/13
to Antoine Labour, Paweł Hajdan, Jr., Justin Schuh, Julien Tinnes, David Turner, chromium-dev
On Wed, Oct 16, 2013 at 12:53 PM, Antoine Labour <pi...@google.com> wrote:



On Wed, Oct 16, 2013 at 12:25 PM, Chris Evans <cev...@chromium.org> wrote:
On Wed, Oct 16, 2013 at 11:58 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
On Wed, Oct 16, 2013 at 11:53 AM, Chris Evans <cev...@google.com> wrote:
The fact that the GPU process currently has a different ASLR layout to renderer processes is considered a pretty useful property.

Are we talking about bringing the GPU under the same zygote as renderers or a brand new zygote for process types that do not live under the setuid sandbox?

The latter. As said in one of my posts above, some code that currently doesn't go through zygote would be non-trivial to sandbox, e.g. PluginLoaderPosix.

I'd like to address the immediate issue first, which is processes that don't go through zygote break upgrading Chrome on Linux, by adding a second, unsandboxed zygote.

Well, this is nice, because IMHO it is security positive. Currently the browser and GPU processes share the same ASLR layout and it sounds like they would not if we had a new unsandboxed zygote.

I think that's the opposite.
Today we exec the GPU process, but with the unsandboxed zygote, we would only fork.

My bad. You're right. pid 857 is browser, pid 2058 GPU:

chris@behemoth:~/newblink/src$ grep chrome /proc/857/maps | grep r-xp
7f4e7b503000-7f4e80cca000 r-xp 00000000 08:12 1317197                    /opt/google/chrome/chrome

chris@behemoth:~/newblink/src$ grep chrome /proc/2058/maps | grep r-xp
7fde13ec9000-7fde19690000 r-xp 00000000 08:12 1317197                    /opt/google/chrome/chrome

(Having posted these addresses to a public list, I suppose I now have to restart my browser :P )

So yeah, the chrome binary is definitely mapped differently, indicative of an exec. That's good for security. I'm pretty sure we didn't used to have this benefit.


Cheers
Chris

Julien Tinnes

unread,
Oct 16, 2013, 6:09:12 PM10/16/13
to Chris Evans, Antoine Labour, Paweł Hajdan, Jr., Justin Schuh, David Turner, chromium-dev
Indeed. All non Zygote processes are fork()/execve(), except for
implementation-level/hidden processes (such as the GPU broker
process). It's necessary anyways since LaunchProcess() is always
called from a multi-threaded environment, so we need an execve() after
the fork() to clean-up all those locks.

Antoine Labour

unread,
Oct 16, 2013, 6:26:31 PM10/16/13
to Julien Tinnes, Chris Evans, Paweł Hajdan, Jr., Justin Schuh, David Turner, chromium-dev, Mark Seaborn, a...@chromium.org, Adam Langley, Jorge Lucangeli Obes
On Wed, Oct 16, 2013 at 2:51 PM, Julien Tinnes <j...@chromium.org> wrote:
(Oops, I had missed this thread!)

Both Mark and myself have been pondering adding new "sub-Zygotes".
It's something I wanted to consider this quarter, but I would like to
have a good sense of what the changes related to Mojo should be first.

Some of the current well-known problems are:
- Threads: anything that fork() cannot be threaded without jumping
through hoops.

Sure, but we create the renderer zygote before we create threads, and we could do the same for the unsandboxed one too.
 
This creates a tremendous amount of complexity and
makes APIs such "GetTerminationStatus()" roughly impossible to
implement correctly on Linux (since they need to be blocking). To fix
this I've been adding the "known_dead" flag to a bunch of APIs which
is very awkward.

This is a difficult problem. CLONE_PARENT could be used to solve it
(the thing that actually does the fork() would be a child, but it
feels very hack-ish. See crbug.com/157458 or crbug.com/274827 for some
examples and explanations.

- The weird "ZygoteForkDelegate" interface that is used by NaCl when
it should really be its own separate Zygote. Currently a fork request
for NaCl goes to the "normal" Zygote which will route it to the NaCl
Zygote. Since both have to be monothreaded, this is especially
problematic. c.f. crbug.com/133453

- The Zygote was created partly because Chrome is updated "in-place"
on Linux. (And executing a new version of Chrome when starting a new
renderer process wouldn't work). However, this property has been lost
since there are too many "non Zygote" process types. Moreover, a lot
of the code that tries to be careful in re-executing /proc/pid/fd/X is
broken because base/ added a few 'readlink' in some of the high level
APIs which break the "same-inode" goal. Example: crbug.com/257149

If we fork the zygote right at startup, I think we can essentially ignore this problem for the same reasons cited upthread.



When designing a new model for a Zygote, we need keep a few things in mind:

- It can't be "one Zygote". We need a few "model process" around. How
many is a matter of trade-offs
The more process types one model process supports, the less the Zygote
becomes useful.

It still has the property that we don't need to exec, which is the fundamental benefit of the zygote.

Antoine

Julien Tinnes

unread,
Oct 16, 2013, 7:23:22 PM10/16/13
to Antoine Labour, Chris Evans, Paweł Hajdan, Jr., Justin Schuh, David Turner, chromium-dev, Mark Seaborn, a...@chromium.org, Adam Langley, Jorge Lucangeli Obes
On Wed, Oct 16, 2013 at 3:26 PM, Antoine Labour <pi...@google.com> wrote:
>
>
>
> On Wed, Oct 16, 2013 at 2:51 PM, Julien Tinnes <j...@chromium.org> wrote:
>>
>> (Oops, I had missed this thread!)
>>
>> Both Mark and myself have been pondering adding new "sub-Zygotes".
>> It's something I wanted to consider this quarter, but I would like to
>> have a good sense of what the changes related to Mojo should be first.
>>
>> Some of the current well-known problems are:
>> - Threads: anything that fork() cannot be threaded without jumping
>> through hoops.
>
> Sure, but we create the renderer zygote before we create threads, and we
> could do the same for the unsandboxed one too.

I've grossly summarized this complicated issue.

The problem is that POSIX lets you find out the exit status of a
process only if you're its parent. If the parent is going to be
whatever calls fork() (which is the case unless we're talking about
CLONE_PARENT hacks), then something that has to be mono-threaded also
needs to handle multiple requests for process status from the browser.
Worse: wait*id() has to be blocking to be reliable (the kernel is at
liberty to NOT return the status of a dead process if you use WNOHANG
and does so especially when the process is still being cleaned-up
(closing file descriptors in particular can block on I/O)).

See the paragraph below for links to bugs and discussions if you're
interested. The bottom line is: the current solutions are brittle,
weird and probably have a negative impact on perf in some cases.
CLONE_PARENT looks like the better solution to me, but it's not
portable and hackish.

(Also in practice this is even more complicated, because of the PID
namespace, the fact that we need our own init() implementation, etc..)

>> This creates a tremendous amount of complexity and
>> makes APIs such "GetTerminationStatus()" roughly impossible to
>> implement correctly on Linux (since they need to be blocking). To fix
>> this I've been adding the "known_dead" flag to a bunch of APIs which
>> is very awkward.
>>
>> This is a difficult problem. CLONE_PARENT could be used to solve it
>> (the thing that actually does the fork() would be a child, but it
>> feels very hack-ish. See crbug.com/157458 or crbug.com/274827 for some
>> examples and explanations.
>>
>> - The weird "ZygoteForkDelegate" interface that is used by NaCl when
>> it should really be its own separate Zygote. Currently a fork request
>> for NaCl goes to the "normal" Zygote which will route it to the NaCl
>> Zygote. Since both have to be monothreaded, this is especially
>> problematic. c.f. crbug.com/133453
>>
>> - The Zygote was created partly because Chrome is updated "in-place"
>> on Linux. (And executing a new version of Chrome when starting a new
>> renderer process wouldn't work). However, this property has been lost
>> since there are too many "non Zygote" process types. Moreover, a lot
>> of the code that tries to be careful in re-executing /proc/pid/fd/X is
>> broken because base/ added a few 'readlink' in some of the high level
>> APIs which break the "same-inode" goal. Example: crbug.com/257149
>
>
> If we fork the zygote right at startup, I think we can essentially ignore
> this problem for the same reasons cited upthread.

I was enunciating some of the known problems of this area. Having more
Zygote processes is indeed a way to solve this issue.
But it's not sufficient, base/ and a few other things need to be fixed
if we want to get rid of the current race condition. The setuid
sandbox makes this surprisingly non intuitive. See crbug.com/257149.

>> When designing a new model for a Zygote, we need keep a few things in
>> mind:
>>
>> - It can't be "one Zygote". We need a few "model process" around. How
>> many is a matter of trade-offs
>> The more process types one model process supports, the less the Zygote
>> becomes useful.
>
>
> It still has the property that we don't need to exec, which is the
> fundamental benefit of the zygote.

Absolutely. Again, I'm just stating what we need to do / keep in mind.
Figuring out this trade-off between a higher or smaller number of
specialized "model" processes will be important.

Julien

Julien Tinnes

unread,
Oct 16, 2013, 9:17:56 PM10/16/13
to Antoine Labour, Chris Evans, Paweł Hajdan, Jr., Justin Schuh, David Turner, chromium-dev, Mark Seaborn, a...@chromium.org, Adam Langley, Jorge Lucangeli Obes
FYI, Mark and myself moved some aspects of this discussion to
https://code.google.com/p/chromium/issues/detail?id=274855
Reply all
Reply to author
Forward
0 new messages