Removing (most parts of) RenderThreadImpl::Shutdown

95 views
Skip to first unread message

Kentaro Hara

unread,
Jul 22, 2016, 5:57:26 AM7/22/16
to platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger
Hi

I'm now thinking about a way to unconditionally shut down the renderer by SIGKILL (i.e., unconditionally use FastShutdownIfPossible()) and thus entirely remove the normal shutdown sequence (i.e., remove RenderThreadImpl::Shutdown). Per the discussion on the blink-dev@ thread, this seems like a right direction to fix various crashes we've been hitting in the shutdown sequence.

Before diving into the implementation details, I'd like to understand the concerns of removing the shutdown sequence. Let me ask a couple of questions:

- Do you know any tests that depend on the shutdown sequence?

- It is interesting that the only case where we currently fail at FastShutdownIfPossible() is when we have workers (It looks like that GetContentClient()->browser()->IsFastShutdownPossible() and SuddenTerminationAllowed() always return true). Do you know why we need the shutdown sequence if we have workers?

- Should we aim at entirely removing RenderThreadImpl::Shutdown()? Or should we aim at removing only blink::shutdown()?

- Do you have any other concern?


--
Kentaro Hara, Tokyo, Japan

Hiroki Nakagawa

unread,
Jul 22, 2016, 6:33:16 AM7/22/16
to Kentaro Hara, platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Kinuko Yasuda
2016-07-22 18:56 GMT+09:00 Kentaro Hara <har...@chromium.org>:
Hi

I'm now thinking about a way to unconditionally shut down the renderer by SIGKILL (i.e., unconditionally use FastShutdownIfPossible()) and thus entirely remove the normal shutdown sequence (i.e., remove RenderThreadImpl::Shutdown). Per the discussion on the blink-dev@ thread, this seems like a right direction to fix various crashes we've been hitting in the shutdown sequence.

Before diving into the implementation details, I'd like to understand the concerns of removing the shutdown sequence. Let me ask a couple of questions:

- Do you know any tests that depend on the shutdown sequence?

- It is interesting that the only case where we currently fail at FastShutdownIfPossible() is when we have workers (It looks like that GetContentClient()->browser()->IsFastShutdownPossible() and SuddenTerminationAllowed() always return true). Do you know why we need the shutdown sequence if we have workers?

This logic was added by this CL not to run the shutdown sequence but to prolong life of a renderer process hosting out-of-process workers (ie. SharedWorker and ServiceWorker) until all workers are gone (+horo@, kinuko@)
 
- Should we aim at entirely removing RenderThreadImpl::Shutdown()? Or should we aim at removing only blink::shutdown()?

- Do you have any other concern?


--
Kentaro Hara, Tokyo, Japan

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CABg10jxu3ZUnrNHStG3fy%2BsuX8%2BeyW2uscmj_gdEJ3_A400DAQ%40mail.gmail.com.

John Abd-El-Malek

unread,
Jul 22, 2016, 11:20:34 AM7/22/16
to Hiroki Nakagawa, Kentaro Hara, platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Kinuko Yasuda
On Fri, Jul 22, 2016 at 3:32 AM, Hiroki Nakagawa <nhi...@chromium.org> wrote:


2016-07-22 18:56 GMT+09:00 Kentaro Hara <har...@chromium.org>:
Hi

I'm now thinking about a way to unconditionally shut down the renderer by SIGKILL (i.e., unconditionally use FastShutdownIfPossible()) and thus entirely remove the normal shutdown sequence (i.e., remove RenderThreadImpl::Shutdown). Per the discussion on the blink-dev@ thread, this seems like a right direction to fix various crashes we've been hitting in the shutdown sequence.

Before diving into the implementation details, I'd like to understand the concerns of removing the shutdown sequence. Let me ask a couple of questions:

- Do you know any tests that depend on the shutdown sequence?

- It is interesting that the only case where we currently fail at FastShutdownIfPossible() is when we have workers (It looks like that GetContentClient()->browser()->IsFastShutdownPossible()

btw this looks like a dead method and we can remove it. It was added for chromeframe.
 
and SuddenTerminationAllowed() always return true). Do you know why we need the shutdown sequence if we have workers?

This logic was added by this CL not to run the shutdown sequence but to prolong life of a renderer process hosting out-of-process workers (ie. SharedWorker and ServiceWorker) until all workers are gone (+horo@, kinuko@)

Once the workers are stopped though, fast shutdown would be possible. i.e. workers keep a process alive just like a tab, but once there aren't workers or pages left, we can do fast shutdown.
 
 
- Should we aim at entirely removing RenderThreadImpl::Shutdown()? Or should we aim at removing only blink::shutdown()?

- Do you have any other concern?


--
Kentaro Hara, Tokyo, Japan

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CABg10jxu3ZUnrNHStG3fy%2BsuX8%2BeyW2uscmj_gdEJ3_A400DAQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.

Nick Carter

unread,
Jul 22, 2016, 2:09:06 PM7/22/16
to John Abd-El-Malek, Hiroki Nakagawa, Kentaro Hara, platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Kinuko Yasuda, Charlie Reis, Avi Drissman, Camille Lamy
Overall, I support simplifying things so that shutdown is always browser-initiated.

What happens now -- with the renderer and the browser process both sort of responsible for triggering the death -- is quite a puzzle, and it's not clear that we get all the interactions right. For example, because RenderProcessHostImpl::OnShutdownRequest() doesn't mutate any state in the RPH, it's not clear that there's anything to suppress (pending_views_ || GetActiveViewCount() > 0) transitioning from false to true immediately after we tell the child process to start terminating.

The CPU sent closing/freeing/unregistering objects right before process shutdown is essentially wasteful, since the system needs to be resilient against unclean termination anyway. One thing we can, at least in theory, gain from clean renderer exit is a kind of proof-of-correctness in the program as a whole: inconsistencies (leaks, dangling references) that would cause errors during shutdown could be problems for normal operation too. There is also the matter of flushing any accumulated stats/tracing/histograms, which I'm not sure how good a job we do of presently. But for the most part, the user -- who doesn't care about any of that stuff -- would be best served by a simple SIGKILL.

> SuddenTerminationAllowed() always return true

Is it true that |RenderProcessHostImpl::sudden_termination_allowed_| is always true? SuddenTerminationChanged can be triggered by RenderProcessHostMsg_SuddenTerminationChanged, and it seems like RendererBlinkPlatformImpl::suddenTerminationChanged, which sends that message, has quite a few callers, including when an unload handler is present.

> tests

There's a chrome browsertest, FastShutdown.SlowTermination, that makes sure that document.cookie assignment from inside a page's unload handler make it to the cookie store. We ought to write a similar test for navigator.sendBeacon, which is an important thing to work from within an unload handler.

When looking at code history, I noticed http://crbug.com/461530, a fullscreen bug that appeared during a prior attempt to refactor sudden termination. That scenario may be worth considering.


Kentaro Hara

unread,
Jul 23, 2016, 1:41:12 PM7/23/16
to Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Kinuko Yasuda, Charlie Reis, Avi Drissman, Camille Lamy
Thanks all!

On Fri, Jul 22, 2016 at 8:09 PM, Nick Carter <ni...@chromium.org> wrote:
Overall, I support simplifying things so that shutdown is always browser-initiated.

What happens now -- with the renderer and the browser process both sort of responsible for triggering the death -- is quite a puzzle, and it's not clear that we get all the interactions right. For example, because RenderProcessHostImpl::OnShutdownRequest() doesn't mutate any state in the RPH, it's not clear that there's anything to suppress (pending_views_ || GetActiveViewCount() > 0) transitioning from false to true immediately after we tell the child process to start terminating.

The CPU sent closing/freeing/unregistering objects right before process shutdown is essentially wasteful, since the system needs to be resilient against unclean termination anyway. One thing we can, at least in theory, gain from clean renderer exit is a kind of proof-of-correctness in the program as a whole: inconsistencies (leaks, dangling references) that would cause errors during shutdown could be problems for normal operation too. There is also the matter of flushing any accumulated stats/tracing/histograms, which I'm not sure how good a job we do of presently.

Unfortunately this assumption is already broken :/ Blink and V8 finishes leaving a ton of objects as is.
 
But for the most part, the user -- who doesn't care about any of that stuff -- would be best served by a simple SIGKILL.

> SuddenTerminationAllowed() always return true

Is it true that |RenderProcessHostImpl::sudden_termination_allowed_| is always true? SuddenTerminationChanged can be triggered by RenderProcessHostMsg_SuddenTerminationChanged, and it seems like RendererBlinkPlatformImpl::suddenTerminationChanged, which sends that message, has quite a few callers, including when an unload handler is present.

Ah, good point. I was missing the IPC path.
 
> tests

There's a chrome browsertest, FastShutdown.SlowTermination, that makes sure that document.cookie assignment from inside a page's unload handler make it to the cookie store. We ought to write a similar test for navigator.sendBeacon, which is an important thing to work from within an unload handler.

When looking at code history, I noticed http://crbug.com/461530, a fullscreen bug that appeared during a prior attempt to refactor sudden termination. That scenario may be worth considering.

Thanks, this kind of info is really helpful to understand the situation.

Maybe in the first step, we should aim at removing only blink::shutdown. Once it's done, we can aim at removing more things from RenderThreadImpl::Shutdown.

Kentaro Hara

unread,
Jul 25, 2016, 2:13:28 PM7/25/16
to Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Kinuko Yasuda, Charlie Reis, Avi Drissman, Camille Lamy
WebView is unconditionally killing the renderer by SIGKILL. Why doesn't it cause compatibility problems for unload handlers, out-of-process workers etc?

Kinuko Yasuda

unread,
Jul 26, 2016, 8:55:34 AM7/26/16
to Kentaro Hara, Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Richard Coles, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Charlie Reis, Avi Drissman, Camille Lamy
I'm not very familiar with WebView behavior but there are compatibility problems.  Unload behavior is hugely inconsistent esp. on Android, that's one of the reason we want to kill unload handler completely (c.f. https://crbug.com/154321 tells about it, and Ilya has a nice blog post about this)-- while handling unload event is still somewhat popular on Web.  Shared worker is not supported in WebView, and Service worker support is being requested but I don't think we're officially supporting it yet.

Btw SuddenTerminationAllowed() is also used to keep a renderer around while background storage request is being processed, e.g. data is being transferred from renderer to browser (so that they can be persisted on disk or can be shared with / transferred to another renderer).  I don't know how they work on WebView, and I suppose there're still cases where we could lose such data in other sudden-death cases.

Torne (Richard Coles)

unread,
Jul 26, 2016, 9:20:55 AM7/26/16
to Kinuko Yasuda, Kentaro Hara, Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Charlie Reis, Avi Drissman, Camille Lamy
On Tue, 26 Jul 2016 at 13:55 Kinuko Yasuda <kin...@chromium.org> wrote:
I'm not very familiar with WebView behavior but there are compatibility problems.  Unload behavior is hugely inconsistent esp. on Android, that's one of the reason we want to kill unload handler completely (c.f. https://crbug.com/154321 tells about it, and Ilya has a nice blog post about this)-- while handling unload event is still somewhat popular on Web.  Shared worker is not supported in WebView, and Service worker support is being requested but I don't think we're officially supporting it yet.

Btw SuddenTerminationAllowed() is also used to keep a renderer around while background storage request is being processed, e.g. data is being transferred from renderer to browser (so that they can be persisted on disk or can be shared with / transferred to another renderer).  I don't know how they work on WebView, and I suppose there're still cases where we could lose such data in other sudden-death cases.


On Tue, Jul 26, 2016 at 3:12 AM, Kentaro Hara <har...@chromium.org> wrote:
WebView is unconditionally killing the renderer by SIGKILL. Why doesn't it cause compatibility problems for unload handlers, out-of-process workers etc?

It's a bit more complicated than that. WebView doesn't have separate renderers at all (it runs in single process mode), and as such doesn't ever shut down the renderer at all, abruptly or cleanly.

Chrome on Android has separate renderer processes, but these will be SIGKILLed by the Android OOM killer any time the OS wants to free up memory (in order of least recently used).

Chrome's browser process gets notified by the OS when it's sent to the background and when it's expected to save state to disk, via onPause/onStop/onLowMemory/etc callbacks, and Chrome tries to make sure everything that must be persisted to disk is persisted in these callbacks, but once these callbacks are triggered, the OS is *also* free to SIGKILL the main browser process as well whenever it wishes. (in extreme low memory situations it can be SIGKILLed without these callbacks happening at all but this is rare and should be thought of as a crash scenario really).

In WebView, though, these callbacks are *not received* by WebView, and there's basically no signal at all that WebView can use to know when it must flush state to disk. We rely entirely on chromium having short enough periodic write timers that data loss is uncommon :(

Unload handlers are likely to just never run, in both Chrome and WebView on Android. The compatibility problems you are imagining just... are real problems that aren't solved and probably can't ever be solved.

Kentaro Hara

unread,
Jul 26, 2016, 9:44:39 AM7/26/16
to Torne (Richard Coles), Kinuko Yasuda, Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Charlie Reis, Avi Drissman, Camille Lamy
Thanks Torne and kinuko-san!

I now understand that it would be hard to unconditionally kill the renderer by FastShutdownIfPossible() because there're a couple of things that cannot cope with the fast shutdown (e.g., unload handlers, service workers, background storage).

Then how about this idea?

- Keep supporting unload handlers, service workers, background storage as is.

- Stop shutting down things in RenderThreadmpl::Shutdown(). For example, stop calling blink::shutdown(), ChildThreadImpl::ShutdownDiscardableSharedMemoryManager(), main_message_loop_.reset() etc. In other words, let the renderer shut down without shutting down things gracefully.

I hope this will be enough to fix use-after-free crashes that happen around the shutdown sequence. Also it will reduce the shutdown overhead. Will it work?

Kinuko Yasuda

unread,
Jul 27, 2016, 11:18:12 AM7/27/16
to Kentaro Hara, Torne (Richard Coles), Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Charlie Reis, Avi Drissman, Camille Lamy
Yep, the current FastShutdownIfPossible() does the two things (check & fast shutdown) but I believe it should be possible to keep the former logic for keeping the renderer alive when necessary around while changing the actual shutdown sequence to do fast shutdown / SIGKILL etc.

Kentaro Hara

unread,
Aug 23, 2016, 12:32:58 AM8/23/16
to Kinuko Yasuda, Torne (Richard Coles), Nick Carter, John Abd-El-Malek, Hiroki Nakagawa, platform-architecture-dev, Ojan Vafai, Jochen Eisinger, Tsuyoshi Horo, Charlie Reis, Avi Drissman, Camille Lamy
I landed a CL to stop calling blink::shutdown and now am trying to remove base::RunLoop().RunUntilIdle() from the shutdown sequence (CL). Do you have any concern about it? The base::RunLoop().RunUntilIdle() has been another source of use-after-free bugs because it runs arbitrary tasks after many things have been shut down.

What I don't fully understand is where unload handlers are currently handled. I don't think that the unload handlers are handled in the base::RunLoop().RunUntilIdle() and thus it's safe to remove it, but correct me if I'm wrong :)





Thanks all!

To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.



--
Kentaro Hara, Tokyo, Japan



--
Kentaro Hara, Tokyo, Japan



--
Kentaro Hara, Tokyo, Japan
Reply all
Reply to author
Forward
0 new messages