PSA: Blink's initialization/shutdown sequence is now elegant :D

218 views
Skip to first unread message

Kentaro Hara

unread,
Mar 22, 2016, 7:23:53 AM3/22/16
to blink-dev
Hi

Blink's initialization/shutdown sequence was a kind of spaghetti and a source of various crashes. So I cleaned up the sequence (*) and guaranteed the following things:

- initialize() and shutdown() are balanced; i.e., shutdown() shuts down things in the reverse order in which they are initialized.

- The main thread joins all other threads before shutting down (**).

- blink::initialize/shutdown initializes/shuts down all the blink components (web/, core/, modules/, platform/ and wtf/).

- Platform::initialize/shutdown initializes/shuts down only platform/ and wtf/. You can use Platform::initialize/shutdown in tests that only need platform/ and wtf/.

Thanks!


(*) Thanks Sigbjorn for reviewing a bunch of patches that handle many subtle scenarios.

(**) This doesn't mean that I regressed the performance of the shutdown sequence. It was already guaranteed that the main thread is the last thread to shut down except some edge cases that happen only in testing code. I just changed the shutdown sequence of the edge cases. (Also I don't think that the performance of the shutdown sequence really matters because in common cases Chrome forcibly kills the renderer process rather than shutting down Blink gracefully.)


--
Kentaro Hara, Tokyo, Japan

Dimitri Glazkov

unread,
Mar 22, 2016, 11:33:39 AM3/22/16
to Kentaro Hara, blink-dev
\o/

:DG<

Alexandre Elias

unread,
Mar 22, 2016, 6:47:42 PM3/22/16
to Kentaro Hara, blink-dev
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

On Tue, Mar 22, 2016 at 4:23 AM, Kentaro Hara <har...@chromium.org> wrote:

Kentaro Hara

unread,
Mar 22, 2016, 7:44:23 PM3/22/16
to Alexandre Elias, Ojan Vafai, blink-dev
On Wed, Mar 23, 2016 at 7:47 AM, Alexandre Elias <ael...@chromium.org> wrote:
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

Ojan: I vaguely remember you know when the shutdown sequence is used in production builds...?


 
On Tue, Mar 22, 2016 at 4:23 AM, Kentaro Hara <har...@chromium.org> wrote:
Hi

Blink's initialization/shutdown sequence was a kind of spaghetti and a source of various crashes. So I cleaned up the sequence (*) and guaranteed the following things:

- initialize() and shutdown() are balanced; i.e., shutdown() shuts down things in the reverse order in which they are initialized.

- The main thread joins all other threads before shutting down (**).

- blink::initialize/shutdown initializes/shuts down all the blink components (web/, core/, modules/, platform/ and wtf/).

- Platform::initialize/shutdown initializes/shuts down only platform/ and wtf/. You can use Platform::initialize/shutdown in tests that only need platform/ and wtf/.

Thanks!


(*) Thanks Sigbjorn for reviewing a bunch of patches that handle many subtle scenarios.

(**) This doesn't mean that I regressed the performance of the shutdown sequence. It was already guaranteed that the main thread is the last thread to shut down except some edge cases that happen only in testing code. I just changed the shutdown sequence of the edge cases. (Also I don't think that the performance of the shutdown sequence really matters because in common cases Chrome forcibly kills the renderer process rather than shutting down Blink gracefully.)


--
Kentaro Hara, Tokyo, Japan

Torne (Richard Coles)

unread,
Mar 23, 2016, 7:14:46 AM3/23/16
to Alexandre Elias, Kentaro Hara, blink-dev
On Tue, 22 Mar 2016 at 22:47 Alexandre Elias <ael...@chromium.org> wrote:
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

WebView is not supposed to ever shut down (android apps basically have no "exit" codepath at all, and just get SIGKILL when the framework is done with them).

Though, there is an open bug (http://crbug.com/514141) about WebView managing to hit the webkit shutdown path and then subsequently try to initialise it again in the same process, causing a crash (since this isn't supported). If anyone has any ideas how we could be getting to shut down in the first place I'd like to know, as we've not managed to find a cause for this yet :(

Ojan Vafai

unread,
Mar 24, 2016, 2:15:46 PM3/24/16
to Kentaro Hara, Alexandre Elias, blink-dev
On Tue, Mar 22, 2016 at 4:44 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 23, 2016 at 7:47 AM, Alexandre Elias <ael...@chromium.org> wrote:
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

Ojan: I vaguely remember you know when the shutdown sequence is used in production builds...?

We fast kill processes that have only the tab being closed in that process as long as the tab doesn't have beforeunload/unload handlers. For tabs that have unload handlers, we shutdown that tab more gracefully, but I don't think blink::shutdown is that code. As best I can tell, blink::shutdown is what we call when we we've already run all the unload handlers and decided to terminate the process.

I'm not really sure why we ever need a clean termination of a render process. Maybe someone else knows?

Kentaro Hara

unread,
Mar 24, 2016, 11:47:21 PM3/24/16
to Ojan Vafai, Alexandre Elias, blink-dev
On Fri, Mar 25, 2016 at 3:15 AM, Ojan Vafai <oj...@chromium.org> wrote:


On Tue, Mar 22, 2016 at 4:44 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 23, 2016 at 7:47 AM, Alexandre Elias <ael...@chromium.org> wrote:
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

Ojan: I vaguely remember you know when the shutdown sequence is used in production builds...?

We fast kill processes that have only the tab being closed in that process as long as the tab doesn't have beforeunload/unload handlers. For tabs that have unload handlers, we shutdown that tab more gracefully, but I don't think blink::shutdown is that code. As best I can tell, blink::shutdown is what we call when we we've already run all the unload handlers and decided to terminate the process.

I'm not really sure why we ever need a clean termination of a render process. Maybe someone else knows?

We need to shut down various things when finishing a renderer process (e.g., shut down discardable memory used by the renderer, close gpu channels etc). To shut down those things safely, we need to shut down all Blink threads (which may touch the things). Then we need a clean termination.

Also it is important to have a clean termination for memory leak detectors.


 
On Tue, Mar 22, 2016 at 4:23 AM, Kentaro Hara <har...@chromium.org> wrote:
Hi

Blink's initialization/shutdown sequence was a kind of spaghetti and a source of various crashes. So I cleaned up the sequence (*) and guaranteed the following things:

- initialize() and shutdown() are balanced; i.e., shutdown() shuts down things in the reverse order in which they are initialized.

- The main thread joins all other threads before shutting down (**).

- blink::initialize/shutdown initializes/shuts down all the blink components (web/, core/, modules/, platform/ and wtf/).

- Platform::initialize/shutdown initializes/shuts down only platform/ and wtf/. You can use Platform::initialize/shutdown in tests that only need platform/ and wtf/.

Thanks!


(*) Thanks Sigbjorn for reviewing a bunch of patches that handle many subtle scenarios.

(**) This doesn't mean that I regressed the performance of the shutdown sequence. It was already guaranteed that the main thread is the last thread to shut down except some edge cases that happen only in testing code. I just changed the shutdown sequence of the edge cases. (Also I don't think that the performance of the shutdown sequence really matters because in common cases Chrome forcibly kills the renderer process rather than shutting down Blink gracefully.)


--
Kentaro Hara, Tokyo, Japan




--
Kentaro Hara, Tokyo, Japan

Ojan Vafai

unread,
Mar 25, 2016, 12:08:39 AM3/25/16
to Kentaro Hara, Alexandre Elias, blink-dev
On Thu, Mar 24, 2016 at 8:47 PM Kentaro Hara <har...@chromium.org> wrote:
On Fri, Mar 25, 2016 at 3:15 AM, Ojan Vafai <oj...@chromium.org> wrote:
On Tue, Mar 22, 2016 at 4:44 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 23, 2016 at 7:47 AM, Alexandre Elias <ael...@chromium.org> wrote:
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

Ojan: I vaguely remember you know when the shutdown sequence is used in production builds...?

We fast kill processes that have only the tab being closed in that process as long as the tab doesn't have beforeunload/unload handlers. For tabs that have unload handlers, we shutdown that tab more gracefully, but I don't think blink::shutdown is that code. As best I can tell, blink::shutdown is what we call when we we've already run all the unload handlers and decided to terminate the process.

I'm not really sure why we ever need a clean termination of a render process. Maybe someone else knows?

We need to shut down various things when finishing a renderer process (e.g., shut down discardable memory used by the renderer, close gpu channels etc). To shut down those things safely, we need to shut down all Blink threads (which may touch the things). Then we need a clean termination.

Also it is important to have a clean termination for memory leak detectors.

Takashi Toyoshima

unread,
Mar 28, 2016, 1:41:56 AM3/28/16
to Ojan Vafai, Kentaro Hara, Alexandre Elias, blink-dev
Let me join this thread because I had a question around this area.

On Fri, Mar 25, 2016 at 1:08 PM, Ojan Vafai <oj...@chromium.org> wrote:
On Thu, Mar 24, 2016 at 8:47 PM Kentaro Hara <har...@chromium.org> wrote:
On Fri, Mar 25, 2016 at 3:15 AM, Ojan Vafai <oj...@chromium.org> wrote:
On Tue, Mar 22, 2016 at 4:44 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 23, 2016 at 7:47 AM, Alexandre Elias <ael...@chromium.org> wrote:
Sounds great!

Out of the curiosity, when is the shutdown sequence used?  Is it primarily for Android WebView (our only shipping configuration in single-process mode)?

Ojan: I vaguely remember you know when the shutdown sequence is used in production builds...?

We fast kill processes that have only the tab being closed in that process as long as the tab doesn't have beforeunload/unload handlers. For tabs that have unload handlers, we shutdown that tab more gracefully, but I don't think blink::shutdown is that code. As best I can tell, blink::shutdown is what we call when we we've already run all the unload handlers and decided to terminate the process.

I'm not really sure why we ever need a clean termination of a render process. Maybe someone else knows?

We need to shut down various things when finishing a renderer process (e.g., shut down discardable memory used by the renderer, close gpu channels etc). To shut down those things safely, we need to shut down all Blink threads (which may touch the things). Then we need a clean termination.

Also it is important to have a clean termination for memory leak detectors.

I don't understand how fast process killing works in that case.


There seems to be a special hack to ignore shutdown-only leaks.

But, it may not work perfectly because I saw a curious leak bug in http://crbug.com/521831. There, fast shutdown was running, and blink shutdown() wasn't called. But some blink objects were reported as direct leaks.
I gave up to understand what this hack actually did. So if someone who knows this area can help to triage this leak bug, that's great.
 

 


 
On Tue, Mar 22, 2016 at 4:23 AM, Kentaro Hara <har...@chromium.org> wrote:
Hi

Blink's initialization/shutdown sequence was a kind of spaghetti and a source of various crashes. So I cleaned up the sequence (*) and guaranteed the following things:

- initialize() and shutdown() are balanced; i.e., shutdown() shuts down things in the reverse order in which they are initialized.

- The main thread joins all other threads before shutting down (**).

- blink::initialize/shutdown initializes/shuts down all the blink components (web/, core/, modules/, platform/ and wtf/).

- Platform::initialize/shutdown initializes/shuts down only platform/ and wtf/. You can use Platform::initialize/shutdown in tests that only need platform/ and wtf/.

Thanks!


(*) Thanks Sigbjorn for reviewing a bunch of patches that handle many subtle scenarios.

(**) This doesn't mean that I regressed the performance of the shutdown sequence. It was already guaranteed that the main thread is the last thread to shut down except some edge cases that happen only in testing code. I just changed the shutdown sequence of the edge cases. (Also I don't think that the performance of the shutdown sequence really matters because in common cases Chrome forcibly kills the renderer process rather than shutting down Blink gracefully.)


--
Kentaro Hara, Tokyo, Japan




--
Kentaro Hara, Tokyo, Japan



--
Kentaro Hara, Tokyo, Japan



--
Takashi Toyoshima
Software Engineer, Google

Jochen Eisinger

unread,
Mar 29, 2016, 12:10:54 PM3/29/16
to Takashi Toyoshima, Ojan Vafai, Kentaro Hara, Alexandre Elias, blink-dev

We used to have a clean shutdown mode for testing, but that never really worked, so it was removed. Right now, there's a race between the renderer's shutdown code and the renderer process dying. Note that this might still result in unexpected crashes. For shared workers, we added another code path where the renderer blocks in a sync IPC until it is killed to avoid the shutdown code path all together.

The big problems with shutdown are that we don't fully destruct blink and v8, there's just an implicit contract that you mustn't call into blink anymore after calling shutdown. On the content/ side that repeatedly led to problems where an async task wasn't notified that blink went away and tries to call into it, or some destructor trying to send signals, or deregister something in blink.

So tl;Dr: shutdown == sadness

Kentaro Hara

unread,
Mar 29, 2016, 8:30:08 PM3/29/16
to Jochen Eisinger, Takashi Toyoshima, Ojan Vafai, Alexandre Elias, blink-dev
On Wed, Mar 30, 2016 at 1:10 AM, Jochen Eisinger <joc...@chromium.org> wrote:

We used to have a clean shutdown mode for testing, but that never really worked, so it was removed.


A bunch of *_unit_tests and *_browser_tests are still calling blink::shutdown. 
 

Right now, there's a race between the renderer's shutdown code and the renderer process dying. Note that this might still result in unexpected crashes. For shared workers, we added another code path where the renderer blocks in a sync IPC until it is killed to avoid the shutdown code path all together.

The big problems with shutdown are that we don't fully destruct blink and v8, there's just an implicit contract that you mustn't call into blink anymore after calling shutdown. On the content/ side that repeatedly led to problems where an async task wasn't notified that blink went away and tries to call into it, or some destructor trying to send signals, or deregister something in blink.

So tl;Dr: shutdown == sadness


Yeah, there were a lot of bugs like that. Since blink::shutdown shuts down Oilpan, PartitionAlloc and V8, it's highly likely that any call into Blink after blink::shutdown crashes.

BTW, if that is the case, we'll get back to Ojan's original question -- why do we need a graceful shutdown for renderers in the first place (because the graceful shutdown is causing the above issues)? If we always forcibly kill the renderers, will the issues be gone?

Ojan Vafai

unread,
Mar 30, 2016, 1:45:21 PM3/30/16
to Kentaro Hara, Jochen Eisinger, Takashi Toyoshima, Alexandre Elias, blink-dev
On Tue, Mar 29, 2016 at 5:30 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 30, 2016 at 1:10 AM, Jochen Eisinger <joc...@chromium.org> wrote:

We used to have a clean shutdown mode for testing, but that never really worked, so it was removed.


A bunch of *_unit_tests and *_browser_tests are still calling blink::shutdown. 
 

Right now, there's a race between the renderer's shutdown code and the renderer process dying. Note that this might still result in unexpected crashes. For shared workers, we added another code path where the renderer blocks in a sync IPC until it is killed to avoid the shutdown code path all together.

The big problems with shutdown are that we don't fully destruct blink and v8, there's just an implicit contract that you mustn't call into blink anymore after calling shutdown. On the content/ side that repeatedly led to problems where an async task wasn't notified that blink went away and tries to call into it, or some destructor trying to send signals, or deregister something in blink.

So tl;Dr: shutdown == sadness


Yeah, there were a lot of bugs like that. Since blink::shutdown shuts down Oilpan, PartitionAlloc and V8, it's highly likely that any call into Blink after blink::shutdown crashes.

BTW, if that is the case, we'll get back to Ojan's original question -- why do we need a graceful shutdown for renderers in the first place (because the graceful shutdown is causing the above issues)? If we always forcibly kill the renderers, will the issues be gone?

Seems like figuring out this question and making sense of our shutdown would be a good project for the architecture team. Is this already on the backlog?

Nico Weber

unread,
Mar 30, 2016, 1:59:21 PM3/30/16
to Ojan Vafai, Kentaro Hara, Jochen Eisinger, Takashi Toyoshima, Alexandre Elias, blink-dev
On Wed, Mar 30, 2016 at 1:45 PM, Ojan Vafai <oj...@chromium.org> wrote:
On Tue, Mar 29, 2016 at 5:30 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 30, 2016 at 1:10 AM, Jochen Eisinger <joc...@chromium.org> wrote:

We used to have a clean shutdown mode for testing, but that never really worked, so it was removed.


A bunch of *_unit_tests and *_browser_tests are still calling blink::shutdown. 
 

Right now, there's a race between the renderer's shutdown code and the renderer process dying. Note that this might still result in unexpected crashes. For shared workers, we added another code path where the renderer blocks in a sync IPC until it is killed to avoid the shutdown code path all together.

The big problems with shutdown are that we don't fully destruct blink and v8, there's just an implicit contract that you mustn't call into blink anymore after calling shutdown. On the content/ side that repeatedly led to problems where an async task wasn't notified that blink went away and tries to call into it, or some destructor trying to send signals, or deregister something in blink.

So tl;Dr: shutdown == sadness


Yeah, there were a lot of bugs like that. Since blink::shutdown shuts down Oilpan, PartitionAlloc and V8, it's highly likely that any call into Blink after blink::shutdown crashes.

BTW, if that is the case, we'll get back to Ojan's original question -- why do we need a graceful shutdown for renderers in the first place (because the graceful shutdown is causing the above issues)? If we always forcibly kill the renderers, will the issues be gone?

Seems like figuring out this question and making sense of our shutdown would be a good project for the architecture team. Is this already on the backlog?

I'm not sure. I think Chrome's long-term goal is to just _exit() processes when they quit since that's faster and we need to not corrupt on-disk data structures well even when quitting in the middle of things since that can happen at any time due to crashes. So I think in general we should never call shutdown, so maybe making it work well is time better invested elsewhere.

Ojan Vafai

unread,
Mar 30, 2016, 2:06:15 PM3/30/16
to Nico Weber, Kentaro Hara, Jochen Eisinger, Takashi Toyoshima, Alexandre Elias, blink-dev
On Wed, Mar 30, 2016 at 10:59 AM Nico Weber <tha...@chromium.org> wrote:
On Wed, Mar 30, 2016 at 1:45 PM, Ojan Vafai <oj...@chromium.org> wrote:
On Tue, Mar 29, 2016 at 5:30 PM Kentaro Hara <har...@chromium.org> wrote:
On Wed, Mar 30, 2016 at 1:10 AM, Jochen Eisinger <joc...@chromium.org> wrote:

We used to have a clean shutdown mode for testing, but that never really worked, so it was removed.


A bunch of *_unit_tests and *_browser_tests are still calling blink::shutdown. 
 

Right now, there's a race between the renderer's shutdown code and the renderer process dying. Note that this might still result in unexpected crashes. For shared workers, we added another code path where the renderer blocks in a sync IPC until it is killed to avoid the shutdown code path all together.

The big problems with shutdown are that we don't fully destruct blink and v8, there's just an implicit contract that you mustn't call into blink anymore after calling shutdown. On the content/ side that repeatedly led to problems where an async task wasn't notified that blink went away and tries to call into it, or some destructor trying to send signals, or deregister something in blink.

So tl;Dr: shutdown == sadness


Yeah, there were a lot of bugs like that. Since blink::shutdown shuts down Oilpan, PartitionAlloc and V8, it's highly likely that any call into Blink after blink::shutdown crashes.

BTW, if that is the case, we'll get back to Ojan's original question -- why do we need a graceful shutdown for renderers in the first place (because the graceful shutdown is causing the above issues)? If we always forcibly kill the renderers, will the issues be gone?

Seems like figuring out this question and making sense of our shutdown would be a good project for the architecture team. Is this already on the backlog?

I'm not sure. I think Chrome's long-term goal is to just _exit() processes when they quit since that's faster and we need to not corrupt on-disk data structures well even when quitting in the middle of things since that can happen at any time due to crashes. So I think in general we should never call shutdown, so maybe making it work well is time better invested elsewhere.

That's what I meant. We should have someone investigating doing that. :)

Bruce Dawson

unread,
Mar 30, 2016, 2:20:01 PM3/30/16
to Ojan Vafai, Nico Weber, Kentaro Hara, Jochen Eisinger, Takashi Toyoshima, Alexandre Elias, blink-dev
Terminating processes with _exit (or TerminateProcess or similar 'unclean' exit strategies) can also help with performance. A clean shutdown on a low-memory system can require paging in lots of memory merely to free it. This can cause shutdown to take arbitrarily long (many seconds or even minutes) with no value to the user, who may just want to restart Chrome and resume browsing.

Kentaro Hara

unread,
Mar 30, 2016, 7:42:54 PM3/30/16
to Bruce Dawson, Ojan Vafai, Nico Weber, Jochen Eisinger, Takashi Toyoshima, Alexandre Elias, blink-dev
So, our goal is to make production builds not call shutdown at all. Added it to the backlog of the arch team.

What about tests? Currently the leak detector depends on a clean shutdown somehow, but I think it would be possible to get rid of the assumption.

Ojan Vafai

unread,
Mar 30, 2016, 11:51:47 PM3/30/16
to Kentaro Hara, Bruce Dawson, Nico Weber, Jochen Eisinger, Takashi Toyoshima, Alexandre Elias, blink-dev

Sgtm

Jochen Eisinger

unread,
Mar 31, 2016, 5:19:21 AM3/31/16
to Ojan Vafai, Kentaro Hara, Bruce Dawson, Nico Weber, Takashi Toyoshima, Alexandre Elias, blink-dev
Note that there's a difference between shutting down a renderer and calling blink::shutdown() - the latter was maybe not elegant but basically always worked.

Also not that for the leak detection tools, we basically whitelist all blink and v8 objects anyways, as their shutdown methods don't free them up.

Ojan Vafai

unread,
Mar 31, 2016, 5:19:58 PM3/31/16
to Jochen Eisinger, Kentaro Hara, Bruce Dawson, Nico Weber, Takashi Toyoshima, Alexandre Elias, blink-dev
On Thu, Mar 31, 2016 at 2:19 AM Jochen Eisinger <joc...@chromium.org> wrote:
Note that there's a difference between shutting down a renderer and calling blink::shutdown() - the latter was maybe not elegant but basically always worked.

As in, blink::shutdown is turning down the renderer process but you can have multiple renderers in a process? Or do you mean something else?

Jochen Eisinger

unread,
Mar 31, 2016, 5:30:09 PM3/31/16
to Ojan Vafai, Kentaro Hara, Bruce Dawson, Nico Weber, Takashi Toyoshima, Alexandre Elias, blink-dev
blink::shutdown() only freezes Blink and V8. To stop the renderer, you need to run RenderThreadImpl::Shutdown which is where it becomes hairy.

Shane Stephens

unread,
Mar 31, 2016, 5:52:32 PM3/31/16
to Jochen Eisinger, Ojan Vafai, Kentaro Hara, Bruce Dawson, Nico Weber, Takashi Toyoshima, Alexandre Elias, blink-dev
Fast shutdown is possibly causing inaccurate UMA counts (https://bugs.chromium.org/p/chromium/issues/detail?id=513059).

It would be nice if the arch team took on cleaning up UMA logging as part of the shutdown task.

Cheers,
    -Shane 

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Elliott Sprehn

unread,
Mar 31, 2016, 6:24:33 PM3/31/16
to Shane Stephens, Jochen Eisinger, Ojan Vafai, Kentaro Hara, Bruce Dawson, Nico Weber, Takashi Toyoshima, Alexandre Elias, blink-dev
My understanding is that someone was looking at moving UMA to shared memory or something to avoid this. I'm not sure what the progress of that is.

(Yes we'll add shutdown UMA logging to our backlog).
Reply all
Reply to author
Forward
0 new messages