Should we stop supporting clean shutdown in Chrome? [was: How does Chrome shutdown work on Android?]

328 views
Skip to first unread message

Colin Blundell

unread,
Sep 2, 2016, 3:58:38 AM9/2/16
to chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, si...@chromium.org, rob...@chromium.org, fdo...@chromium.org
Hi all,

gab@ has raised an interesting question:

"Given that [Android never attempts clean shutdown] and that Android covers a good chunk of the userbase, should we just drop attempting to gracefully shutdown on all platforms?!

...

Shutdown in Chrome is known to be very complex, problematic, and often too slow; given Android already has to reliably handle being shot at any point, making this the default exit path on all platforms might solve many issues (ref. the egregious shutdown hangs we've been chasing on Windows for years)."

gab@ makes the arguments for. So far the potential arguments that I've seen against are:

1. pinkerton@ notes that iOS goes through clean Chrome shutdown on some instances.
2. siggi@ notes that unclean shutdown is "very much best effort, as it may have to abandon laggy writes, doesn't wind up threads that are in mid-operation etc".

Re: 1, I'm interested to hear more from Mike about when and why iOS goes through clean and unclean shutdown; in particular, does iOS have to deal with system-initiated unclean shutdown, similar to Android?

Re: 2, given that unclean shutdown *is* shutdown on Android, it seems to me that the implication here is that to the extent that unclean shutdown can leave the user in an inconsistent state, we need to work to harden the codebase against that.

What do people think?

Thanks,

Colin

Marshall Greenblatt

unread,
Sep 2, 2016, 4:04:05 AM9/2/16
to Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
 

Thanks,

Colin

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Colin Blundell

unread,
Sep 2, 2016, 4:10:30 AM9/2/16
to Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Ah, right. I think this discussion is a generalization of that one though, right? From scanning that thread, I interpret the conclusions there as:

- Make singletons leaky be default
- It's a bug if unclean shutdown results in inconsistent persistent state or inconsistent state on restart

Does that match your understanding?

I feel like the end of that discussion points to the start of this one :).

Thanks,

Colin

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Ryo Hashimoto

unread,
Sep 2, 2016, 4:13:30 AM9/2/16
to magree...@gmail.com, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
+1 for eliminating clean shutdown support.
There are many Chrome OS crash reports caused by Chrome being slow to shutdown (e.g. crbug.com/474869).
Also, relying on clean shutdown to save user data is fundamentally vulnerable to unexpected events like Chrome crash, OS crash, and device power loss.

Mattias Nissler

unread,
Sep 2, 2016, 4:41:44 AM9/2/16
to Ryo Hashimoto, magree...@gmail.com, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
IIRC, another context the shutdown behavior discussion came up in the past was multi-profile. We need to be able shut down a profile cleanly when the user closes the last window, so that part of the clean shutdown path must be kept functional (at least as long as we support multi-profile, IIRC it was a bit of work to get it functional when multi-profile was first introduced).

Marshall Greenblatt

unread,
Sep 2, 2016, 4:59:43 AM9/2/16
to Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 11:09 AM, Colin Blundell <blun...@chromium.org> wrote:
Ah, right. I think this discussion is a generalization of that one though, right? From scanning that thread, I interpret the conclusions there as:

- Make singletons leaky be default
- It's a bug if unclean shutdown results in inconsistent persistent state or inconsistent state on restart

Does that match your understanding?

Sounds about right. Do we have tests currently that verify state across shutdown (forced or otherwise)?
 
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Peter Kasting

unread,
Sep 2, 2016, 5:26:22 AM9/2/16
to Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 1:09 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Also, relying on clean shutdown to save user data is fundamentally vulnerable to unexpected events like Chrome crash, OS crash, and device power loss.

Not inherently true.

For example, parts of the omnibox code need to track stats condensed from the user's (much larger) complete history.  If those stats are out of date, they can be recomputed on startup.

So the omnibox tries to write out updated stats periodically, especially at shutdown.  If it fails (e.g. crash before then), the cost is not correctness, it's startup perf on the next run.  Startup perf is extremely valuable and we don't want to regress that.  But it's also not clear we want to be writing to disk constantly to ensure our stats are updated on every history event.

I am in general very skeptical of the idea that shutdown can or should be entirely eliminated.  The useful thing, which we'd need to do anyway before doing that, would be to understand what currently happens at shutdown and try to eliminate as much of it as possible.  Then a decision about what is left can be made in an informed fashion.

So perhaps it's just phrasing, but I would have phrased things less provocatively about "how can we find and eliminate unnecessary work during shutdown?"

PK

Ryo Hashimoto

unread,
Sep 2, 2016, 7:03:11 AM9/2/16
to Peter Kasting, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 6:25 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 1:09 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Also, relying on clean shutdown to save user data is fundamentally vulnerable to unexpected events like Chrome crash, OS crash, and device power loss.

Not inherently true.

For example, parts of the omnibox code need to track stats condensed from the user's (much larger) complete history.  If those stats are out of date, they can be recomputed on startup.

So the omnibox tries to write out updated stats periodically, especially at shutdown.  If it fails (e.g. crash before then), the cost is not correctness, it's startup perf on the next run.  Startup perf is extremely valuable and we don't want to regress that.  But it's also not clear we want to be writing to disk constantly to ensure our stats are updated on every history event.
Isn't the said situation already happening on Android? Clean shutdown never happens on that platform as gab@ said.
If it's causing a serious startup perf issue, IMO what should be fixed is the startup code.

I am in general very skeptical of the idea that shutdown can or should be entirely eliminated.  The useful thing, which we'd need to do anyway before doing that, would be to understand what currently happens at shutdown and try to eliminate as much of it as possible.  Then a decision about what is left can be made in an informed fashion.

So perhaps it's just phrasing, but I would have phrased things less provocatively about "how can we find and eliminate unnecessary work during shutdown?"
If we are to choose what to save during shutdown, reconstructable data (incl. the said omnibox stats) would be excluded as losing it is not fatal.
So, categorizing data based on its reconstructivity:
  Data which can be reconstructed -> You shouldn't spend time on it during shutdown.
  Data which cannot be reconstructed -> You shouldn't rely on clean shutdown to persist it as it can be lost on unexpected events or when running on Android.
IMHO this means there is no point in supporting clean shutdown.

PK

Mike Pinkerton

unread,
Sep 2, 2016, 10:27:47 AM9/2/16
to Ryo Hashimoto, Peter Kasting, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
I don't believe my comment about iOS going through clean shutdown was an argument against the suggestion. It was just a statement that we shouldn't assume that no mobile platform handles clean shutdown (and to please test that shutdown CL on iOS!). 

For background, when we started Chrome Mobile (on both iOS and Android), Chrome relied heavily on clean shutdowns to clean up and flush user state. We had to do a lot of work to make it behave in this new mobile world, often shifting the work to when the app was backgrounded. In around iOS6 or so, Apple added a mode to the app lifecycle to account for clean shutdowns so we actually had to do additional work (hooking up the shutdown code which we never planned to include) in order to support this.

Personally I think we should get rid of clean shutdown for all platforms. There are uses for this on desktop as well, for example recent macos versions support something called "Sudden Termination" (*) and it would be great if we could teach desktop to handle this cleanly as well. 


--
Mike Pinkerton
Mac Weenie
pink...@google.com

Scott Violet

unread,
Sep 2, 2016, 10:30:45 AM9/2/16
to Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
gab also pointed out on Windows we go through the unclean shutdown
path when the user logs out/powers-down. In this case we can't go
through the clean shutdown path because of the state windows is in
(our clean shutdown path on Windows relies on getting certain messages
that are not delivered in this state).

To me the argument comes down to we are forced to support the unclean
shutdown path, so why bother also maintaining the clean shutdown path?

-Scott

Nico Weber

unread,
Sep 2, 2016, 10:37:26 AM9/2/16
to Scott Violet, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 10:29 AM, Scott Violet <s...@chromium.org> wrote:
gab also pointed out on Windows we go through the unclean shutdown
path when the user logs out/powers-down. In this case we can't go
through the clean shutdown path because of the state windows is in
(our clean shutdown path on Windows relies on getting certain messages
that are not delivered in this state).

To me the argument comes down to we are forced to support the unclean
shutdown path, so why bother also maintaining the clean shutdown path?

I strongly agree with this :-)
 

  -Scott

Colin Blundell

unread,
Sep 2, 2016, 10:40:06 AM9/2/16
to Mike Pinkerton, Ryo Hashimoto, Peter Kasting, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 4:26 PM Mike Pinkerton <pink...@chromium.org> wrote:
I don't believe my comment about iOS going through clean shutdown was an argument against the suggestion. It was just a statement that we shouldn't assume that no mobile platform handles clean shutdown (and to please test that shutdown CL on iOS!).

Apologies, didn't mean to misrepresent your statement; I just meant that if we *needed* to continue supporting clean shutdown on iOS, then we clearly wouldn't be able to get rid of it completely.

Colin Blundell

unread,
Sep 2, 2016, 10:43:18 AM9/2/16
to Peter Kasting, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
I'm sympathetic to this point of view, but given that every shutdown on Android is an unclean shutdown, doesn't all work done during clean shutdown essentially have to be unnecessary work?
 

PK

Primiano Tucci

unread,
Sep 2, 2016, 10:46:13 AM9/2/16
to Colin Blundell, Mike Pinkerton, Ryo Hashimoto, Peter Kasting, Marshall Greenblatt, chromium-dev, Gabriel Charette, Scott Violet, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Strongly like this proposal for production code.
The only thing that makes me worry is that, very likely, this will mean that more things will become leaky singletons -> more state will leak between tests -> more flaky tests (mostly browser tests).
So, in essence I wonder whether this will turn into a: "let's not maintain a clean shutdown path" (good) but also a  "ah let's now maintain a teardown path just for tests" (bad). Any ideas on how should we deal with tests?

Scott Violet

unread,
Sep 2, 2016, 10:59:22 AM9/2/16
to Primiano Tucci, Colin Blundell, Mike Pinkerton, Ryo Hashimoto, Peter Kasting, Marshall Greenblatt, chromium-dev, Gabriel Charette, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
We don't run multiple browser_tests in the same process, so
browser_tests wouldn't be effected.

-Scott

Adam Rice

unread,
Sep 2, 2016, 11:00:26 AM9/2/16
to prim...@chromium.org, Colin Blundell, Mike Pinkerton, Ryo Hashimoto, Peter Kasting, Marshall Greenblatt, chromium-dev, Gabriel Charette, Scott Violet, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
I've worked on systems where shutdown semantics were not guaranteed, and trying to write tests in that environment was a nightmare. Even if you could get your tests to stop crashing long enough to check them in, they would be flaky or break the next time somebody changed the fragile contract you had managed to make with the subsystems underneath you.

I believe that in order to ship a stable browser to our users, we need to be able to write reliable tests. And in order for that to possible, we need clean, working shutdown paths.

Whether those paths are ever exercised on user's machines is a separate issue. It seems clear that in cross-platform code expecting to be able to run code at shutdown ever at all is unwise.

Colin Blundell

unread,
Sep 2, 2016, 11:06:08 AM9/2/16
to Adam Rice, prim...@chromium.org, Colin Blundell, Mike Pinkerton, Ryo Hashimoto, Peter Kasting, Marshall Greenblatt, chromium-dev, Gabriel Charette, Scott Violet, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 4:59 PM Adam Rice <ri...@chromium.org> wrote:
I've worked on systems where shutdown semantics were not guaranteed, and trying to write tests in that environment was a nightmare. Even if you could get your tests to stop crashing long enough to check them in, they would be flaky or break the next time somebody changed the fragile contract you had managed to make with the subsystems underneath you.

Can you think of concrete examples of how this could manifest in Chromium?

Peter Kasting

unread,
Sep 2, 2016, 3:32:34 PM9/2/16
to Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 4:00 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 6:25 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 1:09 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Also, relying on clean shutdown to save user data is fundamentally vulnerable to unexpected events like Chrome crash, OS crash, and device power loss.

Not inherently true.

For example, parts of the omnibox code need to track stats condensed from the user's (much larger) complete history.  If those stats are out of date, they can be recomputed on startup.

So the omnibox tries to write out updated stats periodically, especially at shutdown.  If it fails (e.g. crash before then), the cost is not correctness, it's startup perf on the next run.  Startup perf is extremely valuable and we don't want to regress that.  But it's also not clear we want to be writing to disk constantly to ensure our stats are updated on every history event.
Isn't the said situation already happening on Android? Clean shutdown never happens on that platform as gab@ said.
If it's causing a serious startup perf issue, IMO what should be fixed is the startup code.

How do you proposing "fixing" the startup code to avoid this perf regression?  The point is that you need to read in a large amount of data and then process it to create the smaller amount of data.  There's no way to do that on startup and still be fast.  So you need to have the smaller data around before startup.

It is probably broken on Android.  I doubt anyone working on this has tried to optimize startup perf on Android.

But that doesn't mean the answer is to also break desktop.

I am in general very skeptical of the idea that shutdown can or should be entirely eliminated.  The useful thing, which we'd need to do anyway before doing that, would be to understand what currently happens at shutdown and try to eliminate as much of it as possible.  Then a decision about what is left can be made in an informed fashion.

So perhaps it's just phrasing, but I would have phrased things less provocatively about "how can we find and eliminate unnecessary work during shutdown?"
If we are to choose what to save during shutdown, reconstructable data (incl. the said omnibox stats) would be excluded as losing it is not fatal.

You're missing my point.  The whole reason I wrote the email was that it's incorrect to claim there are only two buckets, "reconstructable" and "non-reconstructable", for which the correct descriptions are always "no consequence to not handling" and "dataloss if not handled".  There are shutdown actions which fall in between: consequences, but less severe than dataloss.

And let's not get hung up on arguing about this specific case.  I'm writing an example to support a larger thesis: it is better to attack the problem by trying to eliminate things from the clean shutdown path, and then later decide the question of whether that path can disappear entirely, than it is to assume from the outset that the path can disappear entirely, because such assumptions lead to things like the mistaken dichotomy above.

PK

Alexei Svitkine

unread,
Sep 2, 2016, 3:35:43 PM9/2/16
to Peter Kasting, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 3:31 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 4:00 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 6:25 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 1:09 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Also, relying on clean shutdown to save user data is fundamentally vulnerable to unexpected events like Chrome crash, OS crash, and device power loss.

Not inherently true.

For example, parts of the omnibox code need to track stats condensed from the user's (much larger) complete history.  If those stats are out of date, they can be recomputed on startup.

So the omnibox tries to write out updated stats periodically, especially at shutdown.  If it fails (e.g. crash before then), the cost is not correctness, it's startup perf on the next run.  Startup perf is extremely valuable and we don't want to regress that.  But it's also not clear we want to be writing to disk constantly to ensure our stats are updated on every history event.
Isn't the said situation already happening on Android? Clean shutdown never happens on that platform as gab@ said.
If it's causing a serious startup perf issue, IMO what should be fixed is the startup code.

How do you proposing "fixing" the startup code to avoid this perf regression?  The point is that you need to read in a large amount of data and then process it to create the smaller amount of data.  There's no way to do that on startup and still be fast.  So you need to have the smaller data around before startup.

How necessary is it to do this on startup, blocking the paint of the first page? I would imagine this could be deferred to shortly after startup, to avoid regressing actual start up "time to first paint".
 

It is probably broken on Android.  I doubt anyone working on this has tried to optimize startup perf on Android.

But that doesn't mean the answer is to also break desktop.

I am in general very skeptical of the idea that shutdown can or should be entirely eliminated.  The useful thing, which we'd need to do anyway before doing that, would be to understand what currently happens at shutdown and try to eliminate as much of it as possible.  Then a decision about what is left can be made in an informed fashion.

So perhaps it's just phrasing, but I would have phrased things less provocatively about "how can we find and eliminate unnecessary work during shutdown?"
If we are to choose what to save during shutdown, reconstructable data (incl. the said omnibox stats) would be excluded as losing it is not fatal.

You're missing my point.  The whole reason I wrote the email was that it's incorrect to claim there are only two buckets, "reconstructable" and "non-reconstructable", for which the correct descriptions are always "no consequence to not handling" and "dataloss if not handled".  There are shutdown actions which fall in between: consequences, but less severe than dataloss.

And let's not get hung up on arguing about this specific case.  I'm writing an example to support a larger thesis: it is better to attack the problem by trying to eliminate things from the clean shutdown path, and then later decide the question of whether that path can disappear entirely, than it is to assume from the outset that the path can disappear entirely, because such assumptions lead to things like the mistaken dichotomy above.

PK

--

Antoine Labour

unread,
Sep 2, 2016, 3:36:59 PM9/2/16
to Peter Kasting, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Agreed, and I would throw a word of caution - there is a difference between "consistent state" and "consistent state with no data loss". Everyone agrees that we need the former even in the case of unclean shutdown not under our control (e.g. crashes, killed by the system, etc.). But the latter is the difference between (strawman) "all my bookmarks are there" vs "hey Chrome didn't save that last bookmark I added" (e.g. we terminated chrome before we finished all disk operations, including saving that last bookmark).

From the user's point of view, I don't think it's acceptable to not have "consistent state with no data loss" when closing chrome.

Antoine

Peter Kasting

unread,
Sep 2, 2016, 3:49:24 PM9/2/16
to Alexei Svitkine, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 2, 2016 at 12:34 PM, Alexei Svitkine <asvi...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 3:31 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 4:00 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 6:25 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 2, 2016 at 1:09 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Also, relying on clean shutdown to save user data is fundamentally vulnerable to unexpected events like Chrome crash, OS crash, and device power loss.

Not inherently true.

For example, parts of the omnibox code need to track stats condensed from the user's (much larger) complete history.  If those stats are out of date, they can be recomputed on startup.

So the omnibox tries to write out updated stats periodically, especially at shutdown.  If it fails (e.g. crash before then), the cost is not correctness, it's startup perf on the next run.  Startup perf is extremely valuable and we don't want to regress that.  But it's also not clear we want to be writing to disk constantly to ensure our stats are updated on every history event.
Isn't the said situation already happening on Android? Clean shutdown never happens on that platform as gab@ said.
If it's causing a serious startup perf issue, IMO what should be fixed is the startup code.

How do you proposing "fixing" the startup code to avoid this perf regression?  The point is that you need to read in a large amount of data and then process it to create the smaller amount of data.  There's no way to do that on startup and still be fast.  So you need to have the smaller data around before startup.

How necessary is it to do this on startup, blocking the paint of the first page? I would imagine this could be deferred to shortly after startup, to avoid regressing actual start up "time to first paint".

It blocks correct operation of the omnibox.

Startup is not just about the first paint metric, it's about the time to a usable browser.  We have no benchmark for "time until the omnibox is usable", which is why I'm especially keen to not regress things like this.

PK 

Vincent Scheib

unread,
Sep 2, 2016, 4:22:52 PM9/2/16
to Peter Kasting, Alexei Svitkine, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Don't some of our memory tests e.g. LeakSanitizer rely on clean shut downs to detect leaks?

--

Gabriel Charette

unread,
Sep 2, 2016, 4:44:52 PM9/2/16
to Peter Kasting, Alexei Svitkine, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
I think the contention is around the teardown phase (when services are told about imminent shutdown and given one last chance to flush). As a first step, we should probably keep that phase (and aim for it to be a no-op for most services -- I agree with Peter's use case of doing some work on shutdown instead of next startup for performance reasons). But we can probably get rid of the shutdown phase (when things are deleted for realz and all). As for tests, browser tests use a new process per test as highlighted by Scott so not an issue; some unit_tests might need existing Shutdown() calls to become ShutdownForTesting() but that's about it (FWIW, the upcoming base/task_scheduler uses this paradigm -- never deleted in production, only ensures that TaskShutdownBehavior semantics are respected and then lets everything loose).

As for Vincent's mention of LSAN tests, I remember being told that LSAN ignores shutdown leaks? But then I guess I'm not sure how it qualifies a "leak"? Perhaps it ignores everything hanging off of known top-level leaked objects?

Jeffrey Yasskin

unread,
Sep 2, 2016, 4:58:33 PM9/2/16
to g...@chromium.org, chromium-dev, Alexander Potapenko
On Fri, Sep 2, 2016 at 1:43 PM, Gabriel Charette <g...@chromium.org> wrote:
I think the contention is around the teardown phase (when services are told about imminent shutdown and given one last chance to flush). As a first step, we should probably keep that phase (and aim for it to be a no-op for most services -- I agree with Peter's use case of doing some work on shutdown instead of next startup for performance reasons). But we can probably get rid of the shutdown phase (when things are deleted for realz and all). As for tests, browser tests use a new process per test as highlighted by Scott so not an issue; some unit_tests might need existing Shutdown() calls to become ShutdownForTesting() but that's about it (FWIW, the upcoming base/task_scheduler uses this paradigm -- never deleted in production, only ensures that TaskShutdownBehavior semantics are respected and then lets everything loose).

As for Vincent's mention of LSAN tests, I remember being told that LSAN ignores shutdown leaks? But then I guess I'm not sure how it qualifies a "leak"? Perhaps it ignores everything hanging off of known top-level leaked objects?

IIRC, LSAN ignores shutdown leaks by checking for leaks just before shutdown starts. For example: https://cs.chromium.org/chromium/src/chrome/browser/browser_process_impl.cc?q=__lsan_do_leak_check&sq=package:chromium&l=1275&dr=C

Jeffrey

Scott Hess

unread,
Sep 2, 2016, 5:36:27 PM9/2/16
to Antoine Labour, Peter Kasting, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Gabriel Charette, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
> Agreed, and I would throw a word of caution - there is a difference between
> "consistent state" and "consistent state with no data loss". Everyone agrees
> that we need the former even in the case of unclean shutdown not under our
> control (e.g. crashes, killed by the system, etc.). But the latter is the
> difference between (strawman) "all my bookmarks are there" vs "hey Chrome
> didn't save that last bookmark I added" (e.g. we terminated chrome before we
> finished all disk operations, including saving that last bookmark).
>
> From the user's point of view, I don't think it's acceptable to not have
> "consistent state with no data loss" when closing chrome.

Another complication is that you can create systems where the data has
been sent towards the disk, but may not be persistent on the disk at
the point where the writing program shuts down. Chromium's storage
subsystems often/generally don't consider stuff written until they are
persistent on disk.

[A big issue in this area is that we have many storage subsystems with
no coordination at all. So our attempt at sane shutdown is
necessarily pretty coarse-grained.]

-scott

Ryo Hashimoto

unread,
Sep 5, 2016, 6:00:34 AM9/5/16
to Gabriel Charette, Peter Kasting, Alexei Svitkine, Ryo Hashimoto, Marshall Greenblatt, Colin Blundell, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Unless there's clear data about how much it affects the performance, I think Chrome shouldn't spend time for the said data during shutdown. Users can do many things with Chrome and Omnibox even without autocomplete.

At least, we've been discarding these data on Android and Windows for long which perform what gab@ called teardown, but not shutdown.
On Android, FlushPersistentData may get called before Chrome is killed by the OS, but it just saves the local state, preferences, and a number of selected data.
On Windows, SessionEnding gets called in an OS shutdown and it just saves the local state and preferences.
Apparently the said perf regression and the said data loss have been already there for years.
In both cases (Android terminating Chrome and other apps to make room for a new app, Windows terminating all running apps on OS shutdown), we should be very picky when choosing which data to write as the OS will mercilessly kill Chrome after the timeout, and the device is supposed to be already busy doing other things when it happens. 

PK 

Colin Blundell

unread,
Sep 5, 2016, 7:55:14 AM9/5/16
to Ryo Hashimoto, Gabriel Charette, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, Colin Blundell, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Since the default starting position of this email thread was effectively "Let's kill clean shutdown", let me summarize the arguments I've seen on the thread for maintaining clean shutdown:

1. When the user explicitly closes Chrome (as they can on Windows), they expect no data loss; data consistency with data loss is not sufficient to meet their expectations.
2. Without clean shutdown, writing tests can be cumbersome or even infeasible, or alternatively, we might end up effectively having to maintain a clean shutdown path just for tests.
3. Some form of clean shutdown is required for closing a Profile in multiprofile.
4. Some actions taken on clean shutdown (e.g., writing out up-to-date omnibox stats) otherwise have to be taken on the next startup, regressing startup performance.

Of these arguments, (1) strikes me as a reason that we do indeed need to maintain clean shutdown in production: effectively, there is an expectation by users of the browser on Windows that simply doesn't exist on the other platforms in question, and we need to satisfy that expectation. I would be curious to hear the opinions of others who have been supportive of eliminating clean shutdown on this thread.

For completeness, my thoughts on the others:

(2) does not seem like a blocker to me: we already have lots of code that's only for tests, and moving complexity in production to complexity in testing seems like an overall win to me.

(3) It seems like the requirements here could possibly be fulfilled without having the entire clean browser shutdown path that we have today.

(4) seems dangerous to me given that clean shutdown is not available on ChromeOS, Android, or iOS. Given that reality, I would regard solutions that rely on clean shutdown to avoid regressing startup performance as an anti-pattern. 

I am unsure of whether gab@'s proposal to "keep the teardown phase but get rid of the shutdown phase" would preserve (1) while eliminating significant complexity around clean shutdown. gab@, could you elaborate?

After digesting this thread, my position on the purpose and usage of clean shutdown in Chrome is now:

"Clean shutdown exists solely to prevent data loss when the user explicitly closes the app (on platforms where such prevention is technically possible). In particular, if using clean shutdown to optimize the performance of subsequent startup, consider strongly whether there is another way of achieving your goal that would avoid regressing startup on Android, ChromeOS, and iOS (platforms where clean shutdown is not possible)."

The natural followup on what should be done is then what PK has stated: "How can we find and eliminate unnecessary work during shutdown?"

Thoughts?

Thanks,

Colin

Ryo Hashimoto

unread,
Sep 5, 2016, 9:56:43 AM9/5/16
to Colin Blundell, Ryo Hashimoto, Gabriel Charette, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Mon, Sep 5, 2016 at 8:53 PM, Colin Blundell <blun...@chromium.org> wrote:
Since the default starting position of this email thread was effectively "Let's kill clean shutdown", let me summarize the arguments I've seen on the thread for maintaining clean shutdown:

1. When the user explicitly closes Chrome (as they can on Windows), they expect no data loss; data consistency with data loss is not sufficient to meet their expectations.
2. Without clean shutdown, writing tests can be cumbersome or even infeasible, or alternatively, we might end up effectively having to maintain a clean shutdown path just for tests.
3. Some form of clean shutdown is required for closing a Profile in multiprofile.
4. Some actions taken on clean shutdown (e.g., writing out up-to-date omnibox stats) otherwise have to be taken on the next startup, regressing startup performance.

Of these arguments, (1) strikes me as a reason that we do indeed need to maintain clean shutdown in production: effectively, there is an expectation by users of the browser on Windows that simply doesn't exist on the other platforms in question, and we need to satisfy that expectation. I would be curious to hear the opinions of others who have been supportive of eliminating clean shutdown on this thread.
I'd say clean shutdown is partially supported on Windows.
When the user turns off their computer, what happens depends on how they do that:
 1. Close all Chrome windows, wait for a number of seconds, shutdown the OS -> Clean shutdown is performed and completed.
 2. Close all Chrome windows, then immediately shutdown the OS -> Clean shutdown is performed, but Chrome may get killed by the OS before it completes.
 3. Shutdown the OS without closing Chrome windows -> Clean shutdown is not performed.
Have we ever advised users to do #1 whenever they turn off their Windows machines?
I think we haven't, and I think users shouldn't worry about Chrome's internal state when they turn off their computers.
IMHO this means we shouldn't rely on clean shutdown even on Windows (otherwise, we have to educate users to correctly perform clean shutdown by doing #1).

Peter Kasting

unread,
Sep 5, 2016, 11:04:18 AM9/5/16
to Ryo Hashimoto, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, Colin Blundell, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Mon, Sep 5, 2016 at 2:56 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Unless there's clear data about how much it affects the performance, I think Chrome shouldn't spend time for the said data during shutdown. Users can do many things with Chrome and Omnibox even without autocomplete.

I'm going to ask you to trust me that this matters and we can't regress it.  So please stop assuming we can safely regress it.

On Android, FlushPersistentData may get called before Chrome is killed by the OS, but it just saves the local state, preferences, and a number of selected data.

The Android omnibox is a different implementation than Windows and I have literally never tested to see if it even works correctly, let alone tried to optimize its performance, so do not assume that things are or should be fine on Android.  However, I would think FlushPersistentData should be sufficient to write this stuff.
 
On Windows, SessionEnding gets called in an OS shutdown and it just saves the local state and preferences.

I believe most Chrome sessions on Windows do not end with an OS shutdown (or have an OS shutdown while Chrome is still shutting down).  Do you have data showing otherwise?  You seem to assume that either this is common, or (in your other email, if it ever happens we "can't rely on" clean shutdown, which is missing the point.

Apparently the said perf regression and the said data loss have been already there for years.

There is no data loss.

And this whole email ignores the point I've been trying to make, and clarify, repeatedly: I am NOT trying to debate a specific thing happening during shutdown.  I am simply trying to suggest that walking into the situation without even understanding what other parts of the browser are doing and saying "clearly we can kill clean shutdown!" is cavalier and unnecessarily aggressive.  Want to improve shutdown perf?  Fine!  Great goal.  Let's do it in a controlled, planned manner that understands and avoids regressions along the way.

So can we please stop talking about the omnibox, and stop throwing around the simplistic idea that we can just discard shutdown entirely in the near future?  Everybody thinks doing as little as possible during shutdown is is a great idea, so let's move on to finding some people who want to actually work on removing unnecessary shutdown work in said controlled and planned manner.

PK

Peter Kasting

unread,
Sep 5, 2016, 11:16:54 AM9/5/16
to Colin Blundell, Ryo Hashimoto, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Mon, Sep 5, 2016 at 4:53 AM, Colin Blundell <blun...@chromium.org> wrote:
For completeness, my thoughts on the others:

(2) does not seem like a blocker to me: we already have lots of code that's only for tests, and moving complexity in production to complexity in testing seems like an overall win to me.

Good principles, but how practical are they in this case?

A code path maintained only for tests is a huge burden and unlikely to really work long term.  And if the situation is really that lacking clean shutdown would make certain kinds of tests impossible, I'd want to be very sure that's a kind of test that we don't actually need.  Otherwise, complexity in production may well be justified.

So I'd like to understand better the details of this concern before discarding it.
 
(3) It seems like the requirements here could possibly be fulfilled without having the entire clean browser shutdown path that we have today.

Possible, but without investigating it, not certain.

(4) seems dangerous to me given that clean shutdown is not available on ChromeOS, Android, or iOS. Given that reality, I would regard solutions that rely on clean shutdown to avoid regressing startup performance as an anti-pattern.

Solutions that do a better job on all platforms are always welcome to any problem.

That said, something that improves perf on Windows without managing to improve it on CrOS is not necessarily wrong.  Just less than ideal.  I would take a patch that improves perf on some platforms some of the time any day.

And again, the important thing is in the details here.  What data is being written when?  What is the precise effect of not writing it?  What alternate solutions exist?

"Clean shutdown exists solely to prevent data loss when the user explicitly closes the app (on platforms where such prevention is technically possible). In particular, if using clean shutdown to optimize the performance of subsequent startup, consider strongly whether there is another way of achieving your goal that would avoid regressing startup on Android, ChromeOS, and iOS (platforms where clean shutdown is not possible)."

To me this is a statement of hope, not a statement of fact.  If you preface the whole thing with "I want to move towards a world where", then great.  But I think qualifiers like that are important because otherwise we're inclined to discard the realities of today with a hand-wavey "well, that shouldn't be that way", as if that just makes things not that way, and we need not spend time and effort worrying about, understanding, and fixing said things.

Finally, if we are able to make correct shutdown simple and fast enough, is it possible to guarantee it, or at least perform it more often, on more platforms?  Just because we might shutdown uncleanly on Windows doesn't mean we usually do, and similarly, if we can be in a world where we "usually" shutdown cleanly on Android/CrOS, because clean shutdown can be done in 100 ms or whatever, that might be worth trying to do.

PK

Colin Blundell

unread,
Sep 5, 2016, 11:23:32 AM9/5/16
to Peter Kasting, Colin Blundell, Ryo Hashimoto, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org

<snip to just make a quick clarification>



"Clean shutdown exists solely to prevent data loss when the user explicitly closes the app (on platforms where such prevention is technically possible). In particular, if using clean shutdown to optimize the performance of subsequent startup, consider strongly whether there is another way of achieving your goal that would avoid regressing startup on Android, ChromeOS, and iOS (platforms where clean shutdown is not possible)."

To me this is a statement of hope, not a statement of fact.  If you preface the whole thing with "I want to move towards a world where", then great.  But I think qualifiers like that are important because otherwise we're inclined to discard the realities of today with a hand-wavey "well, that shouldn't be that way", as if that just makes things not that way, and we need not spend time and effort worrying about, understanding, and fixing said things.


Yes, I should have been clear that that was my statement on what the purpose and usage of clean shutdown in Chrome should be, not what they are at the current time. Apologies for the confusion.
 
PK

Peter Kasting

unread,
Sep 5, 2016, 11:33:19 AM9/5/16
to Colin Blundell, Ryo Hashimoto, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Cool.  If we can get to a world where shutdown is minimal and merely flushes caches or whatever, and dataloss outside shutdown is minimized by smart policies around periodically doing that anyway, I think that'd be great.

PK 

Ryo Hashimoto

unread,
Sep 6, 2016, 6:30:22 AM9/6/16
to Peter Kasting, Ryo Hashimoto, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, Colin Blundell, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Tue, Sep 6, 2016 at 12:03 AM, Peter Kasting <pkas...@chromium.org> wrote:
On Mon, Sep 5, 2016 at 2:56 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Unless there's clear data about how much it affects the performance, I think Chrome shouldn't spend time for the said data during shutdown. Users can do many things with Chrome and Omnibox even without autocomplete.

I'm going to ask you to trust me that this matters and we can't regress it.  So please stop assuming we can safely regress it.

On Android, FlushPersistentData may get called before Chrome is killed by the OS, but it just saves the local state, preferences, and a number of selected data.

The Android omnibox is a different implementation than Windows and I have literally never tested to see if it even works correctly, let alone tried to optimize its performance, so do not assume that things are or should be fine on Android.  However, I would think FlushPersistentData should be sufficient to write this stuff.
 
On Windows, SessionEnding gets called in an OS shutdown and it just saves the local state and preferences.

I believe most Chrome sessions on Windows do not end with an OS shutdown (or have an OS shutdown while Chrome is still shutting down).  Do you have data showing otherwise?  You seem to assume that either this is common, or (in your other email, if it ever happens we "can't rely on" clean shutdown, which is missing the point.
Why do you think it's not common? Do you think most WIndows users close all Chrome windows and wait for a number of seconds before turning off their computers?

Personally, when I use my Windows machine, I launch Chrome just after logging in and keep the window open until OS shutdown.
I've never bothered to close Chrome windows before turning off my machine, so this means, for me, almost all Chrome shutdown is unclean.
If this legitimate use case results in serious performance regression or anything unacceptable, it's not the user's fault, but Chrome's.
IIUC this means we shouldn't rely on clean shutdown even on Windows.

Do you think it's OK to make the UX painful when the user forgets to close all Chrome windows and wait for Chrome to complete shutdown before turning off their computer?

Apparently the said perf regression and the said data loss have been already there for years.

There is no data loss.

And this whole email ignores the point I've been trying to make, and clarify, repeatedly: I am NOT trying to debate a specific thing happening during shutdown.  I am simply trying to suggest that walking into the situation without even understanding what other parts of the browser are doing and saying "clearly we can kill clean shutdown!" is cavalier and unnecessarily aggressive.  Want to improve shutdown perf?  Fine!  Great goal.  Let's do it in a controlled, planned manner that understands and avoids regressions along the way.
I don't care much about omnibox neither.
What I want to clarify is that we're already performing unclean shutdown on Windows, and we shouldn't make it result in bad UX.
 

So can we please stop talking about the omnibox, and stop throwing around the simplistic idea that we can just discard shutdown entirely in the near future?  Everybody thinks doing as little as possible during shutdown is is a great idea, so let's move on to finding some people who want to actually work on removing unnecessary shutdown work in said controlled and planned manner.
Is "doing as little as possible during shutdown" going to work?

For example, crbug.com/470501 was a bug I fixed where Perferences and Local State had been written to the disk multiple times during shutdown.
Writing Preferences and Local State is very costly because it involves fsync() which can take forever depending on how busy the disk is, so ideally it should be done only once during shutdown.
However, developers tend to add redundant code to make sure the data is saved because it's much easier to add new code than investigating the existing code.

Also, almost all storage classes start saving their buffered data in the destructors, and it's easy to write new code which does the same.
If we are to perform clean shutdown, every time a new class which performs IO in its dtor is added, the shutdown performance inevitably gets regressed.

PK

Torne (Richard Coles)

unread,
Sep 6, 2016, 6:37:52 AM9/6/16
to hash...@chromium.org, Peter Kasting, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, Colin Blundell, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Tue, 6 Sep 2016 at 11:29 Ryo Hashimoto <hash...@chromium.org> wrote:
On Tue, Sep 6, 2016 at 12:03 AM, Peter Kasting <pkas...@chromium.org> wrote:
On Mon, Sep 5, 2016 at 2:56 AM, Ryo Hashimoto <hash...@chromium.org> wrote:
Unless there's clear data about how much it affects the performance, I think Chrome shouldn't spend time for the said data during shutdown. Users can do many things with Chrome and Omnibox even without autocomplete.

I'm going to ask you to trust me that this matters and we can't regress it.  So please stop assuming we can safely regress it.

On Android, FlushPersistentData may get called before Chrome is killed by the OS, but it just saves the local state, preferences, and a number of selected data.

The Android omnibox is a different implementation than Windows and I have literally never tested to see if it even works correctly, let alone tried to optimize its performance, so do not assume that things are or should be fine on Android.  However, I would think FlushPersistentData should be sufficient to write this stuff.
 
On Windows, SessionEnding gets called in an OS shutdown and it just saves the local state and preferences.

I believe most Chrome sessions on Windows do not end with an OS shutdown (or have an OS shutdown while Chrome is still shutting down).  Do you have data showing otherwise?  You seem to assume that either this is common, or (in your other email, if it ever happens we "can't rely on" clean shutdown, which is missing the point.
Why do you think it's not common? Do you think most WIndows users close all Chrome windows and wait for a number of seconds before turning off their computers?

I think people are assuming that many users exit Chrome at times *other than* when shutting down their computer, and so even if they never exit chrome cleanly before shutting their computer down, the majority of Chrome exits for those users still happen at times other than shutdown. Certainly some proportion of users do this; I have no idea what that proportion is, maybe we know from UMA somewhere?
 
Personally, when I use my Windows machine, I launch Chrome just after logging in and keep the window open until OS shutdown.
I've never bothered to close Chrome windows before turning off my machine, so this means, for me, almost all Chrome shutdown is unclean.
If this legitimate use case results in serious performance regression or anything unacceptable, it's not the user's fault, but Chrome's.
IIUC this means we shouldn't rely on clean shutdown even on Windows.

Do you think it's OK to make the UX painful when the user forgets to close all Chrome windows and wait for Chrome to complete shutdown before turning off their computer?

Apparently the said perf regression and the said data loss have been already there for years.

There is no data loss.

And this whole email ignores the point I've been trying to make, and clarify, repeatedly: I am NOT trying to debate a specific thing happening during shutdown.  I am simply trying to suggest that walking into the situation without even understanding what other parts of the browser are doing and saying "clearly we can kill clean shutdown!" is cavalier and unnecessarily aggressive.  Want to improve shutdown perf?  Fine!  Great goal.  Let's do it in a controlled, planned manner that understands and avoids regressions along the way.
I don't care much about omnibox neither.
What I want to clarify is that we're already performing unclean shutdown on Windows, and we shouldn't make it result in bad UX.
 

So can we please stop talking about the omnibox, and stop throwing around the simplistic idea that we can just discard shutdown entirely in the near future?  Everybody thinks doing as little as possible during shutdown is is a great idea, so let's move on to finding some people who want to actually work on removing unnecessary shutdown work in said controlled and planned manner.
Is "doing as little as possible during shutdown" going to work?

For example, crbug.com/470501 was a bug I fixed where Perferences and Local State had been written to the disk multiple times during shutdown.
Writing Preferences and Local State is very costly because it involves fsync() which can take forever depending on how busy the disk is, so ideally it should be done only once during shutdown.
However, developers tend to add redundant code to make sure the data is saved because it's much easier to add new code than investigating the existing code.

Also, almost all storage classes start saving their buffered data in the destructors, and it's easy to write new code which does the same.
If we are to perform clean shutdown, every time a new class which performs IO in its dtor is added, the shutdown performance inevitably gets regressed.

PK

Colin Blundell

unread,
Sep 6, 2016, 7:56:31 AM9/6/16
to Torne (Richard Coles), hash...@chromium.org, Peter Kasting, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, Colin Blundell, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Thanks for this discussion, everyone. I have much more insight into this issue than I did when I started the discussion. I agree with Peter that without digging into the specifics of what's going on during clean shutdown (and how often clean shutdown even occurs relative to all shutdowns), it's impossible to resolve the challenges that have been raised to eliminating clean shutdown - it essentially becomes your guess vs. mine.

I don't have cycles to dig into this at the current time, but I created a doc to summarize this discussion so that we have an easy-to-consume record for the next time this issue comes up in one way or another. Let me know if you feel it is misrepresenting or missing something!

Thanks,

Colin

Gabriel Charette

unread,
Sep 7, 2016, 5:20:08 PM9/7/16
to Colin Blundell, Torne (Richard Coles), hash...@chromium.org, Peter Kasting, Gabriel Charette, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
A couple clarifications (replying here instead of doc to make sure everyone sees) :

A) I'm not sure how profile shutdown came to the discussion but AFAIK we never shutdown profiles (we would like to but naked Profile* are all over the place and if you close all windows of a profile, it's not actually teared down until browser shutdown).

B) The term "unclean shutdown" typically refers to Chrome failing to write the "clean shutdown" bit in Preferences (e.g., crash or hanged process killed). This is when you see the "Chrome didn't shutdown properly : Restore" bubble.
Here I think we should refer to "ungraceful shutdown" where by "graceful" I mean: unregistering things, releasing handles, tearing down threads, etc.
For example: observers of WillDestroyCurrentMessageLoop() doing things like removing file watcher registrations just so it doesn't post task to a null MessageLoop is clearly an instance of cleaning the carpets before burning down the building.

I do think that Peter's use case is valid. Saving state on shutdown *when possible* to save time on the next startup should always be done. This is why I propose a "teardown phase" where notifications are propagated to everything to pronounce their last wish before the process is terminated.
It sounds like we may already have that in FlushPersistentData? Should we make this cross-platform, invoke it, and terminate process after?

The only disadvantage I see is that we might no longer be able to shutdown subsytems at runtime, but AFAIK, we already can't (e.g. profiles).

Colin Blundell

unread,
Sep 8, 2016, 8:09:49 AM9/8/16
to Gabriel Charette, Colin Blundell, Torne (Richard Coles), hash...@chromium.org, Peter Kasting, mnis...@chromium.org, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Hi gab@,

Apologies in advance if I misunderstood something :). Questions inline.

On Wed, Sep 7, 2016 at 11:17 PM Gabriel Charette <g...@chromium.org> wrote:
A couple clarifications (replying here instead of doc to make sure everyone sees) :

A) I'm not sure how profile shutdown came to the discussion but AFAIK we never shutdown profiles (we would like to but naked Profile* are all over the place and if you close all windows of a profile, it's not actually teared down until browser shutdown).

+Mattias Nissler raised this point: "We need to be able shut down a profile cleanly when the user closes the last window, so that part of the clean shutdown path must be kept functional (at least as long as we support multi-profile, IIRC it was a bit of work to get it functional when multi-profile was first introduced)." Mattias, any comments here?

B) The term "unclean shutdown" typically refers to Chrome failing to write the "clean shutdown" bit in Preferences (e.g., crash or hanged process killed). This is when you see the "Chrome didn't shutdown properly : Restore" bubble.
Here I think we should refer to "ungraceful shutdown" where by "graceful" I mean: unregistering things, releasing handles, tearing down threads, etc.
For example: observers of WillDestroyCurrentMessageLoop() doing things like removing file watcher registrations just so it doesn't post task to a null MessageLoop is clearly an instance of cleaning the carpets before burning down the building.


It's unclear to me what you're referring to when you write "here". Are you suggesting that we should move Chrome's clean shutdown process to be ungraceful in general, or are you saying that Android already goes through ungraceful shutdown, or are you referring to something else?
 
I do think that Peter's use case is valid. Saving state on shutdown *when possible* to save time on the next startup should always be done. This is why I propose a "teardown phase" where notifications are propagated to everything to pronounce their last wish before the process is terminated. 
It sounds like we may already have that in FlushPersistentData? Should we make this cross-platform, invoke it, and terminate process after?

(Related to the above questions) Are you suggesting that we should have a unified cross-platform clean "ungraceful shutdown" process that basically just issues fire-and-forget requests to save whatever state it can? The complication I see there is Peter and Antoine's point that they don't want to either limit performance optimizations or admit data loss on Windows just because of constraints on (e.g.) Android. It's unclear to me whether it's possible to maintain those properties without waiting for acks on various subsystems' shutdown processes. WDYT?

Mattias Nissler

unread,
Sep 8, 2016, 8:27:37 AM9/8/16
to Colin Blundell, Gabriel Charette, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Thu, Sep 8, 2016 at 2:08 PM, Colin Blundell <blun...@chromium.org> wrote:
Hi gab@,

Apologies in advance if I misunderstood something :). Questions inline.

On Wed, Sep 7, 2016 at 11:17 PM Gabriel Charette <g...@chromium.org> wrote:
A couple clarifications (replying here instead of doc to make sure everyone sees) :

A) I'm not sure how profile shutdown came to the discussion but AFAIK we never shutdown profiles (we would like to but naked Profile* are all over the place and if you close all windows of a profile, it's not actually teared down until browser shutdown).

+Mattias Nissler raised this point: "We need to be able shut down a profile cleanly when the user closes the last window, so that part of the clean shutdown path must be kept functional (at least as long as we support multi-profile, IIRC it was a bit of work to get it functional when multi-profile was first introduced)." Mattias, any comments here?

I remember that there was quite a bit of cleanup work on Profile stuff (also related to shutdown) when multi-profile was first introduced, hence my reply. That was a long while ago though, and I'm not at all familiar with the current state of things. I had assumed that the cleanup work had proceeded to the point where we can get rid of Profile instances at run time (which IIRC was the goal back then), but Gab is totally right in that this is not true (I've just checked ProfileManager, it never removes a Profile once the Profile is fully initialized). That still leaves the question whether the current status quo is the state we should be in or whether our goal is to eventually be able to tear down Profiles correctly after the user closes them.

Gabriel Charette

unread,
Sep 8, 2016, 9:40:34 AM9/8/16
to Mattias Nissler, Colin Blundell, Gabriel Charette, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Another clarification on something I previously said:

C) I incorrectly swapped my own definition of shutdown vs teardown above... The "shutdown phase" is the synchronous one-pass pronounce-your-last-wish (but don't cleanup anything, beyond perhaps invalidating weak ptrs) phase. The "teardown phase" is the one where things are gracefully unwound (and is the one I think we can get rid of on all platforms).

On Thu, Sep 8, 2016 at 8:26 AM Mattias Nissler <mnis...@chromium.org> wrote:
On Thu, Sep 8, 2016 at 2:08 PM, Colin Blundell <blun...@chromium.org> wrote:
Hi gab@,

Apologies in advance if I misunderstood something :). Questions inline.

On Wed, Sep 7, 2016 at 11:17 PM Gabriel Charette <g...@chromium.org> wrote:
A couple clarifications (replying here instead of doc to make sure everyone sees) :

A) I'm not sure how profile shutdown came to the discussion but AFAIK we never shutdown profiles (we would like to but naked Profile* are all over the place and if you close all windows of a profile, it's not actually teared down until browser shutdown).

+Mattias Nissler raised this point: "We need to be able shut down a profile cleanly when the user closes the last window, so that part of the clean shutdown path must be kept functional (at least as long as we support multi-profile, IIRC it was a bit of work to get it functional when multi-profile was first introduced)." Mattias, any comments here?

I remember that there was quite a bit of cleanup work on Profile stuff (also related to shutdown) when multi-profile was first introduced, hence my reply. That was a long while ago though, and I'm not at all familiar with the current state of things. I had assumed that the cleanup work had proceeded to the point where we can get rid of Profile instances at run time (which IIRC was the goal back then), but Gab is totally right in that this is not true (I've just checked ProfileManager, it never removes a Profile once the Profile is fully initialized). That still leaves the question whether the current status quo is the state we should be in or whether our goal is to eventually be able to tear down Profiles correctly after the user closes them.

I think the team tried it initially but it was crashing too much, so they shipped without support for tearing down Profiles. I agree that it would be preferable if it was possible to tear down an unused Profile at run-time, but I think there were a couple of attempts at this and no one ever got it working. I thus wouldn't want to block better shutdown (which is a way more impactful problem) on this.
 
 

B) The term "unclean shutdown" typically refers to Chrome failing to write the "clean shutdown" bit in Preferences (e.g., crash or hanged process killed). This is when you see the "Chrome didn't shutdown properly : Restore" bubble.
Here I think we should refer to "ungraceful shutdown" where by "graceful" I mean: unregistering things, releasing handles, tearing down threads, etc.
For example: observers of WillDestroyCurrentMessageLoop() doing things like removing file watcher registrations just so it doesn't post task to a null MessageLoop is clearly an instance of cleaning the carpets before burning down the building.


It's unclear to me what you're referring to when you write "here". Are you suggesting that we should move Chrome's clean shutdown process to be ungraceful in general, or are you saying that Android already goes through ungraceful shutdown, or are you referring to something else?

By "here" I mean : in this conversation (i.e., it should be titled "Should we stop supporting graceful shutdown in Chrome" instead of "clean shutdown").
 
 
I do think that Peter's use case is valid. Saving state on shutdown *when possible* to save time on the next startup should always be done. This is why I propose a "teardown phase" where notifications are propagated to everything to pronounce their last wish before the process is terminated. 
It sounds like we may already have that in FlushPersistentData? Should we make this cross-platform, invoke it, and terminate process after?

(Related to the above questions) Are you suggesting that we should have a unified cross-platform clean "ungraceful shutdown" process that basically just issues fire-and-forget requests to save whatever state it can? The complication I see there is Peter and Antoine's point that they don't want to either limit performance optimizations or admit data loss on Windows just because of constraints on (e.g.) Android. It's unclear to me whether it's possible to maintain those properties without waiting for acks on various subsystems' shutdown processes. WDYT?

Yes, except that I don't think it should be fire-and-forget. On platforms that can afford it, it should be a synchronous call (just like the synchronous call to shutdown KeyedServices). That addresses concerns raised IMO.

Mattias Nissler

unread,
Sep 8, 2016, 11:57:48 AM9/8/16
to Gabriel Charette, Colin Blundell, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Thu, Sep 8, 2016 at 3:39 PM, Gabriel Charette <g...@chromium.org> wrote:
Another clarification on something I previously said:

C) I incorrectly swapped my own definition of shutdown vs teardown above... The "shutdown phase" is the synchronous one-pass pronounce-your-last-wish (but don't cleanup anything, beyond perhaps invalidating weak ptrs) phase. The "teardown phase" is the one where things are gracefully unwound (and is the one I think we can get rid of on all platforms).

On Thu, Sep 8, 2016 at 8:26 AM Mattias Nissler <mnis...@chromium.org> wrote:
On Thu, Sep 8, 2016 at 2:08 PM, Colin Blundell <blun...@chromium.org> wrote:
Hi gab@,

Apologies in advance if I misunderstood something :). Questions inline.

On Wed, Sep 7, 2016 at 11:17 PM Gabriel Charette <g...@chromium.org> wrote:
A couple clarifications (replying here instead of doc to make sure everyone sees) :

A) I'm not sure how profile shutdown came to the discussion but AFAIK we never shutdown profiles (we would like to but naked Profile* are all over the place and if you close all windows of a profile, it's not actually teared down until browser shutdown).

+Mattias Nissler raised this point: "We need to be able shut down a profile cleanly when the user closes the last window, so that part of the clean shutdown path must be kept functional (at least as long as we support multi-profile, IIRC it was a bit of work to get it functional when multi-profile was first introduced)." Mattias, any comments here?

I remember that there was quite a bit of cleanup work on Profile stuff (also related to shutdown) when multi-profile was first introduced, hence my reply. That was a long while ago though, and I'm not at all familiar with the current state of things. I had assumed that the cleanup work had proceeded to the point where we can get rid of Profile instances at run time (which IIRC was the goal back then), but Gab is totally right in that this is not true (I've just checked ProfileManager, it never removes a Profile once the Profile is fully initialized). That still leaves the question whether the current status quo is the state we should be in or whether our goal is to eventually be able to tear down Profiles correctly after the user closes them.

I think the team tried it initially but it was crashing too much, so they shipped without support for tearing down Profiles. I agree that it would be preferable if it was possible to tear down an unused Profile at run-time, but I think there were a couple of attempts at this and no one ever got it working. I thus wouldn't want to block better shutdown (which is a way more impactful problem) on this.

Definitely agree. If we're still interested in being able to shut down Profiles gracefully at run-time though, this implies we'd have to continue to maintain clean shutdown code for Profile, which removes quite a bit of the "getting rid of unnecessary code" motivation.

Torne (Richard Coles)

unread,
Sep 8, 2016, 12:00:34 PM9/8/16
to Mattias Nissler, Gabriel Charette, Colin Blundell, Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
It depends how gracefully we really care about shutting them down, I think. We could probably get as far as "the profile's extension background pages and their associated renderers are all gone" (which isn't actually the case right now sadly) without *really* having to actually delete the Profile object.

It's not necessarily worth the effort to get back to the exact amount of resources allocated that were used before the profile was opened; we can probably get a lot of it by just shutting off more easily controllable things.

Colin Blundell

unread,
Sep 9, 2016, 8:45:50 AM9/9/16
to Gabriel Charette, Mattias Nissler, Colin Blundell, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, bruce...@chromium.org, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Thu, Sep 8, 2016 at 3:39 PM Gabriel Charette <g...@chromium.org> wrote:
Another clarification on something I previously said:

C) I incorrectly swapped my own definition of shutdown vs teardown above... The "shutdown phase" is the synchronous one-pass pronounce-your-last-wish (but don't cleanup anything, beyond perhaps invalidating weak ptrs) phase. The "teardown phase" is the one where things are gracefully unwound (and is the one I think we can get rid of on all platforms).

Ah, now it's all clicking into place :). I think that's part of why I was confused earlier.
 

On Thu, Sep 8, 2016 at 8:26 AM Mattias Nissler <mnis...@chromium.org> wrote:
On Thu, Sep 8, 2016 at 2:08 PM, Colin Blundell <blun...@chromium.org> wrote:
Hi gab@,

Apologies in advance if I misunderstood something :). Questions inline.

On Wed, Sep 7, 2016 at 11:17 PM Gabriel Charette <g...@chromium.org> wrote:
A couple clarifications (replying here instead of doc to make sure everyone sees) :

A) I'm not sure how profile shutdown came to the discussion but AFAIK we never shutdown profiles (we would like to but naked Profile* are all over the place and if you close all windows of a profile, it's not actually teared down until browser shutdown).

+Mattias Nissler raised this point: "We need to be able shut down a profile cleanly when the user closes the last window, so that part of the clean shutdown path must be kept functional (at least as long as we support multi-profile, IIRC it was a bit of work to get it functional when multi-profile was first introduced)." Mattias, any comments here?

I remember that there was quite a bit of cleanup work on Profile stuff (also related to shutdown) when multi-profile was first introduced, hence my reply. That was a long while ago though, and I'm not at all familiar with the current state of things. I had assumed that the cleanup work had proceeded to the point where we can get rid of Profile instances at run time (which IIRC was the goal back then), but Gab is totally right in that this is not true (I've just checked ProfileManager, it never removes a Profile once the Profile is fully initialized). That still leaves the question whether the current status quo is the state we should be in or whether our goal is to eventually be able to tear down Profiles correctly after the user closes them.

I think the team tried it initially but it was crashing too much, so they shipped without support for tearing down Profiles. I agree that it would be preferable if it was possible to tear down an unused Profile at run-time, but I think there were a couple of attempts at this and no one ever got it working. I thus wouldn't want to block better shutdown (which is a way more impactful problem) on this.

Updated the doc to take this info into account.
 
 
 
 

B) The term "unclean shutdown" typically refers to Chrome failing to write the "clean shutdown" bit in Preferences (e.g., crash or hanged process killed). This is when you see the "Chrome didn't shutdown properly : Restore" bubble.
Here I think we should refer to "ungraceful shutdown" where by "graceful" I mean: unregistering things, releasing handles, tearing down threads, etc.
For example: observers of WillDestroyCurrentMessageLoop() doing things like removing file watcher registrations just so it doesn't post task to a null MessageLoop is clearly an instance of cleaning the carpets before burning down the building.


It's unclear to me what you're referring to when you write "here". Are you suggesting that we should move Chrome's clean shutdown process to be ungraceful in general, or are you saying that Android already goes through ungraceful shutdown, or are you referring to something else?

By "here" I mean : in this conversation (i.e., it should be titled "Should we stop supporting graceful shutdown in Chrome" instead of "clean shutdown").
 
 
I do think that Peter's use case is valid. Saving state on shutdown *when possible* to save time on the next startup should always be done. This is why I propose a "teardown phase" where notifications are propagated to everything to pronounce their last wish before the process is terminated. 
It sounds like we may already have that in FlushPersistentData? Should we make this cross-platform, invoke it, and terminate process after?

(Related to the above questions) Are you suggesting that we should have a unified cross-platform clean "ungraceful shutdown" process that basically just issues fire-and-forget requests to save whatever state it can? The complication I see there is Peter and Antoine's point that they don't want to either limit performance optimizations or admit data loss on Windows just because of constraints on (e.g.) Android. It's unclear to me whether it's possible to maintain those properties without waiting for acks on various subsystems' shutdown processes. WDYT?

Yes, except that I don't think it should be fire-and-forget. On platforms that can afford it, it should be a synchronous call (just like the synchronous call to shutdown KeyedServices). That addresses concerns raised IMO.

OK, I understand now. You're positing that the graceful teardown process falls under PK's "find unnecessary work done at shutdown and remove it", with supporting evidence being that every shutdown on Android is an ungraceful shutdown.

Am I understanding correctly that the implementation of FlushPersistentData() would be per-platform based on what the system affords us on that platform? i.e., it's not that we'd move Windows to Android's implementation of FlushPersistentData().

Do you have a sense of how much of the problems that you've previously described with shutdown on Windows would be eliminated by getting rid of the teardown phase?

Bruce Dawson

unread,
Sep 9, 2016, 2:12:59 PM9/9/16
to Colin Blundell, Gabriel Charette, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Relevant bug:

crbug.com/490716 - Consider calling TerminateProcess for sandboxed child processes

This came from a suggestion from Microsoft. They noticed a lot of I/O during shutdown of Chrome:
"From Abolade:
We’re looking at I/O interference issues for the Stay Healthy RTM push and had a question about Chrome tab closure.
 
We’re seeing that Chrome calling ExitProcess is triggering a bunch of interfering I/O that penalizes the incoming app launch. It seems on the surface that TerminateProcess would be better for system resource usage."


Later I received a trace from a Googler which showed a particularly severe variant of this issue:
"I received an ETW trace which showed a pathological example of this. 82 Chrome processes on a 12 GB machine that had previously been under additional memory pressure (2 GB was free when shutdown started) such that much data was paged out. Shutting down the browsers took about two minutes."


Details/context in the bug.

Gabriel Charette

unread,
Sep 9, 2016, 2:42:14 PM9/9/16
to Bruce Dawson, Colin Blundell, oj...@chromium.org, g...@chromium.org, Gabriel Charette, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Mike Pinkerton, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
We had a similar discussion with +Ojan Vafai last year on whether the web platform had to honor unload handlers during shutdown. Given it never does on mobile, is it worth honoring them on desktop (during shutdown)?

Indeed having to page-in renderers in low memory scenarios just to run their unload handlers is kind of silly. Especially given that for tab discarding (when we kill background tabs under memory pressure) doesn't invoke unload handlers (so it's already possible for them to not be invoked, even on desktop).

We probably still want to check the one that lets the page tell us if there's pending unsaved data (i.e. the one that generates a user popup when closing the tab), but we could perhaps check it when the tab is backgrounded and save the state in the browser process instead of having to page in the entire renderer just to shut it down much later.

Re. slowness and why it matters: probably related to the above, I remember +Greg Thompson showing me Chrome shutdown in Process Explorer while doing a chrome://restart and processes were slowly going away, one-by-one, taking minutes to shutdown (and thus to restart -- which is where it becomes a UX issue).

Mike Pinkerton

unread,
Sep 9, 2016, 2:55:46 PM9/9/16
to Gabriel Charette, Bruce Dawson, Colin Blundell, oj...@chromium.org, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
I agree this is a particular pain point for our users (paging renderers back in when closing tabs or quitting Chrome). I have watched a long-living Chrome take 5 minutes to Quit on her older, low-ram macbook. Given the points gab makes about mobile already not guaranteeing unload handlers, coupled with tab discarding, it seems silly to penalize our desktop users this way.

Do Apple and Microsoft make any such guarantees when tabs are closed?
--
Mike Pinkerton
Mac Weenie
pink...@google.com

Ojan Vafai

unread,
Sep 9, 2016, 3:38:05 PM9/9/16
to Mike Pinkerton, Gabriel Charette, Bruce Dawson, Colin Blundell, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Peter Kasting, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
Tried to process this whole thread. Sorry if I missed things.

Re: unload handlers.
These are deeply entrenched in the desktop web and somewhat entrenched on mobile. AFAIK, all desktop browsers are pretty consistent in their support for unload events. Unload handlers do fire on mobile, they're just even less reliable than on desktop.

Removing these is going to be take a long time. In particular, we don't yet have a reasonable replacement. We're working on it. But the bug fix needed there is complicated and relatively low priority. So it's making slow progress. The advantage of the replacement (visibilitychange) is that it only fires when you're navigating away from the foreground tab or when the foreground tab is backgrounded. So, we'd only have the same performance problem if the foreground tab is somehow paged out of memory when you try to close it, which seems exceedingly rare to me.

If anyone has interest in helping this project proceed faster, ping me offline and hopefully we can make it happen.

Re: having a teardown/shutdown sequence.
IMO, the benefits of having any teardown sequence are hugely outweighed by:
  1. It's very hard to properly police that you only do the correct set of things in the teardown and that you handle non-graceful shutdown correctly. It's pretty subtle.
  2. On all OSes we need to support the non-graceful shutdown codepath anyways. So if we can make it work as well as the graceful one, we've generally just made a better product.
  3. On many OSes, non-graceful shutdown is extremely common.
  4. As best I can tell, the only argument people buy for the need for optimistic teardown is Peter's use case. That use case does not seem sufficiently valuable to me to justify this. We should find a way to write that information to disk during idle time or something like that. Also, I don't think "please trust me that this large architectural burden is necessary for performance" is acceptable to stall progress without pointing to an UMA or other quantifiable justification. If it's critical for performance it should be trivial to turn the feature off for one day on canary and point to the regression. This question should be easy to answer.
I think where we've landed in Collin's doc is the wrong next steps given the very positive sentiment on this thread to at least minimizing and hopefully entirely killing graceful shutdown. I don't think we need more analysis/investigation.

The next steps should be: start removing code from the shutdown codepath with the ultimate goal of removing that codepath entirely. Each incremental step you take there makes the product a little bit better, so it has value independent of whether we succeed in the end with killing shutdown. It also allows examining each case independently, e.g. we can fix the omnibox code to not use shutdown and thus prove/disprove the performance concern.

Peter Kasting

unread,
Sep 9, 2016, 4:57:47 PM9/9/16
to Ojan Vafai, Mike Pinkerton, Gabriel Charette, Bruce Dawson, Colin Blundell, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 9, 2016 at 12:37 PM, Ojan Vafai <oj...@chromium.org> wrote:
Tried to process this whole thread. Sorry if I missed things.

Re: unload handlers.
These are deeply entrenched in the desktop web and somewhat entrenched on mobile. AFAIK, all desktop browsers are pretty consistent in their support for unload events. Unload handlers do fire on mobile, they're just even less reliable than on desktop.

Removing these is going to be take a long time. In particular, we don't yet have a reasonable replacement. We're working on it. But the bug fix needed there is complicated and relatively low priority. So it's making slow progress. The advantage of the replacement (visibilitychange) is that it only fires when you're navigating away from the foreground tab or when the foreground tab is backgrounded. So, we'd only have the same performance problem if the foreground tab is somehow paged out of memory when you try to close it, which seems exceedingly rare to me.

If anyone has interest in helping this project proceed faster, ping me offline and hopefully we can make it happen.

Perhaps fixing this is higher-priority than trying to address shutdown, since this hits tab closure and not just all-browser-closure?  And tons of people have been complaining about unload handlers resulting in not-fast-killable tabs for years, whereas slow shutdown, while a real problem, has not risen to nearly the same level of noise.

  1. As best I can tell, the only argument people buy for the need for optimistic teardown is Peter's use case. That use case does not seem sufficiently valuable to me to justify this. We should find a way to write that information to disk during idle time or something like that.
We're doing that already.

Also, I don't think "please trust me that this large architectural burden is necessary for performance" is acceptable to stall progress without pointing to an UMA or other quantifiable justification.

That's not what I was saying.  I don't want a large architectural burden imposed and I'm not interested in stalling progress.  I'm interested in not ratholing about a specific example case (instead of discussing the underlying point) and not making statements like "clearly all shutdown is stupid and we can just kill it all" when I don't believe anyone on this thread knows everything shutdown does. The IMO correct next step of "let's start ripping out what we can one thing at a time" avoid both those problems.
 
The next steps should be: start removing code from the shutdown codepath with the ultimate goal of removing that codepath entirely. Each incremental step you take there makes the product a little bit better, so it has value independent of whether we succeed in the end with killing shutdown. It also allows examining each case independently, e.g. we can fix the omnibox code to not use shutdown and thus prove/disprove the performance concern.

I don't understand how that's practically different than what Colin's doc suggests doing.  "Audit to figure out what's actually going on" is a necessary first step to removing code, as you can't remove what you don't understand.  Clearly, if there is code we do understand that we know isn't needed, it should just go away.  And the doc basically states the desired end state, where whatever shutdown we have is as minimal as possible.

If all you are saying is that you think this should be done parallel in pieces, then I agree, and I don't think Colin's doc implied otherwise.

PK

Ojan Vafai

unread,
Sep 9, 2016, 7:42:49 PM9/9/16
to Peter Kasting, Mike Pinkerton, Gabriel Charette, Bruce Dawson, Colin Blundell, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Fri, Sep 9, 2016 at 1:57 PM Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 9, 2016 at 12:37 PM, Ojan Vafai <oj...@chromium.org> wrote:
Tried to process this whole thread. Sorry if I missed things.

Re: unload handlers.
These are deeply entrenched in the desktop web and somewhat entrenched on mobile. AFAIK, all desktop browsers are pretty consistent in their support for unload events. Unload handlers do fire on mobile, they're just even less reliable than on desktop.

Removing these is going to be take a long time. In particular, we don't yet have a reasonable replacement. We're working on it. But the bug fix needed there is complicated and relatively low priority. So it's making slow progress. The advantage of the replacement (visibilitychange) is that it only fires when you're navigating away from the foreground tab or when the foreground tab is backgrounded. So, we'd only have the same performance problem if the foreground tab is somehow paged out of memory when you try to close it, which seems exceedingly rare to me.

If anyone has interest in helping this project proceed faster, ping me offline and hopefully we can make it happen.

Perhaps fixing this is higher-priority than trying to address shutdown, since this hits tab closure and not just all-browser-closure?  And tons of people have been complaining about unload handlers resulting in not-fast-killable tabs for years, whereas slow shutdown, while a real problem, has not risen to nearly the same level of noise.

Even once we fix it, it's going to be a slow, long process to get sites to switch over to it. It won't have the immediate benefit of getting rid of shutdown code. In either case, it's not an either or problem. It's different people that would take on each of these things.
  1. As best I can tell, the only argument people buy for the need for optimistic teardown is Peter's use case. That use case does not seem sufficiently valuable to me to justify this. We should find a way to write that information to disk during idle time or something like that.
We're doing that already.

Also, I don't think "please trust me that this large architectural burden is necessary for performance" is acceptable to stall progress without pointing to an UMA or other quantifiable justification.

That's not what I was saying.  I don't want a large architectural burden imposed and I'm not interested in stalling progress.  I'm interested in not ratholing about a specific example case (instead of discussing the underlying point) and not making statements like "clearly all shutdown is stupid and we can just kill it all" when I don't believe anyone on this thread knows everything shutdown does. The IMO correct next step of "let's start ripping out what we can one thing at a time" avoid both those problems.

I see. That's not what I took away from this thread. It's also not what I took away from the conclusion in Collin's doc. "Audit" does not imply action. We could audit, decide we need some of the shutdown code and not actually make any changes.

But that's fine. Communicating over email is hard. It sounds like we agree.

Ripping out what we can one thing at a time makes sense to me. I didn't imagine a world in which we blindly just deleted all the code. :) Removing individual things from the shutdown codepath is valuable even if we can't remove everything.

I think I'm just optimistic that whatever is left at the end won't provide enough value to justify keeping the shutdown codepath around and that we'll be able to find other solutions that are good enough.

Gabriel Charette

unread,
Sep 12, 2016, 9:42:19 AM9/12/16
to Ojan Vafai, Peter Kasting, Mike Pinkerton, Gabriel Charette, Bruce Dawson, Colin Blundell, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
As a first step in that direction, the upcoming TaskScheduler will default tasks to SKIP_ON_SHUTDOWN instead of the current SequencedWorkerPool default of BLOCK_SHUTDOWN (this will be broadcasted clearly when we officially kick off the transition to the new world -- not quite there yet ;-)).

PS: We'd like to even default to CONTINUE_ON_SHUTDOWN, but haven't made up our mind on whether that's safe. It's at least no worse than requiring that all tasks be resilient to crashes/eager termination. Well I'm lying a bit... on process termination there can't use-after-free whereas CONTINUE_ON_SHUTDOWN means you might still be running after static uninitialization, e.g. when Singletons are gone, getting rid of graceful shutdown and always proceeding with eager termination would solve that :-). We could perhaps start by calling TerminateProcess from AtExitManager's dtor (or probably just attempt to remove AtExitManager and its registrants and replace it by TerminateProcess) and then move termination earlier and earlier as things are stripped?

Colin Blundell

unread,
Sep 12, 2016, 9:42:37 AM9/12/16
to Ojan Vafai, Peter Kasting, Mike Pinkerton, Gabriel Charette, Bruce Dawson, Colin Blundell, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
On Sat, Sep 10, 2016 at 1:42 AM Ojan Vafai <oj...@chromium.org> wrote:
On Fri, Sep 9, 2016 at 1:57 PM Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Sep 9, 2016 at 12:37 PM, Ojan Vafai <oj...@chromium.org> wrote:
Tried to process this whole thread. Sorry if I missed things.

Re: unload handlers.
These are deeply entrenched in the desktop web and somewhat entrenched on mobile. AFAIK, all desktop browsers are pretty consistent in their support for unload events. Unload handlers do fire on mobile, they're just even less reliable than on desktop.

Removing these is going to be take a long time. In particular, we don't yet have a reasonable replacement. We're working on it. But the bug fix needed there is complicated and relatively low priority. So it's making slow progress. The advantage of the replacement (visibilitychange) is that it only fires when you're navigating away from the foreground tab or when the foreground tab is backgrounded. So, we'd only have the same performance problem if the foreground tab is somehow paged out of memory when you try to close it, which seems exceedingly rare to me.

If anyone has interest in helping this project proceed faster, ping me offline and hopefully we can make it happen.

Perhaps fixing this is higher-priority than trying to address shutdown, since this hits tab closure and not just all-browser-closure?  And tons of people have been complaining about unload handlers resulting in not-fast-killable tabs for years, whereas slow shutdown, while a real problem, has not risen to nearly the same level of noise.

Even once we fix it, it's going to be a slow, long process to get sites to switch over to it. It won't have the immediate benefit of getting rid of shutdown code. In either case, it's not an either or problem. It's different people that would take on each of these things.
  1. As best I can tell, the only argument people buy for the need for optimistic teardown is Peter's use case. That use case does not seem sufficiently valuable to me to justify this. We should find a way to write that information to disk during idle time or something like that.
We're doing that already.

Also, I don't think "please trust me that this large architectural burden is necessary for performance" is acceptable to stall progress without pointing to an UMA or other quantifiable justification.

That's not what I was saying.  I don't want a large architectural burden imposed and I'm not interested in stalling progress.  I'm interested in not ratholing about a specific example case (instead of discussing the underlying point) and not making statements like "clearly all shutdown is stupid and we can just kill it all" when I don't believe anyone on this thread knows everything shutdown does. The IMO correct next step of "let's start ripping out what we can one thing at a time" avoid both those problems.

I see. That's not what I took away from this thread. It's also not what I took away from the conclusion in Collin's doc. "Audit" does not imply action. We could audit, decide we need some of the shutdown code and not actually make any changes.

But that's fine. Communicating over email is hard. It sounds like we agree.

Ripping out what we can one thing at a time makes sense to me. I didn't imagine a world in which we blindly just deleted all the code. :) Removing individual things from the shutdown codepath is valuable even if we can't remove everything.

I think I'm just optimistic that whatever is left at the end won't provide enough value to justify keeping the shutdown codepath around and that we'll be able to find other solutions that are good enough.

Yes, we are all in agreement here, but I also understand why the wording at the end of my doc could have come off very sequential (understand all the work -> agree on what the right global endpoint is -> move toward that endpoint). I augmented the text to make it clear that we can and should pipeline removing unnecessary work with auditing.

Gabriel Charette

unread,
Sep 12, 2016, 9:49:04 AM9/12/16
to Gabriel Charette, Ojan Vafai, Peter Kasting, Mike Pinkerton, Bruce Dawson, Colin Blundell, g...@chromium.org, Mattias Nissler, Torne (Richard Coles), Ryo Hashimoto, Alexei Svitkine, Marshall Greenblatt, chromium-dev, Scott Violet, Primiano Tucci, dfalc...@chromium.org, Sigurður Ásgeirsson, Robert Liao, fdo...@chromium.org
I agree with Peter that making it possible to get rid of unload handlers is the highest priority here (other things can still happen in parallel). While the complexity of shutdown isn't there IMO, the slowness of shutdown is (at least the slowest shutdowns I've seen are always when processes slowly tick away one by one).

This makes me think of something. We de-prioritize (both at OS process priority level and internal Chrome scheduling priority) background renderers at run-time, are they re-prioritized on shutdown?! If not we're effectively blocking on a backgrounded process to do work, that's a priority inversion.. :-O!
Reply all
Reply to author
Forward
0 new messages