Stop killing chrome...what did it ever do to you?

273 views
Skip to first unread message

David Moore

unread,
Aug 31, 2010, 10:49:39 AM8/31/10
to Chromium OS dev
I've been researching a couple of problems related to shutdown. 
  1. Chrome frequently starts up indicating that it hadn't shut down cleanly
  2. We have a frequent crash that only happens if the X server goes away
I've found a few things that I believe are the cause
  1. A bug in the session_manager that can cause it to send a SIGABRT to chrome (it's only supposed to do that if chrome hasn't exited within 3 seconds of the SIGTERM it sends at shutdown).
  2. chrome attempts to handle the SIGTERM by shutting down cleanly. But it resets its handler to the default once it has received one. That will cause it to exit when it processes a subsequent one.
  3. ui.conf sends up to 10 SIGTERMs to chrome, spaced .1 seconds apart while the session_manager is trying to terminate chrome cleanly. This is because its post-stop script is executed in parallel to the SIGTERM that is sent to the session_manager. I believe this is intended by upstart, but unexpected by us.
  4. That same bug will cause the X server to be killed after those 10 SIGTERMS. If chrome hadn't gone away yet, it will crash.
  5. We can also send up to 10 SIGTERMs to chrome from chromes_shutdown. I'm not sure yet exactly when these happen wrt ui.conf.
With everyone gunning for chrome, it's no surprise that it rarely starts cleanly. I have some thoughts about what to do, but I'd like some advice.
  1. I'll fix the bug in session_manager to only send the SIGABRT if 3 seconds have gone by without chrome terminating.
  2. I could change the post-stop script in ui.conf to wait until the session_manager is gone or some timeout is reached before doing the rest of its processing. I was surprised upstart didn't provide some general mechanism for doing this, as cleanup of a service should probably come after it dies. Am I misunderstanding how this is supposed to work? Can anyone suggest alternatives?
  3. The reason for all the SIGTERMs from ui.conf and chromeos_shutdown are because we're trying to kill all processes with files open on cryptohome before unmounting it. As this is happening in parallel to other processes being shut down it seems over aggressive. ui.conf does it do handle logout, which happens when chrome exits cleanly or in any way during screenlock. Maybe we can differentiate that from shutdown and leave it to chromeos_shutdown in that case. But even then I think processes are going to be killed while they're in the middle of shutdown. Does anyone know for sure exactly when chromeos_shutdown gets executed wrt the other jobs being stopped? I believe there are several other open bugs related to this.
Dave

Vince Laviano

unread,
Aug 31, 2010, 2:15:17 PM8/31/10
to David Moore, Chromium OS dev
On Tue, Aug 31, 2010 at 7:49 AM, David Moore <dave...@chromium.org> wrote:
I've been researching a couple of problems related to shutdown. 
  1. Chrome frequently starts up indicating that it hadn't shut down cleanly
  2. We have a frequent crash that only happens if the X server goes away
I've found a few things that I believe are the cause
  1. A bug in the session_manager that can cause it to send a SIGABRT to chrome (it's only supposed to do that if chrome hasn't exited within 3 seconds of the SIGTERM it sends at shutdown).
  2. chrome attempts to handle the SIGTERM by shutting down cleanly. But it resets its handler to the default once it has received one. That will cause it to exit when it processes a subsequent one.
  3. ui.conf sends up to 10 SIGTERMs to chrome, spaced .1 seconds apart while the session_manager is trying to terminate chrome cleanly. This is because its post-stop script is executed in parallel to the SIGTERM that is sent to the session_manager. I believe this is intended by upstart, but unexpected by us.
  4. That same bug will cause the X server to be killed after those 10 SIGTERMS. If chrome hadn't gone away yet, it will crash.
  5. We can also send up to 10 SIGTERMs to chrome from chromes_shutdown. I'm not sure yet exactly when these happen wrt ui.conf.
With everyone gunning for chrome, it's no surprise that it rarely starts cleanly. I have some thoughts about what to do, but I'd like some advice.
  1. I'll fix the bug in session_manager to only send the SIGABRT if 3 seconds have gone by without chrome terminating.
This sounds good.
  1. I could change the post-stop script in ui.conf to wait until the session_manager is gone or some timeout is reached before doing the rest of its processing. I was surprised upstart didn't provide some general mechanism for doing this, as cleanup of a service should probably come after it dies. Am I misunderstanding how this is supposed to work? Can anyone suggest alternatives?
  2. The reason for all the SIGTERMs from ui.conf and chromeos_shutdown are because we're trying to kill all processes with files open on cryptohome before unmounting it. As this is happening in parallel to other processes being shut down it seems over aggressive. ui.conf does it do handle logout, which happens when chrome exits cleanly or in any way during screenlock. Maybe we can differentiate that from shutdown and leave it to chromeos_shutdown in that case. But even then I think processes are going to be killed while they're in the middle of shutdown. Does anyone know for sure exactly when chromeos_shutdown gets executed wrt the other jobs being stopped? I believe there are several other open bugs related to this.
Dave

How about modifying Chrome's SIGTERM handler so that, after it initiates a graceful shutdown, it ignores subsequent SIGTERMs rather than installing the default handler?

--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-os-dev?hl=en

David Moore

unread,
Aug 31, 2010, 2:19:43 PM8/31/10
to Vince Laviano, Chromium OS dev
Right address this time.

On Tue, Aug 31, 2010 at 11:18 AM, David Moore <dave...@google.com> wrote:
I've tried that, and it works (in that chrome doesn't terminate early via the subsequent sigterms). But I need to research exactly why it was set up this way in the first place. Also, it feels wrong to have up to 21 sigterms being fired at chrome. And it's possible we're killing other things that we shouldn't be, in a random order.

Dave

Frank Swiderski

unread,
Sep 1, 2010, 2:12:04 PM9/1/10
to David Moore, Vince Laviano, Chromium OS dev
CL headed your way for cryptohome.  There was still legacy code there that terminated processes with open files when Unmount() was called.  While it's unlikely to be the culprit (it would mean someone is calling Unmount before they should be), it just shouldn't be there.

fes

David Moore

unread,
Sep 3, 2010, 11:39:48 AM9/3/10
to Frank Swiderski, Vince Laviano, Chromium OS dev
I found one of the problems...ui.conf wasn't execing session_manager, just running it from a script. This lead to the pid of the wrong process being registered as the pid of the ui job (it got the parent process). So it would get term'd causing session_manager to term, causing chrome to term and start shutting down cleanly. But because upstart had the wrong pid it fired the post-stop script in ui.conf right away, causing additional terms to be sent to chrome, as it was one of the processes with files open on cryptohome.

I've landed a change to ui.conf to fix this.

Dave
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages