Re: Issue 551116 in chromium: chrome crash during dark resume leaves zombie processes, reparented to init, which makes new chrome instance unusable.

1,318 views
Skip to first unread message

chro...@googlecode.com

unread,
Nov 4, 2015, 4:52:56 AM11/4/15
to chromi...@chromium.org
Updates:
Owner: osh...@chromium.org
Cc: w...@chromium.org mnis...@chromium.org
Labels: -Type-Bug Type-Bug-Security Cr-OS-Systems Cr-OS-Kernel

Comment #1 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Here's an interesting section from the logs:

2015-11-03T11:06:30.944702-08:00 WARNING session_manager[1281]:
[WARNING:browser_job.cc(127)] Aborting child process 1302's process group 3
seconds after sending signal
2015-11-03T11:06:30.944754-08:00 INFO session_manager[1281]:
[INFO:browser_job.cc(111)] Terminating process group: Browser took more
than 3 seconds to exit after signal.
2015-11-03T11:06:30.944794-08:00 INFO session_manager[1281]:
[INFO:system_utils_impl.cc(49)] Sending 6 to -1302 as 1000
2015-11-03T11:06:30.944961-08:00 ERR session_manager[1281]:
[ERROR:session_manager_service.cc(261)] Choosing to end session rather than
restart browser.
2015-11-03T11:06:30.946350-08:00 INFO session_manager[1281]:
[INFO:session_manager_service.cc(455)] SessionManagerService quitting run
loop
2015-11-03T11:06:30.946625-08:00 INFO session_manager[1281]:
[INFO:session_manager_service.cc(187)] SessionManagerService exiting
2015-11-03T11:06:30.954674-08:00 INFO session_manager[1281]:
[INFO:policy_service.cc(188)] Persisted policy to disk.
2015-11-03T11:06:30.959233-08:00 INFO session_manager[1281]:
[INFO:policy_service.cc(188)] Persisted policy to disk.
2015-11-03T11:06:30.960530-08:00 WARNING session_manager[1281]:
[WARNING:session_manager_main.cc(218)] session_manager exiting with code 1
2015-11-03T11:06:30.962822-08:00 WARNING kernel: [ 33.891764] init: ui
main process (1281) terminated with status 1
2015-11-03T11:06:30.975817-08:00 WARNING kernel: [ 33.905122] init:
debugd main process (1301) killed by TERM signal
2015-11-03T11:06:30.977464-08:00 NOTICE ui-respawn[8568]: ui failed with
exit status 1.
2015-11-03T11:06:30.985472-08:00 NOTICE ui-respawn[8579]: Respawning ui.
2015-11-03T11:06:31.003660-08:00 INFO chapsd[1339]: Token at
/home/root/b1bed674d692cd10ee1ab6f50d8b840101a70bee/chaps has been removed
from slot 1
2015-11-03T11:06:31.032941-08:00 INFO chapsd[1339]: Unloaded keys for slot 1
2015-11-03T11:06:31.033784-08:00 INFO kernel: [ 33.962461] tpm_tis
tpm_tis: command 0xba (size 18) returned code 0x0
2015-11-03T11:06:31.089718-08:00 CRIT ui-unkillable[8590]: 1308 S
[chrome]
2015-11-03T11:06:31.089736-08:00 CRIT ui-unkillable[8590]: 6987 Dl
/opt/google/chrome/chrome --type=renderer --enable-logging --log-level=1
--use-gl=egl
--vmodule=screen_locker=1,webui_screen_locker=1,*ui/display/chromeos*=1,*ash/display*=1,*ui/ozone*=1,*zygote*=1,*plugin*=2
--lang=en-US
--force-fieldtrials=*UMA-Population-Restrict/normal/*UMA-Uniformity-Trial-100-Percent/group_01/*UMA-Uniformity-Trial-20-Percent/group_03/*UMA-Uniformity-Trial-50-Percent/group_01/
--enable-crash-reporter=45CA9D16-0282-425A-AC90-FED895401BC9
--user-data-dir=/home/chronos
--homedir=/home/chronos/u-b1bed674d692cd10ee1ab6f50d8b840101a70bee
--extension-process --enable-webrtc-hw-h264-encoding --login-profile=user
--enable-offline-auto-reload --enable-offline-auto-reload-visible-only
--ppapi-flash-args=enable_hw_video_decode=1
--ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so
--ppapi-flash-version=19.0.0.225-r2 --enable-pinch --num-raster-threads=2
--content-image-texture-target=3553,3553,3553,3553,3553,3553,3553,3553,3553,3553,3553,3553,3553
--video-image-texture-targe
2015-11-03T11:06:31.089759-08:00 CRIT ui-unkillable[8590]: t=3553
--channel=1302.7.782101347 --v8-natives-passed-by-fd
--v8-snapshot-passed-by-fd
2015-11-03T11:06:31.089762-08:00 CRIT ui-unkillable[8590]: 6994 Dl
/opt/google/chrome/chrome --type=renderer --enable-logging --log-level=1
--use-gl=egl
--vmodule=screen_locker=1,webui_screen_locker=1,*ui/display/chromeos*=1,*ash/display*=1,*ui/ozone*=1,*zygote*=1,*plugin*=2
--lang=en-US
--force-fieldtrials=*UMA-Population-Restrict/normal/*UMA-Uniformity-Trial-100-Percent/group_01/*UMA-Uniformity-Trial-20-Percent/group_03/*UMA-Uniformity-Trial-50-Percent/group_01/
--enable-crash-reporter=45CA9D16-0282-425A-AC90-FED895401BC9
--user-data-dir=/home/chronos
--homedir=/home/chronos/u-b1bed674d692cd10ee1ab6f50d8b840101a70bee
--extension-process --enable-webrtc-hw-h264-encoding --login-profile=user
--enable-offline-auto-reload --enable-offline-auto-reload-visible-only
--ppapi-flash-args=enable_hw_video_decode=1
--ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so
--ppapi-flash-version=19.0.0.225-r2 --enable-pinch --num-raster-threads=2
--content-image-texture-target=3553,3553,3553,3553,3553,3553,3553,3553,3553,3553,3553,3553,3553
--video-image-texture-targe

[Further lines similar to the last one omitted due to bug tracker size
limit]

These look to renderer processes which we failed to kill as root... that
sounds pretty bad, and a potential security risk. The code that attempts
the kills is here FWIW:
https://cs.corp.google.com/#chromeos_public/src/platform2/login_manager/init/ui.conf&l=113

The zombies getting reparented to init is a consequence of session_manager
exiting. This is expected after killing the processes failed.

oshima@, if I'm not mistaken you have a repro in dev mode. Can you check
what error the kernel returns when you kill -9 one of these processes? Can
you check which kernel resources these renderer processes still hold on to?
lsof and /proc contents for these processes hopefully can provide hints to
what's keeping these alive.

Also adding CC'ing the kernel folks, in case "unkillable processes" rings a
bell with them.

I'm afraid I can't do much more without repro steps. Assigning back to
oshima@ to gather additional information per the comments above.

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

chro...@googlecode.com

unread,
Nov 4, 2015, 9:46:27 AM11/4/15
to chromi...@chromium.org
Updates:
Owner: mnis...@chromium.org
Labels: Needs-Feedback

Comment #2 on issue 551116 by osh...@chromium.org: chrome crash during dark
resume leaves zombie processes, reparented to init, which makes new chrome
instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

There is no exact repro step, but I will try to repro again and let you
login to the device so that you can look into it.

chro...@googlecode.com

unread,
Nov 4, 2015, 9:51:31 AM11/4/15
to chromi...@chromium.org

Comment #3 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

If there are no exact steps, can you document whatever you have done to get
a repro in this bug? Also, what device model is most likely to reproduce
this?

chro...@googlecode.com

unread,
Nov 4, 2015, 10:17:02 AM11/4/15
to chromi...@chromium.org

Comment #4 on issue 551116 by osh...@chromium.org: chrome crash during dark
resume leaves zombie processes, reparented to init, which makes new chrome
instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

This can happen after the crash described in crbug.com/537836.

1) revert the patch in crbug.com/537836 (or install older image)
2) If you're using test image, you need to enable dark resume
(crbug.com/537836#c54)

then

3) crbug.com/537836#c46

Not all device supports dark resume. I've been using Samus.

chro...@googlecode.com

unread,
Nov 5, 2015, 7:56:21 AM11/5/15
to chromi...@chromium.org

Comment #5 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Here's some good and bad news: The good is that I'm manage to repro on
7520.17.0 (Official Build) dev-channel samus in dev mode. The bad news is
that after opening the lid, all I get is a black screen which the device
doesn't recover from. In particular, I haven't managed to get a shell to
take a closer look.

chro...@googlecode.com

unread,
Nov 6, 2015, 6:21:02 AM11/6/15
to chromi...@chromium.org

Comment #7 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Recording some notes regarding reproduction to permanent memory:

1. When the screen goes black, that is actually because of powerd deciding
to shutdown after a failed exit from dark resume. Tell-tale message from
the powerd log:

[1106/114021:INFO:daemon.cc(1554)] Shutting down, reason:
exit-dark-resume-failed

To prevent this from happening, I've successfully wrapped
/usr/bin/powerd_setuid_helper to only call the actual powerd_setuid_helper
when it's not passed --action=shut_down.

2. Once powerd has been tricked into not shutting down, the system is still
inaccessible (black screen, no network). To verify this is indeed the case,
one can short-press the power button. If the device is powered down, this
will boot the device. If it's still on, nothing should happen.

3. In order to get access to the system, one can disable dark resume logic
for the USB controller, this has enabled me to re-gain SSH access after a
failed dark resume exit:

find /sys -name dark_resume_active | grep usb | while read name; do echo
-n 'disabled' > $name; done

chro...@googlecode.com

unread,
Nov 6, 2015, 6:51:03 AM11/6/15
to chromi...@chromium.org

Comment #8 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Here is some data for one of the processes that get reparented and are now
unkillable:

root@localhost 28543 # cat wchan
__refrigerator

root@localhost 28543 # cat status
Name: chrome
State: D (disk sleep)
Tgid: 28543
Ngid: 0
Pid: 28543
PPid: 27592
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 64
Groups: 18 27 208 220 222 240 403 1000 1001
VmPeak: 1100904 kB
VmSize: 1085972 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 159736 kB
VmRSS: 137452 kB
VmData: 801736 kB
VmStk: 136 kB
VmExe: 95136 kB
VmLib: 46952 kB
VmPTE: 836 kB
VmSwap: 0 kB
Threads: 12
SigQ: 51/127676
SigPnd: 0000000000000100
ShdPnd: 0000000000004120
SigBlk: 0000000000000000
SigIgn: 0000000000001002
SigCgt: 00000001c0014eed
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 2
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 1
Mems_allowed_list: 0
voluntary_ctxt_switches: 217
nonvoluntary_ctxt_switches: 1830

root@localhost 28543 # cat stat
28543 (chrome) D 27592 27586 27586 0 -1 4284480 57058 0 0 0 97 13 0 0 20 0
12 0 301655 1112035328 34363 18446744073709551615 139920078184448
139920175602032 140723187844384 140723187842464 139920063808846 256 0 4098
1073827565 18446744073709551615 0 0 17 2 0 0 0 0 0 139920175609536
139920181329904 139920212705280 140723187845685 140723187846084
140723187846084 140723187847134 0

416040root@localhost 28543 # strace kill -9 28543
execve("/bin/kill", ["kill", "-9", "28543"], [/* 14 vars */]) = 0
brk(0) = 0x7fd050b81000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fd04f8a5000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25681, ...}) = 0
mmap(NULL, 25681, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd04f89e000
close(3) = 0
open("/lib64/libprocps.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"...,
832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=67960, ...}) = 0
mmap(NULL, 145856, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7fd04f87a000
mmap(0x7fd04f889000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0xe000) = 0x7fd04f889000
mmap(0x7fd04f88c000, 72128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x7fd04f88c000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\0\2\0\0\0\0\0"...,
832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1791720, ...}) = 0
mmap(NULL, 3900568, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7fd04f2cc000
mprotect(0x7fd04f47a000, 2097152, PROT_NONE) = 0
mmap(0x7fd04f67a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x1ae000) = 0x7fd04f67a000
mmap(0x7fd04f680000, 17560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x7fd04f680000
close(3) = 0
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\16\0\0\0\0\0\0"...,
832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=14440, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fd04f879000
mmap(NULL, 2109584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7fd04f0c8000
mprotect(0x7fd04f0cb000, 2093056, PROT_NONE) = 0
mmap(0x7fd04f2ca000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x2000) = 0x7fd04f2ca000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fd04f878000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fd04f877000
arch_prctl(ARCH_SET_FS, 0x7fd04f878700) = 0
mprotect(0x7fd04f67a000, 16384, PROT_READ) = 0
mprotect(0x7fd04f2ca000, 4096, PROT_READ) = 0
mprotect(0x7fd04f889000, 8192, PROT_READ) = 0
mprotect(0x7fd04f8ae000, 4096, PROT_READ) = 0
mprotect(0x7fd04f8a6000, 4096, PROT_READ) = 0
munmap(0x7fd04f89e000, 25681) = 0
uname({sys="Linux", node="localhost", ...}) = 0
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-3\n", 8192) = 4
close(3) = 0
getpid() = 32501
kill(28543, SIGKILL) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++


So the kill syscall succeeds, but the process remains. It's interesting
that wchan is __refrigerator, which indicates that the process has been
frozen for suspend. This is consistent with the process' flags field
evident in the stat file, it is 4284480 or 0x416040, thus the PFM_FROZEN
(0x10000) bit is set. This seems a plausible reason for the process to be
unkillable.

The prize question becomes how the process ended up in this state, in
particular why it didn't get unfrozen.

chro...@googlecode.com

unread,
Nov 6, 2015, 7:10:04 AM11/6/15
to chromi...@chromium.org

Comment #9 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Attaching the list of apparently stuck chrome processes, and the relevant
syslog and power_manager log excerpts, as well as a /var/log tarball.

Attachments:
stuck_chrome 31.9 KB
syslog 151 KB
powerd_log 11.1 KB
debug_logs.tgz 860 KB

chro...@googlecode.com

unread,
Nov 6, 2015, 7:17:05 AM11/6/15
to chromi...@chromium.org

Comment #10 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Here's some info for the chrome process that's the parent of all the stuck
renderers:

root@localhost 27592 # cat wchan
do_wait

root@localhost 27592 # cat status
Name: chrome
State: S (sleeping)
Tgid: 27592
Ngid: 0
Pid: 27592
PPid: 1
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 0
Groups: 18 27 208 220 222 240 403 1000 1001
Threads: 1
SigQ: 51/127676
SigPnd: 0000000000000000
ShdPnd: 0000000000000120
SigBlk: 0000000000000000
SigIgn: 0000000000011002
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 1
Mems_allowed_list: 0
voluntary_ctxt_switches: 3513
nonvoluntary_ctxt_switches: 3

root@localhost 27592 # cat stat
27592 (chrome) S 1 27586 27586 0 -1 4210948 7246 447686 0 0 2 14 1752 168
20 0 1 0 247090 0 0 18446744073709551615 0 0 0 0 0 0 0 69634 0
18446744071998535971 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0

root@localhost 27592 # strace kill -9 27592
execve("/bin/kill", ["kill", "-9", "27592"], [/* 14 vars */]) = 0
brk(0) = 0x7f0d002aa000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f0cfe8f5000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25681, ...}) = 0
mmap(NULL, 25681, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0cfe8ee000
close(3) = 0
open("/lib64/libprocps.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"...,
832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=67960, ...}) = 0
mmap(NULL, 145856, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7f0cfe8ca000
mmap(0x7f0cfe8d9000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0xe000) = 0x7f0cfe8d9000
mmap(0x7f0cfe8dc000, 72128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x7f0cfe8dc000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\0\2\0\0\0\0\0"...,
832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1791720, ...}) = 0
mmap(NULL, 3900568, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7f0cfe31c000
mprotect(0x7f0cfe4ca000, 2097152, PROT_NONE) = 0
mmap(0x7f0cfe6ca000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x1ae000) = 0x7f0cfe6ca000
mmap(0x7f0cfe6d0000, 17560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x7f0cfe6d0000
close(3) = 0
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\16\0\0\0\0\0\0"...,
832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=14440, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f0cfe8c9000
mmap(NULL, 2109584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7f0cfe118000
mprotect(0x7f0cfe11b000, 2093056, PROT_NONE) = 0
mmap(0x7f0cfe31a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x2000) = 0x7f0cfe31a000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f0cfe8c8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f0cfe8c7000
arch_prctl(ARCH_SET_FS, 0x7f0cfe8c8700) = 0
mprotect(0x7f0cfe6ca000, 16384, PROT_READ) = 0
mprotect(0x7f0cfe31a000, 4096, PROT_READ) = 0
mprotect(0x7f0cfe8d9000, 8192, PROT_READ) = 0
mprotect(0x7f0cfe8fe000, 4096, PROT_READ) = 0
mprotect(0x7f0cfe8f6000, 4096, PROT_READ) = 0
munmap(0x7f0cfe8ee000, 25681) = 0
uname({sys="Linux", node="localhost", ...}) = 0
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-3\n", 8192) = 4
close(3) = 0
getpid() = 648
kill(27592, SIGKILL) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++


Note that this process is not in frozen state, but still unkillable
(perhaps because it has PF_FROZEN child processes?). Notably, it's also not
in zombie state.


This process does show up in the kernel log with this however:

2015-11-06T11:40:21.189758+01:00 NOTICE kernel: [ 2970.982640] Freezing
user space processes ...
2015-11-06T11:40:21.189765+01:00 ERR kernel: [ 2990.998954] Freezing of
tasks failed after 20.000 seconds (1 tasks refusing to freeze, wq_busy=0):
2015-11-06T11:40:21.189770+01:00 INFO kernel: [ 2990.999000]
chrome S ffff88040ba90d60 0 27592 1 0x00000002
2015-11-06T11:40:21.189775+01:00 NOTICE kernel: [ 2990.999023]
ffff880468949d60 0000000000000046 ffff880468949fd8 ffff88040ba908a0
2015-11-06T11:40:21.189780+01:00 NOTICE kernel: [ 2990.999046]
0000000000012240 ffffffff9aa11500 ffff88040ba908a0 ffff880468949df0
2015-11-06T11:40:21.189786+01:00 NOTICE kernel: [ 2990.999068]
ffff88040ba908a0 ffff88040ba908a0 ffff88040ba908a0 ffff88040ba90890
2015-11-06T11:40:21.189790+01:00 NOTICE kernel: [ 2990.999091] Call Trace:
2015-11-06T11:40:21.189797+01:00 NOTICE kernel: [ 2990.999113]
[<ffffffff9a59cbb3>] schedule+0x6e/0x70
2015-11-06T11:40:21.189803+01:00 NOTICE kernel: [ 2990.999130]
[<ffffffff9a03f923>] do_wait+0x1c5/0x26b
2015-11-06T11:40:21.189808+01:00 NOTICE kernel: [ 2990.999145]
[<ffffffff9a03fa69>] SYSC_wait4+0xa0/0xcf
2015-11-06T11:40:21.189814+01:00 NOTICE kernel: [ 2990.999160]
[<ffffffff9a03dfee>] ? kill_orphaned_pgrp+0xc1/0xc1
2015-11-06T11:40:21.189819+01:00 NOTICE kernel: [ 2990.999177]
[<ffffffff9a03fdb1>] SyS_wait4+0xe/0x10
2015-11-06T11:40:21.189825+01:00 NOTICE kernel: [ 2990.999192]
[<ffffffff9a0a5299>] zap_pid_ns_processes+0xf1/0x15f
2015-11-06T11:40:21.189830+01:00 NOTICE kernel: [ 2990.999208]
[<ffffffff9a03e9a3>] do_exit+0x4c6/0x929
2015-11-06T11:40:21.189836+01:00 NOTICE kernel: [ 2990.999224]
[<ffffffff9a049d59>] ? do_sigaltstack+0x43/0x17a
2015-11-06T11:40:21.189841+01:00 NOTICE kernel: [ 2990.999240]
[<ffffffff9a03fb7a>] do_group_exit+0x42/0xb0
2015-11-06T11:40:21.189846+01:00 NOTICE kernel: [ 2990.999255]
[<ffffffff9a03fbfc>] SyS_exit_group+0x14/0x14
2015-11-06T11:40:21.189852+01:00 NOTICE kernel: [ 2990.999271]
[<ffffffff9a5a0292>] system_call_fastpath+0x16/0x1b
2015-11-06T11:40:21.189855+01:00 NOTICE kernel: [ 2990.999329]
2015-11-06T11:40:21.194661+01:00 NOTICE kernel: [ 2990.999335] Restarting
tasks ... done.

chro...@googlecode.com

unread,
Nov 6, 2015, 7:20:06 AM11/6/15
to chromi...@chromium.org
Updates:
Owner: ejcar...@chromium.org
Labels: -Needs-Feedback

Comment #11 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

I think at this point there's enough evidence for strange things happening
in the kernel regarding freezing/unfreezing processes. Passing this on to
ejcaruso@. I'll keep the samus device I've used for debugging in the state
it is for today in case there is interest go grab further information from
it.

chro...@googlecode.com

unread,
Nov 6, 2015, 2:18:47 PM11/6/15
to chromi...@chromium.org

Comment #12 on issue 551116 by ejcar...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

I think the fix for this is to thaw renderers if we need to kill them.
Unless you send SIGKILL then they won't run as long as they are frozen, so
they won't execute signal handlers.

chro...@googlecode.com

unread,
Nov 9, 2015, 3:24:04 AM11/9/15
to chromi...@chromium.org

Comment #13 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Note that we are actually sending SIGKILL:
https://chromium.googlesource.com/chromiumos/platform2/+/master/login_manager/init/ui.conf#112
(line 112).

So frozen renderers apparently even survive SIGKILL. Is that a separate bug
that needs fixing?

chro...@googlecode.com

unread,
Nov 9, 2015, 4:08:06 PM11/9/15
to chromi...@chromium.org

Comment #18 on issue 551116 by ejcar...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

That seems way more disruptive to the user than trying to restart the UI,
though. If we do want to do that, the unkillable process check is still
around.

chro...@googlecode.com

unread,
Nov 9, 2015, 4:28:07 PM11/9/15
to chromi...@chromium.org

Comment #19 on issue 551116 by osh...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

The problem is that it made chrome unusable. see crbug.com/548474.

chro...@googlecode.com

unread,
Nov 10, 2015, 5:01:59 AM11/10/15
to chromi...@chromium.org

Comment #20 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Re comments #16-#19: From what I gather, the symptoms described in issue
548474 seem what I'd expect to happen when Chrome ends up waking with
frozen renderer processes. That's a different issue from this one though,
which is about renderer processes staying around after Chrome gets
terminated (via a crash).

Bottom line: I don't see how rebooting upon finding frozen renderers would
help solve issue 548474.

chro...@googlecode.com

unread,
Nov 10, 2015, 5:34:15 AM11/10/15
to chromi...@chromium.org

Comment #21 on issue 551116 by osh...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

IIRC, there were more than renderer process in this state. UI restart
didn't solve this as it doesn't clean up these processes, but rebooting
does.

chro...@googlecode.com

unread,
Nov 10, 2015, 5:49:16 AM11/10/15
to chromi...@chromium.org

Comment #22 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

I haven't observed any other processes in frozen state (except for the
parents of frozen renderers, and I've verified the proposed change cleans
these up as well).

Note that the proposed change isn't merely restarting UI, but fixing the
cleanup code to deal with frozen renderers correctly.

As mentioned above, the situation you saw in the original bug suggests that
renderers didn't get thawed on resume, which is a different problem from
that we're dealing with here.

chro...@googlecode.com

unread,
Nov 11, 2015, 6:46:09 PM11/11/15
to chromi...@chromium.org

Comment #23 on issue 551116 by bugd...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116#c23

The following revision refers to this bug:

https://chromium.googlesource.com/chromiumos/platform2/+/404bc3f9bb3c4066355a702328076cb2022085a9

commit 404bc3f9bb3c4066355a702328076cb2022085a9
Author: Eric Caruso <ejca...@chromium.org>
Date: Fri Nov 06 19:44:04 2015

login: thaw renderers before killing processes

Chrome renderers are frozen in some circumstances, such as during
dark resume. If Chrome crashes during this time, the renderers will
be left in the freezer, and we won't be able to kill them when we
try to restart the UI. This leaves renderers orphaned.

BUG=chromium:551116
TEST=run as root:
# echo FROZEN > \
/sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state
# ps aux | grep renderer
# restart ui
# ps aux | grep renderer
and make sure the first set of renderers is not hanging around.

Change-Id: I04ec026472b85d4bcd841e6bf9b645996156bf2a
Reviewed-on: https://chromium-review.googlesource.com/311309
Commit-Ready: Eric Caruso <ejca...@chromium.org>
Tested-by: Mattias Nissler <mnis...@chromium.org>
Reviewed-by: Mattias Nissler <mnis...@chromium.org>

[modify]
http://crrev.com/404bc3f9bb3c4066355a702328076cb2022085a9/login_manager/init/ui.conf

chro...@googlecode.com

unread,
Nov 12, 2015, 2:36:31 PM11/12/15
to chromi...@chromium.org

Comment #27 on issue 551116 by osh...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

#26, I see, thank you for clarification. It's much better if logout/login
fixes the issue anyway.

Just out of curiosity, I believe it is the session manager that detects
crash and restart the chrome. Isn't it possible for the session manager to
do the same thing before restarting chrome?

chro...@googlecode.com

unread,
Nov 13, 2015, 4:34:22 AM11/13/15
to chromi...@chromium.org

Comment #28 on issue 551116 by mnis...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Re comment #27: The shutdown / ui restart code paths are already shared.
However, I just realized that there might be a code path where the browser
crashes *before* it hasn't thawed its renderers, and session_manager
doesn't decide to restart the entire ui (returning to the login screen),
but just restart Chrome. Frozen Chrome renderers would then remain until
session_manager exits the session and the ui jobs restart. I have upload a
CL that demonstrates this: https://codereview.chromium.org/1437393002. Note
that with this CL applied, I get exactly the behavior described in issue
548474: Chrome takes a long time to come back up after the crash at resume,
and page loads spin forever but don't render anything.

I'll file a separate bug for that scenario though, as the one covered in
this bug (i.e. frozen renderers that are reparented to init and don't get
cleaned up) has been fixed by ejcaruso's CL. I've verified that even with
my crash-before-thaw CL applied, the frozen renderers get cleaned up on
stopping the ui job.

chro...@googlecode.com

unread,
Mar 8, 2016, 12:58:20 AM3/8/16
to chromi...@chromium.org

Comment #31 on issue 551116 by osh...@chromium.org: chrome crash during
dark resume leaves zombie processes, reparented to init, which makes new
chrome instance unusable.
https://code.google.com/p/chromium/issues/detail?id=551116

Issue chrome-os-partner:49676 has been merged into this issue.
Reply all
Reply to author
Forward
0 new messages