Hi all,
I'm trying to debug a hanging google-chrome-stable-52.0.2743.116-1.x86_64 instance on Centos 7.
The chrome instance is started by the JS
karma-runner which runs some tests and then attempts to stop the instance by sending a SIGTERM to which it does not respond.
The ps auxf output at this point looks like this:
root 27695 0.0 0.0 82552 284 ? Ss Aug10 0:00 /usr/sbin/sshd -D
root 106123 0.0 0.1 138708 4300 ? Ss Sep11 0:00 \_ sshd: jenkins1 [priv]
jenkins1 106127 0.0 0.0 139452 2140 ? S Sep11 0:09 | \_ sshd: jenkins1@notty
jenkins1 106907 0.0 0.0 115264 516 ? Ss Sep11 0:00 | \_ bash -c cd "/home/jenkins1" && java -Xmx384m -XX:MaxPermSize=164m -jar slave.jar
jenkins1 107098 0.5 2.8 2481572 104012 ? Sl Sep11 10:08 | \_ java -Xmx384m -XX:MaxPermSize=164m -jar slave.jar
jenkins1 83295 8.1 6.2 3631416 223300 ? Sl 14:04 0:07 | \_ /home/jenkins1/tools/hudson.model.JDK/JDK_8u45/jdk1.8.0_45/bin/java -XX:MaxPermSize=164m -classpath /home/jenkins1/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.3/apache-maven-3.3.3/boot/plexus-classworlds-2.5.2.jar -Dclassworlds.conf=/home/jenkins1/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.3/apache-maven-3.3.3/bin/m2.conf -Dmaven.home=/home/jenkins1/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.3/apache-maven-3.3.3 -Dmaven.multiModuleProjectDirectory=/home/jenkins1/workspace/app org.codehaus.plexus.classworlds.launcher.Launcher clean deploy -U -P ci-build
jenkins1 83349 3.2 3.2 1034540 115424 ? Sl 14:04 0:02 | \_ gulp
jenkins1 83360 2.1 2.6 752468 95112 ? Sl 14:04 0:01 | \_ node /home/jenkins1/workspace/app/gulp-karma/lib/background.js {"configFile":"/home/jenkins1/workspace/app/karma.conf.js","singleRun":true,"autoWatch":false,"files":[... test files...]}
jenkins1 83369 0.7 1.9 673740 70228 ? Sl 14:04 0:00 | \_ /opt/google/chrome/chrome --user-data-dir=/tmp/karma-62258560 --no-default-browser-check --no-first-run --disable-default-apps --disable-popup-blocking --disable-translate --enable-logging --v=1 http://localhost:8123/?id=62258560 jenkins1 83374 0.0 0.0 107916 368 ? S 14:04 0:00 | \_ cat
jenkins1 83375 0.0 0.0 107916 624 ? S 14:04 0:00 | \_ cat
jenkins1 83377 0.0 0.0 6464 396 ? S 14:04 0:00 | \_ /opt/google/chrome/chrome-sandbox /opt/google/chrome/chrome --type=zygote --user-data-dir=/tmp/karma-62258560
jenkins1 83381 0.0 0.8 479484 29672 ? S 14:04 0:00 | | \_ /opt/google/chrome/chrome --type=zygote --user-data-dir=/tmp/karma-62258560
jenkins1 83386 0.0 0.0 6460 392 ? S 14:04 0:00 | | \_ /opt/google/chrome/chrome-sandbox /opt/google/chrome/nacl_helper
jenkins1 83387 0.0 0.1 167620 5036 ? S 14:04 0:00 | | | \_ /opt/google/chrome/nacl_helper
jenkins1 83392 0.0 0.2 479484 8520 ? S 14:04 0:00 | | \_ /opt/google/chrome/chrome --type=zygote --user-data-dir=/tmp/karma-62258560
jenkins1 83443 0.4 0.3 649724 13808 ? S 14:04 0:00 | \_ /opt/google/chrome/chrome --user-data-dir=/tmp/karma-62258560 --no-default-browser-check --no-first-run --disable-default-apps --disable-popup-blocking --disable-translate --enable-logging --v=1 http://localhost:8123/?id=62258560
Sending a kill to any process in the "83369 chrome tree" other than 83443 does not stop the instance.
Running strace -p 83443 shows the following:
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 6616088}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 9736216}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 2015608}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 16258970}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 12841268}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 8227232}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 2492878}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 894992}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 8013670}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 2006804}) = -1 ETIMEDOUT (Connection timed out)
... repeated ...
And running gdb -p 83443 shows the following:
<...snip...>
Loaded symbols for /lib64/libnssckbi.so
Reading symbols from /lib64/libtasn1.so.6...Reading symbols from /lib64/libtasn1.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libtasn1.so.6
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 cmpq $-4095, %rax /* Check %rax for error. */
Missing separate debuginfos, use: debuginfo-install google-chrome-stable-52.0.2743.116-1.x86_64
(gdb) bt
Python Exception <type 'exceptions.RuntimeError'> Cannot locate object file for block.:
#0 0x00007f53b7ed0999 in syscall#1 0x00007f53bf33ddaf in base::internal::SpinLockDelay(int volatile*, int, int) ()
#2 0x00000000008256e6 in ()
#3 0x00007f53c4714870 in tcmalloc::Static::pageheap_lock_ ()
#4 0x0000000000000002 in ()
#5 0x0025449a33b319b2 in ()
#6 0x0000000000003d55 in ()
#7 0x00007f53c4707150 in AtomicOps_Internalx86CPUFeatures ()
#8 0x00007f53bf33dc4f in SpinLock::SlowLock() ()
#9 0x00007f53c4714870 in tcmalloc::Static::pageheap_lock_ ()
#10 0x0000000000000009 in ()
#11 0x0000000200000000 in ()
#12 0x0000000000009000 in ()
#13 0x00003c3477963f40 in ()
#14 0x0000000000000044 in ()
#15 0x00007f53bf34effb in ()
#16 0x00007f53c47152e0 in tcmalloc::Static::central_cache_ ()
#17 0x00003c3477ffb350 in ()
#18 0x00000003fa9df9a6 in ()
Python Exception <type 'exceptions.OverflowError'> long too big to convert:
#19 0xffffffffffffffff in ()#20 0x00003c3477ffb350 in ()
#21 0x00007f53bf340368 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) ()
#22 0x00007f53b8199440 in _IO_strn_jumps () at /lib64/libc.so.6
#23 0x0000000000000000 in ()
(gdb)
At this point I'm stuck as I can't find debug symbols for Chrome on Linux to further decode the backtrace. The only issue I can find in the Chromium tracker that appears to be somewhat related to what I'm seeing is
https://codereview.chromium.org/1715903002/ but I'm not sure on the status of that.
Can anyone help with debugging this further or point me in the right direction?
Thanks.