Centos7/Chrome hang with possible tcmalloc deadlock

313 views
Skip to first unread message

Gary

unread,
Sep 14, 2016, 1:06:50 PM9/14/16
to Chromium-dev
Hi all,

I'm trying to debug a hanging google-chrome-stable-52.0.2743.116-1.x86_64 instance on Centos 7.

The chrome instance is started by the JS karma-runner which runs some tests and then attempts to stop the instance by sending a SIGTERM to which it does not respond. 

The ps auxf output at this point looks like this:

root      27695  0.0  0.0  82552   284 ?        Ss   Aug10   0:00 /usr/sbin/sshd -D
root     106123  0.0  0.1 138708  4300 ?        Ss   Sep11   0:00  \_ sshd: jenkins1 [priv]
jenkins1 106127  0.0  0.0 139452  2140 ?        S    Sep11   0:09  |   \_ sshd: jenkins1@notty
jenkins1 106907  0.0  0.0 115264   516 ?        Ss   Sep11   0:00  |       \_ bash -c cd "/home/jenkins1" && java  -Xmx384m -XX:MaxPermSize=164m -jar slave.jar
jenkins1 107098  0.5  2.8 2481572 104012 ?      Sl   Sep11  10:08  |           \_ java -Xmx384m -XX:MaxPermSize=164m -jar slave.jar
jenkins1  83295  8.1  6.2 3631416 223300 ?      Sl   14:04   0:07  |               \_ /home/jenkins1/tools/hudson.model.JDK/JDK_8u45/jdk1.8.0_45/bin/java -XX:MaxPermSize=164m -classpath /home/jenkins1/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.3/apache-maven-3.3.3/boot/plexus-classworlds-2.5.2.jar -Dclassworlds.conf=/home/jenkins1/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.3/apache-maven-3.3.3/bin/m2.conf -Dmaven.home=/home/jenkins1/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.3/apache-maven-3.3.3 -Dmaven.multiModuleProjectDirectory=/home/jenkins1/workspace/app org.codehaus.plexus.classworlds.launcher.Launcher clean deploy -U -P ci-build
jenkins1  83349  3.2  3.2 1034540 115424 ?      Sl   14:04   0:02  |                   \_ gulp
jenkins1  83360  2.1  2.6 752468 95112 ?        Sl   14:04   0:01  |                       \_ node /home/jenkins1/workspace/app/gulp-karma/lib/background.js {"configFile":"/home/jenkins1/workspace/app/karma.conf.js","singleRun":true,"autoWatch":false,"files":[... test files...]}
jenkins1  83369  0.7  1.9 673740 70228 ?        Sl   14:04   0:00  |                           \_ /opt/google/chrome/chrome --user-data-dir=/tmp/karma-62258560 --no-default-browser-check --no-first-run --disable-default-apps --disable-popup-blocking --disable-translate --enable-logging --v=1 http://localhost:8123/?id=62258560
jenkins1  83374  0.0  0.0 107916   368 ?        S    14:04   0:00  |                               \_ cat
jenkins1  83375  0.0  0.0 107916   624 ?        S    14:04   0:00  |                               \_ cat
jenkins1  83377  0.0  0.0   6464   396 ?        S    14:04   0:00  |                               \_ /opt/google/chrome/chrome-sandbox /opt/google/chrome/chrome --type=zygote --user-data-dir=/tmp/karma-62258560
jenkins1  83381  0.0  0.8 479484 29672 ?        S    14:04   0:00  |                               |   \_ /opt/google/chrome/chrome --type=zygote --user-data-dir=/tmp/karma-62258560
jenkins1  83386  0.0  0.0   6460   392 ?        S    14:04   0:00  |                               |       \_ /opt/google/chrome/chrome-sandbox /opt/google/chrome/nacl_helper
jenkins1  83387  0.0  0.1 167620  5036 ?        S    14:04   0:00  |                               |       |   \_ /opt/google/chrome/nacl_helper
jenkins1  83392  0.0  0.2 479484  8520 ?        S    14:04   0:00  |                               |       \_ /opt/google/chrome/chrome --type=zygote --user-data-dir=/tmp/karma-62258560
jenkins1  83443  0.4  0.3 649724 13808 ?        S    14:04   0:00  |                               \_ /opt/google/chrome/chrome --user-data-dir=/tmp/karma-62258560 --no-default-browser-check --no-first-run --disable-default-apps --disable-popup-blocking --disable-translate --enable-logging --v=1 http://localhost:8123/?id=62258560

Sending a kill to any process in the "83369 chrome tree" other than 83443 does not stop the instance.

Running strace -p 83443 shows the following:

futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 6616088}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 9736216}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 2015608}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 16258970}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 12841268}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 8227232}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 2492878}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 894992}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 8013670}) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f186d436870, FUTEX_WAIT_PRIVATE, 2, {0, 2006804}) = -1 ETIMEDOUT (Connection timed out)
... repeated ...

And running gdb -p 83443 shows the following:

<...snip...>
Loaded symbols for /lib64/libnssckbi.so
Reading symbols from /lib64/libtasn1.so.6...Reading symbols from /lib64/libtasn1.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libtasn1.so.6
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38              cmpq $-4095, %rax       /* Check %rax for error.  */
Missing separate debuginfos, use: debuginfo-install google-chrome-stable-52.0.2743.116-1.x86_64
(gdb) bt
Python Exception <type 'exceptions.RuntimeError'> Cannot locate object file for block.:
#0  0x00007f53b7ed0999 in syscall#1  0x00007f53bf33ddaf in base::internal::SpinLockDelay(int volatile*, int, int) ()
#2  0x00000000008256e6 in  ()
#3  0x00007f53c4714870 in tcmalloc::Static::pageheap_lock_ ()
#4  0x0000000000000002 in  ()
#5  0x0025449a33b319b2 in  ()
#6  0x0000000000003d55 in  ()
#7  0x00007f53c4707150 in AtomicOps_Internalx86CPUFeatures ()
#8  0x00007f53bf33dc4f in SpinLock::SlowLock() ()
#9  0x00007f53c4714870 in tcmalloc::Static::pageheap_lock_ ()
#10 0x0000000000000009 in  ()
#11 0x0000000200000000 in  ()
#12 0x0000000000009000 in  ()
#13 0x00003c3477963f40 in  ()
#14 0x0000000000000044 in  ()
#15 0x00007f53bf34effb in  ()
#16 0x00007f53c47152e0 in tcmalloc::Static::central_cache_ ()
#17 0x00003c3477ffb350 in  ()
#18 0x00000003fa9df9a6 in  ()
Python Exception <type 'exceptions.OverflowError'> long too big to convert:
#19 0xffffffffffffffff in  ()#20 0x00003c3477ffb350 in  ()
#21 0x00007f53bf340368 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) ()
#22 0x00007f53b8199440 in _IO_strn_jumps () at /lib64/libc.so.6
#23 0x0000000000000000 in  ()
(gdb)

I've only skimmed the tcmalloc code here (https://github.com/gperftools/gperftools/blob/master/src/static_vars.h#L90) but it appears that the thread is attempting to acquire the pageheap_ lock_ twice which seems would likely lead to deadlock.

I also managed to capture the shutdown tracing information as per (http://peter.sh/experiments/chromium-command-line-switches/#trace-shutdown) but that does not reveal anything obvious.

At this point I'm stuck as I can't find debug symbols for Chrome on Linux to further decode the backtrace. The only issue I can find in the Chromium tracker that appears to be somewhat related to what I'm seeing is https://codereview.chromium.org/1715903002/ but I'm not sure on the status of that.

Can anyone help with debugging this further or point me in the right direction?

Thanks.

Gary

unread,
Sep 15, 2016, 11:59:31 AM9/15/16
to Chromium-dev
To re-iterate the final part of my question, can anyone help with debugging this further or point me in the right direction?

Or should I just open a bug?

Thanks.

Robert Kroeger

unread,
Sep 15, 2016, 4:14:42 PM9/15/16
to gary.du...@gmail.com, Chromium-dev


On Thursday, September 15, 2016, Gary <gary.du...@gmail.com> wrote:
To re-iterate the final part of my question, can anyone help with debugging this further or point me in the right direction?

Or should I just open a bug?

Seems worthwhile as is (if nothing else) a prerequisite to submitting a fix and a good reference.
You have built Chromium in debug from source? A debug build should have symbols.

Rob.
 

Can anyone help with debugging this further or point me in the right direction?

Thanks.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Gary

unread,
Sep 15, 2016, 6:21:28 PM9/15/16
to Chromium-dev
Hi Roger,

Thanks for your reply. Will gather as much info as I can and submit a bug for this.

As for debug symbols, I'm not using a version of Chromium I built myself. I'm using the Google Chrome RPM (http://dl.google.com/linux/chrome/rpm/stable/x86_84)

One last question. Any idea how I can tell which version of TC malloc Chrome is using?

Thanks again.
Reply all
Reply to author
Forward
0 new messages