Debugging deadlock/futex issues in pthead library.

692 views
Skip to first unread message

Shashank

unread,
Dec 29, 2011, 1:54:07 PM12/29/11
to android-porting
We are facing some random issues with futex while porting ICS on our
devices.

Issue description:
---------------------

We are seeing random futex deadlock issues on ICS with different
processes on Android.

One of the example is, we run a perl script that install/uninstall
multiple apks on device using adb. As install process adb daemon
(adbd) on the device forks another child process to handle this new
install request. In the failure case I saw that this new adbd child
process was sleeping in the futex queue, because of which adb request
on host was just waiting for reply back. This scenario is very random.

In another case we saw dexpot getting stuck randomly during boot up
while running a malloc call. From stack trace it again turned out it
was sleeping in the futex queue (malloc -> dlmalloc -> futex).


Questions:
--------------

1) Has anyone seen such issue?
2) How folks at Google generally debug pthead/futex code in libc
library under bionic?

Thanks!
--Shashank

Glenn Kasten

unread,
Dec 29, 2011, 5:25:37 PM12/29/11
to android-porting
1. I'm not aware of any.

2. A few things to try:
- can you repeat the bug with same software running on another
(released) device with different chipset?
This would help show whether the bug is likely portable or specific
to this device/chipset.
- try a different kernel version
- try disabling all but 1 cpu; if the problem goes away it may be
related to SMP / missing barrier
- worst case, write a low-level stress test just for this one issue

Gabriel Beddingfield

unread,
Dec 29, 2011, 5:47:17 PM12/29/11
to android-porting


On Dec 29, 12:54 pm, Shashank <shankmit...@gmail.com> wrote:
> One of the example is,  we run a perl script that install/uninstall
> multiple apks on device using adb. As install process adb daemon
> (adbd) on the device forks another child process to handle this new
> install request. In the failure case I saw that this new adbd child
> process was sleeping in the futex queue, because of which adb request
> on host was just waiting for reply back. This scenario is very random.
[snip]

This sounds like bionic's fork() bug. There's currently a fix in the
master branch of bionic:

commit 177ba8cb42ed6d232e7c8bcad5e6ee21fc51a0e8
Author: Rabin Vincent <rabin....@stericsson.com>
Date: Fri Apr 8 08:50:48 2011 +0200

Prevent deadlock when using fork

When forking of a new process in bionic, it is critical that it
does not allocate any memory according to the comment in
java_lang_ProcessManager.c:
"Note: We cannot malloc() or free() after this point!
A no-longer-running thread may be holding on to the heap lock, and
an attempt to malloc() or free() would result in deadlock."
However, as fork is using standard lib calls when tracing it a
bit,
they might allocate memory, and thus causing the deadlock.
This is a rewrite so that the function cpuacct_add, that fork
calls,
will use system calls instead of standard lib calls.

Signed-off-by: christian bejram <christia...@stericsson.com>

Change-Id: Iff22ea6b424ce9f9bf0ac8e9c76593f689e0cc86

-gabriel

Arun Joseph

unread,
Jan 3, 2012, 3:56:17 AM1/3/12
to android-porting
Hi Shashank,

I have seen such a rare deadlock issue with the bionic pthreads on an
BeagleBone like platform.
I am not sure whether this is the issue happening here. This issue was
ARCHITECTURE specific.

Details of the issue
----------------------------
1) Android Boots Up- System Server starts
2) WindowManager Service (parent) creates a vibrator thread(child)
3) Parent Thread see that the child thread has created successfully
4) Parent Thread waits for child thread to start execute and signal
parent.
5) Infinite wait happens

This boot time hang was occurring once in 20 boots.

Solution
------------
The real cause to the problem was an incorrect implementation of
sched_clock in the clock source initialization code (arch specific).

sched_clock() is a "weak" function defined in kernel/time/
clock_source.c.
The weak implementation is based on the value of jiffies and it is
less accurate.
sched_clock() serves as the provider of time to different time keeping
APIs inside the kernel.
The default sched_clock function can be overridden by the architecture
specific one.

This issue is resolved by a correct implementation of sched_clock in
the clock source initialization code.
Another observation was PRINTK timings were incorrect before this fix.

Regards,
Arun

On Dec 30 2011, 3:47 am, Gabriel Beddingfield <gabrb...@gmail.com>
wrote:
> On Dec 29, 12:54 pm, Shashank <shankmit...@gmail.com> wrote:> One of the example is,  we run a perl script that install/uninstall
> > multiple apks on device using adb. As install process adb daemon
> > (adbd) on the device forks another child process to handle this new
> > install request. In the failure case I saw that this new adbd child
> > process was sleeping in the futex queue, because of which adb request
> > on host was just waiting for reply back. This scenario is very random.
>
> [snip]
>
> This sounds like bionic's fork() bug.  There's currently a fix in the
> master branch of bionic:
>
> commit 177ba8cb42ed6d232e7c8bcad5e6ee21fc51a0e8
> Author: Rabin Vincent <rabin.vinc...@stericsson.com>
> Date:   Fri Apr 8 08:50:48 2011 +0200
>
>     Prevent deadlock when using fork
>
>     When forking of a new process in bionic, it is critical that it
>     does not allocate any memory according to the comment in
>     java_lang_ProcessManager.c:
>     "Note: We cannot malloc() or free() after this point!
>     A no-longer-running thread may be holding on to the heap lock, and
>     an attempt to malloc() or free() would result in deadlock."
>     However, as fork is using standard lib calls when tracing it a
> bit,
>     they might allocate memory, and thus causing the deadlock.
>     This is a rewrite so that the function cpuacct_add, that fork
> calls,
>     will use system calls instead of standard lib calls.
>
>     Signed-off-by: christian bejram <christian.bej...@stericsson.com>
>
>     Change-Id: Iff22ea6b424ce9f9bf0ac8e9c76593f689e0cc86
>
> -gabriel

Ilya Kulakov

unread,
Feb 26, 2015, 7:34:46 AM2/26/15
to android...@googlegroups.com
Hi Arun,

Is there a patch anywhere for that fix?

вторник, 3 января 2012 г., 15:56:17 UTC+7 пользователь Arun Joseph написал:
Reply all
Reply to author
Forward
0 new messages