Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

possible bug sighting - exit hang

1 view
Skip to first unread message

Hui Huang

unread,
Nov 15, 2001, 2:20:11 PM11/15/01
to
Greetings!

While investigating a JDK bug, I noticed a problem in
glibc/linuxthreads.
Basically, a process may hang on exit if a thread is doing memory
allocation
at the same time. Here is the test code:

--------------------------------------------------------
#include <stdio.h>
#include <pthread.h>

void * start_routine(void * arg)
{
printf("child thread pid=%d\n", getpid());
while(1) malloc(100);
return NULL;
}

int main(void)
{
pthread_t tid;
printf("main thread pid=%d\n", getpid());
pthread_create(&tid, NULL, start_routine, NULL);
return 0;
}
--------------------------------------------------------

The problemetic code is in pthread.c, pthread_onexit_process
(glibc-2.2.4):

793 if (self == __pthread_main_thread)
794 {
795 waitpid(__pthread_manager_thread.p_pid, NULL, __WCLONE);
796 free (__pthread_manager_thread_bos);
797 __pthread_manager_thread_bos = __pthread_manager_thread_tos =
NULL;
798 }

Line 796 needs to grab malloc lock and it may hang if one of the threads
is killed when it's still holding the lock. The test code works fine
with
line 796 commented out. I don't see why you need to free the manager
thread
stack, the process is gone, there is no memory leak to worry about.

regards,
-hui

Wolfram Gloger

unread,
Nov 19, 2001, 10:40:34 AM11/19/01
to
Hello,

> While investigating a JDK bug, I noticed a problem in
> glibc/linuxthreads.

Having reread pthreads documentation, I think you're right, this is a
problem with asynchronous thread termination in LinuxThreads in the
special case of 'falling off the end of main()', _however_...

> Basically, a process may hang on exit if a thread is doing memory
> allocation
> at the same time. Here is the test code:
>
> --------------------------------------------------------
> #include <stdio.h>
> #include <pthread.h>
>
> void * start_routine(void * arg)
> {
> printf("child thread pid=%d\n", getpid());
> while(1) malloc(100);
> return NULL;
> }
>
> int main(void)
> {
> pthread_t tid;
> printf("main thread pid=%d\n", getpid());
> pthread_create(&tid, NULL, start_routine, NULL);
> return 0;
> }

.. if this is really the stripped-down version of the Java code I'd
suggest that you consider either creating the thread detached or
joining with it. Falling off the end of main lets all threads that
are still running just 'evaporate' at some unspecified time with
absolutely no control over their resources.

I believe this is _very_ rarely what you want.

> The problemetic code is in pthread.c, pthread_onexit_process
> (glibc-2.2.4):

..


> Line 796 needs to grab malloc lock and it may hang if one of the threads
> is killed when it's still holding the lock. The test code works fine
> with
> line 796 commented out. I don't see why you need to free the manager
> thread
> stack, the process is gone, there is no memory leak to worry about.

I disagree, this should be fixed in another way. There are other
locks such as file locks that are potentially affected in this rare
case. I think we should perform the equivalent of
pthread_atfork_prepare() in the main thread when falling of the end of
main(). Then all present and future libc locks would be kept and the
remaining threads could be terminated safely. I'll prepare a patch if
no-one else beats me to it.

Regards,
Wolfram.

Hui Huang

unread,
Nov 20, 2001, 3:31:04 PM11/20/01
to
Wolfram-

> ... if this is really the stripped-down version of the Java code I'd


> suggest that you consider either creating the thread detached or
> joining with it.

Thanks for the suggestion. No, it's not a stripped-down version of VM
code. In fact, it is a native thread created by user program that is
allocating memory when the VM shuts down. We control the shutdown path
for all threads that are known to the VM, but can't do anything for
native threads.

Falling off the end of main lets all threads that
> are still running just 'evaporate' at some unspecified time with
> absolutely no control over their resources.

I absolutely agree here.

>
> I believe this is _very_ rarely what you want.
>
> > The problemetic code is in pthread.c, pthread_onexit_process
> > (glibc-2.2.4):

> ...


> > Line 796 needs to grab malloc lock and it may hang if one of the threads
> > is killed when it's still holding the lock. The test code works fine
> > with
> > line 796 commented out. I don't see why you need to free the manager
> > thread
> > stack, the process is gone, there is no memory leak to worry about.
>
> I disagree, this should be fixed in another way. There are other
> locks such as file locks that are potentially affected in this rare
> case. I think we should perform the equivalent of
> pthread_atfork_prepare() in the main thread when falling of the end of
> main(). Then all present and future libc locks would be kept and the
> remaining threads could be terminated safely. I'll prepare a patch if
> no-one else beats me to it.

Good to know there is a way to prevent that. You might also want to look
at the handling of exit(). That could cause hang too. Probably you
should
grab libc locks before sending REQ_PROCESS_EXIT to the manager thread.
Just some random thoughts.

regards,
-hui

Wolfram Gloger

unread,
Nov 22, 2001, 4:08:45 PM11/22/01
to
Ok, I take it back -- partially. Changing the 'atfork' handlers so
one can distinguish system- and user handlers seems too complicated.
(I have produced such a patch but it is very long and introduces a new
global function __pthread_atfork_system() for use in malloc.) I think
the following is preferable. With this,
pthread_kill_other_threads_np() still precludes any later use of
free(), but I don't care too much about that case...

With this patch, I can run Hui Huang's sample without hangs.

2001-11-22 Wolfram Gloger <w...@malloc.de>

* pthread.c (pthread_onexit_process): Don't call free
after threads have been asynchronously terminated.

* manager.c (pthread_handle_exit): Surround cancellation
of threads with __flockfilelist()/__funlockfilelist().

--- linuxthreads/pthread.c.orig Thu Aug 16 03:50:05 2001
+++ linuxthreads/pthread.c Thu Nov 22 17:07:14 2001
@@ -793,7 +793,9 @@
if (self == __pthread_main_thread)
{
waitpid(__pthread_manager_thread.p_pid, NULL, __WCLONE);
- free (__pthread_manager_thread_bos);
+ /* Since all threads have been asynchronously terminated
+ (possibly holding locks), free cannot be used any more. */
+ /*free (__pthread_manager_thread_bos);*/


__pthread_manager_thread_bos = __pthread_manager_thread_tos = NULL;
}
}

--- linuxthreads/manager.c.orig Mon Jul 23 19:54:13 2001
+++ linuxthreads/manager.c Thu Nov 22 17:40:16 2001
@@ -883,6 +902,12 @@
pthread_descr th;
__pthread_exit_requested = 1;
__pthread_exit_code = exitcode;
+ /* A forced asynchronous cancellation follows. Make sure we won't
+ get stuck later in the main thread with a system lock being held
+ by one of the cancelled threads. Ideally one would use the same
+ code as in pthread_atfork(), but we can't distinguish system and
+ user handlers there. */
+ __flockfilelist();
/* Send the CANCEL signal to all running threads, including the main
thread, but excluding the thread from which the exit request originated
(that thread must complete the exit, e.g. calling atexit functions
@@ -899,6 +924,7 @@
th = th->p_nextlive) {
waitpid(th->p_pid, NULL, __WCLONE);
}
+ __fresetlockfiles();
restart(issuing_thread);
_exit(0);
}

Ulrich Drepper

unread,
Nov 28, 2001, 5:27:59 PM11/28/01
to
Wolfram Gloger <wm...@dent.med.uni-muenchen.de> writes:

> 2001-11-22 Wolfram Gloger <w...@malloc.de>
>
> * pthread.c (pthread_onexit_process): Don't call free
> after threads have been asynchronously terminated.
>
> * manager.c (pthread_handle_exit): Surround cancellation
> of threads with __flockfilelist()/__funlockfilelist().

This patch looks harmless enough and indeed can fix some problems.
I've applied it. Thanks,

--
---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------

0 new messages