Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problem with Suspend & Resume Thread Example

296 views
Skip to first unread message

Raymond Lau

unread,
Aug 16, 1999, 3:00:00 AM8/16/99
to David.B...@compaq.com
Mr. Butenhof has provided a cleaner version of the
susp.c in which functions for suspending and resume
a thread is defined. Please see reference below.

I compiled his updated example and it run fined. But
it find his solution troublesome. If anyone can clarify
that for me, I would be appreciated.

1) The suspend_init_routine() is initialized by pthread_once()
which is inside thr_suspend() and thr_continue().
In his example, thr_suspend() and thr_contine() are called
by the main thread. But since the worker threads are already
created. I am not sure whether their signal masks would be
updated accordingly?

2) Can a thread can thr_suspend() in order to suspend itself?
If it can and it's the first one to call thr_suspend() or
thr_continue(). With the signal masks of other threads
be updated accordingly?

3) Seems to me that after a specific thread is suspended and
resumed, suspending it again would not work.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Reference>>>>>>>>>>>>>>
I wrote David Butenhof this morning about just that program, and here is

his reply:

- David Eger

-------------------------------------------

Date: Fri, 13 Aug 1999 09:14:36 -0400
From: Dave Butenhof <David.B...@compaq.com>
To: David Thomas Eger <eg...@cc.gatech.edu>
Subject: Re: Thread suspension example


David Thomas Eger wrote:

> I was wondering about your program "susp.c" in "Programming with POSIX

> Threads, specifically the suspend_signal_handler. How do we know that

> the signal handler couldn't be interrupted between:
>
> sentinel = 1;
>
> and
>
> sigsuspend (&signal_set);
>
> That is, what if the timeslice ends after sentinel = 1; and the
program
> continues to a point where it tells the thread to resume, only the
thread

> isn't waiting for the signal yet, because it hasn't executed
sigsuspend()

> yet? If we only send resume once, we're sort of stuck, aren't we?

Yes, but it really doesn't have anything to do with timeslicing; for
example, on an SMP, both threads could be running simultanteously.

I've known for a long time that the susp.c program had bugs. (As I said
in
the footnote, it wasn't written by me. Unfortunately, since I chose to
publish it, I can't exactly say that absolves me of responsibility for
the
errors. Oh well. ;-) )

Anyway, one particularly major error is that the signal handler for
SIGUSR1

is set up incorrectly. The race can be resolved by setting sa_mask to
block

both SIGUSR1 and SIGUSR2 until the sigsuspend call unblocks SIGUSR2. In
practice, this doesn't happen easily or often. The susp.c program was
being

developed as a sample to show our Java team how to do Java's
suspend/resume

(especially for garbage collection). It was intended to be just a quick
sample to get them started. I was just finishing the book. I wasn't very

happy with the completely artificial and useless pthread_kill example
program I'd written, and I knew many people would be interested in
asynchronous suspend/resume (however bad an idea that may be!), so I
decided
to switch to susp.c. It "seemed to work" in initial testing, and I
didn't
have the time to try to completely rewrite it. (Though there were
several
aspects I'd have liked to change, including using semaphores for a more
reliable suspend synchronization mechanism.) Of course, the more
something
like that is used, the more the seams start splitting open, and it
wasn't
long before Java's attempts to actually use the mechanism started
showing
errors.

On and off, I've been working on a bunch of corrections for the next
printing. (And I'd like to find time to start on a second edition, but
that's another story.) I've been fixing some of the examples, and I did
a
substantial amount of work on susp.c. What I have now is a fixed but
only
partly "cleaned up" version to fix the worst of the problems without too

much change. I'll include the new code as an attachment.

By the way, if you've got a version of the book earlier than the 3rd
printing, you might want to take a look at the Errata list. I really
need
to
find time to figure out who to talk to about posting it on the awl web
site,
but, right now, you can find it at
http://members.aol.com/drbutenhof/Errata.html.


/---------------------------[ Dave Butenhof ]--------------------------\

| Compaq Computer Corporation David.B...@compaq.com |

| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |

| Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ |

\-----------------[ Better Living Through Concurrency ]----------------/


/*
* susp.c
*
* Demonstrate an implementation of thread suspend and resume (similar
* to the Solaris thr_suspend/thr_continue functions) using portable
* POSIX functions.
*
* Note 1: Use of suspend and resume requires extreme care. You can
* easily deadlock your application by suspending a thread that holds
* some resource -- for example, a thread calling printf to print a
* message may have libc mutexes locked, and no other thread will be
* able to return from printf until the suspended thread is resumed.
*
* Note 2: This program is called "susp" rather than "suspend" to
* avoid confusion (by your shell) with the suspend command.
*
* Note 3: This simple program will fail if any thread terminates
* during the test. The value of ITERATIONS must be adjusted to a
* value sufficiently large that the main thread can complete its two
* suspend/continue loops before any thread_routine threads terminate.
* */
#include <pthread.h>
#include <sched.h>
#include <signal.h>
#include <semaphore.h>
#include "errors.h"

#define THREAD_COUNT 20
#define ITERATIONS 80000

typedef struct {
int inuse; /* 1 if in use, else 0
*/
pthread_t id; /* Thread ID */
} Victim_t;

sem_t sem;
int thread_count = THREAD_COUNT;
int iterations = ITERATIONS;
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
pthread_once_t once = PTHREAD_ONCE_INIT;
Victim_t *array = NULL;
int bottom = 0;

/*
* Handle SIGUSR1 in the target thread, to suspend it until receiving
* SIGUSR2 (resume). Note that this is run with both SIGUSR1 and
* SIGUSR2 blocked. Having SIGUSR2 blocked prevents a resume before we
* can finish the suspend protocol.
*/
void
suspend_signal_handler (int sig)
{
sigset_t signal_set;

/*
* Block all signals except SIGUSR2 while suspended.
*/
sigfillset (&signal_set);
sigdelset (&signal_set, SIGUSR2);
sem_post (&sem);
sigsuspend (&signal_set);

/*
* Once here, the resume signal handler has run to completion.
*/
return;
}

/*
* Handle SIGUSR2 in the target thread, to resume it. Note that
* the signal handler does nothing. It exists only because we need
* to cause sigsuspend() to return.
*/
void
resume_signal_handler (int sig)
{
return;
}

/*
* Dynamically initialize the "suspend package" when first used
* (called by pthread_once).
*/
void
suspend_init_routine (void)
{
int status;
struct sigaction sigusr1, sigusr2;

/*
* Initialize a semaphore, to be used by the signal handler to
* confirm suspension. We only need one, because suspend & resume
* are fully serialized by a mutex.
*/
status = sem_init (&sem, 0, 1);
if (status == -1)
errno_abort ("Initializing semaphore");

/*
* Allocate the suspended threads array. This array is used to
guarantee
* idempotency.
*/
bottom = 10;
array = (Victim_t*) calloc (bottom, sizeof (Victim_t));

/*
* Install the signal handlers for suspend/resume.
*
* We add SIGUSR2 to the sa_mask field for the SIGUSR1 handler. That

* avoids a race if one thread suspends the target while another
resumes
* that same target. (The SIGUSR2 signal cannot be delivered before
the
* target thread calls sigsuspend.)
*/
sigusr1.sa_flags = 0;
sigusr1.sa_handler = suspend_signal_handler;
sigemptyset (&sigusr1.sa_mask);
sigaddset (&sigusr1.sa_mask, SIGUSR2);

sigusr2.sa_flags = 0;
sigusr2.sa_handler = resume_signal_handler;
sigemptyset (&sigusr2.sa_mask);

status = sigaction (SIGUSR1, &sigusr1, NULL);
if (status == -1)
errno_abort ("Installing suspend handler");

status = sigaction (SIGUSR2, &sigusr2, NULL);
if (status == -1)
errno_abort ("Installing resume handler");

return;
}

/*
* Suspend a thread by sending it a signal (SIGUSR1), which will
* block the thread until another signal (SIGUSR2) arrives.
*
* Multiple calls to thd_suspend for a single thread have no
* additional effect on the thread -- a single thd_continue
* call will cause it to resume execution.
*/
int
thd_suspend (pthread_t target_thread)
{
int status;
int i;

/*
* The first call to thd_suspend will initialize the
* package.
*/
status = pthread_once (&once, suspend_init_routine);
if (status != 0)
return status;

/*
* Serialize access to suspend, makes life easier
*/
status = pthread_mutex_lock (&mut);
if (status != 0)
return status;

/*
* Threads that are suspended are added to the target_array; a
request to
* suspend a thread already listed in the array is ignored.
*/
for (i = 0; i < bottom; i++) {
if (array[i].inuse
&& pthread_equal (array[i].id, target_thread)) {
status = pthread_mutex_unlock (&mut);
return status;
}
}

/*
* Ok, we really need to suspend this thread. So, lets find the
location
* in the array that we'll use. If we run off the end, realloc the
array
* for more space. We allocate 2 new array slots; we'll use one, and
leave
* the other for the next time.
*/
i = 0;
while (i < bottom && array[i].inuse)
i++;

if (i >= bottom) {
bottom += 2;
array = (Victim_t*)realloc (
array, (bottom * sizeof (Victim_t)));
if (array == NULL) {
pthread_mutex_unlock (&mut);
return errno;
}
array[bottom-1].inuse = 0; /* Clear new last entry */
}

/*
* Initialize the target's data. We initialize a semaphore to
synchronize
* with the signal handler. After sending the signal, we wait until
the
* signal handler posts the semaphore. Then we can destroy the
semaphore,
* because we won't need it again.
*/
array[i].id = target_thread;
array[i].inuse = 1;

status = pthread_kill (target_thread, SIGUSR1);
if (status != 0) {
pthread_mutex_unlock (&mut);
return status;
}

/*
* Wait for the victim to acknowledge suspension.
*/
while ((status = sem_wait (&sem)) != 0) {
if (errno != EINTR) {
pthread_mutex_unlock (&mut);
return errno;
}
}

status = pthread_mutex_unlock (&mut);
return status;
}

/*
* Resume a suspended thread by sending it SIGUSR2 to break
* it out of the sigsuspend() in which it's waiting. If the
* target thread isn't suspended, return with success.
*/
int
thd_continue (pthread_t target_thread)
{
int status;
int i = 0;

/*
* The first call to thd_suspend will initialize the package.
*/
status = pthread_once (&once, suspend_init_routine);
if (status != 0)
return status;

/*
* Serialize access to suspend, makes life easier.
*/
status = pthread_mutex_lock (&mut);
if (status != 0)
return status;

/*
* Make sure the thread is in the suspend array. If not, it hasn't
been
* suspended (or it has already been resumed) and we can just carry
on.
*/
while (i < bottom && array[i].inuse
&& pthread_equal (array[i].id, target_thread))
i++;

if (i >= bottom) {
pthread_mutex_unlock (&mut);
return 0;
}

/*
* Signal the thread to continue, and remove the thread from
* the suspended array.
*/
status = pthread_kill (target_thread, SIGUSR2);
if (status != 0) {
pthread_mutex_unlock (&mut);
return status;
}

array[i].inuse = 0; /* Clear array element */
status = pthread_mutex_unlock (&mut);
return status;
}

static void *
thread_routine (void *arg)
{
int number = (int)arg;
int status;
int i;
char buffer[128];

for (i = 1; i <= iterations; i++) {
/*
* Every time each thread does 2000 interations, print a
progress
* report.
*/
if (i % 2000 == 0) {
sprintf (
buffer, "Thread %02d: %d\n",
number, i);
write (1, buffer, strlen (buffer));
}

sched_yield ();
}

return (void *)0;
}

int
main (int argc, char *argv[])
{
pthread_t threads[THREAD_COUNT];
pthread_attr_t detach;
int status;
void *result;
int i;

status = pthread_attr_init (&detach);
if (status != 0)
err_abort (status, "Init attributes object");
status = pthread_attr_setdetachstate (
&detach, PTHREAD_CREATE_DETACHED);
if (status != 0)
err_abort (status, "Set create-detached");

for (i = 0; i< THREAD_COUNT; i++) {
status = pthread_create (
&threads[i], &detach, thread_routine, (void *)i);
if (status != 0)
err_abort (status, "Create thread");
}

sleep (1);

for (i = 0; i < THREAD_COUNT/2; i++) {
printf ("Suspending thread %d.\n", i);
status = thd_suspend (threads[i]);
if (status != 0)
err_abort (status, "Suspend thread");
}

printf ("Sleeping ...\n");
sleep (1);

for (i = 0; i < THREAD_COUNT/2; i++) {
printf ("Continuing thread %d.\n", i);
status = thd_continue (threads[i]);
if (status != 0)
err_abort (status, "Suspend thread");
}

for (i = THREAD_COUNT/2; i < THREAD_COUNT; i++) {
printf ("Suspending thread %d.\n", i);
status = thd_suspend (threads[i]);
if (status != 0)
err_abort (status, "Suspend thread");
}

printf ("Sleeping ...\n");
sleep (1);

for (i = THREAD_COUNT/2; i < THREAD_COUNT; i++) {
printf ("Continuing thread %d.\n", i);
status = thd_continue (threads[i]);
if (status != 0)
err_abort (status, "Continue thread");
}

/*
* Request that each thread terminate. We don't bother waiting for
them;
* just trust that they will terminate in "reasonable time". When
the last
* thread exits, the process will exit.
*/
for (i = 0; i < THREAD_COUNT; i++)
pthread_cancel (threads[i]);

pthread_exit (NULL); /* Let threads finish */
}


Dave Butenhof

unread,
Aug 16, 1999, 3:00:00 AM8/16/99
to Raymond Lau
Raymond Lau wrote:

> Mr. Butenhof has provided a cleaner version of the
> susp.c in which functions for suspending and resume
> a thread is defined. Please see reference below.
>
> I compiled his updated example and it run fined. But
> it find his solution troublesome. If anyone can clarify
> that for me, I would be appreciated.
>
> 1) The suspend_init_routine() is initialized by pthread_once()
> which is inside thr_suspend() and thr_continue().
> In his example, thr_suspend() and thr_contine() are called
> by the main thread. But since the worker threads are already
> created. I am not sure whether their signal masks would be
> updated accordingly?

The pthread_once() routine doesn't change any thread's signal mask. It
defines signal actions for two signals (SIGUSR1 as "suspend" and SIGUSR2 as
"resume"). The fixed code (unlike the original) assigns a non-default
signal mask to the signal action. That means, when the SIGUSR1 signal is
delivered to the thread, SIGUSR2 and SIGUSR1 will both be masked. (The
default is to mask only the signal being delivered, SIGUSR1, which resulted
in the race where the SIGUSR2 might be sent after the suspend handler had
"acknowledged" the suspend by setting the volatile variable but before it
called sigsuspend.)

Any SIGUSR1 delivered prior to the init routine wouldn't have the proper
mask. But then, it would have to be a SIGUSR1 from some other source, since
the signal action is set at the same time.

> 2) Can a thread can thr_suspend() in order to suspend itself?
> If it can and it's the first one to call thr_suspend() or
> thr_continue(). With the signal masks of other threads
> be updated accordingly?

Again, this doesn't matter. There's no attempt to modify the signal mask of
any thread. (Except indirectly through the normal kernel mechanism for
signal delivery.)

> 3) Seems to me that after a specific thread is suspended and
> resumed, suspending it again would not work.

Why? If you're worried about the signal mask, don't. Returning from the
signal action routine will cause the kernel to restore the thread's
previous signal mask. (Unblocking both SIGUSR1 and SIGUSR2.)

As I said, this is a "work in progress", from one brief session when I
found some time to fix the worst of the known problems. I hadn't really
intended the current code to go any further than my working directory.
Since this issue came up, I decided I might as well -- I believe it's "no
worse than" the original, and it does fix several serious problems. If you
do see any real errors in the new code, of course, please let me know.

Kimera Warid

unread,
Mar 11, 2023, 7:10:59 AM3/11/23
to
On Monday, 16 August 1999 at 10:00:00 UTC+3, Dave Butenhof wrote:
> Raymond Lau wrote:
> > Mr. Butenhof has provided a cleaner version of the
> > susp.c in which functions for suspending and resume
> > a thread is defined. Please see reference below.
> >
> > I compiled his updated example and it run fined. But
> > it find his solution troublesome. If anyone can clarify
> > that for me, I would be appreciated.
> >
> > 1) The suspend_init_routine() is initialized by pthread_once()
> > which is inside thr_suspend() and thr_continue().
> > In his example, thr_suspend() and thr_contine() are called
> > by the main thread. But since the worker threads are already
> > created. I am not sure whether their signal masks would be
> > updated accordingly?
> The pthread_once() routine doesn't change any thread's signal mask. It
> defines signal actions for two signals (SIGUSR1 as "suspend" and SIGUSR2 as
> "resume"). The fixed code (unlike the original) assigns a non-default
> signal mask to the signal action. That means, when the SIGUSR1 signal is
> delivered to the thread, SIGUSR2 and SIGUSR1 will both be masked. (The
> default is to mask only the signal being delivered, SIGUSR1, which resulted
> in the race where the SIGUSR2 might be sent after the suspend handler had
> "acknowledged" the suspend by setting the volatile variable but before it
> called sigsuspend.)
> Any SIGUSR1 delivered prior to the init routine wouldn't have the proper
> mask. But then, it would have to be a SIGUSR1 from some other source, since
> the signal action is set at the same time.
> > 2) Can a thread can thr_suspend() in order to suspend itself?
> > If it can and it's the first one to call thr_suspend() or
> > thr_continue(). With the signal masks of other threads
> > be updated accordingly?
> Again, this doesn't matter. There's no attempt to modify the signal mask of
> any thread. (Except indirectly through the normal kernel mechanism for
> signal delivery.)
> > 3) Seems to me that after a specific thread is suspended and
> > resumed, suspending it again would not work.
> Why? If you're worried about the signal mask, don't. Returning from the
> signal action routine will cause the kernel to restore the thread's
> previous signal mask. (Unblocking both SIGUSR1 and SIGUSR2.)
> As I said, this is a "work in progress", from one brief session when I
> found some time to fix the worst of the known problems. I hadn't really
> intended the current code to go any further than my working directory.
> Since this issue came up, I decided I might as well -- I believe it's "no
> worse than" the original, and it does fix several serious problems. If you
> do see any real errors in the new code, of course, please let me know.
0 new messages