Windows Suspend/ResumeThread API equivalent for Linux or POSIX?

Chris Thomasson

unread,

May 10, 2005, 9:27:36 PM5/10/05

to

Is there a way I can achieve similar functionality on Linux or POSIX based
systems?

David Schwartz

unread,

May 10, 2005, 10:31:23 PM5/10/05

to

"Chris Thomasson" <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote in
message news:WZidna_OzsH...@comcast.com...

> Is there a way I can achieve similar functionality on Linux or POSIX based
> systems?

You can do so any way you like with the cooperation of the thread you
would like to suspend or resume. Without the cooperation of that thread, it
simply cannot be done safely. There would be essentially nothing you could
do between the suspend and the resume.

Usually when people ask this question, they are barking up the wrong
tree. Why do you think you need to suspend a thread?

DS

Chris Thomasson

unread,

May 11, 2005, 2:09:19 AM5/11/05

to

> Why do you think you need to suspend a thread?

In order to call the Windows GetThreadContext Thread API, you need to use
the Resume/SuspendThread API. I am reading thread context and comparing
against deferred pointers in a modified SMR polling algorithm; You can
eliminate the hazard pointer reload and compare by doing this.

I was wondering if Linux and/or POSIX has a way to get thread context and if
it was similar to the way windows does it. Something like:

pthread_suspend_np( pthread_t, ucontext_t* );
pthread_resume_np( pthread_t );

Jomu

unread,

May 11, 2005, 3:29:33 AM5/11/05

to

Been there, done that.... In my case it was because of GC need to stop
all threads for a moment to finish GC cycle. A bit of cooperation is a
must, and I've used pthread_kill to send SIGRTMIN+7 to threads I need
to suspend, waited until designated semaphore signalled all are
suspended and used same signal later to resume. One per-thread signal
handler, few semaphores, and that is it. It's even as portable as is
Boehm's GC :), as he used same pthread_ calls inside his signal
handler. It is used to suspend/resume and to get thread context in
place for next phase (registers/stack scan for pointers).

This is also verrrrry performant. I have it running on a DC HUB (built
as a testbed for native threads in Modula-3) with hundreds (up to 1500
observed) of threads running and load average on system is under 1%
99.9% of time. (N-4)/3 are TCP connections so IO and heap trashing are
constantly happening :).

There is source snapshot at http://home.rstel.net/~dragisha/gc-pthread

dd

Sherif Elgohari

unread,

May 11, 2005, 9:04:48 AM5/11/05

to

What about pthread_suspend and pthread_continue ?

Casper H.S. Dik

unread,

May 11, 2005, 1:52:27 PM5/11/05

to

"Sherif Elgohari" <sherif....@gmail.com> writes:

>What about pthread_suspend and pthread_continue ?

Other than that they do not and should not exist?

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

David Schwartz

unread,

May 11, 2005, 3:10:09 PM5/11/05

to

"Chris Thomasson" <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote in

message news:DaednQpM4cn...@comcast.com...

>> Why do you think you need to suspend a thread?

> In order to call the Windows GetThreadContext Thread API, you need to use
> the Resume/SuspendThread API. I am reading thread context and comparing
> against deferred pointers in a modified SMR polling algorithm; You can
> eliminate the hazard pointer reload and compare by doing this.

Sounds like doing an awful lot of work to save a small amount of work.
Suspending a thread without its cooperation is an incredibly expensive thing
to do.

> I was wondering if Linux and/or POSIX has a way to get thread context and
> if it was similar to the way windows does it. Something like:
>
> pthread_suspend_np( pthread_t, ucontext_t* );
> pthread_resume_np( pthread_t );

No. What could you do while the thread was suspended? How can you be
sure you don't trip over a lock the suspended thread holds?

DS

Message has been deleted

David Schwartz

unread,

May 11, 2005, 7:08:44 PM5/11/05

to

"Oliver S." <Foll...@gmx.net> wrote in message news:42828d3a@news-fe-01...

>> Usually when people ask this question, they are barking up the
>> wrong tree. Why do you think you need to suspend a thread?

> Imagine you have an application which performes a computation-time
> -intensely task in the background with a worker-thread (f.e. a rende-
> ring-application). And this app would provide a pause-button to sus-
> pend this calculations to free computing-power for a while.

> Without an API similar to SuspendThread / ResumeThread you would have
> to poll a volatile flag which is shared between the worker-threads and
> the supervising thread. That takes performance (esspecially when mul-
> tiple worker-threads share this flag on a SMP-system and need to MESI
> -broadcast for a newer version of the cache-line containing that flag
> to all CPUs) and when you need a fine granularity of the poll-inter-
> val, you're likely to insert that polling-code in a lot of functions
> which aren't related to this threading-issue.

Umm, no, it would stay shared until it was modified.

> Don't you think an API
> like SuspendThread / ResumeThread is much smarter in that case ?

No, I don't. In fact, it's a grossly ugly premature optimization. And if
you suspend the thread while it holds the 'malloc' mutex, you're screwed.

> And
> as suspending an resuming a thread can be considered as a longer in-
> terval of suspension through preemption, such an API could be provi-
> ded by an operating-system without a significant cost in code-size
> or growth in complexity of the OS.

It is nothing like pre-emption, because with pre-emption, the thread
will ultimately be rescheduled with no action from user code, so deadlock is
not possible.

DS

Message has been deleted

David Schwartz

unread,

May 11, 2005, 8:27:34 PM5/11/05

to

"Oliver S." <Foll...@gmx.net> wrote in message news:42829b2d@news-fe-01...

>> No, I don't. In fact, it's a grossly ugly premature optimization.

> That's your personal view, but _for_this_special_case_ this "opti-
> mization" is ok under some circumstances. Do you really think that
> polluting the code otherwise unrelated to threading-issues to check
> a flag is less ugly ?

Yes, by far. Because otherwise every single thing the thread does or
could to must be made suspension-safe and deadlock free.

>> And if you suspend the thread while it holds the 'malloc' mutex,
>> you're screwed.

> I've written a rendering-core of a PostScript-interpreter and this
> rendering-core can be distributed among a number of processors in a
> SMP-system. And the worker-threads of this core share a highly opti-
> mized common memory-allocator for every device-oject. And if I sus-
> pend all worker-threads attached to every device-object and using
> the same memory-pool-object, this wouldn't cause any side-effects;
> and there isn't any magic in taking care for this issue when wri-
> ting code for worker-threads. Of course there could be cases where
> you aren't able to design the worker-threads according to that rule
> because you're forced to use a library that baffles that approach.

And that's the problem. You're looking for a generic, portable mechanism
that works with every pthreads implementation. This puts serious
restrictions on what that implementation can do (for example, it can't
asynchronously grab a thread and allocate memory unless it contains special
code to make this suspension-safe). And why do you want this serious burden
of code? To support one particular optimization that is usually a
pessimization anyway.

If you really want this, and really thing it's worth it, go ahead and
code it. But you won't get help from anyone else unless you can convince
them that the benfits outweigh the costs.

>>> And as suspending an resuming a thread can be considered as a

>>> longer interval of suspension through preemption, such an API
>>> could be provided by an operating-system without a significant

>>> cost in code-size or growth in complexity of the OS.
>
>> It is nothing like pre-emption, because with pre-emption, the
>> thread will ultimately be rescheduled with no action from user
>> code, so deadlock is not possible.

> Of course this is possibe, but you misunderstood me. I simply wanted
> to say that the OS-code for SuspendThread / ResumeThread use the same
> code for suspending and resuming the thread which the scheduler uses.

That would be a disaster, and a recipe for deadlock. If the suspending
thread ever touched a resource that the suspended thread holds, there would
be a deadlock. You would have to use a suspending routine that insured that
such suspension-unsafe resources were not held (and what, retry the
suspension later if they are?). Using the regular scheduler suspension
mechanism would be suicidal. The normal scheduler suspension mechanism need
not concern itself with deadlock because the suspended thread will resume
without waiting for any resource another thread might hold.

DS

Chris Thomasson

unread,

May 11, 2005, 9:12:04 PM5/11/05

to

>> In order to call the Windows GetThreadContext Thread API, you need to use
>> the Resume/SuspendThread API. I am reading thread context and comparing
>> against deferred pointers in a modified SMR polling algorithm; You can
>> eliminate the hazard pointer reload and compare by doing this.
>
> Sounds like doing an awful lot of work to save a small amount of work.

:)

> Suspending a thread without its cooperation is an incredibly expensive
> thing to do.

Yes. Luckily, the algorithm assumes that. The polling algorithm in question
only runs about every five seconds, or when the total number of deferred
objects hits a certain level, its usually set to about a half-million nodes.
This is sufficient for many "read-mostly" data structures. Using thread
context for an algorithm that had a moderate and somewhat persistent number
of writes would be way too expensive. Anyway, I am not really advocating
scanning thread context to implement SMR. I want to gather a number of
different somewhat portable methods together so I can show how my new
algorithm differs from each one.

>> I was wondering if Linux and/or POSIX has a way to get thread context and
>> if it was similar to the way windows does it. Something like:
>>
>> pthread_suspend_np( pthread_t, ucontext_t* );
>> pthread_resume_np( pthread_t );
>
> No.

Ok. How about any Linux specific function to grab a copy of a thread's
context? I am not a Linux Kernel guru.

> What could you do while the thread was suspended?

> How can you be sure you don't trip over a lock the suspended thread holds?

Yes, this can be so dangerous...

Microsoft recommends that the thread that calls SuspendThread API not hold
and/or access "any" sync objects:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/suspendthread.asp

If you can't guarantee this with 100% certainty, then using SuspendThread is
like playing with a lit stick of dynamite.

:)

Chris Thomasson

unread,

May 11, 2005, 9:25:38 PM5/11/05

to

> I don't really understand what Chris' is doing here because I'm not
> familiar with this "hazard-pointers" (although I already heard of
> this term and I know that it has to do with lock-free algorithms);

Here is a paper describing the original SMR algorithm:

http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf

> but I implemented a lock-free stack some time ago on Win32/x86.

Yes, I believe it used SEH to handle the problems with freeing it's nodes?

> And I vaguely suppose that this suspension is done in case of a
> collision between two threads which isn't impossible, but usually
> very unlikely (at least with my stack).

For this tweaked version of SMR I need to suspend the threads in order to
get a snapshot of there current context, at least on windows.

> So isn't it possible that
> this suspension statistically needs a very small amount of compu-
> tation-time because it is done very rare ?

Yes. About every five seconds for read-mostly collections.

Message has been deleted

Patrick TJ McPhee

unread,

May 11, 2005, 11:17:19 PM5/11/05

to

In article <42828d3a@news-fe-01>, Oliver S. <Foll...@gmx.net> wrote:

% Imagine you have an application which performes a computation-time
% -intensely task in the background with a worker-thread (f.e. a rende-
% ring-application). And this app would provide a pause-button to sus-
% pend this calculations to free computing-power for a while.

This can be handled better by setting the background thread's priority.
--

Patrick TJ McPhee
North York Canada
pt...@interlog.com

David Schwartz

unread,

May 11, 2005, 11:28:40 PM5/11/05

to

"Chris Thomasson" <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote in

message news:pJSdnWBv0ti...@comcast.com...

> Microsoft recommends that the thread that calls SuspendThread API not hold
> and/or access "any" sync objects:
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/suspendthread.asp
>
> If you can't guarantee this with 100% certainty, then using SuspendThread
> is like playing with a lit stick of dynamite.

You never can. You never know when a libarary might commandeer your
thread and go near a sync object. In order to support Suspend/Resume
threads, you must make sure your entire design, from the ground up, is
suspension safe. This is a (potentially) huge cost that pthreads decided not
to require every implementation to take.

DS

David Schwartz

unread,

May 11, 2005, 11:37:00 PM5/11/05

to

"Chris Thomasson" <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote in

message news:IsSdnUOMFMv...@comcast.com...

> Here is a paper describing the original SMR algorithm:
>
> http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf

I can tell you from experience with similar algorithms that these
algorithms do not work well in user-space. To get the performance boost, you
need more control over your environment and the low-level scheduling
behavior than is possible to get in that environment. (Though they may get
you real benefits in kernel space.)

I can also tell your from experience that it is often the case that when
you think you need this type of alogorithm, what you really need to do is
rearchitect or make other subtle changes. Avoiding contention in the first
place is better than algorithms that make contention less expensive.

Of course, I'm not familiar with this exact algorithm or your exact
application. So you may have the 2% case where the usual guidelines break.
Are you *sure* portable algorithms with mutexes won't get you sufficient
performance? (Or is this a research or proof of concept kind of thing?)

I would also warn you against making design changes based upon
benchmarks on unrealistic workloads. Many lock-free algorithms, for example,
perform better than locking algorithms on unrealistic workloads. On
realistic workloads, the descheduling of a conflicting thread allows both
threads to run with no further conflicts. This is much more efficient than
cache ping-ponging as both threads operate on the same data even with no
locking cost.

Descheduling a thread that's conflicting with another thread is *good*
and running it later when it won't conflict with anything is *great*.
Running both threads slowly with a lock-free algorithm is bad because work
that must be done slowly is done instead of work that can be done quickly.
(Of course, you don't see this on unrealistic workloads because there's
nothing else to do.)

Think about it. Your intuitions may be wrong.

DS

Chris Thomasson

unread,

May 12, 2005, 2:03:49 AM5/12/05

to

>> If you can't guarantee this with 100% certainty, then using SuspendThread
>> is like playing with a lit stick of dynamite.
>
> You never can. You never know when a libarary might commandeer your
> thread and go near a sync object.

I have total control over the research application. It has never had any
problems with some other library hijacking its threads. In fact, what
exactly are you getting at here?

Chris Thomasson

unread,

May 12, 2005, 2:48:48 AM5/12/05

to

>> Here is a paper describing the original SMR algorithm:
>>
>> http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf
>
> I can tell you from experience with similar algorithms that these
> algorithms do not work well in user-space. To get the performance boost,
> you need more control over your environment and the low-level scheduling
> behavior than is possible to get in that environment. (Though they may get
> you real benefits in kernel space.)

Actually, both SMR and RCU can work well in user-space. Here are two working
full-blown examples:

http://appcore.home.comcast.net/
( User-Space SMR )

http://atomic-ptr-plus.sourceforge.net/
( User-Space RCU. Joe Seigh has a nice lock-free read test setup. )

> I can also tell your from experience that it is often the case that
> when you think you need this type of alogorithm, what you really need to
> do is rearchitect or make other subtle changes. Avoiding contention in the
> first place is better than algorithms that make contention less expensive.

Well, these algorithms can be used to provide scaleable solutions to the
reader/writer problem. In particular, the iteration of read-mostly
data-structures. I can't seem to design a lock-based solution that can get
equivalent read-side performance numbers for this special case. If you can
present a lock-based solution for iterating read-mostly data structures that
can beat these types of algorithms I will gladly post it to my site. My site
is about mixing lock-free with lock-based, so a high-speed lock-based
solution to the reader-writer problem would be greatly welcome.

> Of course, I'm not familiar with this exact algorithm or your exact
> application. So you may have the 2% case where the usual guidelines break.
> Are you *sure* portable algorithms with mutexes won't get you sufficient
> performance? (Or is this a research or proof of concept kind of thing?)

This is pure research. I also provide a user-space demo library as a proof
of concept.

> I would also warn you against making design changes based upon
> benchmarks on unrealistic workloads. Many lock-free algorithms, for
> example, perform better than locking algorithms on unrealistic workloads.
> On realistic workloads, the descheduling of a conflicting thread allows
> both threads to run with no further conflicts. This is much more efficient
> than cache ping-ponging as both threads operate on the same data even with
> no locking cost.

Contended locks can suffer from similar problems, in certain situations...

Take a global read-write lock protecting a mostly-read data-structure. Under
heavy read contention, forward progress can be stalled waiting for the
updates to the internal lock state to be shuttled around the CPU's caches.
If the size of the read-side critical section is moderate, the cost of
moving lock updates from one CPU's cache to another cal rival that of the
critical section itself; This can prevent readers from truly executing in
parallel.

> Descheduling a thread that's conflicting with another thread is *good*
> and running it later when it won't conflict with anything is *great*.
> Running both threads slowly with a lock-free algorithm is bad because work
> that must be done slowly is done instead of work that can be done quickly.
> (Of course, you don't see this on unrealistic workloads because there's
> nothing else to do.)

Lock-free is not good for everything; An efficient marriage between
lock-free and lock-based can be better than using one or the other. Take a
look at this:

http://groups.google.ca/group/comp.programming.threads/msg/7e7834ca10f2613a?hl=en
( read the HP paper describing the performance differences of lock-free with
lock-based, and how a mix of both worlds can help. )

Chris Thomasson

unread,

May 12, 2005, 2:58:16 AM5/12/05

to

>> Yes, I believe it used SEH to handle the problems with freeing
>> it's nodes?
>

> Yes, you remember right. And it's really exiting to see how this
> simple trick perfectly circumvents the deallocation-problem. This
> originates in one of my bathtub-ideas *G*.

:)

> It's a smart idea, but I think David is right wehen he objects,
> that this is very costly. And beyond that, it's absolutely unpor-
> table, even between different Unices (if they'd support the thread
> -inspection APIs) because on every different architecture you to
> regard different registers to inspect the current instruction-poin-
> ter and state-registers.

Yeah. I am only tinkering around with this because I need to build a list of
different ways to implement SMR in user-space. I need this list so I can
compare and contrast each method against an ( unpublished ) algorithm I
created.

> I don't know whether you understood me correctly: I wanted to ask
> you if this thread-suspension is done only done in case of a col-
> lision between two threads being in a race-condition for a certain
> point in time and in most cases this collision doesn't happen so
> that this threads-suspension accounts to the *average* effort to
> access that shared resource only very slightly.

No. I need access to thread context only in the polling phase of this
particular modified SMR algorithm.

Chris Thomasson

unread,

May 12, 2005, 3:11:59 AM5/12/05

to

> If you can present a lock-based solution for iterating read-mostly data
> structures that can beat these types of algorithms I will gladly post it
> to my site.

I forgot to mention than an acceptable lock-based solution should not allow
a read to block a concurrent write, or vise versa...

> My site is about mixing lock-free with lock-based, so a high-speed
> lock-based solution to the reader-writer problem would be greatly welcome.

Here are some performance numbers that Joes RCU implementation can provide:

http://groups.google.ca/group/comp.programming.threads/msg/f8a63d1d5fa6f0ee?hl=en

http://groups.google.ca/group/comp.programming.threads/msg/7cb61e1c4116f1ca?hl=en

That is simply awesome per-thread read throughput.

Casper H.S. Dik

unread,

May 12, 2005, 5:47:22 AM5/12/05

to

"Chris Thomasson" <_no_damn_spam_cristom@_no_damn_comcast.net_spam> writes:

When you suspend a thread which uses any type of sync object, the
thread may be holding it while it is suspended.

In the particular case of a garbage collector and a shared memory
pool, a thread may be holding a lock to the memory pool and it
may be in an inconsitent state.

You should also keep in mind that when you start with garbage
collection, non-portability rears its ugly head in many way:
you will need to know not only about the stack but also about
register save areas used by the kernel which may not be easily
available.

David Schwartz

unread,

May 12, 2005, 6:37:54 AM5/12/05

to

"Chris Thomasson" <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote in

message news:K9udnfCOipw...@comcast.com...

I'm saying that had the POSIX committee included an API to suspend a
thread, they would have to add significant restrictions on what any
implementation could do. POSIX only does this when the benefits are
significant. Here, the benefits are very, very questionable.

DS

Joe Seigh

unread,

May 12, 2005, 6:04:16 AM5/12/05

to

On Wed, 11 May 2005 18:25:38 -0700, Chris Thomasson <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote:

> For this tweaked version of SMR I need to suspend the threads in order to
> get a snapshot of there current context, at least on windows.
>

There's nothing on unix that's portable. On Solaris there's a /proc based
funtion to stop a thread but it wasn't guaranteed to take effect in any
specific amount of time so it wasn't very useful.

You'd have to use unix signals but if you use that then you might as well
have the thread do the context examination so it doesn't have to spend too
much time suspended.

SMR hazard pointers were designed so you don't have to look at context for
the thread's registers. Regular Boehm style GC needs to do this. If you
can guarantee that GC tracked local references are never copied to shared
memory, ie. the heap, then you can do GC by stopping one thread at a time
rather than the whole world.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.