how do threads work?

Andersen

unread,

Nov 18, 2005, 6:02:01 PM11/18/05

to

How can a thread library in user land (not kernel) actually do context
switching. How does it actually preempt? I cannot imagine how that can
be done in userland.

Grant Edwards

unread,

Nov 18, 2005, 6:23:00 PM11/18/05

to

On 2005-11-18, Andersen <anders...@hotmail.com> wrote:

> How can a thread library in user land (not kernel) actually do context
> switching.

In the general case, it can't unless it's all done
cooperatively.

> How does it actually preempt?

It doesn't.

> I cannot imagine how that can be done in userland.

It isn't.

Under Linux, threads are just processes that share a memory
space. All of the the context switching is done in the kernel.

--
Grant Edwards grante Yow! TAILFINS!!...click...
at
visi.com

Pascal Bourguignon

unread,

Nov 18, 2005, 7:49:00 PM11/18/05

to

Andersen <anders...@hotmail.com> writes:

This is rather easy.

When you want to context switch, you save all the registers on the
current stack, and store the stack pointer into the current thread
structure, then you change the current thread, you fetch the new stack
pointer from the new current thread structure, and you restore the
registers. Context switch done.

Now the problem is to do it premptively. Since on unix normally the
only asynchronous events a process can receive are signals, we should
us them. But if we switch the context in the middle of a signal
handling function, I'm not sure all systems will appreciate: it's
better to return from the signal handler. So what you can do is to
use SIGALRM, and in the signal handler, you pop the signal stack
frame and save it temporarily, do the context switching, push the
saved signal stack frame, and return from the signal handler. You
need to know how to determine the size of signal stack frame.

If you have more specific knowledge of the current system, you may be
able to use more optimized mechanisms. For example, if the system you
have can deliver the signal on a specific stack, instead of using the
current stack pointer, you can avoid the copying of the signal stack
frames.

--
__Pascal Bourguignon__ http://www.informatimago.com/
You're always typing.
Well, let's see you ignore my
sitting on your hands.

Andersen

unread,

Nov 19, 2005, 7:54:09 AM11/19/05

to

Pascal Bourguignon wrote:

> When you want to context switch, you save all the registers on the
> current stack, and store the stack pointer into the current thread
> structure, then you change the current thread, you fetch the new stack
> pointer from the new current thread structure, and you restore the
> registers. Context switch done.

Understand.

> Now the problem is to do it premptively. Since on unix normally the
> only asynchronous events a process can receive are signals, we should
> us them. But if we switch the context in the middle of a signal
> handling function, I'm not sure all systems will appreciate: it's
> better to return from the signal handler. So what you can do is to
> use SIGALRM, and in the signal handler, you pop the signal stack
> frame and save it temporarily, do the context switching, push the
> saved signal stack frame, and return from the signal handler. You
> need to know how to determine the size of signal stack frame.

Do not get it. You mean the actual preemption is done by using signals?
How is that implemented on a machine such as IA32. I mean how can a
user-level thread library preempt a thread?

Stack frame = registers and return address saved on the stack before a
function call?

David Schwartz

unread,

Nov 19, 2005, 8:35:03 AM11/19/05

to

"Andersen" <anders...@hotmail.com> wrote in message
news:437e5d72$0$27956$892e...@authen.yellow.readfreenews.net...

> How can a thread library in user land (not kernel) actually do context
> switching. How does it actually preempt? I cannot imagine how that can be
> done in userland.

How can you read a file in userland? You ask the kernel to do it for
you.

DS

Andersen

unread,

Nov 19, 2005, 9:34:11 AM11/19/05

to

David Schwartz wrote:
>
> How can you read a file in userland? You ask the kernel to do it for
> you.
>

Right. That is why you never hear the expression file access being
implemented in a library in user-level.

With threads however, there are kernel-level threads and user-level
threads (or library implementations etc). I am curious how the latter
does the preemption.

Pascal Bourguignon

unread,

Nov 19, 2005, 10:05:10 AM11/19/05

to

Andersen <anders...@hotmail.com> writes:

> Pascal Bourguignon wrote:
>
>> When you want to context switch, you save all the registers on the
>> current stack, and store the stack pointer into the current thread
>> structure, then you change the current thread, you fetch the new stack
>> pointer from the new current thread structure, and you restore the
>> registers. Context switch done.
>
> Understand.
>
>> Now the problem is to do it premptively. Since on unix normally the
>> only asynchronous events a process can receive are signals, we should
>> us them. But if we switch the context in the middle of a signal
>> handling function, I'm not sure all systems will appreciate: it's
>> better to return from the signal handler. So what you can do is to
>> use SIGALRM, and in the signal handler, you pop the signal stack
>> frame and save it temporarily, do the context switching, push the
>> saved signal stack frame, and return from the signal handler. You
>> need to know how to determine the size of signal stack frame.
>
> Do not get it. You mean the actual preemption is done by using
> signals?

Yes.

> How is that implemented on a machine such as IA32. I mean how
> can a user-level thread library preempt a thread?

With an timer signal: SIGALRM, SIGVTARLM, etc. See setitimer(2)

You can also use sigaltstack, that was what I was thinking about, to
have the signal delivered on a distinct stack so you don't have to
bother with the signal stack frame.

When you do a thread context switch, mind updating the stack limit:
setrlimit(RLIMIT_STACK, &rlim) to match that of the thread stack.

> Stack frame = registers and return address saved on the stack before a
> function call?

Yes. Namely: struct sigframe.

The best source of information will be: /usr/src/linux
Check: /usr/src/linux/arch/$ARCH/kernel/signal.c
and: /usr/src/linux/include/asm-$ARCH/sigcontext.h

You can fetch the stack pointer of the interupted thread in the
sigcontext field of the sigframe. (Or refer to rt_sigframe if you use
a real-time signal). Exchanging the sigcontext records of the threads
to switch, in the sigframe, should be all you need to do to do the
thread context switch.

--
__Pascal Bourguignon__ http://www.informatimago.com/

Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay

Enrique Perez-Terron

unread,

Nov 19, 2005, 2:29:37 PM11/19/05

to

From the kernel's perspective there is no threads and no preemption in
the latter case. There is a single process that decides within its own
means to spend time on this or that.

A process cannot consider to switch task unless the flow of instructions
brings it to a function that does such deliberations. While in the middle
of computing pi to one million decimals, the flow of instructions does
not get near any such function. To solve this, the process can ask the
kernel for a little help, in the form of regular timer signals.

Signals do force an interruption of the flow of instructions. Once in
the signal handler, the threading library takes the steps needed to
"preempt" the computation of pi in favor of something else.

One of the technical hurdles that such a threading library must overcome,
is to arrange for each "thread" to have a separate stack. It must be posible
for one thread to unwind its stack even after another thread has spun
deeply into some recursion of its own.

A problem with user-space threads is if one thread does a blocking read,
there is no way the thread library can switch to another thread without
aborting or completing that read. The threading library needs to replace
all blocking system calls with its own wrappers, which call non-blocking
equivalents, and take suitable control when that equivalent fails to
deliver immediately.

However, if a program uses a new kind of device, executes some ioctl's that
the threading lib authors were not aware of, or that did not even exist at
the time, then the threading lib will not have a wrapper for that.

-Enrique

phil-new...@ipal.net

unread,

Nov 19, 2005, 3:55:20 PM11/19/05

to

On Fri, 18 Nov 2005 23:23:00 -0000 Grant Edwards <gra...@visi.com> wrote:
| On 2005-11-18, Andersen <anders...@hotmail.com> wrote:
|
|> How can a thread library in user land (not kernel) actually do context
|> switching.
|
| In the general case, it can't unless it's all done
| cooperatively.
|
|> How does it actually preempt?
|
| It doesn't.
|
|> I cannot imagine how that can be done in userland.
|
| It isn't.
|
| Under Linux, threads are just processes that share a memory
| space. All of the the context switching is done in the kernel.

All of the memory space is shared, including stack space, which means
each thread really has to be operating in a different address range
for the stack.

Personally, I'd rather share controlled portions of the address space
between tasks, and have things like the stack be unique.

I'm also curious why it is that strace works fine on multiple process
applications, but gets hung up on multple thread applications.

--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------

phil-new...@ipal.net

unread,

Nov 19, 2005, 4:02:18 PM11/19/05

to

On Sat, 19 Nov 2005 20:29:37 +0100 Enrique Perez-Terron <en...@online.no> wrote:

| A problem with user-space threads is if one thread does a blocking read,
| there is no way the thread library can switch to another thread without
| aborting or completing that read. The threading library needs to replace
| all blocking system calls with its own wrappers, which call non-blocking
| equivalents, and take suitable control when that equivalent fails to
| deliver immediately.

This is just a deficiency in the design of the system. A better design
would allow a context state to recognize that a system call is in progress
and any attempt to switch back to that context resumes the blocking that
was in effect, without even so much as returning incomplete from that call
in that context. Then allow signals to interrupt any blocking function,
and the return from that signal, in that context, would have no impact on
the call. A signal handler that decided "enough is enough" should also
have the ability to reset a context so the blocked call does return in an
incompleted state, or just abort the process without ever getting back to
that context.

Andersen

unread,

Nov 19, 2005, 5:03:31 PM11/19/05

to

Enrique Perez-Terron wrote:

> From the kernel's perspective there is no threads and no preemption in
> the latter case. There is a single process that decides within its own
> means to spend time on this or that.

Right.

> A process cannot consider to switch task unless the flow of instructions
> brings it to a function that does such deliberations. While in the middle
> of computing pi to one million decimals, the flow of instructions does
> not get near any such function. To solve this, the process can ask the
> kernel for a little help, in the form of regular timer signals.

Right. Though I am curious how that would be implemented on a IA32 arch
(Interrupts?).

> Signals do force an interruption of the flow of instructions. Once in
> the signal handler, the threading library takes the steps needed to
> "preempt" the computation of pi in favor of something else.
>
> One of the technical hurdles that such a threading library must overcome,
> is to arrange for each "thread" to have a separate stack. It must be
> posible
> for one thread to unwind its stack even after another thread has spun
> deeply into some recursion of its own.

Would it not be quite simple to have multiple stacks, save their stack
pointers on the heap of the thrd lib, and when context switching make
sure that the right stack pointer is used poiting to the right place?

> A problem with user-space threads is if one thread does a blocking read,
> there is no way the thread library can switch to another thread without
> aborting or completing that read. The threading library needs to replace
> all blocking system calls with its own wrappers, which call non-blocking
> equivalents, and take suitable control when that equivalent fails to
> deliver immediately.

Why is this a problem if your a using signals to preempt? I mean why
replace blocking calls. Why not just use the signal, interrupt the
blocking operation and context switch? Is it problematic to signal in
the middle of a blocking call?

> However, if a program uses a new kind of device, executes some ioctl's that
> the threading lib authors were not aware of, or that did not even exist at
> the time, then the threading lib will not have a wrapper for that.

Again, why would it not work as the signal would interrupt this
unexpected blocking operation?

Thanks a lot for your answers. This is exactly what I was looking for!

Enrique Perez-Terron

unread,

Nov 19, 2005, 7:15:36 PM11/19/05

to

On Sat, 19 Nov 2005 23:03:31 +0100, Andersen <anders...@hotmail.com> wrote:

>
> Enrique Perez-Terron wrote:
[...]

>> A process cannot consider to switch task unless the flow of instructions
>> brings it to a function that does such deliberations. While in the middle
>> of computing pi to one million decimals, the flow of instructions does
>> not get near any such function. To solve this, the process can ask the
>> kernel for a little help, in the form of regular timer signals.
>
> Right. Though I am curious how that would be implemented on a IA32 arch
> (Interrupts?).

In what way is this different for a IA32 arch processor? What other kind
of processors would you compare with?

At the level where the processor per se is visible, you have external chips
deliver interrupts to the processor, and all such interrupts invoke interrupt
handlers that run in kernel mode. All processors I know about invoke
interrupt handlers in whatever protection mode or ring level, or whatever
it's called, that is used by the kernel, and implies hardware access etc.

Remember, on a uniprocessor, when the kernel starts a process, the processor
starts executing instructions from that process' code, and the processor
by itself has no other means of being intelligent about what it is doing
than by executing instructions that are intelligently composed.

So, while the processor is executing the process' code, it has no other
intelligence than that which is implicit in that process. The kernel, at
that point in time is not a live being that can decide and act, it is just
a bunch of instructions lying idle in the memory, about like the letters in
a book in a shelf. Those instructions are powerless to look at the watch
every thousandth of a second, as long as those instructions are not
executed.

This is where the interrupt comes in. The processor is built that way, it
will not fetch and run the next instruction of the running program, it
will instead fetch and run instructions from an interrupt handler whose
address is found through a hardwired procedure. In this way, the kernel
regains control over the computer every while, thanks to the regular
interrupts from a timer circuit.

In order for a user-level library like a user-level threads library to
get notification at times when the thread (and therefore, the process) is
doing the proverbial computation of pi decimals, i.e., when the flow of
instructions does not naturally lead to functions in the thread library
(which could poll the os to learn how long the pi decimals have been
computing) -- you need something coming in from outside the process itself,
and that is the signal, which the kernel can deliver, prompted by a
hardware interrupt from an external (or internal, for that matter) circuit.

If you now ask, how does the kernel deliver signals, then we are no longer
talking about threads, but about the kernel and its processes (or tasks,
or kernel-level threads) in general. Whenever the kernel is running some
of its code, and is about to resume the active process, it checks the state
of the signal bits in the process (er, task) structure, and if one is set,
it invokes the process signal handler instead of the normal resumption
point - after arranging whatever messy details are required so the normal
resumption point can be resumed later, after the signal handler returns.

>> Signals do force an interruption of the flow of instructions. Once in
>> the signal handler, the threading library takes the steps needed to
>> "preempt" the computation of pi in favor of something else.
>>
>> One of the technical hurdles that such a threading library must overcome,
>> is to arrange for each "thread" to have a separate stack. It must be
>> posible
>> for one thread to unwind its stack even after another thread has spun
>> deeply into some recursion of its own.
>
> Would it not be quite simple to have multiple stacks, save their stack
> pointers on the heap of the thrd lib, and when context switching make
> sure that the right stack pointer is used poiting to the right place?

That is approximately what thread libraries do, as far as I know. There
is a list of tread structures, each containing the data needed to resume
that thread, just like the kernel maintains similar data for the proceses.

Each thread has a private stack region, and a resuption address on the stack.
The value of the stack pointer register for the thread is in one of three
places: in the library's tread structure, in the kernel's process structure,
or in the stack pointer register, when the thread is actually executing.

>> A problem with user-space threads is if one thread does a blocking read,
>> there is no way the thread library can switch to another thread without
>> aborting or completing that read. The threading library needs to replace
>> all blocking system calls with its own wrappers, which call non-blocking
>> equivalents, and take suitable control when that equivalent fails to
>> deliver immediately.
>
> Why is this a problem if your a using signals to preempt? I mean why
> replace blocking calls. Why not just use the signal, interrupt the
> blocking operation and context switch? Is it problematic to signal in
> the middle of a blocking call?

Because the blocking is at kernel level, and the kernel does "know" or care
that the process is multiple threads. The kernel simply blocks the process,
including the threading library.

When a signal arrives, the kernel must decide to deliver the signal, or not.
If it does, it must decide to abort the system call (returning with a status
of EINTR) or restart the system call. The latter means the control will go
back to the kernel after the signal handler finishes, and then back to the
program when the system call finishes.

In any case this is not dependent on the process being multithreaded.

Perhaps the signal handling routine can switch context and run another
thread while the kernel thinks you are still running a signal handler.

Now consider the following: a thread issues a system call of the restartable
variety, and a signal is delivered while this system call is incomplete.
The threading library receives the signal in its signal handler. The treading
library does not even know that the currently active thread is a blocked
in a system call. It has no way to know that it should switch context, and
not switch back, until that system call has returned.

Further, assume that the thread library switches context, and the new context
issues a second restartable system call. Just assume that both calls are
read calls that will transfer data from some external network connection
to a disk buffer. (I don't know what calls are actually restartable,
anybody can correct me here.) Now the kernel must handle two return
addresses for the process, one for each system call. It must ensure that
each return is associated with the correct number of bytes tranferred or
any other status information. Now you want the two threads to be resumed
with the correct status, no matter what system call finishes first.
I guess you see there is a can of worms here.

>> However, if a program uses a new kind of device, executes some ioctl's that
>> the threading lib authors were not aware of, or that did not even exist at
>> the time, then the threading lib will not have a wrapper for that.
>
> Again, why would it not work as the signal would interrupt this
> unexpected blocking operation?

I guess this is really the same question as above.

> Thanks a lot for your answers. This is exactly what I was looking for!

Just a final word: I'm not God. I'm not even a kernel programmer. I could
be mistaken. It's just my two cents.

-Enrique

Enrique Perez-Terron

unread,

Nov 19, 2005, 7:32:49 PM11/19/05

to

On Sun, 20 Nov 2005 01:15:36 +0100, Enrique Perez-Terron <en...@online.no> wrote:

> Because the blocking is at kernel level, and the kernel does "know" or care

Should be: does not "know" or care..

> that the process is multiple threads. The kernel simply blocks the process,
> including the threading library.
>
> When a signal arrives, the kernel must decide to deliver the signal, or not.
> If it does, it must decide to abort the system call (returning with a status
> of EINTR) or restart the system call.

Thinking about it, I believe that the restart is really accomplished in the
libc. The kernel just interrupts the system call. Anyone who knows can
correct me here.

But then the libc does know what system calls were issued, they were issued
through libc. Libc also knows when signal handlers are invoked, because they
are installed though libc. If other code has it's own system call interface
bypassing libc, then that code must handle any restart, if so is desired.

-Enrique

Grant Edwards

unread,

Nov 19, 2005, 10:05:34 PM11/19/05

to

On 2005-11-19, phil-new...@ipal.net <phil-new...@ipal.net> wrote:

>| Under Linux, threads are just processes that share a memory
>| space. All of the the context switching is done in the kernel.
>
> All of the memory space is shared, including stack space,
> which means each thread really has to be operating in a
> different address range for the stack.

Correct.

> Personally, I'd rather share controlled portions of the
> address space between tasks, and have things like the stack be
> unique.

That would break a few (probably seldom-used) things, but it
would be safer.

> I'm also curious why it is that strace works fine on multiple
> process applications, but gets hung up on multple thread
> applications.

Don't know.

--
Grant Edwards grante Yow! As a FAD follower,
at my BEVERAGE choices are
visi.com rich and fulfilling!

Kasper Dupont

unread,

Nov 20, 2005, 5:35:48 PM11/20/05

to

Pascal Bourguignon wrote:
>
> Now the problem is to do it premptively. Since on unix normally the
> only asynchronous events a process can receive are signals, we should
> us them.

Yes, signals and interrupts are similar enough to
both be used for this purpose. And you will have
the same kinds of race conditions to worry about.

> But if we switch the context in the middle of a signal
> handling function, I'm not sure all systems will appreciate: it's
> better to return from the signal handler.

Actually this part is easy. You just have to make
correct use of masks to block signals. AFAIR by
default all signals are blocked while a signal
handler is executing, so there will be no risk of
preemption taking place while a signal is being
handled.

Of course you must worry about being preempted
while you are just about to give up the CPU
voluntarilly. For that reason the scheduling
function needs to block signals as well. Of
course it must save the mask before that. The
state will end up on the stack and become part
of the saved thread context.

And with a few extra levels of function calls all
you have to change is a single frame pointer
somewhere on the stack. And returning through those
functions again, everything will automagically work
correctly.

You have to know exactly what the stackframes
looks like to make it work.

Actually if you do preemptive scheduling within a
SIGALRM handler, the easiest way to give up the CPU
voluntarily may be simply to raise SIGALRM.

> So what you can do is to
> use SIGALRM, and in the signal handler, you pop the signal stack
> frame and save it temporarily, do the context switching, push the
> saved signal stack frame, and return from the signal handler. You
> need to know how to determine the size of signal stack frame.

You don't need to push and pop stack frames manually.
In fact the signal handling will do all the hard work
with saving and restoring the state, so the trickiest
part really is what the initial stack layout of a
thread needs to be.

I have an old example demonstrating how this can be
done in all C code, there is not one bit of assembler
code in there:

http://www.daimi.au.dk/~kasperd/scheduler.c

However this example does make some assumptions about
the stack layour, which does not hold anymore. I'm
not 100% sure, but I think some older gcc is required
to make it work.

Besides, doing this in user mode is not the optimal
way. A good kernel mode thread switching (such as the
one in Linux 2.6) is way better. A combined kernel
and user mode implementation would allow for the best
performance as you can avoid the user/kernel switches
when a thread gives up the CPU voluntarilly. But
getting the best from a user mode and a kernel mode
implementation would be very complicated.

--
Kasper Dupont
Note to self: Don't try to allocate
256000 pages with GFP_KERNEL on x86.

Kasper Dupont

unread,

Nov 20, 2005, 5:52:11 PM11/20/05

to

Andersen wrote:
>
> I mean how can a
> user-level thread library preempt a thread?

It tells the kernel to deliver signals, for example
once every 100ms. It is not that much different from
how the kernel tells the hardware to deliver interrupts
whenever some I/O is ready and once every ms.

>
> Stack frame = registers and return address saved on the stack before a
> function call?

Something like that, but the details are a litle more
complicated. You don't save all registers in every
stack frame, so there need to be a convention about
who is responsible for which registers to be saved.

One convention would be that all registers are call
clobbered, meaning that the callee is allowed to change
every register and the caller is responsible to save
those it need preserved across the call.

Another convention would be that no registers are call
clobbered, meaning that the caller can rely on the
values to be the same after the call as they were
before, and the callee is responsible to save the state
of all those registers it intend to change.

Of course one can also have a convention with some
registers being clobbered by a call and others preserved.

Knowing which convention is being used is obviously an
advantage when writing the context switching code.

There are other things in a stack frame. For example all
local variables are in the stack frame. This is relevant
because you might also need to save stuff like signal
masks in local variables to preserve them until the next
time that thread is scheduled. You also need to know
that during the switching you will temporarilly have an
inconsistency between the stack and the registers, which
means local variables may not work as expected.

Typically the stack frames also contain references to
each other. On entry to a function, it may push the base
pointer onto the stack and copy the stack pointer to the
base pointer. Assuming this copy is used to restore the
stack pointer, you may actually switch stacks by changing
the base pointer rather than the stack pointer. Or even
by changing a base pointer saved on the stack and then
returning from the scheduler.

Kasper Dupont

unread,

Nov 20, 2005, 6:16:46 PM11/20/05

to

Andersen wrote:

>
> Enrique Perez-Terron wrote:
>
> > A process cannot consider to switch task unless the flow of instructions
> > brings it to a function that does such deliberations. While in the middle
> > of computing pi to one million decimals, the flow of instructions does
> > not get near any such function. To solve this, the process can ask the
> > kernel for a little help, in the form of regular timer signals.
>
> Right. Though I am curious how that would be implemented on a IA32 arch
> (Interrupts?).

No it is not interrupts, it is signals. There are quite some
similarities but they are at different abstraction levels.
Interrupts are implemented by the hardware such that the
kernel can make use of them. Signals are implemented by the
kernel such that applications can make use of them.

What the kernel does to deliver a signal involes modifying
the user mode stack. It creates a new stack frame saving all
registers and signal masks. It also arranges for a special
system call to be called on return from the handler. This
return from signal system call will then restore user mode
registers from the saved values.

The kernel doesn't care if you switch stack causing such
signal handlers to return in an order completely unrelated
to the one they were invoked. It will just restore what it
is told to restore (it is user state we are dealing with, so
the kernel has nothing to worry about).

>
> Would it not be quite simple to have multiple stacks, save their stack
> pointers on the heap of the thrd lib, and when context switching make
> sure that the right stack pointer is used poiting to the right place?

Yes, that is exactly how switching works. The most tricky
part often prove to be how to create an initial stack such
that switching to a newly created thread works.

>
> Why is this a problem if your a using signals to preempt? I mean why
> replace blocking calls. Why not just use the signal, interrupt the
> blocking operation and context switch? Is it problematic to signal in
> the middle of a blocking call?

First of all interrupting the blocking call is not good
enough. You want the CPU to be used for something as soon
as one thread blocks. Say you have an application that have
one CPU bound thrad and one I/O bound thread. The I/O bound
thread blocks in a system call, and the CPU becomes idle.
The CPU bound thread is still waiting for the CPU even
though it is idle, only later while a singal arrives will
the thread be scheduled. But that is not the only problem,
because once the CPU bound thread has got the CPU, obviously
the process is no longer blocked waiting for I/O, so when
whatever hardware it was waiting for has completed, the
process will not notice. Only later when the thread switching
happens again, the I/O bound thread will restart the blocked
system call.

The result will be that both the CPU bound and the I/O bound
thread executes at half the speed of what could have been
achieved. A pure user mode thread implementation solving this
is only possible if the kernel offers some async I/O interface
which can be used instead of blocking calls. then a signal
from the kernel will ensure the library can switch threads
once the I/O bound thread can unblock. This is similar to the
advantages of hardware sending interrupts.

But such a user mode thread implementation is going to be
complicated, and still doesn't take advantage of multi CPU
systems. So one might as well move the entire threading into
the kernel. The kernel already has most of the threading
infrastructure, so it doesn't get significantly more
complicated.

>
> > However, if a program uses a new kind of device, executes some ioctl's that
> > the threading lib authors were not aware of, or that did not even exist at
> > the time, then the threading lib will not have a wrapper for that.
>
> Again, why would it not work as the signal would interrupt this
> unexpected blocking operation?

Like I explained above, a signal interrupting a blocking
system call does not give you a good threading performance.
Besides not all blocking calls are interruptible.

Kasper Dupont

unread,

Nov 20, 2005, 6:19:13 PM11/20/05

to

phil-new...@ipal.net wrote:
>
> On Sat, 19 Nov 2005 20:29:37 +0100 Enrique Perez-Terron <en...@online.no> wrote:
>
> | A problem with user-space threads is if one thread does a blocking read,
> | there is no way the thread library can switch to another thread without
> | aborting or completing that read. The threading library needs to replace
> | all blocking system calls with its own wrappers, which call non-blocking
> | equivalents, and take suitable control when that equivalent fails to
> | deliver immediately.
>
> This is just a deficiency in the design of the system.

Yes. And I'd say the deficiency in this case is the
attempt to do threading in user mode.

> A better design
> would allow a context state to recognize that a system call is in progress
> and any attempt to switch back to that context resumes the blocking that
> was in effect, without even so much as returning incomplete from that call
> in that context. Then allow signals to interrupt any blocking function,
> and the return from that signal, in that context, would have no impact on
> the call. A signal handler that decided "enough is enough" should also
> have the ability to reset a context so the blocked call does return in an
> incompleted state, or just abort the process without ever getting back to
> that context.

Isn't this just a complicated way to say that the
threading should be implemented by the kernel?

Nix

unread,

Nov 21, 2005, 6:48:32 AM11/21/05

to

On 19 Nov 2005, phil-new...@ipal.net wondered:

> I'm also curious why it is that strace works fine on multiple process
> applications, but gets hung up on multple thread applications.

Because you have an old strace, probably. There have been threading
fixes in strace right up until 4.5.4: I'm running 4.5.9 here, and
it works fine with multithreaded apps (on i686 and sparc64 both).

--
`Y'know, London's nice at this time of year. If you like your cities
freezing cold and full of surly gits.' --- David Damerell

Andersen

unread,

Nov 21, 2005, 4:29:28 PM11/21/05

to Kasper Dupont

Kasper Dupont wrote:

> systems. So one might as well move the entire threading into
> the kernel. The kernel already has most of the threading
> infrastructure, so it doesn't get significantly more
> complicated.

My whole question about threads is because I want to understand the
performance penalties associated with kernel threading.

Could you please explain the steps taken by the kernel to make the
context switch, in particular from a performance penalty point of view.
Why is this so slow (having a few thousand threads can be slow)?

Michel Talon

unread,

Nov 21, 2005, 5:23:46 PM11/21/05

to

There is not so big difference as you may think between the different
scenarios. Here i am showing you results obtained on a FreeBSD machine because
there are different threading libraries in standard:
-one pure userland threading libc_r
-one N:M libpthread using the concept of KSE
-one 1:1 kernel managed similar to Linux called libthr
This is on a monoproc Athlon 1 Ghz.
asmodee% time ./aqueue_c_r -n 1000000
pusher started
poper started
./aqueue_c_r -n 1000000 2,08s user 0,00s system 98% cpu 2,102 total
asmodee% time ./aqueue_kse -n 1000000
pusher started
poper started
./aqueue_kse -n 1000000 1,95s user 0,01s system 96% cpu 2,027 total
asmodee% (export LIBPTHREAD_SYSTEM_SCOPE=1;time aqueue_kse -n 1000000)
pusher started
poper started
aqueue_kse -n 1000000 1,86s user 1,93s system 99% cpu 3,828 total
asmodee% time ./aqueue_thr -n 1000000
pusher started
poper started
./aqueue_thr -n 1000000 0,54s user 0,00s system 90% cpu 0,598 total

The benchmarking program is here:
http://www.lpthe.jussieu.fr/~talon/aqueue.c
It appeared on a FreeBSD mailing list. You can play with it on Linux.
It has been compiled this way:
cc -O2 -static aqueue.c -D_THREAD_SAFE -lc_r -o aqueue_c_r
cc -O2 -static aqueue.c -D_THREAD_SAFE -lthr -o aqueue_thr
cc -O2 -static aqueue.c -D_THREAD_SAFE -lpthread -o aqueue_kse

You can see on this particular example that the simple 1:1 library behaves
better, while the others are not much different. It would be interesting to do
the same on a biproc, for example.

--

Michel TALON

David Schwartz

unread,

Nov 22, 2005, 1:12:08 AM11/22/05

to

"Andersen" <anders...@hotmail.com> wrote in message

news:437fa13f$0$30590$892e...@authen.yellow.readfreenews.net...

> Right. Though I am curious how that would be implemented on a IA32 arch
> (Interrupts?).

We're in user-space, not kernel space. You don't need interrupts.

> Again, why would it not work as the signal would interrupt this unexpected
> blocking operation?

Yes, but then what would happen? Suppose the function was supposed to
block until something happens. The code might fail if the function returns
when that thing hasn't happened yet.

DS

David Schwartz

unread,

Nov 22, 2005, 1:14:53 AM11/22/05

to

"Andersen" <anders...@hotmail.com> wrote in message

news:43823C38...@hotmail.com...
> Kasper Dupont wrote:

>> systems. So one might as well move the entire threading into
>> the kernel. The kernel already has most of the threading
>> infrastructure, so it doesn't get significantly more
>> complicated.

> My whole question about threads is because I want to understand the
> performance penalties associated with kernel threading.

There aren't any.

> Could you please explain the steps taken by the kernel to make the context
> switch, in particular from a performance penalty point of view. Why is
> this so slow (having a few thousand threads can be slow)?

If you are doing so many context switches that their performance is
affecting your application, you have no idea how to write proper threaded
programs. Context switches should be rare, only needed when there is no work
to do, a thread uses up its whole timeslice, or you hit a condition that
isn't expected to impact performance.

DS

phil-new...@ipal.net

unread,

Nov 22, 2005, 10:24:28 AM11/22/05

to

On Mon, 21 Nov 2005 11:48:32 +0000 Nix <nix-ra...@esperi.org.uk> wrote:

| On 19 Nov 2005, phil-new...@ipal.net wondered:
|> I'm also curious why it is that strace works fine on multiple process
|> applications, but gets hung up on multple thread applications.
|
| Because you have an old strace, probably. There have been threading
| fixes in strace right up until 4.5.4: I'm running 4.5.9 here, and
| it works fine with multithreaded apps (on i686 and sparc64 both).

That strace has to be "fixed" to handle threads still indicates that
threads are something more than just processes with shared memory.

phil-new...@ipal.net

unread,

Nov 22, 2005, 10:23:07 AM11/22/05

to

On Sun, 20 Nov 2005 03:05:34 -0000 Grant Edwards <gra...@visi.com> wrote:

|> Personally, I'd rather share controlled portions of the
|> address space between tasks, and have things like the stack be
|> unique.
|
| That would break a few (probably seldom-used) things, but it
| would be safer.

It would break sharing data obtained with alloca(), obviously. That should
be a no-no in threads (or even better, wrapped to do an abort).

But I was really saying I would rather start with processes and then set up
specific memory to be shared by some appropriate method.

phil-new...@ipal.net

unread,

Nov 22, 2005, 10:27:05 AM11/22/05

to

Not exactly. It's something that would affect non-threaded processes as
well. But it would make things a bit cleaner overall, IMHO. I can't
say that it wouldn't break something that now depends on this behaviour
that I think should never have been there to begin with.

Kasper Dupont

unread,

Nov 23, 2005, 9:32:35 AM11/23/05

to

phil-new...@ipal.net wrote:
>
> Personally, I'd rather share controlled portions of the address space
> between tasks, and have things like the stack be unique.

You'd lose much of the performance benefit you'd otherwise
have gotten from using threads. Unless everything is shared
you'd need to do a TLB flush when switching and consequently
take a performance hit.

And sometimes there are good reasons to share data located
on a stack. You haven't noticed how kernel code makes use of
that all the time?

Nix

unread,

Nov 30, 2005, 8:23:11 AM11/30/05

to

On 22 Nov 2005, phil-new...@ipal.net stipulated:

> On Mon, 21 Nov 2005 11:48:32 +0000 Nix <nix-ra...@esperi.org.uk> wrote:
>
>| On 19 Nov 2005, phil-new...@ipal.net wondered:
>|> I'm also curious why it is that strace works fine on multiple process
>|> applications, but gets hung up on multple thread applications.
>|
>| Because you have an old strace, probably. There have been threading
>| fixes in strace right up until 4.5.4: I'm running 4.5.9 here, and
>| it works fine with multithreaded apps (on i686 and sparc64 both).
>
> That strace has to be "fixed" to handle threads still indicates that
> threads are something more than just processes with shared memory.

Well, strace supports tracing multiple threads at once, or only one
thread, or all new threads... that all needs support and the rules for
all of that changed repeatedly while NPTL was congealing.