Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Threads and Signals

7 views
Skip to first unread message

Sarat Babu Kamisetty

unread,
Feb 1, 2002, 1:09:57 AM2/1/02
to
Hi,

I have a very simple program (pasted below) where a main thread creates two
other threads and sleeps for a second. These two threads install a signal
handler for same signal SIGURG and just block by calling select(). Now the
main thread sends a signal to each of those threads using thr_kill(). For
some reason, only one thread is getting the signal and then it core dumps.
My code, sample run and gdb analysis are below. Looks like when delivering
the second signal, it is accessing a NULL pointer in kernel. I think after
delivering the first signal kernel seems to be resetting the signal handler
pointer to NULL and so while delivering the second signal it crashes. I was
wondering if anyone have an explanation for this behavior ? Both the
following approaches work to avoid the crash:

1) Changing the default_signal_handler() to call signal() function again to
re-install the signal handler AND by putting sleep(1) between the two calls
to thr_kill() in main avoids the crash.
2) Each thread installs signal handler for DIFFERENT signals

Coming to the actual problem, i want to create multiple threads each of
these install signal handler for the SAME signal. All of these call select()
at one point or the other either for waiting for socket data or any signals
from other threads. When some active thread (say A) wants to signal other
threads (B, C, D), it calls thr_kill() with each of those threads IDs (in a
loop). According to my observations described earlier, it looks only one of
B/C/D will get the signal and rest of them block. How do i wake up all those
threads ? Unix interface seems to be strange, you can install signal handler
for the whole process for a given signal (not per each thread), but can
signal based on thread ids. Can any help me out here ? Since i have large
number of threads i cannot go with approach 2). Approach 1) does not seem to
be acceptable because it will introduce many delays. I also have to use
select() (not anything like conditional variables or mutexes since i have to
watch for socket data as well as signals too).

Thanks,
Sarat

************* Code ***************
#include <thread.h>
#include <signal.h>

void default_signal_handler(int signal) {
printf("tid = %d in signal handler\n", thr_self());
}

void func(void) {
printf("thread tid = %d created\n", thr_self());
signal(SIGURG, default_signal_handler);
select(0, NULL, NULL, NULL, 0);
while(1);
}

int main() {
unsigned int tid1, tid2;
int ret;

thr_create(NULL, NULL, (void *)func, NULL, THR_DETACHED, &tid1);
thr_create(NULL, NULL, (void *)func, NULL, THR_DETACHED, &tid2);

sleep(1);

thr_kill(tid1, SIGURG);
thr_kill(tid2, SIGURG);

while (1);
}

Sample Output:

>gcc -g thr.c -lpthread
>a.out
thread tid = 4 created
thread tid = 5 created
tid = 4 in signal handler
Segmentation fault (core dumped)

GDB Analysis:

>gdb a.out
(gdb) core core
Core was generated by `a.out'.
Program terminated with signal 9, Killed.
Reading symbols from /usr/lib/libpthread.so.1...done.
Reading symbols from /usr/lib/libc.so.1...done.
Reading symbols from /usr/lib/libdl.so.1...done.
Reading symbols from
/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1...
done.
Reading symbols from /usr/lib/libthread.so.1...done.
#0 0x0 in ?? ()
(gdb) bt
#0 0x0 in ?? ()
#1 0xef656658 in __sighndlr ()
#2 <signal handler called>
#3 0xef6b743c in poll ()
#4 0x11ad8 in func ()


David Butenhof

unread,
Feb 1, 2002, 6:17:52 AM2/1/02
to
Sarat Babu Kamisetty wrote:

> I have a very simple program (pasted below) where a main thread creates
> two other threads and sleeps for a second. These two threads install a
> signal handler for same signal SIGURG and just block by calling select().
> Now the main thread sends a signal to each of those threads using
> thr_kill(). For some reason, only one thread is getting the signal and
> then it core dumps. My code, sample run and gdb analysis are below. Looks
> like when delivering the second signal, it is accessing a NULL pointer in
> kernel. I think after delivering the first signal kernel seems to be
> resetting the signal handler pointer to NULL and so while delivering the
> second signal it crashes. I was wondering if anyone have an explanation
> for this behavior ? Both the following approaches work to avoid the crash:
>
> 1) Changing the default_signal_handler() to call signal() function again
> to re-install the signal handler AND by putting sleep(1) between the two
> calls to thr_kill() in main avoids the crash.

Right. Because handlers installed by signal() go away when the signal is
delivered. That also means your second thr_kill(), for the other thread,
won't find a handler (which is why you need the sleep() in addition to
reloading the handler). Use sigaction() instead, which won't reset the
signal action to SIG_DFL before delivering it (unless you tell it to,
which, presumably, you wouldn't. ;-) )

> 2) Each thread installs signal handler for DIFFERENT signals

Yes, because a signal handler is a PROCESS resource. There's only one for
the address space, not one for each thread. When two threads install
separate handlers for the same signal, that's no different from a
traditional single threaded program calling signal() (or sigaction()) twice
in a row for the same signal number. "There can be only one."

Two threads CAN wait in sigwait() for the same signal, and then do whatever
they want (independently of each other) when they receive the signal. You
can thr_kill() or pthread_kill() whichever thread you want; but of course
they need to be waiting. You could also POLL for signals, by running with
the signal blocked, and calling sigtimedwait() when you want to check. Of
course that won't help if you want to block in select().

Either work out a solution that uses a single process-wide signal handler,
or work out something entirely different to wake your select(), such as
adding a special fd that's not normally used for anything... and simply
write to it when you want your select() to awaken.

For the purpose you describe, it wouldn't be hard to use a single handler.
In fact, there's no reason it needs to do anything at all except to return,
which would cause your select() calls to return with EINTR. You'd need to
be sure that each thread blocked that signal except around calls to
select(), though, or else you'd risk interrupting some other syscall.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Alexander Terekhov

unread,
Feb 1, 2002, 9:05:30 AM2/1/02
to
David Butenhof <David.B...@compaq.com> wrote in message news:<QWu68.315$am1....@news.cpqcorp.net>...

Sorry, I am surely missing something because I fail
to understand how to prevent/avoid a "race condition"
with respect to select and NOOP signal handler...

How could one ensure that by the time a thread will
process this NOOP signal it will indeed be in the
state of "select shall return [EINTR]"?

Why could it (i.e. signal processing) NOT happen
too early or too late wrt select->EINTR and therefore
just have no effect whatsoever?

I also have a question about threads and *pselect*
which does have "const sigset_t *restrict sigmask"
argument:

"31190 If sigmask is not a null pointer, then the pselect()
function shall replace the signal mask of the
31191 process by the set of signals pointed to by sigmask
^^^^^^^ ?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?
before examining the descriptors, and shall
31192 restore the signal mask of the process before
^^^^^^^ ?!?!?!?!?!?!
returning."

Thanks!

regards,
alexander.

David Butenhof

unread,
Feb 1, 2002, 10:21:39 AM2/1/02
to
Alexander Terekhov wrote:

> David Butenhof <David.B...@compaq.com> wrote in message

>>

>> For the purpose you describe, it wouldn't be hard to use a single
>> handler. In fact, there's no reason it needs to do anything at all except
>> to return, which would cause your select() calls to return with EINTR.
>> You'd need to be sure that each thread blocked that signal except around
>> calls to select(), though, or else you'd risk interrupting some other
>> syscall.
>
> Sorry, I am surely missing something because I fail
> to understand how to prevent/avoid a "race condition"
> with respect to select and NOOP signal handler...
>
> How could one ensure that by the time a thread will
> process this NOOP signal it will indeed be in the
> state of "select shall return [EINTR]"?
>
> Why could it (i.e. signal processing) NOT happen
> too early or too late wrt select->EINTR and therefore
> just have no effect whatsoever?

And your point is? I never said trying to do this with signals was a good
idea. I merely gave you some pointers if you really want to try. Telling
people not to try to do what they want to do usually doesn't work very
well; though helping them to realize the complications sometimes will.

Your sample code, of course, has the same race, even if it worked as you'd
like. A signal could arrive between your signal() call and the select()
call, you know.

Perhaps you should explain a little more about what you intend to
accomplish? Are you expecting that your thr_kill() will somehow kill the
thread rather than merely breaking out of a select() with EINTR? (It won't;
nor is there anything more you can do in a signal handler, since functions
like thr_exit/pthread_exit aren't allowed at signal level.) The sample just
goes into a compute bound loop on return from select(), without even
testing the status, so I can't infer what you'd expect "real code" to do at
that point.

> I also have a question about threads and *pselect*
> which does have "const sigset_t *restrict sigmask"
> argument:
>
> "31190 If sigmask is not a null pointer, then the pselect()
> function shall replace the signal mask of the
> 31191 process by the set of signals pointed to by sigmask
> ^^^^^^^ ?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?
> before examining the descriptors, and shall
> 31192 restore the signal mask of the process before
> ^^^^^^^ ?!?!?!?!?!?!
> returning."

Sounds like pselect() would solve your race by allowing you to enable
SIGURG for the thread only within the pselect() call.

The fact that the man page says "process" is certain to be an error in the
man page. On any OS with anything approaching real thread support
(including Solaris) it's virtually impossible to affect other threads'
state as the text implies. There is no such thing as a process signal mask.

Of course, pselect(), while interesting, is a completely nonstandard and
nonportable function. If that's important to you. (Though the UI thread
interface you're using isn't very portable either.)

Alexander Terekhov

unread,
Feb 1, 2002, 2:05:46 PM2/1/02
to
David Butenhof <David.B...@compaq.com> wrote in message news:<pvy68.331$am1....@news.cpqcorp.net>...
> Alexander Terekhov wrote:
^^^^^^^^^^^^^^^^^^
(OP in this thread, who is using UI thread interfaces, etc is
Sarat Babu Kamisetty; NOT me! ;-)

>
> > David Butenhof <David.B...@compaq.com> wrote in message
>
> >>
> >> For the purpose you describe, it wouldn't be hard to use a single
> >> handler. In fact, there's no reason it needs to do anything at all except
> >> to return, which would cause your select() calls to return with EINTR.
> >> You'd need to be sure that each thread blocked that signal except around
> >> calls to select(), though, or else you'd risk interrupting some other
> >> syscall.
> >
> > Sorry, I am surely missing something because I fail
> > to understand how to prevent/avoid a "race condition"
> > with respect to select and NOOP signal handler...
> >
> > How could one ensure that by the time a thread will
> > process this NOOP signal it will indeed be in the
> > state of "select shall return [EINTR]"?
> >
> > Why could it (i.e. signal processing) NOT happen
> > too early or too late wrt select->EINTR and therefore
> > just have no effect whatsoever?
>
> And your point is? I never said trying to do this with signals was a good
> idea. I merely gave you some pointers if you really want to try. Telling
> people not to try to do what they want to do usually doesn't work very
> well; though helping them to realize the complications sometimes will.

*My* point is that perhaps *Sarat Babu Kamisetty* should take a look
at thread *cancellation* or an extra/special fd, already suggested by
you!

The quotes were taken from my copy of the Final
Draft 7 (IEEE Std 1003.1-2001)! Also, how about
this:

http://www.opengroup.org/onlinepubs/007904975/functions/pselect.html

> On any OS with anything approaching real thread support
> (including Solaris) it's virtually impossible to affect other threads'
> state as the text implies. There is no such thing as a process signal mask.
>
> Of course, pselect(), while interesting, is a completely nonstandard and
> nonportable function. If that's important to you. (Though the UI thread
> interface you're using isn't very portable either.)

"nonstandard and nonportable" given the link above? Or you
just mean that there are not so many SUS/Version *3* systems
currently available out there? ;-)

regards,
alexander.

David Butenhof

unread,
Feb 4, 2002, 8:55:34 AM2/4/02
to
Alexander Terekhov wrote:

> David Butenhof <David.B...@compaq.com> wrote in message
> news:<pvy68.331$am1....@news.cpqcorp.net>...
>> Alexander Terekhov wrote:
> ^^^^^^^^^^^^^^^^^^
> (OP in this thread, who is using UI thread interfaces, etc is
> Sarat Babu Kamisetty; NOT me! ;-)

Correction noted, Alexander. Sometimes this business of quoting multiple
levels of reply can get confusing. It's too easy to start replying to
various paragraphs without noting from which level of quote it originated.

So, sorry. ;-)

>> > David Butenhof <David.B...@compaq.com> wrote in message
>>
>> >> For the purpose you describe, it wouldn't be hard to use a single
>> >> handler. In fact, there's no reason it needs to do anything at all
>> >> except to return, which would cause your select() calls to return with
>> >> EINTR. You'd need to be sure that each thread blocked that signal
>> >> except around calls to select(), though, or else you'd risk
>> >> interrupting some other syscall.
>> >
>> > Sorry, I am surely missing something because I fail
>> > to understand how to prevent/avoid a "race condition"
>> > with respect to select and NOOP signal handler...
>> >
>> > How could one ensure that by the time a thread will
>> > process this NOOP signal it will indeed be in the
>> > state of "select shall return [EINTR]"?
>> >
>> > Why could it (i.e. signal processing) NOT happen
>> > too early or too late wrt select->EINTR and therefore
>> > just have no effect whatsoever?
>>
>> And your point is? I never said trying to do this with signals was a good
>> idea. I merely gave you some pointers if you really want to try. Telling
>> people not to try to do what they want to do usually doesn't work very
>> well; though helping them to realize the complications sometimes will.
>
> *My* point is that perhaps *Sarat Babu Kamisetty* should take a look
> at thread *cancellation* or an extra/special fd, already suggested by
> you!

I decided not to diverge into cancellation, but, yeah, you're right;
cancellation MAY be a viable alternative, depending on the actual intent.
Since you didn't actually MENTION the word "cancel" anywhere in the reply I
quoted, I'm not quite sure how anyone was to have inferred that noble
intent, though.

(And, again, I slid into replying to you as if you had posted the original
question, and I apologize for the lapse. ;-) )

Irrelevant note: as with POSIX, be careful about the subtle but critical
distinctions between "final draft" and "standard".

However, in this case, two things are evident:

1) This newfangled pselect() went into the specification entirely without
my notice. Interesting. The manpage format of the specification makes it
easier to overlook, as it's combined with select(). In reviewing such a
massive stack of documentation it's easy to miss something new combined
with something tediously familiar.

2) You've uncovered a bug in the specification that should be repaired in
the corrigenda. There is no such thing as a "process signal mask", and even
if it existed pselect() would have no business altering it. It needs to be
fixed to specify the THREAD signal mask.

>> On any OS with anything approaching real thread support
>> (including Solaris) it's virtually impossible to affect other threads'
>> state as the text implies. There is no such thing as a process signal
>> mask.
>>
>> Of course, pselect(), while interesting, is a completely nonstandard and
>> nonportable function. If that's important to you. (Though the UI thread
>> interface you're using isn't very portable either.)
>
> "nonstandard and nonportable" given the link above? Or you
> just mean that there are not so many SUS/Version *3* systems
> currently available out there? ;-)

OK, so you've got me there. It's "standard". Fine.

However; even though you're reading from the document designated as UNIX
2001, POSIX 1003.1-2001, and soon ISO/IEC 9945-1:2001, which is about as
"standard" as you can get, don't let that lead you to make any unfounded
assumptions about portability. The new functions in this standard aren't
likely to be widely available for quite some time.

Alexander Terekhov

unread,
Feb 5, 2002, 4:56:01 AM2/5/02
to
David Butenhof <David.B...@compaq.com> wrote in message news:<Uww78.387$am1....@news.cpqcorp.net>...
[...]

> > *My* point is that perhaps *Sarat Babu Kamisetty* should take a look
> > at thread *cancellation* or an extra/special fd, already suggested by
> > you!
>
> I decided not to diverge into cancellation, but, yeah, you're right;
> cancellation MAY be a viable alternative, depending on the actual intent.
> Since you didn't actually MENTION the word "cancel" anywhere in the reply I
> quoted, I'm not quite sure how anyone was to have inferred that noble
> intent, though.

Yeah, I should have been more specific with respect
to cancel, sorry. Partly that is just because I got
concerned/puzzled with this race condition/interrupting
select() call in the context of thread cancellation
discussion[1] not so long time ago and just forgot
(not realized) that it might not be clear to others.

[...]


> However, in this case, two things are evident:
>
> 1) This newfangled pselect() went into the specification entirely without
> my notice. Interesting. The manpage format of the specification makes it
> easier to overlook, as it's combined with select(). In reviewing such a
> massive stack of documentation it's easy to miss something new combined
> with something tediously familiar.
>
> 2) You've uncovered a bug in the specification that should be repaired in
> the corrigenda. There is no such thing as a "process signal mask", and even
> if it existed pselect() would have no business altering it. It needs to be
> fixed to specify the THREAD signal mask.

I've just sent you the list referring to more "process
signal mask"/"signal mask of the process"/etc appearances
in the POSIX.1-2001/SUSv3 standard PDFs[2] as "Reply to
Sender Only".

BTW, it would be nice to "commit" the following
pthread_once-"in"/pthread_cond_signal-
pthread_cond_broadcast-"out" correction as well:

http://groups.google.com/groups?as_umsgid=c29b5e33.0202010305.5c12381d%40posting.google.com
(unless, of course, I've got something wrong here ;-)

regards,
alexander.

[1] http://groups.google.com/groups?as_umsgid=3C2CC94A.503FB8E2%40web.de

[2] http://www.opengroup.org/publications/mem-online/c950/c950.pdf
http://www.opengroup.org/publications/mem-online/c951/c951.pdf
http://www.opengroup.org/publications/mem-online/c952/c952.pdf
http://www.opengroup.org/publications/mem-online/c953/c953.pdf
http://www.opengroup.org/publications/mem-online/c610/c610.pdf

Registration and free membership to get access:

http://www.opengroup.org/austin

David Butenhof

unread,
Feb 6, 2002, 7:56:00 AM2/6/02
to
Alexander Terekhov wrote:

> I've just sent you the list referring to more "process
> signal mask"/"signal mask of the process"/etc appearances
> in the POSIX.1-2001/SUSv3 standard PDFs[2] as "Reply to
> Sender Only".

That's nice, but I have no control over the specification, really. I'm just
one of the people who talks about it a lot. ;-)

Of course, I can enter an Aardvark (formal SUS problem report) pointing out
the problem. But, by the same token, so can you. I would encourage anyone
who notices a problem like this to report it. Don't worry too much about
whether you're "absolutely provably right". Others, often including me,
will get involved in discussing the problem reports if there's any question
at all, and a decision will be made.

> BTW, it would be nice to "commit" the following
> pthread_once-"in"/pthread_cond_signal-
> pthread_cond_broadcast-"out" correction as well:

Yes, it's quite true that applications can have no useful dependencies on
signal or broadcast for data coherency. The defined application data
involved is the PREDICATE, and other shared data protected by the
associated mutex. The condition signal/broadcast really affects only the
condition variable (and its queue of waiting threads), which is internal
data. In a way, the inclusion of those function is more a matter of
pragmatics; the thread library WILL require data coherency support within
those functions, or they cannot operate correctly. Therefore, the question
isn't really whether an application requires knowledge of that coherency;
but really whether the knowledge should be hidden from it. We listed them
because we figured the application developer might as well know the simple
and inevitable truth. ;-)

However, pthread_once() should certainly join them. It IS, and must be,
sufficient merely to call pthread_once() for a given init routine in each
thread, without requring the application to resort to additional
synchronization. Or else what would be the point of the silly function? I
find it hard to get too excited about this, though. I had trouble agreeing
with the requirement in draft 5 for static mutex initialization; until
someone pointed out that it made pthread_once() irrelevant. I liked that;
and ever since I've been telling people to avoid pthread_once() unless they
really want to use it and it does EXACTLY what they want. (Which I think is
rarely the case.)

So, if I don't see anyone else point out the "process signal mask" bug
soon, I'll write it up. I see no point in removing signal and broadcast
from the memory list, and I'm not going to champion the cause. I'm not at
all sure I'd support it, because I could imagine arcane applications that
might be broken by a change -- if I actually thought any implementation
could take advantage of the change by making the operations FAIL to
synchronize memory. As for pthread_once()? Yeah, it ought to be added. If I
do click over to the problem report page, I'd probably write that up, too.

Alexander Terekhov

unread,
Feb 6, 2002, 11:31:58 AM2/6/02
to

David Butenhof wrote:
[...]

> Yes, it's quite true that applications can have no useful dependencies on
> signal or broadcast for data coherency. The defined application data
> involved is the PREDICATE, and other shared data protected by the
> associated mutex. The condition signal/broadcast really affects only the
> condition variable (and its queue of waiting threads), which is internal
> data. In a way, the inclusion of those function is more a matter of
> pragmatics; the thread library WILL require data coherency support within
> those functions, or they cannot operate correctly. Therefore, the question
> isn't really whether an application requires knowledge of that coherency;
> but really whether the knowledge should be hidden from it. We listed them
> because we figured the application developer might as well know the simple
> and inevitable truth. ;-)

Well, I was thinking (rather *uneducated* guess) of
something along the lines of IA32's "MTRRs"[1] -- so
that some of the condition's internal structure/data
(e.g. queue) could be allocated in a more restricted/
*in-order* memory, which (while providing *internal*
data coherency/visibility) would have no effect
whatsoever on the rest of application (regular)
data/memory-visibility...

Or am I way off the mark with such ideas?

regards,
alexander.

[1] "The IA-32 Intel Architecture Developer s Manual
Volume 3: System Programming Guide" says:

"The MTRRs were introduced in the P6 family processors to
define the cache characteristics for specified areas of
physical memory. The following are two examples of how
memory types set up with MTRRs can be used strengthen
or weaken memory ordering for the Pentium 4, Intel
Xeon, and P6 family processors:

- The uncached (UC) memory type forces a strong-ordering
model on memory accesses. Here, all reads and writes to
the UC memory region appear on the bus and out-of-order
or speculative accesses are not performed. This memory
type can be applied to an address range dedicated to
memory mapped I/O devices to force strong memory ordering.

- For areas of memory where weak ordering is acceptable,
the write back (WB) memory type can be chosen. Here,
reads can be performed speculatively and writes can be
buffered and combined. For this type of memory, cache
locking is performed on atomic (locked) operations that
do not split across cache lines, which helps to reduce
the performance penalty associated with the use of the
typical synchronization instructions, such as XCHG,
that lock the bus during the entire read-modify-write
operation. With the WB memory type, the XCHG instruction
locks the cache instead of the bus if the memory access
is contained within a cache line."

David Butenhof

unread,
Feb 7, 2002, 6:32:05 AM2/7/02
to
Alexander Terekhov wrote:

>
> David Butenhof wrote:
> [...]
>> Yes, it's quite true that applications can have no useful dependencies on
>> signal or broadcast for data coherency. The defined application data
>> involved is the PREDICATE, and other shared data protected by the
>> associated mutex. The condition signal/broadcast really affects only the
>> condition variable (and its queue of waiting threads), which is internal
>> data. In a way, the inclusion of those function is more a matter of
>> pragmatics; the thread library WILL require data coherency support within
>> those functions, or they cannot operate correctly. Therefore, the
>> question isn't really whether an application requires knowledge of that
>> coherency; but really whether the knowledge should be hidden from it. We
>> listed them because we figured the application developer might as well
>> know the simple and inevitable truth. ;-)
>
> Well, I was thinking (rather *uneducated* guess) of
> something along the lines of IA32's "MTRRs"[1] -- so
> that some of the condition's internal structure/data
> (e.g. queue) could be allocated in a more restricted/
> *in-order* memory, which (while providing *internal*
> data coherency/visibility) would have no effect
> whatsoever on the rest of application (regular)
> data/memory-visibility...
>
> Or am I way off the mark with such ideas?

Interesting question. (Just to avoid misunderstanding: what follows isn't
intended to be a definitive architectural answer, but rather some thought
and discussion of the ideas.)

In theory, it'd be possible to exploit quirks like that. Perhaps even for
queued wakeup -- though there's generally more communication required than
simply queue insert/remove. (You would, after all, have to SCHEDULE the
dequeued waiter.)

First, keep in mind that IA32 is a strongly ordered architecture. Yes,
there's speculative memory operations "behind the scenes" to try to improve
throughput... but while the operations can "float" away from the
instruction, they still complete in order; so it's transparent to the
application code.

The "strong order" mode prohibits speculation and caching, both of which
will slow down code execution that depends on such regions of memory. As
the quote says, this can be great for memory-mapped I/O devices, where
speculative memory access can trigger nasty behavior, and caching probably
isn't useful even when it's not wrong. The other mode allows out of order
writes, something that IA32 doesn't usually allow and that the instruction
set really can't support except when order really doesn't matter.
Operations within such a region might be "out of order" with respect to
other data, but that doesn't affect the operations in "normal memory";
they'll always still be consistent with each other.

If there were some such capability for a machine that's normally out of
order, like Alpha or IA64, one might consider using a strongly ordered
memory range for wakeup queues. However, that's meaningful, as I said up
front, only if no other communication is required. That's true in many
client/server queue transactions, but probably NOT for scheduling and
blocking threads. Besides, it's not clear from the quote at what
granularity these options work. At the level of a cache line? Maybe that
wouldn't be too bad. At the level of a page? Either you add a new set of
malloc-like operations to manage those "special pages", or you waste a lot
of space for each queue header.

In theory, there's no difference between theory and practice. In practice,
there's no similarity...

Alexander Terekhov

unread,
Feb 7, 2002, 8:30:00 AM2/7/02
to

David Butenhof wrote:
[...]

> First, keep in mind that IA32 is a strongly ordered architecture. Yes,
> there's speculative memory operations "behind the scenes" to try to improve
> throughput... but while the operations can "float" away from the
> instruction, they still complete in order; so it's transparent to the
> application code.
>
> The "strong order" mode prohibits speculation and caching, both of which
> will slow down code execution that depends on such regions of memory. As
> the quote says, this can be great for memory-mapped I/O devices, where
> speculative memory access can trigger nasty behavior, and caching probably
> isn't useful even when it's not wrong. The other mode allows out of order
> writes, something that IA32 doesn't usually allow and that the instruction
> set really can't support except when order really doesn't matter.
> Operations within such a region might be "out of order" with respect to
> other data, but that doesn't affect the operations in "normal memory";
> they'll always still be consistent with each other.

Here is a few more perhaps relevant quotes:

"7.2. MEMORY ORDERING

The term memory ordering refers to the order in which
the processor issues reads (loads) and writes (stores)
through the system bus to system memory. The IA-32
architecture supports several memory ordering models
depending on the implementation of the architecture. For
example, the Intel386 processor enforces program ordering
(generally referred to as strong ordering), where reads
and writes are issued on the system bus in the order they
occur in the instruction stream under all circumstances.

To allow optimizing of instruction execution, the IA-32
architecture allows departures from strong-ordering model
called processor ordering in Pentium 4, Intel Xeon, and P6
family processors. These processor-ordering variations
allow performance enhancing operations such as allowing
reads to go ahead of buffered writes. The goal of any of
these variations is to increase instruction execution
speeds, while maintaining memory coherency, even in
multiple-processor systems.
...
Software intended to operate correctly in processor-ordered
processors (such as the Pentium 4, Intel Xeon, and P6 family
processors) should not depend on the relatively strong ordering
of the Pentium or Intel486 processors. Instead, it should insure
that accesses to shared variables that are intended to control
concurrent execution among processors are explicitly required
to obey program ordering through the use of appropriate locking
or serializing operations (see Section 7.2.4., "Strengthening
or Weakening the Memory Ordering Model").
...
The processor-ordering model described in this section is
virtually identical to that used by the Pentium and Intel486
processors. The only enhancements in the Pentium 4, Intel
Xeon, and P6 family processors are:

- Added support for speculative reads.

- Store-buffer forwarding, when a read passes a write
to the same memory location.

- Out of order store from long string store and string move
operations (see Section 7.2.3., "Out-of-Order Stores For
String Operations in Pentium 4, Intel Xeon, and P6 Family
Processors", below).
...
The SFENCE, LFENCE, and MFENCE instructions provide a
performance-efficient way of insuring load and store memory
ordering between routines that produce weakly-ordered results
and routines that consume that data. The functions of these
instructions are as follows:

- SFENCE-Serializes all store (write) operations that
occurred prior to the SFENCE instruction in the program
instruction stream, but does not affect load operations.

- LFENCE-Serializes all load (read) operations that occurred
prior to the LFENCE instruction in the program instruction
stream, but does not affect store operations.

- MFENCE-Serializes all store and load operations that
occurred prior to the MFENCE instruction in the program
instruction stream."

And finally:

http://groups.google.com/groups?as_umsgid=3C3C9B63.DFDC9920%40web.de
(I just wonder whether Microsoft engineers did notice this bit,
and what does Microsoft management think, given rather high (IMHO)
level of "need-for-sync. ignorance" in Microsoft/win32 developers
camp ;-))

"Despite the fact that Pentium 4, Intel Xeon, and P6 family
processors support processor ordering, Intel does not guarantee
that future processors will support this model. To make software
portable to future processors, it is recommended that operating
systems provide critical region and resource control constructs
and API's (application program interfaces) based on I/O, locking,
and/or serializing instructions be used to synchronize access to
shared areas of memory in multiple-processor systems."



> If there were some such capability for a machine that's normally out of
> order, like Alpha or IA64, one might consider using a strongly ordered
> memory range for wakeup queues. However, that's meaningful, as I said up
> front, only if no other communication is required. That's true in many
> client/server queue transactions, but probably NOT for scheduling and
> blocking threads. Besides, it's not clear from the quote at what
> granularity these options work. At the level of a cache line? Maybe that
> wouldn't be too bad. At the level of a page? Either you add a new set of
> malloc-like operations to manage those "special pages", or you waste a lot
> of space for each queue header.

Granularity of these options works at the level of a page
or even multiple pages (depending on the range -- "100000H:
8 variable ranges (from 4 KBytes to maximum size of physical
memory").

"The MTRR mechanism allows up to 96 memory ranges to be
defined in physical memory, and it defines a set of model-
specific registers (MSRs) for specifying the type of memory
that is contained in each range. "

"Figure 9-3. Mapping Physical Memory With MTRRs:"
(non-graphical representation):

"...
C0000H: 256 KBytes; 64 fixed ranges ( 4 KBytes each)
80000H: 256 KBytes; 16 fixed ranges (16 KBytes each)
00000H: 512 KBytes; 8 fixed ranges (64-KBytes each)"

> In theory, there's no difference between theory and practice. In practice,
> there's no similarity...

Does this also apply to a truly-weakly-memory-ordered IA32
multiprocessor? ;-)

regards,
alexander.

0 new messages