Solaris posix threads and ucontext

Panagiotis E. Hadjidoukas

unread,

Jun 30, 2003, 7:25:32 PM6/30/03

to

Hello all,

I have a rather advanced question that refers
to Solaris 8. I have noticed that if a machine context
(ucontext) is saved by posix thread A and then restored
by posix thread B then pthread_self on thread B returns
the identifier of A! Both Posix Threads are system scope.
Does anybody know if this happens because the self-identification
mechanism is based on something (probably register) that is
altered by the ucontext operations (getcontext, swapcontext)?

On other operating systems (Linux, IRIX, AIX, OSF1) it works
fine! I haven' t tested on Solaris 9.

Panagiotis E. Hadjidoukas

PS. By the way, _lwp_self() works correctly, i.e. returns
different identifiers which is what we should also expect
from phread_self().

Casper H.S. Dik

unread,

Jul 1, 2003, 12:34:36 PM7/1/03

to

"Panagiotis E. Hadjidoukas" <p...@hpclab.ceid.upatras.gr> writes:

>I have a rather advanced question that refers
>to Solaris 8. I have noticed that if a machine context
>(ucontext) is saved by posix thread A and then restored
>by posix thread B then pthread_self on thread B returns
>the identifier of A! Both Posix Threads are system scope.
>Does anybody know if this happens because the self-identification
>mechanism is based on something (probably register) that is
>altered by the ucontext operations (getcontext, swapcontext)?

And your program continues to run? You've just jumped to the
stack of another thread; where is that other thread executing now?

>On other operating systems (Linux, IRIX, AIX, OSF1) it works
>fine! I haven' t tested on Solaris 9.

On Solaris, the library data structure describing the thread
is part of the thread's stack and is references through a global
register; by restoring the register state of another thread in
your thread, the thread will find the wrong library thread structure
at the end of its global state pointer.

>Panagiotis E. Hadjidoukas

>PS. By the way, _lwp_self() works correctly, i.e. returns
>different identifiers which is what we should also expect
>from phread_self().

I'm not sure what you're trying to achieve; but what you are doing
looks way beyond supported usage of setcontext/getcontext as you
appear to make multiple threads share the same stack.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Panagiotis E. Hadjidoukas

unread,

Jul 1, 2003, 2:34:29 PM7/1/03

to

"Casper H.S. Dik" <Caspe...@Sun.COM> wrote in message
news:3f017fdc$0$49117$e4fe...@news.xs4all.nl...

> "Panagiotis E. Hadjidoukas" <p...@hpclab.ceid.upatras.gr> writes:
>
>
> And your program continues to run? You've just jumped to the
> stack of another thread; where is that other thread executing now?
>

First of all, I would like to thank you for your response.
My program continues to run because the other thread
sleeps or executes on a different stack. With makecontext etc.
you define a complete execution environment (registers, stack(!)).

> On Solaris, the library data structure describing the thread
> is part of the thread's stack and is references through a global
> register; by restoring the register state of another thread in
> your thread, the thread will find the wrong library thread structure
> at the end of its global state pointer.
>

Older versions of glibc (i386) on Linux used to have a similar mechanism
for self-identification (pthread_self) that was based on stack. This
was considered a bug and finally changed.

>
> I'm not sure what you're trying to achieve; but what you are doing
> looks way beyond supported usage of setcontext/getcontext as you
> appear to make multiple threads share the same stack.

I use contexts in order to implement my own two-level thread model
on top of POSIX Threads. I ensure you that two threads never share the
same stack. My code runs correclty on other operating systems
(Linux, AIX, IRIX, Tru64, Windows...) and even on Solaris 8 when my
"virtual processors" have been created with _lwp_create().

Does anyone know if the Single Unix Specification prohibits mixing
Posix threads and ucontext operations?

Panagiotis E. Hadjidoukas

===================================================
Panagiotis E. Hadjidoukas
Ph.D Student
High Performance Information Systems Laboratory
Computer Enginnering & Informatics Department
University of Patras, Greece

http://www.hpclab.ceid.upatras.gr/pgroup/members/peh

David Butenhof

unread,

Jul 1, 2003, 7:03:55 PM7/1/03

to

Panagiotis E. Hadjidoukas wrote:

> Does anyone know if the Single Unix Specification prohibits mixing
> Posix threads and ucontext operations?

It doesn't; but it doesn't precisely define what it means, either. You're
standing on shaky ground. In general, the best advice is to keep a set of
related ucontext coroutines within a single OS thread context. Just to
begin with, note that they share the same domain as Win32 "fibers" in
excluding some important aspects of "thread" context. For example,
thread-specific data is not included, nor a thread's errno. Many
implementations include additional thread context (such as security
personna information) that you cannot easily (much less portably) access or
switch yourself. You run many risks in attempting this.

Furthermore, the ucontext operations, due to unresolvable typing issues, are
now being marked officially OBSOLETE, and may be removed from a future
version of the standard.

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Casper H.S. Dik

unread,

Jul 2, 2003, 10:45:28 AM7/2/03

to

"Panagiotis E. Hadjidoukas" <p...@hpclab.ceid.upatras.gr> writes:

>Older versions of glibc (i386) on Linux used to have a similar mechanism
>for self-identification (pthread_self) that was based on stack. This
>was considered a bug and finally changed.

I don't think it's a bug; at most an implementation artefact; the
SPARC implementation uses one of the global registers to point to
the complete thread state; i.e., as far as the thread library
is concerned, setucontext() does change the executing thread.

>I use contexts in order to implement my own two-level thread model
>on top of POSIX Threads. I ensure you that two threads never share the
>same stack. My code runs correclty on other operating systems
>(Linux, AIX, IRIX, Tru64, Windows...) and even on Solaris 8 when my
>"virtual processors" have been created with _lwp_create().

Is there any particular reason why you implement a second thread
layer? Is there some perfomance/resource improvement? Is that
the same on all systems?

I'd strongly suggest to use worker threads and explicit task
descriptors, avoid setcontext.

Note that you can't really be sure that the other LWP is sleeping;
it may catch a signal or flush its register windows to your stack.

Panagiotis E. Hadjidoukas

unread,

Jul 2, 2003, 1:19:04 PM7/2/03

to

"David Butenhof" <David.B...@hp.com> wrote in message
news:3f01...@usenet01.boi.hp.com...

>
> > Does anyone know if the Single Unix Specification prohibits mixing
> > Posix threads and ucontext operations?
>

>

> Furthermore, the ucontext operations, due to unresolvable typing issues,
are
> now being marked officially OBSOLETE, and may be removed from a future
> version of the standard.
>

As far as I have read, they are being marked OBSOLETE just because
"makecontext()
cannot be expressed correctly in ISO C". I believe that there are cases that
makecontext
can be very useful, even if the function that is passed to it does not
return any value
and does not take any arguments. I think that such a function does not cause
the
unresolvable typing issues of makecontext.
Moreover, if ucontext operations become OBSOLETE then what will happen to
sigaction?
I remind you that if the SA_SIGINFO flag is set, then the third argument of
the signal handler function
is a pointer to a ucontext_t structure, which will be useless (correct my if
I am wrong).

Panagiotis

Panagiotis E. Hadjidoukas

unread,

Jul 2, 2003, 1:55:30 PM7/2/03

to

"Casper H.S. Dik" <Caspe...@Sun.COM> wrote in message

news:3f02b7c8$0$49106$e4fe...@news.xs4all.nl...

> "Panagiotis E. Hadjidoukas" <p...@hpclab.ceid.upatras.gr> writes:
>

>
> Is there any particular reason why you implement a second thread
> layer? Is there some perfomance/resource improvement? Is that
> the same on all systems?
>

Because of the advantages of a two-level thread model (user-level
threads on top of kernel-level threads, called virtual processors).
There are several implementations of two-level thread libraries,
which have been used by various projects, e.g. web browsers (Mozilla),
OpenMP implementations (Omni, NanosCompiler).

> I'd strongly suggest to use worker threads and explicit task
> descriptors, avoid setcontext.
>

I do not disagree with you. However, an application that exhibits
multilevel (nested) parallelism, e.g. a multithreaded implementation
of recursive fibonacci, requires a new context (thread) for every level
of parallelism. With task descriptors you cannot do that.
Posix threads are (usually) expensive to create (cpu and memory)
and you cannot control directly their execution.

> Note that you can't really be sure that the other LWP is sleeping;
> it may catch a signal or flush its register windows to your stack.
>

The other LWP has already switched to a different stack, executing
the scheduling loop of the library or another user-level thread.

Panagiotis

Casper H.S. Dik

unread,

Jul 2, 2003, 4:09:57 PM7/2/03

to

"Panagiotis E. Hadjidoukas" <p...@hpclab.ceid.upatras.gr> writes:

>Because of the advantages of a two-level thread model (user-level
>threads on top of kernel-level threads, called virtual processors).
>There are several implementations of two-level thread libraries,
>which have been used by various projects, e.g. web browsers (Mozilla),
>OpenMP implementations (Omni, NanosCompiler).

Which advantages would these be again? A long time ago Sun
decided to implement a two-level threading model. While the
original argument might have been sound, given the hardware of that
era (we're talking, I think, about 10-15 years ago), many problems
were the result of that choice: the implementation becomes much
more complicated (bugs) and some things are just hell to get
right (Scheduling, priorities, signal handling). Then someone
reimplemented the thread library interface as a 1:1 library and
it was found that the only applications that ran slower was the
class of "broken applications". (they could be fixed).

The main reasons for secondary thread implementations, I believe,
are the fact that until recently, pthreads weren't available everywhere;
and even now, you might want to have an abstraction for whatever Win32
uses and whatever Unix uses.

>I do not disagree with you. However, an application that exhibits
>multilevel (nested) parallelism, e.g. a multithreaded implementation
>of recursive fibonacci, requires a new context (thread) for every level
>of parallelism. With task descriptors you cannot do that.

You'd create a new "work description" and push that on the
"big pile of work" for the worker threads to be found.

>Posix threads are (usually) expensive to create (cpu and memory)
>and you cannot control directly their execution.

So you should not create/destroy them but rather employ a pool of worker
threads.

Casper

David Butenhof

unread,

Jul 2, 2003, 5:20:18 PM7/2/03

to

Panagiotis E. Hadjidoukas wrote:

>> Furthermore, the ucontext operations, due to unresolvable typing issues,
>> are now being marked officially OBSOLETE, and may be removed from a
>> future version of the standard.
>
> As far as I have read, they are being marked OBSOLETE just because
> "makecontext() cannot be expressed correctly in ISO C".

Yes, that's the main reason. It doesn't really may any difference WHY it's
being done, though. And, for what it's worth, I had nothing to do with this
decision or even the discussion preceding the decision -- though I did
follow it.

> I believe that there are cases that makecontext can be very useful, even
> if the function that is passed to it does not return any value
> and does not take any arguments. I think that such a function does not
> cause the unresolvable typing issues of makecontext.

That may be, but there was no interest in defining a new interface that
wouldn't have the typing problems. I don't intend to comment on whether I
think this is right or wrong, necessary or not. I was merely reporting the
future direction of the standard as it relates to this topic.

> Moreover, if ucontext operations become OBSOLETE then what will happen to
> sigaction? I remind you that if the SA_SIGINFO flag is set, then the third
> argument of the signal handler function is a pointer to a ucontext_t
> structure, which will be useless (correct my if I am wrong).

It wouldn't be "useless", though you obviously couldn't pass it as an
argument to functions that don't exist. In POSIX, for example, the 3rd
argument to a signal action function is undefined by the standard; but is
nevertheless defined and even widely used on various implementations.

The standard doesn't actually allow the use of setcontext() inside a signal
handler anyway, since setcontext isn't async-signal safe. Thus, (and
despite the implications of the setcontext man page) it's not at all clear
(strictly in terms of the portable standard) that removal of the context
functions would make the signal action argument any less useful (or perhaps
we should say "more useless") than it is now. ;-)

Panagiotis E. Hadjidoukas

unread,

Jul 3, 2003, 1:54:59 PM7/3/03

to

"David Butenhof" <David.B...@hp.com> wrote in message

news:3f03...@usenet01.boi.hp.com...
> Panagiotis E. Hadjidoukas wrote:
>

> It wouldn't be "useless", though you obviously couldn't pass it as an
> argument to functions that don't exist. In POSIX, for example, the 3rd
> argument to a signal action function is undefined by the standard; but is
> nevertheless defined and even widely used on various implementations.
>
> The standard doesn't actually allow the use of setcontext() inside a
signal
> handler anyway, since setcontext isn't async-signal safe. Thus, (and
> despite the implications of the setcontext man page) it's not at all clear
> (strictly in terms of the portable standard) that removal of the context
> functions would make the signal action argument any less useful (or
perhaps
> we should say "more useless") than it is now. ;-)
>

setcontext isn't async-signal safe but you can use it if you block the
signals
before calling setcontext and unblock them in the new (restored) context.
Even without ucontext, you can switch to another context with siglongjmp.

So, my final conclusion is that the new specification is going to make
obsolete
(or is it just defined obsolescent?) something (a mechanism) that is
supported
by most operating systems.

Panagiotis E. Hadjidoukas

unread,

Jul 3, 2003, 2:29:12 PM7/3/03

to

"Casper H.S. Dik" <Caspe...@Sun.COM> wrote in message

news:3f0303d5$0$49104$e4fe...@news.xs4all.nl...

> >I do not disagree with you. However, an application that exhibits
> >multilevel (nested) parallelism, e.g. a multithreaded implementation
> >of recursive fibonacci, requires a new context (thread) for every level
> >of parallelism. With task descriptors you cannot do that.
>
> You'd create a new "work description" and push that on the
> "big pile of work" for the worker threads to be found.
>
> >Posix threads are (usually) expensive to create (cpu and memory)
> >and you cannot control directly their execution.
>
> So you should not create/destroy them but rather employ a pool of worker
> threads.
>

1) Look at this example:
/***************************************************************************
******/
#include <pthread.h>
#include <stdio.h>

void *func(void *arg)
{
int depth = (int) arg;
pthread_t t1, t2;

if (depth > 2) {
/* exploit nested parallelism */

/* create two tasks */
pthread_create(&t1, NULL, func, (void *) (depth-1));
pthread_create(&t2, NULL, func, (void *) (depth-2));

/* wait for the results */
pthread_join(t1, NULL);
pthread_join(t2, NULL);
}

return NULL;
}

int main(void)
{
func((void *) 50);
}
/***************************************************************************
******/

A Posix Thread MUST block waiting for the new (two) Posix Threads to return.
If you implement this with task-queues then a worker thread will block and
consequently the tasks must be executed by new worker threads. So, in this
case, tasks queues are useless and user-level threads are necessary.
Moreover, such algorithms are very efficient and have been supported by
the Cilk language.

2) Once again, I agree with you that everything can be done using the
Posix Threads API. However, I think the API lacks some useful features
like better user control over the thread stacks, explicit switching to
another Posix Thread [e.g. pthread_switch_to(pthread_t )], binding of
system scope threads on cpus, etc. Even a pthread_suspend/resume
can be useful (see Garbage Collection, ...).
My feeling is that the Posix standard should not target only high-end
programmers but also library developers that need more advanced features,
even if the latter are considered "dangerous" for others.

Patrick TJ McPhee

unread,

Jul 3, 2003, 4:01:19 PM7/3/03

to

In article <be1ejq$2cf9$1...@ulysses.noc.ntua.gr>,
Panagiotis E. Hadjidoukas <p...@hpclab.ceid.upatras.gr> wrote:

[...]

% 1) Look at this example:
% /***************************************************************************
% ******/
% #include <pthread.h>
% #include <stdio.h>
%
% void *func(void *arg)
% {
% int depth = (int) arg;
% pthread_t t1, t2;
%
% if (depth > 2) {
% /* exploit nested parallelism */
%
% /* create two tasks */
% pthread_create(&t1, NULL, func, (void *) (depth-1));
% pthread_create(&t2, NULL, func, (void *) (depth-2));
%
% /* wait for the results */
% pthread_join(t1, NULL);
% pthread_join(t2, NULL);
% }
%
% return NULL;
% }
%
% int main(void)
% {
% func((void *) 50);
% }

Granted that func could be changed to actually do something, as it
stands, it does nothing.

% A Posix Thread MUST block waiting for the new (two) Posix Threads to return.

No, it _can_ block waiting for the two new threads to return, or it could
exit, or it could go off and do something useful. The choices are
limited only by the human imagination.

% If you implement this with task-queues then a worker thread will block and
% consequently the tasks must be executed by new worker threads.

It's really hard to say in the absence of an example that actually
does something. I can implement the above with one thread:

main() { }

Now, if you're saying that a work queue can't be used to implement
an algorithm like this:

until there's no data:
divide work
start tasks to do work
wait for work to be done
combine results

you're simply wrong. You can just as well put two tasks on the work
queue and wait for them to be completed as start two threads.

% So, in this
% case, tasks queues are useless and user-level threads are necessary.

Now, it's hard to see how user-level threads are either necessary or
useful. One of the fundamental features of user-level threads on
most systems is that you can't get true concurrency. Whenever one
user-level thread is running, all the others aren't. Even if you
cobble together a two-level scheduler to get around this problem,
you're ultimately just creating a more-complicated version of
a work queue.

--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Panagiotis E. Hadjidoukas

unread,

Jul 3, 2003, 4:35:02 PM7/3/03

to

OK, I provide you a better (complete) example.
This application is a recursive version of Fibonacci -
fib(0)=0, fib(1)=1, fib(n) = fib(n-1)+fib(n-2).
The current thread creates two more threads and
BLOCKs waiting for the results.
I claim that is impossible to execute this program
with task queues and worker threads without creating
a new Posix Thread for each task ("fib" function).

/************************************************************/
#include <stdio.h>
#include <pthread.h>

void *fib(void *arg)
{
int n = (int) arg;
pthread_t t1, t2;
int res1, res2;
int res;

printf("[%d] Hello from %d\n", n, pthread_self());
if (n >= 2) {
pthread_create(&t1, NULL, fib, (void *) (n-1));
pthread_create(&t2, NULL, fib, (void *) (n-2));
pthread_join(t1, (void *)&res1);
pthread_join(t2, (void *)&res2);
res = res1+res2;
}
else
res = n;

return (void *)res;
}

int main(void)
{
int res;
pthread_t first;

pthread_create(&first, NULL, fib, (void *) 9);
pthread_join(first, (void *) &res);

printf("res = %d\n", res);
return 0;
}
/************************************************************/

Thanks...,

Panagiotis

===================================================
Panagiotis E. Hadjidoukas
Ph.D Student
High Performance Information Systems Laboratory

Computer Engineering & Informatics Department

Patrick TJ McPhee

unread,

Jul 6, 2003, 5:08:21 PM7/6/03

to

In article <be1lvo$2pjc$1...@ulysses.noc.ntua.gr>,

Panagiotis E. Hadjidoukas <p...@hpclab.ceid.upatras.gr> wrote:

% This application is a recursive version of Fibonacci -
% fib(0)=0, fib(1)=1, fib(n) = fib(n-1)+fib(n-2).
% The current thread creates two more threads and
% BLOCKs waiting for the results.
% I claim that is impossible to execute this program
% with task queues and worker threads without creating
% a new Posix Thread for each task ("fib" function).

Let's start with sensible. It's ridiculous to code this function
either the way you've done it or using a work queue. It might make
sense to divide the work at the top, but only if you know the
two subsidiary will execute concurrently:

#include <stdio.h>
#include <pthread.h>

int Fib(int n)
{
if (n >= 2) {
return Fib(n-1) + Fib(n-2);
}
else
return n;
}

void *fib(void *arg)
{
return (void *)Fib((int) arg);
}

int main(void)
{
int res1, res2;
pthread_t first, second;

pthread_create(&first, NULL, fib, (void *) 8);
pthread_create(&second, NULL, fib, (void *) 7);
pthread_join(first, (void *) &res1);
pthread_join(second, (void *) &res2);

printf("res = %d\n", res1+res2);
return 0;
}

I say this is clearer and more efficient than your version, and I
expect the same will be true for any recursive function, however
many CPUs you have and however expensive the Fib() function is
relative to thread creation.

Solving the problem with task queues is certainly possible, and probably
more efficient than your version, although still ridiculous, and
arguably less clear. The basic problem is that there's no natural end to
the amount of work, and no efficient way to divide it up in advance.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NOTHREADS 5

struct task_queue {
int n;
struct task_queue * N;
};

static struct task_queue * wq;

static int res, qes, threads_done;

static pthread_mutex_t wqm = PTHREAD_MUTEX_INITIALIZER,
rm = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t wqc = PTHREAD_COND_INITIALIZER,
rc = PTHREAD_COND_INITIALIZER;

void *fib(void *arg)
{
struct task_queue * qe, * nqe;
int myres = 0;

pthread_mutex_lock(&wqm);
while (qes) {
while (qes && !wq) {
pthread_cond_wait(&wqc, &wqm);
}

if (!qes)
break;

qe = wq;
wq = qe->N;
pthread_mutex_unlock(&wqm);

printf("[%d] Hello from %d\n", qe->n, pthread_self());

if (qe->n >= 2) {
nqe = malloc(sizeof(*nqe));
nqe->N = qe;
nqe->n = qe->n - 1;
qe->n -= 2;

pthread_mutex_lock(&wqm);
qes += 2;
qe->N = wq;
wq = nqe;
pthread_mutex_unlock(&wqm);
pthread_cond_broadcast(&wqc);
}
else if (qe->n > 0) {
free(qe);
myres++;
}

pthread_mutex_lock(&wqm);
qes --;
if (!qes)
pthread_cond_broadcast(&wqc);
}

pthread_mutex_unlock(&wqm);

pthread_mutex_lock(&rm);
res += myres;
threads_done++;
pthread_mutex_unlock(&rm);
pthread_cond_broadcast(&rc);

return NULL;
}

int main(void)
{
pthread_t first;
pthread_attr_t attr;
int i;

pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_DETACHED);

wq = malloc(sizeof(*wq));
wq->N = NULL;
wq->n = 9;
qes = 1;

for (i = 0; i < NOTHREADS; i++) {
pthread_create(&first, &attr, fib, NULL);
}

pthread_mutex_lock(&rm);
while (threads_done < NOTHREADS) {
pthread_cond_wait(&rc, &rm);
}

printf("res = %d\n", res);

pthread_mutex_unlock(&rm);

return 0;

David Butenhof

unread,

Jul 7, 2003, 11:06:00 AM7/7/03

to

Panagiotis E. Hadjidoukas wrote:

> So, my final conclusion is that the new specification is going to make
> obsolete (or is it just defined obsolescent?) something (a mechanism) that
> is supported by most operating systems.

And nobody's going to require that those systems (whether "most" or a few)
drop any support they have, either. It merely means that code using these
functions can't be "strictly conforming" and won't be guaranteed portable
between implementations of POSIX or SUS. Arguably the fact that the current
specification depends on and requires C99 and yet is incompatible with C99
rules means that code using makecontext already isn't strictly conforming
or portable.

If you're really bothered by these functions going away, complaining in a
newsgroup isn't going to make any difference. Get involved, and come up
with a viable alternative proposal.

Panagiotis E. Hadjidoukas

unread,

Jul 7, 2003, 12:57:32 PM7/7/03

to

"David Butenhof" <David.B...@hp.com> wrote

> Panagiotis E. Hadjidoukas wrote:
>
> If you're really bothered by these functions going away, complaining in a
> newsgroup isn't going to make any difference. Get involved, and come up
> with a viable alternative proposal.
>
> --

Actually, my intention was to realize the situation and not to complain.
From my point of view, it is nice to have developed something that is
both efficient and portable. Our discussion really helped me very much.
I am grateful to you.

Finally, as I mentioned in an another message, I think (maybe I am wrong)
that the POSIX Threads API lacks some useful features, like better user

control over the thread stacks, explicit switching to another Posix Thread
[e.g. pthread_switch_to(pthread_t )], binding of system scope threads on
cpus, etc.

My feeling is that the Posix Threads API should not target only high-end

programmers but also library developers that need more advanced features,
even if the latter are considered "dangerous" for others.

Panagiotis

----------------------------------------------------------------------------

Panagiotis E. Hadjidoukas
Ph.D Student
High Performance Information Systems Laboratory

Computer Engineering & Informatics Department
University of Patras, Greece
Contact
e-mail: p...@hpclab.ceid.upatras.gr
work: +30 2610993805
fax: +30 2610997706
http://www.hpclab.ceid.upatras.gr/pgroup/members/peh

Panagiotis E. Hadjidoukas

unread,

Jul 7, 2003, 1:14:30 PM7/7/03

to

"Patrick TJ McPhee" <pt...@interlog.com> wrote

>
> % This application is a recursive version of Fibonacci -
> % fib(0)=0, fib(1)=1, fib(n) = fib(n-1)+fib(n-2).
> % The current thread creates two more threads and
> % BLOCKs waiting for the results.
> % I claim that is impossible to execute this program
> % with task queues and worker threads without creating
> % a new Posix Thread for each task ("fib" function).
>
> Let's start with sensible. It's ridiculous to code this function
> either the way you've done it or using a work queue. It might make
> sense to divide the work at the top, but only if you know the
> two subsidiary will execute concurrently:
>

[snip]

>
> I say this is clearer and more efficient than your version, and I
> expect the same will be true for any recursive function, however
> many CPUs you have and however expensive the Fib() function is
> relative to thread creation.
>

You should not focus on that specific application. In general, you do
not know how much each function (Fib) costs and this method
will result in load imbalance (considering an SMP machine).

>
> Solving the problem with task queues is certainly possible, and probably
> more efficient than your version, although still ridiculous, and
> arguably less clear. The basic problem is that there's no natural end to
> the amount of work, and no efficient way to divide it up in advance.
>

[snip]

You have tranformed the nested parallelism (joins) into global reductions
(res is global variable). This is a very clever implementation, which is,
however, application specific. If you are a library developer (e.g. of an
OpenMP runtime library) it is not easy to do such things.

Panagiotis

----------------------------------------------------------------------------

Panagiotis E. Hadjidoukas
Ph.D Student
High Performance Information Systems Laboratory
Computer Engineering & Informatics Department
University of Patras, Greece

David Butenhof

unread,

Jul 8, 2003, 1:03:52 PM7/8/03

to

Panagiotis E. Hadjidoukas wrote:

> Finally, as I mentioned in an another message, I think (maybe I am wrong)
> that the POSIX Threads API lacks some useful features, like better user
> control over the thread stacks, explicit switching to another Posix Thread
> [e.g. pthread_switch_to(pthread_t )], binding of system scope threads on
> cpus, etc.
> My feeling is that the Posix Threads API should not target only high-end
> programmers but also library developers that need more advanced features,
> even if the latter are considered "dangerous" for others.

I really don't feel like getting into discussing these issues, because it's
like talking religion, politics, or text editor. The "objective" "right"
answer depends entirely on the viewpoint of the person who's talking. No
matter how much you might fervently believe these functions are good and
even essential, there are at least as many people with equally compelling
arguments that they're useless and destructive.

Again, if these issues are important to you, get involved with the standard
committee.

I have no idea what you'd intend to do with something like a
"pthread_switch_to" function, but the only practical use is as part of a
low-level toolkit to build your own thread scheduler. There's nothing
inherently wrong with providing such a toolkit, (and many systems do), but
it's not "POSIX threads".

While functions like suspend and resume are widely implemented, they're
hacks that are merely "necessary evils", not desirable or beneficial
features. (Suspending all threads for a GC destroys concurrency, cache
footprint, and so forth; and simply cannot be argued to be "good" for
anyone.) Binding, too, is widely implemented, but is also a hack to get
around inappropriate scheduler algorithms. Worse, it's most frequently used
inappropriately in ways that actually harm typical application
responsiveness (by refusing to utilize available resources in favor of
other resources that are currently busy). Nevertheless, it's probably
unrealistic to expect that any scheduler will ever be perfect for all
workloads, and binding is widely used and does have a few nearly legitimate
purposes (though there ought to be better alternatives, as with priority
scheduling sometimes simple and widely understood is better than perfect).
The reason binding isn't in the standard is simply that we didn't manage to
reach a reasonable consensus on the definitions and interfaces. POSIX
applies to a wide range of hardware, from uniprocessors to large NUMA
systems.

Patrick TJ McPhee

unread,

Jul 8, 2003, 4:38:12 PM7/8/03

to

In article <bebrnn$1juv$1...@ulysses.noc.ntua.gr>,

Panagiotis E. Hadjidoukas <p...@hpclab.ceid.upatras.gr> wrote:

[...]

% however, application specific. If you are a library developer (e.g. of an
% OpenMP runtime library) it is not easy to do such things.

In general, though, you can always stick a function pointer and a data
pointer on the work queue, and stick the results on a result queue. If
I were developing a threaded development system, I would want the
developer to give hints as to where concurrency would make him or her
happy, and have the scheduler do its thing based on whatever it knows
about the system environment.

I'm not sure a portable API could give enough information to do a good
job of it, but I could always have a portable fallback that sticks function
and data pointers on the work queue and puts results on a result queue.