pthread_func_1() {
while (condition_not_met) {
pthread_yield();
}
}
pthread_func_2() {
while (there is busy work) {
doing some busy work;
wait on some condition var;
}
condition_not_met = 0;
}
> }
> }
>
> pthread_func_2() {
> while (there is busy work) {
> doing some busy work;
> wait on some condition var;
> }
> condition_not_met = 0;
> }
--
VH
AFAIK, the following is an entirely legitimate and POSIX conforming
implemention of pthread_yield:
int pthread_yield() { return 0; }
That is, it can be implemented as a no-op. It would be contrary to the
"spirit" of pthread_yield, but no conforming portable program could
tell the difference. It depends on the internals of the scheduler, and
that is something with POSIX did not standardize.
Unless you're doing some fun low-level stuff like MMIO (stuff which is
by definition not portable), then use of pthread_yield indicates
likely bad design, a bug in the design, or just shoddy code.
Finally, note that as written, it seems that your pthread_func_1
function has a race condition. It seems that your pthread_func_1
thread reads an object that was written to by another thread without
proper synchronization, which is a race condition. Race conditions
lead to formally undefined behavior in POSIX, and also in the upcoming
C and C++ standards. Undefined behavior is bad. On a non-broken
implementation, any result is allowed, from nothing to the proverbial
hard drive format, though some results, like random seg faults, are
more likely than others.
Thus, if you want to remove the race condition, you need a mutex to
guard the shared state (or you need to go down to memory barriers,
std::atomic, or such - let's forget about that for now), and with a
mutex the use of a condition variable in place of yield follows
readily to solve this problem.
> Unless you're doing some fun low-level stuff like MMIO (stuff which is
> by definition not portable), then use of pthread_yield indicates
> likely bad design, a bug in the design, or just shoddy code.
This may be your own statistical experience, having seen more bad code
than good code. But I think it's dangerous to generalize from your own
experiences. The most common use of pthread_yield I've seen looks like
this:
1) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
2) pthread_yield.
3) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
4) Oh well, do it the long way. Stop.
5) Do it the fast way.
This is common when code could benefit from acquiring a lock but
cannot wait for the lock. The reasons the thread cannot wait could
either be that it could delay the thread too long or that it could
cause a deadlock in very rare cases. Obviously, the code must be able
to perform its function whether it can get the lock or not. And there
would be no point unless the code path with the lock (say, performing
some simple operation) is significantly better than the code path
without the lock (say, dispatching another thread that can wait for
the lock to do the operation).
DS
I will acknowledge my relative inexperience in this area, but man that
looks hacky. It's almost like a poor man's spin lock. Why not just
"trust" that the implemention of mutex isn't stupid and is a spin lock
of some kind, and/or you write your own spin lock? Dunno. Just
pontificating.
> > 1) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
> > 2) pthread_yield.
> > 3) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
> > 4) Oh well, do it the long way. Stop.
> > 5) Do it the fast way.
> >
> > This is common when code could benefit from acquiring a lock but
> > cannot wait for the lock. The reasons the thread cannot wait could
> > either be that it could delay the thread too long or that it could
> > cause a deadlock in very rare cases. Obviously, the code must be able
> > to perform its function whether it can get the lock or not. And there
> > would be no point unless the code path with the lock (say, performing
> > some simple operation) is significantly better than the code path
> > without the lock (say, dispatching another thread that can wait for
> > the lock to do the operation).
>
> I will acknowledge my relative inexperience in this area, but man that
> looks hacky. It's almost like a poor man's spin lock. Why not just
> "trust" that the implemention of mutex isn't stupid and is a spin lock
> of some kind, and/or you write your own spin lock? Dunno. Just
> pontificating.
I think that `pthread_yield()' can be "useful" when you need to acquire
multiple locks on dynamically created objects. As for
`pthread_mutex_trylock()', well I have always thought that something like
the following locking scheme could end up being fairly useful in certain
scenarios:
___________________________________________________________
void a_worker_thread()
{
# define LIMIT 16
for (;;)
{
local_work_cache l_workc;
unsigned limit = LIMIT;
for (;;)
{
work* w = g_queue.try_pop();
if (! w)
{
if (limit != LIMIT)
{
pthread_mutex_lock(...);
break;
}
w = g_queue.wait_pop();
}
l_workc.push(w);
if (! pthread_mutex_trylock(...))
{
break;
}
else if (! (--limit))
{
pthread_mutex_lock(...);
break;
}
}
l_workc.process();
pthread_mutex_unlock(...);
}
}
___________________________________________________________
If you cannot acquire the mutex, then why not try to gather more work? Once
you hit a threshold of work and/or acquire the mutex, you process the
everything in a batch. The act of consuming more work, or doing something
else that is useful just might act as a kind of "backoff" mechanism. Instead
of spinning, you are doing some actual work... Of course that simple example
assumes the mutex must be held during the "processing phase". Humm... Think
about something like:
___________________________________________________________
void a_worker_thread()
{
# define LIMIT 16
for (;;)
{
local_work_cache l_workc;
unsigned limit = LIMIT;
for (;;)
{
work* w = g_queue.try_pop();
if (! w)
{
if (limit != LIMIT)
{
pthread_mutex_lock(...);
break;
}
w = g_queue.wait_pop();
}
l_workc.push(w);
if (! pthread_mutex_trylock(...))
{
break;
}
else if (! (--limit))
{
pthread_mutex_lock(...);
break;
}
}
l_workc.locked_process();
pthread_mutex_unlock(...);
l_workc.unlocked_process();
}
}
___________________________________________________________
You might be able to separate the work that needs to be protected by the
mutex from the work that can be executed locally; humm...
> I will acknowledge my relative inexperience in this area, but man that
> looks hacky. It's almost like a poor man's spin lock. Why not just
> "trust" that the implemention of mutex isn't stupid and is a spin lock
> of some kind, and/or you write your own spin lock? Dunno. Just
> pontificating.
Two reasons:
1) A spin lock will block this thread until the mutex can be acquired.
Since this thread doesn't need the mutex to make forward progress,
this would be a serious pessimization.
2) What if this thread holds the lock that the thread that holds the
spin lock is blocked on? If we try to acquire a spin lock, we might
spin forever.
Remember, the condition is that we cannot afford to wait for a mutex,
but we can make forward progress 'better' with the mutex than without
it. We could just call pthread_mutex_trylock and give up if we don't
get it, but we'd like to try just a bit harder than that.
DS
> AFAIK, the following is an entirely legitimate and POSIX conforming
> implemention of pthread_yield:
> int pthread_yield() { return 0; }
There is no function called pthread_yield() in POSIX. I assume you
mean sched_yield().
> That is, it can be implemented as a no-op. It would be contrary to the
> "spirit" of pthread_yield, but no conforming portable program could
> tell the difference.
I think sched_yield() can be a no-op on implementations that don't
support the TPS (_POSIX_THREAD_PRIORITY_SCHEDULING) option. If the
TPS option is supported then POSIX has requirements on what
sched_yield() does as part of the description of the SCHED_FIFO policy:
9. When a running thread issues the sched_yield() function, the
thread becomes the tail of the thread list for its priority.
(It also applies to SCHED_RR and SCHED_SPORADIC, as they are described
by reference to SCHED_FIFO.)
It should be possible to write a program that uses some SCHED_FIFO
threads to detect whether sched_yield() "works".
> It depends on the internals of the scheduler, and
> that is something with POSIX did not standardize.
POSIX doesn't standardise internals, but it does standardise how
threads with the above mentioned scheduling policies are scheduled.
--
Geoff Clare <net...@gclare.org.uk>
That makes sense for a uniprocessor.
But nowadays even pocket gadgets are multiprocessors.
> POSIX doesn't standardise internals, but it does standardise how
> threads with the above mentioned scheduling policies are scheduled.
http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
"For application threads with scheduling allocation domains of size
greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and
SCHED_SPORADIC shall be used in an implementation-defined manner."
Translation: no rules whatsoever.
regards,
alexander.
Good point. However, earlier in that section the standard says:
"It should be noted that the presence of multiple processors does not
automatically indicate a scheduling allocation domain size greater
than one. Conforming implementations on multi-processors may map all
or any subset of the CPUs to one or multiple scheduling allocation
domains"
> Translation: no rules whatsoever.
The way the domain-size-one rules are applied to larger domains is
"implementation-defined", which means implementations are required
to document how they are applied. Not quite the same as "no rules
whatsoever".
--
Geoff Clare <net...@gclare.org.uk>
That just says that the presence of multiple processors doesn't
guarantee parallel execution of threads. I'm talking about the case with
parallel execution of threads granted.
>
> > Translation: no rules whatsoever.
>
> The way the domain-size-one rules are applied to larger domains is
> "implementation-defined", which means implementations are required
> to document how they are applied. Not quite the same as "no rules
> whatsoever".
In praxis "implementation-defined" means undocumented to begin with.
Anyway since "implementation-defined" can change from version to version
by even the same vendor it does mean "no rules whatsoever" in disguise.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html#tag_01_05_02
"implementation-defined
Describes a value or behavior that is not defined by POSIX.1-2008 but is
selected by an implementor. The value or behavior may vary among
implementations that conform to POSIX.1-2008. An application should not
rely on the existence of the value or behavior. An application that
relies on such a value or behavior cannot be assured to be portable
across conforming implementations."
regards,
alexander.
> Geoff Clare wrote:
>>
>> Alexander Terekhov wrote:
>>
>> >
>> > Geoff Clare wrote:
>> > [...]
>> >> 9. When a running thread issues the sched_yield() function, the
>> >> thread becomes the tail of the thread list for its priority.
>> >
>> > That makes sense for a uniprocessor.
>> >
>> > But nowadays even pocket gadgets are multiprocessors.
>> >
>> >> POSIX doesn't standardise internals, but it does standardise how
>> >> threads with the above mentioned scheduling policies are scheduled.
>> >
>> > http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
>> >
>> > "For application threads with scheduling allocation domains of size
>> > greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and
>> > SCHED_SPORADIC shall be used in an implementation-defined manner."
>> >
>> > Translation: no rules whatsoever.
>>
>> The way the domain-size-one rules are applied to larger domains is
>> "implementation-defined", which means implementations are required
>> to document how they are applied. Not quite the same as "no rules
>> whatsoever".
>
> In praxis "implementation-defined" means undocumented to begin with.
Huh? XBD7 section 2.1.2:
A conformance document with the following information shall be
available for an implementation claiming conformance to POSIX.1-2008.
[...]
The conformance document shall describe the behavior of the
implementation for all implementation-defined features defined in
POSIX.1-2008. This requirement shall be met by listing these
features and providing either a specific reference to the system
documentation or providing full syntax and semantics of these
features.
> Anyway since "implementation-defined" can change from version to version
> by even the same vendor it does mean "no rules whatsoever" in disguise.
No, it means each implementor chooses their own rules, but they must
tell you what those rules are. It is then up to application writers
to decide whether to rely on those rules. For applications targeted
at a small set of specific implementations, there might be some
advantage in doing so.
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html#tag_01_05_02
>
> "implementation-defined
>
> Describes a value or behavior that is not defined by POSIX.1-2008 but is
> selected by an implementor. The value or behavior may vary among
> implementations that conform to POSIX.1-2008. An application should not
> rely on the existence of the value or behavior. An application that
> relies on such a value or behavior cannot be assured to be portable
> across conforming implementations."
This uses "should", which means it is a recommendation. The point
is that for maximum portability, application writers should not rely
on any implementation-defined behaviour. The application writer has
to decide whether portability is more important than some advantage
that might be gained by sacrificing a little portability.
--
Geoff Clare <net...@gclare.org.uk>
Well, at
http://www.opengroup.org/csq/public/search.mhtml?t=XY1&sort=bycomponent
I can't find anything regarding rules for scheduling allocation domains
of size greater than one. Do you have a link?
regards,
alexander.
The Open Group's CSQs don't cover everything the POSIX Conformance
Document (PCD) has to cover.
Originally I think the idea was that the PCD would be supplied with
the system's developer documentation. Of course, these days it
would make sense for them to be provided online, but it's up to
each implementor how they want to provide it. It does seem that
they are not easy to find online. I was able to find one for
HP-UX, but it was an old one from before pthreads were added to the
standard. One that I did find with the relevant information is for
QNX:
www.qnx.com/developers/docs/6.5.0/topic/com.qnx.doc.neutrino_prog/posix_conformance.html
According to that document, this is what QNX does:
For threads with scheduling allocation domains of size greater
than one, the rules defined for SCHED_FIFO, SCHED_RR, and
SCHED_SPORADIC are followed such that if the thread becomes the
head of its thread list, the thread may become the runnable thread
on any processor in its scheduling allocation domain if it has a
higher priority than the running thread on one of those processors.
This may in turn result in the preempted thread's becoming the
running thread on a different processor if that thread also has a
scheduling allocation domain of size greater than one and its
priority is higher than the running thread on one of the other
processors in its scheduling allocation domain.
If a thread's scheduling allocation domain has a size greater than
one, a runnable thread selects the processor on which it becomes
the runnable thread as follows:
1. For each processor in the thread's scheduling allocation domain,
find the processor whose running thread has the lowest priority
that is less than this thread.
2. If the search identifies only one processor, select that
processor and make the thread the running thread on that processor.
3. If the search identifies more than one processor, select the
processor on which the thread previously ran before becoming a
blocked thread.
4. If the thread hadn't run on any of the selected processors
before becoming a blocked thread, select the first processor
that was found in the search.
--
Geoff Clare <net...@gclare.org.uk>
Thanks.
The thing is that (priorities aside) in the case of T threads running on
P processors with P = T (all threads can run in parallel, priorities are
irrelevant), sched_yield() is supposed to be a no-op, do you agree?
regards,
alexander.
> Geoff Clare wrote:
>>
>> Alexander Terekhov wrote:
>>
>> > Geoff Clare wrote:
[...]
Yes, from the point of view of which threads run on which processors
and when. At a more detailed level, it wouldn't be quite the same as
a "pure" no-op, as there would be some processor time used by the O/S
to check that there isn't another thread that it should run instead.
--
Geoff Clare <net...@gclare.org.uk>
How is "too long" measured against the delay that pthread_yield() could incur?
No, it really does mean defined by the implementation. What it also often means in practice is that people who expect all documents in the world to be available free of charge on the World Wide Web, and who aren't prepared to do things like buy books, are stymied. "implementation defined" does not mean "available on the WWW at no cost".
Take the HP/UX POSIX Conformance Document, for example. Here's what HP says to those wanting to get hold of a copy, in the "manuals" entry in section 5 of the on-line manual:
> Ordering Information
>
> For information about how to order any of these manuals, as well as other HP
> computer and calculator manuals and supplies, call Parts Direct Ordering
> toll-free in the United States at 1-800-227-8164, or contact your nearest HP
> Sales and Support office outside the U.S.A. for information about ordering
> these manuals or versions of these manuals in other languages.
>
> [...]
> B2355-90049 | POSIX Conformance Document
Unless there's some ordering defined for a set of processors, that's not a very meaningful constraint, since it lacks a concrete definition of "first" that can be relied upon. ("first" by processor number? "first" by processor interrupt arbitration priority? "first" in order of address in kernel space of some per-processor data structure? "first" in some linked list that was scanned? "first" in processor bootstrap order? All are conceivable implementation strategies/side-effects.) Is there such an ordering given in what you read?
I gave the URL for the document; you could have looked yourself :-)
Anyway, I have searched it for all occurrences of "processor" and
didn't find anything relevant.
--
Geoff Clare <net...@gclare.org.uk>
Waiting for another thread to complete a task that may be virtually
unbounded is too long. Waiting for another scheduler timeslot is not.
The point is not that one wait is some number of milliseconds larger
than the other. The point is that it's a fundamentally different type
of wait. Threads can be forced to wait for a new scheduler timeslice
at any time anyway. If that's a problem, you're design is hopeless no
matter what you do (unless you use a non-standard scheduler or real-
time extensions). But having a thread wait for another thread to do
something is totally under your control.
DS