Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pthread question

55 views
Skip to first unread message

p6 c6d6

unread,
May 9, 2011, 12:28:59 AM5/9/11
to
can pthread_func_1() eat up 100% cpu.
i have the following scenario where pthread_fuc_1()
is waiting on some global variable to be set.
while pthread_func_2() does the setting of the global
variable once it finsishes it's busy work.
pthread_func_2() might wait on condition_variable while
it is doing the work. In this scenario i've observed that
my process is taking 100% cpu so i forced manual core
on analyzing the core i've observed that pthread_func_1()
thread is in Rnning state, pthread_func_2() is in cond_wait
state. i was wondering, why pthread_func_1() is taking 100%
cpu though there is pthread_yield(). i am guessing that
it is reponsible for 100% cpu as it is the only thread in
the running state, when forced it to coredump.

pthread_func_1() {
while (condition_not_met) {
pthread_yield();
}
}

pthread_func_2() {
while (there is busy work) {
doing some busy work;
wait on some condition var;
}
condition_not_met = 0;
}

Vaclav Haisman

unread,
May 9, 2011, 1:11:42 AM5/9/11
to
p6 c6d6 wrote, On 9.5.2011 6:28:
> can pthread_func_1() eat up 100% cpu.
> i have the following scenario where pthread_fuc_1()
> is waiting on some global variable to be set.
> while pthread_func_2() does the setting of the global
> variable once it finsishes it's busy work.
> pthread_func_2() might wait on condition_variable while
> it is doing the work. In this scenario i've observed that
> my process is taking 100% cpu so i forced manual core
> on analyzing the core i've observed that pthread_func_1()
> thread is in Rnning state, pthread_func_2() is in cond_wait
> state. i was wondering, why pthread_func_1() is taking 100%
> cpu though there is pthread_yield(). i am guessing that
> it is reponsible for 100% cpu as it is the only thread in
> the running state, when forced it to coredump.
>
> pthread_func_1() {
> while (condition_not_met) {
> pthread_yield();
Even with pthread_yeield() it is busy waiting. You are correct about why,
despite the yield, it runs all the time. Change it to use conditional
variable as the pthread_func_2() does to avoid that.

> }
> }
>
> pthread_func_2() {
> while (there is busy work) {
> doing some busy work;
> wait on some condition var;
> }
> condition_not_met = 0;
> }

--
VH

Joshua Maurice

unread,
May 9, 2011, 2:49:16 PM5/9/11
to

AFAIK, the following is an entirely legitimate and POSIX conforming
implemention of pthread_yield:
int pthread_yield() { return 0; }
That is, it can be implemented as a no-op. It would be contrary to the
"spirit" of pthread_yield, but no conforming portable program could
tell the difference. It depends on the internals of the scheduler, and
that is something with POSIX did not standardize.

Unless you're doing some fun low-level stuff like MMIO (stuff which is
by definition not portable), then use of pthread_yield indicates
likely bad design, a bug in the design, or just shoddy code.

Finally, note that as written, it seems that your pthread_func_1
function has a race condition. It seems that your pthread_func_1
thread reads an object that was written to by another thread without
proper synchronization, which is a race condition. Race conditions
lead to formally undefined behavior in POSIX, and also in the upcoming
C and C++ standards. Undefined behavior is bad. On a non-broken
implementation, any result is allowed, from nothing to the proverbial
hard drive format, though some results, like random seg faults, are
more likely than others.

Thus, if you want to remove the race condition, you need a mutex to
guard the shared state (or you need to go down to memory barriers,
std::atomic, or such - let's forget about that for now), and with a
mutex the use of a condition variable in place of yield follows
readily to solve this problem.

David Schwartz

unread,
May 9, 2011, 3:24:53 PM5/9/11
to
On May 9, 11:49 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:

> Unless you're doing some fun low-level stuff like MMIO (stuff which is
> by definition not portable), then use of pthread_yield indicates
> likely bad design, a bug in the design, or just shoddy code.

This may be your own statistical experience, having seen more bad code
than good code. But I think it's dangerous to generalize from your own
experiences. The most common use of pthread_yield I've seen looks like
this:

1) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
2) pthread_yield.
3) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
4) Oh well, do it the long way. Stop.
5) Do it the fast way.

This is common when code could benefit from acquiring a lock but
cannot wait for the lock. The reasons the thread cannot wait could
either be that it could delay the thread too long or that it could
cause a deadlock in very rare cases. Obviously, the code must be able
to perform its function whether it can get the lock or not. And there
would be no point unless the code path with the lock (say, performing
some simple operation) is significantly better than the code path
without the lock (say, dispatching another thread that can wait for
the lock to do the operation).

DS

Joshua Maurice

unread,
May 9, 2011, 5:22:35 PM5/9/11
to

I will acknowledge my relative inexperience in this area, but man that
looks hacky. It's almost like a poor man's spin lock. Why not just
"trust" that the implemention of mutex isn't stupid and is a spin lock
of some kind, and/or you write your own spin lock? Dunno. Just
pontificating.

Chris M. Thomasson

unread,
May 9, 2011, 6:33:49 PM5/9/11
to
"Joshua Maurice" <joshua...@gmail.com> wrote in message
news:695a8aff-464a-478d...@k3g2000prl.googlegroups.com...

On May 9, 12:24 pm, David Schwartz <dav...@webmaster.com> wrote:
> > On May 9, 11:49 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
[...]

> > 1) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
> > 2) pthread_yield.
> > 3) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
> > 4) Oh well, do it the long way. Stop.
> > 5) Do it the fast way.
> >
> > This is common when code could benefit from acquiring a lock but
> > cannot wait for the lock. The reasons the thread cannot wait could
> > either be that it could delay the thread too long or that it could
> > cause a deadlock in very rare cases. Obviously, the code must be able
> > to perform its function whether it can get the lock or not. And there
> > would be no point unless the code path with the lock (say, performing
> > some simple operation) is significantly better than the code path
> > without the lock (say, dispatching another thread that can wait for
> > the lock to do the operation).
>

> I will acknowledge my relative inexperience in this area, but man that
> looks hacky. It's almost like a poor man's spin lock. Why not just
> "trust" that the implemention of mutex isn't stupid and is a spin lock
> of some kind, and/or you write your own spin lock? Dunno. Just
> pontificating.

I think that `pthread_yield()' can be "useful" when you need to acquire
multiple locks on dynamically created objects. As for
`pthread_mutex_trylock()', well I have always thought that something like
the following locking scheme could end up being fairly useful in certain
scenarios:
___________________________________________________________
void a_worker_thread()
{

# define LIMIT 16

for (;;)
{
local_work_cache l_workc;
unsigned limit = LIMIT;

for (;;)
{
work* w = g_queue.try_pop();

if (! w)
{
if (limit != LIMIT)
{
pthread_mutex_lock(...);
break;
}

w = g_queue.wait_pop();
}

l_workc.push(w);

if (! pthread_mutex_trylock(...))
{
break;
}

else if (! (--limit))
{
pthread_mutex_lock(...);
break;
}
}

l_workc.process();

pthread_mutex_unlock(...);
}
}
___________________________________________________________

If you cannot acquire the mutex, then why not try to gather more work? Once
you hit a threshold of work and/or acquire the mutex, you process the
everything in a batch. The act of consuming more work, or doing something
else that is useful just might act as a kind of "backoff" mechanism. Instead
of spinning, you are doing some actual work... Of course that simple example
assumes the mutex must be held during the "processing phase". Humm... Think
about something like:
___________________________________________________________
void a_worker_thread()
{

# define LIMIT 16

for (;;)
{
local_work_cache l_workc;
unsigned limit = LIMIT;

for (;;)
{
work* w = g_queue.try_pop();

if (! w)
{
if (limit != LIMIT)
{
pthread_mutex_lock(...);
break;
}

w = g_queue.wait_pop();
}

l_workc.push(w);

if (! pthread_mutex_trylock(...))
{
break;
}

else if (! (--limit))
{
pthread_mutex_lock(...);
break;
}
}

l_workc.locked_process();

pthread_mutex_unlock(...);

l_workc.unlocked_process();
}
}
___________________________________________________________

You might be able to separate the work that needs to be protected by the
mutex from the work that can be executed locally; humm...


David Schwartz

unread,
May 10, 2011, 4:22:42 AM5/10/11
to
On May 9, 2:22 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:

> I will acknowledge my relative inexperience in this area, but man that
> looks hacky. It's almost like a poor man's spin lock. Why not just
> "trust" that the implemention of mutex isn't stupid and is a spin lock
> of some kind, and/or you write your own spin lock? Dunno. Just
> pontificating.

Two reasons:

1) A spin lock will block this thread until the mutex can be acquired.
Since this thread doesn't need the mutex to make forward progress,
this would be a serious pessimization.

2) What if this thread holds the lock that the thread that holds the
spin lock is blocked on? If we try to acquire a spin lock, we might
spin forever.

Remember, the condition is that we cannot afford to wait for a mutex,
but we can make forward progress 'better' with the mutex than without
it. We could just call pthread_mutex_trylock and give up if we don't
get it, but we'd like to try just a bit harder than that.

DS

Geoff Clare

unread,
May 10, 2011, 8:47:38 AM5/10/11
to
Joshua Maurice wrote:

> AFAIK, the following is an entirely legitimate and POSIX conforming
> implemention of pthread_yield:
> int pthread_yield() { return 0; }

There is no function called pthread_yield() in POSIX. I assume you
mean sched_yield().

> That is, it can be implemented as a no-op. It would be contrary to the
> "spirit" of pthread_yield, but no conforming portable program could
> tell the difference.

I think sched_yield() can be a no-op on implementations that don't
support the TPS (_POSIX_THREAD_PRIORITY_SCHEDULING) option. If the
TPS option is supported then POSIX has requirements on what
sched_yield() does as part of the description of the SCHED_FIFO policy:

9. When a running thread issues the sched_yield() function, the
thread becomes the tail of the thread list for its priority.

(It also applies to SCHED_RR and SCHED_SPORADIC, as they are described
by reference to SCHED_FIFO.)

It should be possible to write a program that uses some SCHED_FIFO
threads to detect whether sched_yield() "works".

> It depends on the internals of the scheduler, and
> that is something with POSIX did not standardize.

POSIX doesn't standardise internals, but it does standardise how
threads with the above mentioned scheduling policies are scheduled.

--
Geoff Clare <net...@gclare.org.uk>

Alexander Terekhov

unread,
May 10, 2011, 11:53:27 AM5/10/11
to

Geoff Clare wrote:
[...]

> 9. When a running thread issues the sched_yield() function, the
> thread becomes the tail of the thread list for its priority.

That makes sense for a uniprocessor.

But nowadays even pocket gadgets are multiprocessors.

> POSIX doesn't standardise internals, but it does standardise how
> threads with the above mentioned scheduling policies are scheduled.

http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html

"For application threads with scheduling allocation domains of size
greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and
SCHED_SPORADIC shall be used in an implementation-defined manner."

Translation: no rules whatsoever.

regards,
alexander.

Geoff Clare

unread,
May 11, 2011, 8:46:26 AM5/11/11
to
Alexander Terekhov wrote:

Good point. However, earlier in that section the standard says:

"It should be noted that the presence of multiple processors does not
automatically indicate a scheduling allocation domain size greater
than one. Conforming implementations on multi-processors may map all
or any subset of the CPUs to one or multiple scheduling allocation
domains"

> Translation: no rules whatsoever.

The way the domain-size-one rules are applied to larger domains is
"implementation-defined", which means implementations are required
to document how they are applied. Not quite the same as "no rules
whatsoever".

--
Geoff Clare <net...@gclare.org.uk>

Alexander Terekhov

unread,
May 11, 2011, 10:19:07 AM5/11/11
to

Geoff Clare wrote:
>
> Alexander Terekhov wrote:
>
> >
> > Geoff Clare wrote:
> > [...]
> >> 9. When a running thread issues the sched_yield() function, the
> >> thread becomes the tail of the thread list for its priority.
> >
> > That makes sense for a uniprocessor.
> >
> > But nowadays even pocket gadgets are multiprocessors.
> >
> >> POSIX doesn't standardise internals, but it does standardise how
> >> threads with the above mentioned scheduling policies are scheduled.
> >
> > http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
> >
> > "For application threads with scheduling allocation domains of size
> > greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and
> > SCHED_SPORADIC shall be used in an implementation-defined manner."
>
> Good point. However, earlier in that section the standard says:
>
> "It should be noted that the presence of multiple processors does not
> automatically indicate a scheduling allocation domain size greater
> than one. Conforming implementations on multi-processors may map all
> or any subset of the CPUs to one or multiple scheduling allocation
> domains"

That just says that the presence of multiple processors doesn't
guarantee parallel execution of threads. I'm talking about the case with
parallel execution of threads granted.

>
> > Translation: no rules whatsoever.
>
> The way the domain-size-one rules are applied to larger domains is
> "implementation-defined", which means implementations are required
> to document how they are applied. Not quite the same as "no rules
> whatsoever".

In praxis "implementation-defined" means undocumented to begin with.
Anyway since "implementation-defined" can change from version to version
by even the same vendor it does mean "no rules whatsoever" in disguise.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html#tag_01_05_02

"implementation-defined

Describes a value or behavior that is not defined by POSIX.1-2008 but is
selected by an implementor. The value or behavior may vary among
implementations that conform to POSIX.1-2008. An application should not
rely on the existence of the value or behavior. An application that
relies on such a value or behavior cannot be assured to be portable
across conforming implementations."

regards,
alexander.

Geoff Clare

unread,
May 12, 2011, 8:26:13 AM5/12/11
to
Alexander Terekhov wrote:

> Geoff Clare wrote:
>>
>> Alexander Terekhov wrote:
>>
>> >
>> > Geoff Clare wrote:
>> > [...]
>> >> 9. When a running thread issues the sched_yield() function, the
>> >> thread becomes the tail of the thread list for its priority.
>> >
>> > That makes sense for a uniprocessor.
>> >
>> > But nowadays even pocket gadgets are multiprocessors.
>> >
>> >> POSIX doesn't standardise internals, but it does standardise how
>> >> threads with the above mentioned scheduling policies are scheduled.
>> >
>> > http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
>> >
>> > "For application threads with scheduling allocation domains of size
>> > greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and
>> > SCHED_SPORADIC shall be used in an implementation-defined manner."
>> >

>> > Translation: no rules whatsoever.
>>
>> The way the domain-size-one rules are applied to larger domains is
>> "implementation-defined", which means implementations are required
>> to document how they are applied. Not quite the same as "no rules
>> whatsoever".
>
> In praxis "implementation-defined" means undocumented to begin with.

Huh? XBD7 section 2.1.2:

A conformance document with the following information shall be
available for an implementation claiming conformance to POSIX.1-2008.
[...]
The conformance document shall describe the behavior of the
implementation for all implementation-defined features defined in
POSIX.1-2008. This requirement shall be met by listing these
features and providing either a specific reference to the system
documentation or providing full syntax and semantics of these
features.

> Anyway since "implementation-defined" can change from version to version
> by even the same vendor it does mean "no rules whatsoever" in disguise.

No, it means each implementor chooses their own rules, but they must
tell you what those rules are. It is then up to application writers
to decide whether to rely on those rules. For applications targeted
at a small set of specific implementations, there might be some
advantage in doing so.

> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html#tag_01_05_02
>
> "implementation-defined
>
> Describes a value or behavior that is not defined by POSIX.1-2008 but is
> selected by an implementor. The value or behavior may vary among
> implementations that conform to POSIX.1-2008. An application should not
> rely on the existence of the value or behavior. An application that
> relies on such a value or behavior cannot be assured to be portable
> across conforming implementations."

This uses "should", which means it is a recommendation. The point
is that for maximum portability, application writers should not rely
on any implementation-defined behaviour. The application writer has
to decide whether portability is more important than some advantage
that might be gained by sacrificing a little portability.

--
Geoff Clare <net...@gclare.org.uk>

Alexander Terekhov

unread,
May 12, 2011, 9:28:07 AM5/12/11
to

Geoff Clare wrote:
[...]

> >> > "For application threads with scheduling allocation domains of size
> >> > greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and
> >> > SCHED_SPORADIC shall be used in an implementation-defined manner."
> >> >
> >> > Translation: no rules whatsoever.
> >>
> >> The way the domain-size-one rules are applied to larger domains is
> >> "implementation-defined", which means implementations are required
> >> to document how they are applied. Not quite the same as "no rules
> >> whatsoever".
> >
> > In praxis "implementation-defined" means undocumented to begin with.
>
> Huh? XBD7 section 2.1.2:
>
> A conformance document with the following information shall be
> available for an implementation claiming conformance to POSIX.1-2008.
> [...]
> The conformance document shall describe the behavior of the
> implementation for all implementation-defined features defined in
> POSIX.1-2008. This requirement shall be met by listing these
> features and providing either a specific reference to the system
> documentation or providing full syntax and semantics of these
> features.
>
> > Anyway since "implementation-defined" can change from version to version
> > by even the same vendor it does mean "no rules whatsoever" in disguise.
>
> No, it means each implementor chooses their own rules, but they must
> tell you what those rules are. . . .

Well, at

http://www.opengroup.org/csq/public/search.mhtml?t=XY1&sort=bycomponent

I can't find anything regarding rules for scheduling allocation domains
of size greater than one. Do you have a link?

regards,
alexander.

Geoff Clare

unread,
May 13, 2011, 9:00:22 AM5/13/11
to
Alexander Terekhov wrote:

The Open Group's CSQs don't cover everything the POSIX Conformance
Document (PCD) has to cover.

Originally I think the idea was that the PCD would be supplied with
the system's developer documentation. Of course, these days it
would make sense for them to be provided online, but it's up to
each implementor how they want to provide it. It does seem that
they are not easy to find online. I was able to find one for
HP-UX, but it was an old one from before pthreads were added to the
standard. One that I did find with the relevant information is for
QNX:

www.qnx.com/developers/docs/6.5.0/topic/com.qnx.doc.neutrino_prog/posix_conformance.html

According to that document, this is what QNX does:

For threads with scheduling allocation domains of size greater
than one, the rules defined for SCHED_FIFO, SCHED_RR, and
SCHED_SPORADIC are followed such that if the thread becomes the
head of its thread list, the thread may become the runnable thread
on any processor in its scheduling allocation domain if it has a
higher priority than the running thread on one of those processors.

This may in turn result in the preempted thread's becoming the
running thread on a different processor if that thread also has a
scheduling allocation domain of size greater than one and its
priority is higher than the running thread on one of the other
processors in its scheduling allocation domain.

If a thread's scheduling allocation domain has a size greater than
one, a runnable thread selects the processor on which it becomes
the runnable thread as follows:

1. For each processor in the thread's scheduling allocation domain,
find the processor whose running thread has the lowest priority
that is less than this thread.

2. If the search identifies only one processor, select that
processor and make the thread the running thread on that processor.

3. If the search identifies more than one processor, select the
processor on which the thread previously ran before becoming a
blocked thread.

4. If the thread hadn't run on any of the selected processors
before becoming a blocked thread, select the first processor
that was found in the search.

--
Geoff Clare <net...@gclare.org.uk>

Alexander Terekhov

unread,
May 13, 2011, 4:59:21 PM5/13/11
to

Thanks.

The thing is that (priorities aside) in the case of T threads running on
P processors with P = T (all threads can run in parallel, priorities are
irrelevant), sched_yield() is supposed to be a no-op, do you agree?

regards,
alexander.

Geoff Clare

unread,
May 16, 2011, 8:28:21 AM5/16/11
to
Alexander Terekhov wrote:

> Geoff Clare wrote:
>>
>> Alexander Terekhov wrote:
>>
>> > Geoff Clare wrote:
[...]

Yes, from the point of view of which threads run on which processors
and when. At a more detailed level, it wouldn't be quite the same as
a "pure" no-op, as there would be some processor time used by the O/S
to check that there isn't another thread that it should run instead.

--
Geoff Clare <net...@gclare.org.uk>

J de Boyne Pollard

unread,
May 17, 2011, 4:57:09 AM5/17/11
to
> The most common use of pthread_yield I've seen looks like this:
>
> 1) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
> 2) pthread_yield.
> 3) Call pthread_mutex_trylock. If we got the lock, skip to step 5.
> 4) Oh well, do it the long way. Stop.
> 5) Do it the fast way.
>
> This is common when code could benefit from acquiring a lock but
> cannot wait for the lock. The reasons the thread cannot wait could
> either be that it could delay the thread too long or that it could
> cause a deadlock in very rare cases. [...]

How is "too long" measured against the delay that pthread_yield() could incur?

J de Boyne Pollard

unread,
May 17, 2011, 5:10:38 AM5/17/11
to
> In praxis "implementation-defined" means undocumented to begin with.

No, it really does mean defined by the implementation. What it also often means in practice is that people who expect all documents in the world to be available free of charge on the World Wide Web, and who aren't prepared to do things like buy books, are stymied. "implementation defined" does not mean "available on the WWW at no cost".

Take the HP/UX POSIX Conformance Document, for example. Here's what HP says to those wanting to get hold of a copy, in the "manuals" entry in section 5 of the on-line manual:

> Ordering Information
>
> For information about how to order any of these manuals, as well as other HP
> computer and calculator manuals and supplies, call Parts Direct Ordering
> toll-free in the United States at 1-800-227-8164, or contact your nearest HP
> Sales and Support office outside the U.S.A. for information about ordering
> these manuals or versions of these manuals in other languages.
>
> [...]
> B2355-90049 | POSIX Conformance Document

J de Boyne Pollard

unread,
May 17, 2011, 5:21:19 AM5/17/11
to
> 4. If the thread hadn't run on any of the selected processors
> before becoming a blocked thread, select the first processor
> that was found in the search.

Unless there's some ordering defined for a set of processors, that's not a very meaningful constraint, since it lacks a concrete definition of "first" that can be relied upon. ("first" by processor number? "first" by processor interrupt arbitration priority? "first" in order of address in kernel space of some per-processor data structure? "first" in some linked list that was scanned? "first" in processor bootstrap order? All are conceivable implementation strategies/side-effects.) Is there such an ordering given in what you read?

Geoff Clare

unread,
May 17, 2011, 8:22:10 AM5/17/11
to

I gave the URL for the document; you could have looked yourself :-)

Anyway, I have searched it for all occurrences of "processor" and
didn't find anything relevant.

--
Geoff Clare <net...@gclare.org.uk>

David Schwartz

unread,
May 18, 2011, 2:53:02 PM5/18/11
to
On May 17, 1:57 am, J de Boyne Pollard <j.deboynepoll...@tesco.net>
wrote:

Waiting for another thread to complete a task that may be virtually
unbounded is too long. Waiting for another scheduler timeslot is not.

The point is not that one wait is some number of milliseconds larger
than the other. The point is that it's a fundamentally different type
of wait. Threads can be forced to wait for a new scheduler timeslice
at any time anyway. If that's a problem, you're design is hopeless no
matter what you do (unless you use a non-standard scheduler or real-
time extensions). But having a thread wait for another thread to do
something is totally under your control.

DS

0 new messages