Threads, conditions, sockets, selects...

malc

unread,

Aug 10, 2006, 9:36:45 PM8/10/06

to

Hello,

Not long ago i have written an application consisting of two processes
communicating with each other via simple socket based IPC, one acts as
a data producer and the other as the consumer. One might call them
server and client respectively. Server is purely I/O bound, client on
the other hand also does some data processing. Everything worked fine
up until the point where i run the pair on the other machine where
client just wasn't getting all the data it supposedly had no problem
consuming. After some head-scratching and a few kernel recompiles i
nailed the difference down to the change of HZ value. At 100HZ
everything works at 250 things break.

Maybe this is all very Linux specific, on the other hand maybe i just
do things incorrectly. Reduced test-case looks something like this:

======================================================================
server part consists of 3 threads

SELECT THREAD:
watches client socket for data (using select)
whenever select returns, reads the data (one char)
adds client + data to main threads queue and notifies main thread

SLEEP THREAD:
sleeps for 4 milliseconds
increments main threads sequence number and notifies main thread

MAIN THREAD:
waits for any notification

goes through the queue
if data on the client socket is 'U' {
sets seen_U variable
removes this item from the queue
}

if seen_U variable is set {
selects first item from the queue with data == 'L'
if items sequence number is less than mains {
if items sequence number != main sequence number + 1
complains

sends 'L' to the client
updates clients sequence number to match that of main
removes this item from the queue
clears seen_U
}
}

======================================================================
client part is one simple thread

loop:
sends 'L' to server
blocking reads from server (the value must be 'L')
busy loops
sends U to server
sleeps for a while

Full source code of the example and some additional information can be
found at http://www.boblycat.org/~malc/thtestc.c

Any hints/feedback will be greatly appreciated.

--
mailto:ma...@pulsesoft.com

gbb

unread,

Aug 11, 2006, 2:41:09 AM8/11/06

to

in your select_notifier and sleep_notifier functions you are unlocking the
mutex before signaling the condition. You need to signal the condition
before unlocking the associated mutex, otherwise you end up with a race
condition.

chris

malc

unread,

Aug 11, 2006, 9:12:06 AM8/11/06

to

gbb <geeb...@nubcake.net> writes:

> in your select_notifier and sleep_notifier functions you are unlocking the
> mutex before signaling the condition. You need to signal the condition
> before unlocking the associated mutex, otherwise you end up with a race
> condition.
>

In this particular case there's no race condition to be triggered by
the order of unlock/signal. And in any case this does not change the
behavior in any way. Also i think that in presence of guaranteed
spurious wakeups the point is kindof moot anyway.

--
mailto:ma...@pulsesoft.com

David Schwartz

unread,

Aug 11, 2006, 3:54:27 PM8/11/06

to

gbb wrote:
> in your select_notifier and sleep_notifier functions you are unlocking the
> mutex before signaling the condition. You need to signal the condition
> before unlocking the associated mutex, otherwise you end up with a race
> condition.

Nope. There is only one way unlocking before signalling can cause a
race condition, and it has never been observed in the wild. In fact,
it's damn near impossible to code up intentionally.

DS

malc

unread,

Aug 11, 2006, 9:08:29 PM8/11/06

to

"David Schwartz" <dav...@webmaster.com> writes:

I didn't know that even possibility of a race existed. Can you please
elaborate?

FWIW i have updated the test-case to:
1. use signal then unlock sequence (as mentioned before, this does not
change a thing)

2. optionally build without any trace of select (improves situation
(why?), but is unacceptable in a original program)

3. use CPU speed independent busy loop

4. optionally build with ad-hoc IPC (well more of an ITC)
this one does make a world of a difference on 250HZ kernels

Things i noticed: top shows that fastest version (ad-hoc IPC) uses
2-3% more CPU than the original. I experienced (at least once) that
even ad-hoc version starts missing data sometimes, as if some sort of
drift influences things.

http://www.boblycat.org/~malc/thtestc.c
http://www.boblycat.org/~malc/thtestc.c.orig - original version

Again hints are most welcome. Or suggestion of any other newsgroup/ML
where i can take it to.

--
mailto:ma...@pulsesoft.com

Maxim Yegorushkin

unread,

Aug 14, 2006, 4:01:10 AM8/14/06

to

gbb wrote:
> in your select_notifier and sleep_notifier functions you are unlocking the
> mutex before signaling the condition. You need to signal the condition
> before unlocking the associated mutex, otherwise you end up with a race
> condition.

Sorry, you are mistaken.

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_cond_signal.html
<q>
The pthread_cond_broadcast() or pthread_cond_signal() functions may be
called by a thread whether or not it currently owns the mutex that
threads calling pthread_cond_wait() or pthread_cond_timedwait() have
associated with the condition variable during their waits; however, if
predictable scheduling behavior is required, then that mutex shall be
locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal().
</q>

David R. Butenhof in Programming with POSIX Threads writes that
unlocking the mutex before signaling the condition may actually reduce
the number of context switches on some systems.

malc

unread,

Aug 14, 2006, 6:11:52 PM8/14/06

to

I belive i have found the way to make the scenario work without the
slowdown, however i'm at loss as to why adding unecessary socket I/O
makes it run as expected and whether this behavior is limited to Linux
or not. The speed boost FAST gives is something that i have really
hard time understanding and i doubt that it's just because Linux loves
symmetry.

To rehash (The redundant I/O is surrounded by #ifdef FAST)

void periodic_thread (server)
{
every 0.04 seconds {
lock (server);
server.seq++;
cond_signal (server);
unlock (server);
}
}

void ipc_thread (server)
{
forever {
wait_for_data_on (server.client.sock);
timeval tv, char c = recv (server.client.sock);
lock (server);
enqueue_tail (server.queue, {tv, c});
cond_signal (server);
unlock (server);
}
}

void server_thread (server)
{
seen_U = true;
forever {
cond_wait (server);

foreach (item in server.queue) {
timeval tv, char c = item;
if (c == 'U') {
seen_U = true;
deqeue (server.queue, item);
}
#ifdef FAST
send (server.client.sock, 'U');
#endif
}

if (seen_U) {
foreach (item in server.queue) {
timeval tv, char c = item;
if (c == 'L') {
send (server.client.sock, 'L');
server.client.seq = server.seq;
dequeue (item);
break;
}
}
}
}
}

void client_thread (server_sock)
{
forever {
send (server_sock, 'L');
timeval tv, char c = recv (server_sock);
assert (c == 'L');
busy_loop_for (0.009);
send (server_sock, 'U');
#ifdef FAST
timeval tv, char c = recv (server_sock);
assert (c == 'U');
#endif
sleep (0.02);
}
}

Real (buildable) test-case with a bit more information is available at:
http://www.boblycat.org/~malc/thtestc.c

Original test-case with some additional infromation can be found at:
http://www.boblycat.org/~malc/thtestc.c.orig

--
mailto:ma...@pulsesoft.com

David Schwartz

unread,

Aug 14, 2006, 11:25:59 PM8/14/06

to

malc wrote:

> > Nope. There is only one way unlocking before signalling can cause a
> > race condition, and it has never been observed in the wild. In fact,
> > it's damn near impossible to code up intentionally.

> I didn't know that even possibility of a race existed. Can you please
> elaborate?

Certainly.

The race scenario goes as follows:

1) One thread (call it A) is blocked on the condition variable, waiting
for the predicate to change. Assume A is the only thread blocked on the
condition variable.

2) Another thread (call it B) changes the predicate, but unlocks the
mutex before signalling.

3) Another thread, (call it C), notices the changed predicate and
sleeps on the condition variable. (It can do this because B unlocked
the mutex after it changed the predicate.)

4) B finally gets to signal. Unfortunately, the signal wakes C instead
of B.

5) C notices the predicate is the same as what it was when it decided
to wait and goes back to sleep.

Oops, A missed its wake up.

This is impossible if B signals while holding the mutex. The race
occurs only when C blocks on the condition variable after B changes the
predicate but before B signals the condition variable.

Note that this has never been detected in real live code and it darn
difficult to even code up an example of. It requires that the same
condition variable be used to signal at least two different conditions,
depending upon the previous value of the predicate.

Note that this race cannot occur with pthread_cond_broadcast, only with
pthread_cond_signal, because the problem is the wakeup going to the
"wrong" thread which is impossible with a wakeup that goes to all
threads.

DS

Maxim Yegorushkin

unread,

Aug 15, 2006, 7:15:57 AM8/15/06

to

David Schwartz wrote:
> malc wrote:
>
> > > Nope. There is only one way unlocking before signalling can cause a
> > > race condition, and it has never been observed in the wild. In fact,
> > > it's damn near impossible to code up intentionally.
>
> > I didn't know that even possibility of a race existed. Can you please
> > elaborate?
>
> Certainly.
>
> The race scenario goes as follows:
>
> 1) One thread (call it A) is blocked on the condition variable, waiting
> for the predicate to change. Assume A is the only thread blocked on the
> condition variable.
>
> 2) Another thread (call it B) changes the predicate, but unlocks the
> mutex before signalling.
>
> 3) Another thread, (call it C), notices the changed predicate and
> sleeps on the condition variable. (It can do this because B unlocked
> the mutex after it changed the predicate.)
>
> 4) B finally gets to signal. Unfortunately, the signal wakes C instead
> of B.
>
> 5) C notices the predicate is the same as what it was when it decided
> to wait and goes back to sleep.
>
> Oops, A missed its wake up.

It appears to me that the predicate in this scenario is not appropriate
to be usable with condition variables.

As spurios wake ups are probable, the predicate should convey enough
information whether the event being waited for has actually happened.
IOW, the predicate should work if instead pthread_cond_wait() one does
busy waiting on the predicate.

In the above scenario step 3 thread C should check the predicate and
notice that the event it is about to wait for has already happened and
then proceed without using pthread_cond_wait.

Am I missing something?

David Schwartz

unread,

Aug 15, 2006, 4:17:04 PM8/15/06

to

Maxim Yegorushkin wrote:

> > The race scenario goes as follows:
> >
> > 1) One thread (call it A) is blocked on the condition variable, waiting
> > for the predicate to change. Assume A is the only thread blocked on the
> > condition variable.
> >
> > 2) Another thread (call it B) changes the predicate, but unlocks the
> > mutex before signalling.
> >
> > 3) Another thread, (call it C), notices the changed predicate and
> > sleeps on the condition variable. (It can do this because B unlocked
> > the mutex after it changed the predicate.)
> >
> > 4) B finally gets to signal. Unfortunately, the signal wakes C instead
> > of B.
> >
> > 5) C notices the predicate is the same as what it was when it decided
> > to wait and goes back to sleep.
> >
> > Oops, A missed its wake up.

> It appears to me that the predicate in this scenario is not appropriate
> to be usable with condition variables.

Huh?

> As spurios wake ups are probable, the predicate should convey enough
> information whether the event being waited for has actually happened.
> IOW, the predicate should work if instead pthread_cond_wait() one does
> busy waiting on the predicate.

Right, and it does in this case.

> In the above scenario step 3 thread C should check the predicate and
> notice that the event it is about to wait for has already happened and
> then proceed without using pthread_cond_wait.

The event thread C is waiting for has not happened yet. That's why it
blocks.

Imagine, for example, that thread C is waiting for a queue to be empty
and thread B put something on the queue.

DS

David Schwartz

unread,

Aug 15, 2006, 9:11:29 PM8/15/06

to

malc wrote:
> I belive i have found the way to make the scenario work without the
> slowdown, however i'm at loss as to why adding unecessary socket I/O
> makes it run as expected and whether this behavior is limited to Linux
> or not. The speed boost FAST gives is something that i have really
> hard time understanding and i doubt that it's just because Linux loves
> symmetry.

Are you using TCP? It sounds like your sender is dribbling data rather
than sending it in reasonable chunks. Having the other side acknowledge
every chunk hides this problem because the TCP acknowledges piggy-back
with the application-level acknowledges.

If you can confirm that you are using TCP and the delays are on the
order of 200 milliseconds, I can give you some better suggestions.

DS

malc

unread,

Aug 15, 2006, 10:16:31 PM8/15/06

to

"David Schwartz" <dav...@webmaster.com> writes:

First of all, thanks for answering this, i was beginning to loose all
hope of ever understanding what the hell is going on under the hood.

> malc wrote:
>> I belive i have found the way to make the scenario work without the
>> slowdown, however i'm at loss as to why adding unecessary socket I/O
>> makes it run as expected and whether this behavior is limited to Linux
>> or not. The speed boost FAST gives is something that i have really
>> hard time understanding and i doubt that it's just because Linux loves
>> symmetry.
>
> Are you using TCP? It sounds like your sender is dribbling data rather
> than sending it in reasonable chunks. Having the other side acknowledge
> every chunk hides this problem because the TCP acknowledges piggy-back
> with the application-level acknowledges.

Full source is available at: http://www.boblycat.org/~malc/thtestc.c
Yes i do use TCP. It's dribbling (original version just sends 1 char),
but this is reasonable chunk from the applications perspective.

I'd like to note that i'm by no means an expert in networking and can't
see the relation between TCP acks and the problem at hand.

>
> If you can confirm that you are using TCP and the delays are on the
> order of 200 milliseconds, I can give you some better suggestions.

The source link above gives some timing information.

<quote>
legend:
1 - 2.4.30 (100HZ) Athlon(tm) Processor 1050.052MHz
2 - 2.6.17.6 (250HZ) Athlon(tm)64 X2 Dual Core Processor 3800+ 2000.000 MHz

time elapsed between client sending an 'L' to the server and the server
actually seeing it (after select/recv)

original
1 - 0.010
2 - 0.016

speedy-racer
1 - 0.000030 - 0.000070
2 - 0.000011
</quote>

0.016 (250HZ Linux kernel) is on the order of 200 milliseconds.

--
mailto:ma...@pulsesoft.com

David Schwartz

unread,

Aug 15, 2006, 11:20:57 PM8/15/06

to

malc wrote:

> Full source is available at: http://www.boblycat.org/~malc/thtestc.c
> Yes i do use TCP. It's dribbling (original version just sends 1 char),
> but this is reasonable chunk from the applications perspective.

Right, but it's not reasonable from the TCP stack's perspective. The
problem is that the TCP stack has no way to tell when it should send a
packet, so it guesses.

> I'd like to note that i'm by no means an expert in networking and can't
> see the relation between TCP acks and the problem at hand.

You write one byte. The network stack has no way to know that that's
not the last byte you're going to write, so it sends it immediately.
You send another byte. Now the network stack has no way to know you're
not going to send another, so it delays the send until the previous
byte is acknowledged by the other end.

If the other end replies, the acknowledge will piggy-back on the reply.
When the sender gets the acknowledge, it will send the next chunk.

If the other end does not reply, it will not acknowledge the first byte
immediately. So the sender will not send the next chunk immediately.

> 0.016 (250HZ Linux kernel) is on the order of 200 milliseconds.

Yep. You need to stop calling 'send' when there's going to be more data
almost immediately. Only call 'send' when you have at least 2KB of
data, you know there will not be any more data until the other side
sends some, or you know you will need a quite bit of computation before
more will be ready.

One ugly fix is to write your own 'send' function that adds the data to
a buffer rather than sending it directly. Have your 'recv' function
flush the buffer, flush the buffer if it exceeds 2Kb, and flush it
every 20 milliseconds or so from a timer. (Or, better, call a flush
function after you're done generating data and going to block on
network data.)

DS

malc

unread,

Aug 16, 2006, 12:00:31 AM8/16/06

to

"David Schwartz" <dav...@webmaster.com> writes:

> malc wrote:
>
>> Full source is available at: http://www.boblycat.org/~malc/thtestc.c
>> Yes i do use TCP. It's dribbling (original version just sends 1 char),
>> but this is reasonable chunk from the applications perspective.
>
> Right, but it's not reasonable from the TCP stack's perspective. The
> problem is that the TCP stack has no way to tell when it should send a
> packet, so it guesses.
>
>> I'd like to note that i'm by no means an expert in networking and can't
>> see the relation between TCP acks and the problem at hand.
>
> You write one byte. The network stack has no way to know that that's
> not the last byte you're going to write, so it sends it immediately.
> You send another byte. Now the network stack has no way to know you're
> not going to send another, so it delays the send until the previous
> byte is acknowledged by the other end.
>
> If the other end replies, the acknowledge will piggy-back on the reply.
> When the sender gets the acknowledge, it will send the next chunk.
>
> If the other end does not reply, it will not acknowledge the first byte
> immediately. So the sender will not send the next chunk immediately.
>

Aha. Thank you for thorough explanation.

>> 0.016 (250HZ Linux kernel) is on the order of 200 milliseconds.
>
> Yep. You need to stop calling 'send' when there's going to be more data
> almost immediately. Only call 'send' when you have at least 2KB of
> data, you know there will not be any more data until the other side
> sends some, or you know you will need a quite bit of computation before
> more will be ready.

There will NEVER be data available almost immediately. Data arrives
every N milliseconds, furthermore the data size is huge so the server
merely notifies the clients that there is new data available, client
fetches it via other means.

> One ugly fix is to write your own 'send' function that adds the data to
> a buffer rather than sending it directly. Have your 'recv' function
> flush the buffer, flush the buffer if it exceeds 2Kb, and flush it
> every 20 milliseconds or so from a timer. (Or, better, call a flush
> function after you're done generating data and going to block on
> network data.)

Thanks for suggestion but this will not work in this particular case.
Here server only sends data description and client merely informs the
server when it's done using it.

Sockets were chosen for a few reasons: brevity, multiplexing and
portability. The real-data is not available across network anyway.
I guess some other form of IPC is more appropriate here, but this,
sadly, will take the "cheap" portability away.

Once again - thank you, this has been very educational.

--
mailto:ma...@pulsesoft.com

Maxim Yegorushkin

unread,

Aug 16, 2006, 3:14:16 AM8/16/06

to

I may be wrong, I understood, that (2) thread B changed the predicate
while holding the mutex. Then (3) thread C acquired the mutex and
noticed that the predicate had been changed. What I did not understand,
why would then thread C go and sleep on a condition variable?

> Imagine, for example, that thread C is waiting for a queue to be empty
> and thread B put something on the queue.

To make it less confusing let's consider the following queue code,
which does pthread_cond_signal with the mutex released. Thread A is
blocked in pop(), thread B in push() has added something to the queue,
released the mutex, but has not called pthread_cond_signal yet. Thread
C enters pop() and finds out that the queue is not empty, gets the new
element thus leaving the queue empty and exits pop() function without
waiting on the condition. Thread B signals, thread A is awaken, finds
the queue empty and blocks on the condition again. Is there something
wrong?

template<class T>
class atomic_queue : private boost::noncopyable
{
private:
typedef std::list<T> items_type;

public:
struct cancelled : std::logic_error
{
cancelled() : std::logic_error("cancelled") {}
};

public:
atomic_queue() : cancel_(false) {}

public:
void cancel()
{
++cancel_;
cnd_.notify_all();
}

public:
void push(T const& t)
{
// this avoids an allocation inside the critical section
bellow
items_type tmp(&t, &t + 1);
{
boost::mutex::scoped_lock l(mtx_);
items_.splice(items_.end(), tmp);
}
cnd_.notify_one();
}

// this function provides only basic exception safety if T's copy
ctor can
// throw or strong exception safety if T's copy ctor is nothrow
T pop()
{
// this avoids copying T inside the critical section bellow
items_type tmp;
{
boost::mutex::scoped_lock l(mtx_);
while(!cancel_ && items_.empty())
cnd_.wait(l);
if(cancel_)
throw cancelled();
tmp.splice(tmp.end(), items_, items_.begin());
}
return tmp.front();
}

private:
boost::mutex mtx_;
boost::condition cnd_;
boost::detail::atomic_count cancel_;
items_type items_;
};

malc

unread,

Aug 16, 2006, 3:06:28 AM8/16/06

to

"David Schwartz" <dav...@webmaster.com> writes:

> malc wrote:
>
>> Full source is available at: http://www.boblycat.org/~malc/thtestc.c
>> Yes i do use TCP. It's dribbling (original version just sends 1 char),
>> but this is reasonable chunk from the applications perspective.
>
> Right, but it's not reasonable from the TCP stack's perspective. The
> problem is that the TCP stack has no way to tell when it should send a
> packet, so it guesses.
>
>> I'd like to note that i'm by no means an expert in networking and can't
>> see the relation between TCP acks and the problem at hand.
>
> You write one byte. The network stack has no way to know that that's
> not the last byte you're going to write, so it sends it immediately.
> You send another byte. Now the network stack has no way to know you're
> not going to send another, so it delays the send until the previous
> byte is acknowledged by the other end.
>
> If the other end replies, the acknowledge will piggy-back on the reply.
> When the sender gets the acknowledge, it will send the next chunk.
>
> If the other end does not reply, it will not acknowledge the first byte
> immediately. So the sender will not send the next chunk immediately.
>
>> 0.016 (250HZ Linux kernel) is on the order of 200 milliseconds.

<snip>

Forgot that two questions remained a mystery to me:
1. Do all OSes (or TCP implementations) behave more or less this way?
2. Why 250HZ kernel is being hit much harder than 100HZ one?

While both are more of a curiosity thing, second is actually puzzling.

--
mailto:ma...@pulsesoft.com

David Schwartz

unread,

Aug 16, 2006, 4:20:12 AM8/16/06

to

Maxim Yegorushkin wrote:

> I may be wrong, I understood, that (2) thread B changed the predicate
> while holding the mutex. Then (3) thread C acquired the mutex and
> noticed that the predicate had been changed. What I did not understand,
> why would then thread C go and sleep on a condition variable?

Because the predicate has changed to a predicate that is not the one
that C is waiting for.

> > Imagine, for example, that thread C is waiting for a queue to be empty
> > and thread B put something on the queue.
>
> To make it less confusing let's consider the following queue code,
> which does pthread_cond_signal with the mutex released. Thread A is
> blocked in pop(), thread B in push() has added something to the queue,
> released the mutex, but has not called pthread_cond_signal yet. Thread
> C enters pop() and finds out that the queue is not empty, gets the new
> element thus leaving the queue empty and exits pop() function without
> waiting on the condition. Thread B signals, thread A is awaken, finds
> the queue empty and blocks on the condition again. Is there something
> wrong?

No, because in this case the condition variable is only used to signal
a single predicate. Let me try a more explicit example.

Suppose you have a predicate that can have two values, 1 and 2. There
is one thread that sleeps on the predicate if it's one, and changes it
to two. There is another thread that sleeps on the predicate if it's
two, and changes it to one. Each thread signals the condition variable
after it changes the predicate. No other thread ever blocks on the
condition variable.

So thread 1 looks like this:

mutex_lock();
while(predicate==1) cond_wait();
predicate=2;
mutex_unlock();

and 2 looks like this:

mutex_lock();
while(predicate==2) cond_wait();
predicate=1;
mutex_unlock();

Now, suppose threads 1 and 2 ping pong the predicate back and forth a
bit, then a third thread comes by, with code like this:

mutex_lock();
if(predicate==1)
predicate=2;
cond_signal();
mutex_unlock();

Assuming no other threads touch anything, this code can never deadlock.
The 'cond_signal()' can only wake the thread that sleeps if the
predicate is 1.

But if we change that to:

mutex_lock();
if(predicate==1)
predicate=2;
mutex_unlock();
cond_signal();

Then *both* threads 1 and 2 can be sleeping on the condition variable
when we signal it. The thread that sleeps when the predicate is 1 could
have been sleeping before we locked the mutex and the thread that
sleeps when the predicate is 2 could have gone to sleep after we
unlocked the mutex. So the 'cond_signal' can wake the wrong thread.

However, it won't actually happen in this example, it's too simple to
illustrate the problem. Do you see why? This is why the race has never
been seen in the wild.

However, the fact remains that signalling a condition variable after
you release the mutex could result in waking a thread that could only
have gone to sleep after seeing the change you made to the predicate.
So it is a semantic change.

DS

Maxim Yegorushkin

unread,

Aug 16, 2006, 4:42:37 AM8/16/06

to

I see now. Thank you. In this case there is a race if the mutex is
unlocked before signaling the condition. Luckily, it only may happen
when one does something fancy and use one condition with more then one
predicate at the same time.

David Schwartz

unread,

Aug 16, 2006, 7:30:33 AM8/16/06

to

malc wrote:

> There will NEVER be data available almost immediately. Data arrives
> every N milliseconds, furthermore the data size is huge so the server
> merely notifies the clients that there is new data available, client
> fetches it via other means.

Then you should be fine. If you try to send a single byte when no data
has been sent for awhile, the byte will be sent immediately.

I have a feeling if you trace your program, you will find that it is
calling 'send' twice, each time for a single byte, with less than 200ms
between.

DS

malc

unread,

Aug 16, 2006, 3:35:23 PM8/16/06

to

"David Schwartz" <dav...@webmaster.com> writes:

No need to trace it's fairly evident from both the pseudo and real
codes, it sends an 'L' and then (0.009 seconds later) an 'U', the
cycle is repeated after 0.02 seconds.

--
mailto:ma...@pulsesoft.com

David Schwartz

unread,

Aug 17, 2006, 8:18:54 PM8/17/06

to

malc wrote:

> "David Schwartz" <dav...@webmaster.com> writes:

> >> There will NEVER be data available almost immediately. Data arrives
> >> every N milliseconds, furthermore the data size is huge so the server
> >> merely notifies the clients that there is new data available, client
> >> fetches it via other means.
> >
> > Then you should be fine. If you try to send a single byte when no data
> > has been sent for awhile, the byte will be sent immediately.
> >
> > I have a feeling if you trace your program, you will find that it is
> > calling 'send' twice, each time for a single byte, with less than 200ms
> > between.

> No need to trace it's fairly evident from both the pseudo and real
> codes, it sends an 'L' and then (0.009 seconds later) an 'U', the
> cycle is repeated after 0.02 seconds.

Right, stop doing that. Don't send an 'L' if you know you are going to
need to send a 'U' 9 milliseconds later. Otherwise, made the other side
acknowledge the 'L'.

It is easy enough to write code to delay the 'L' for, say, 20
milliseconds. If no new data is received by then, send the 'L'.
Otherwise, send them as a chunk. Alternatively, consider using a
protocol that is designed for low-latency.

You are forcing TCP into its worst case behavior.

DS

David Schwartz

unread,

Aug 17, 2006, 8:40:02 PM8/17/06

to

malc wrote:

> Forgot that two questions remained a mystery to me:

> 1. Do all OSes (or TCP implementations) behave more or less this way?

Yes. All modern ones that you are likely to run into, as far as I know.

> 2. Why 250HZ kernel is being hit much harder than 100HZ one?

The 250HZ signal is doing a more accurate job of implementing the
specification. The 100HZ kernel is erroring in your favor. I think.
200mS is what you should be seeing.

DS

malc

unread,

Aug 18, 2006, 2:16:43 AM8/18/06

to

"David Schwartz" <dav...@webmaster.com> writes:

> malc wrote:
>
>> "David Schwartz" <dav...@webmaster.com> writes:
>
>> >> There will NEVER be data available almost immediately. Data arrives
>> >> every N milliseconds, furthermore the data size is huge so the server
>> >> merely notifies the clients that there is new data available, client
>> >> fetches it via other means.
>> >
>> > Then you should be fine. If you try to send a single byte when no data
>> > has been sent for awhile, the byte will be sent immediately.
>> >
>> > I have a feeling if you trace your program, you will find that it is
>> > calling 'send' twice, each time for a single byte, with less than 200ms
>> > between.
>
>> No need to trace it's fairly evident from both the pseudo and real
>> codes, it sends an 'L' and then (0.009 seconds later) an 'U', the
>> cycle is repeated after 0.02 seconds.
>
> Right, stop doing that. Don't send an 'L' if you know you are going to
> need to send a 'U' 9 milliseconds later. Otherwise, made the other side
> acknowledge the 'L'.

'L' is always acked, 'U' is send when the work is done. One way of
thinking about it is:

client sends 'L' and blocks meaning the server upon availability of new
data will reply with description of this data and mark it as Locked.

client does some heavy duty CPU burning work on the data

client sends and U (i.e. - i'm done please Unlock the data)

>
> It is easy enough to write code to delay the 'L' for, say, 20
> milliseconds. If no new data is received by then, send the 'L'.
> Otherwise, send them as a chunk. Alternatively, consider using a
> protocol that is designed for low-latency.

Once again sockets are used chiefly because it's an easy way to do IPC.

> You are forcing TCP into its worst case behavior.

I see.

--
mailto:ma...@pulsesoft.com

malc

unread,

Aug 18, 2006, 2:18:32 AM8/18/06

to

"David Schwartz" <dav...@webmaster.com> writes:

So there's some spec somewhere that describes how the stack
implementation must adapt to the send/receive patterns?

Thanks for clarifications.

--
mailto:ma...@pulsesoft.com

David Schwartz

unread,

Aug 19, 2006, 12:29:38 AM8/19/06

to

malc wrote:

> So there's some spec somewhere that describes how the stack
> implementation must adapt to the send/receive patterns?
>
> Thanks for clarifications.

Yeah, the TCP specification. Google for 'TCP and 'Nagle' and/or
'delayed ack'.

If you are using this strictly over a LAN or on a single machine,
disabling Nagle with a 'setsockopt' may help.

DS

malc

unread,

Aug 19, 2006, 4:18:14 AM8/19/06

to

"David Schwartz" <dav...@webmaster.com> writes:

Thank you very much.

--
mailto:ma...@pulsesoft.com

Maxim Yegorushkin

unread,

Aug 19, 2006, 10:02:18 AM8/19/06

to

David Schwartz wrote:

[]

> The race scenario goes as follows:
>
> 1) One thread (call it A) is blocked on the condition variable, waiting
> for the predicate to change. Assume A is the only thread blocked on the
> condition variable.
>
> 2) Another thread (call it B) changes the predicate, but unlocks the
> mutex before signalling.
>
> 3) Another thread, (call it C), notices the changed predicate and
> sleeps on the condition variable. (It can do this because B unlocked
> the mutex after it changed the predicate.)
>
> 4) B finally gets to signal. Unfortunately, the signal wakes C instead
> of B.
>
> 5) C notices the predicate is the same as what it was when it decided
> to wait and goes back to sleep.
>
> Oops, A missed its wake up.
>
> This is impossible if B signals while holding the mutex. The race
> occurs only when C blocks on the condition variable after B changes the
> predicate but before B signals the condition variable.

After thinking a while, I wonder if holding a mutex while signaling
changes anything.

1) Suppose both A and C are blocked on the same condition waiting for
different predicates, say a and c correspondingly.

2) B modifies condition a and signals while mutex locked. It is
unspecified which thread should be woken up and here say it is C.

3) Thread C is woken up, it finds that predicate c is unchanged and
blocks again.

4) Signal has been eaten by thread C, thread A is not unblocked,
although predicate a it cares about was changed.

I wonder, can one ever use signal instead broadcast when the condition
is being used with more then one predicate?

If broadcast can only be used, is it necessary to hold the mutex while
broadcasting?

David Schwartz

unread,

Aug 19, 2006, 9:56:31 PM8/19/06

to

Maxim Yegorushkin wrote:
> David Schwartz wrote:
>
> []
>
> > The race scenario goes as follows:
> >
> > 1) One thread (call it A) is blocked on the condition variable, waiting
> > for the predicate to change. Assume A is the only thread blocked on the
> > condition variable.
> >
> > 2) Another thread (call it B) changes the predicate, but unlocks the
> > mutex before signalling.
> >
> > 3) Another thread, (call it C), notices the changed predicate and
> > sleeps on the condition variable. (It can do this because B unlocked
> > the mutex after it changed the predicate.)
> >
> > 4) B finally gets to signal. Unfortunately, the signal wakes C instead
> > of B.
> >
> > 5) C notices the predicate is the same as what it was when it decided
> > to wait and goes back to sleep.
> >
> > Oops, A missed its wake up.
> >
> > This is impossible if B signals while holding the mutex. The race
> > occurs only when C blocks on the condition variable after B changes the
> > predicate but before B signals the condition variable.
>
> After thinking a while, I wonder if holding a mutex while signaling
> changes anything.

It does.

> 1) Suppose both A and C are blocked on the same condition waiting for
> different predicates, say a and c correspondingly.

> 2) B modifies condition a and signals while mutex locked. It is
> unspecified which thread should be woken up and here say it is C.

Correct, so you cannot do this if a thread could be sleeping on the
condition variable that would not handle the predicate.

> 3) Thread C is woken up, it finds that predicate c is unchanged and
> blocks again.
>
> 4) Signal has been eaten by thread C, thread A is not unblocked,
> although predicate a it cares about was changed.

This is a different example. In my example, so long as you signal with
the mutex locked, there can never be two threads blocked on the
condition variable.

This is a different example wherein there is a problem either way.

> I wonder, can one ever use signal instead broadcast when the condition
> is being used with more then one predicate?

Absolutely. However, if threads could be blocked on either predicate at
the same time, you must either use pthread_cond_broadcast or re-signal
the condition variable if the wrong thread wakes up.

> If broadcast can only be used, is it necessary to hold the mutex while
> broadcasting?

I know of no possible reason with pthread_cond_broadcast regardless of
whether you hold the mutex or not. The race condition I mentioned
(which is the only one known) occurs when the "wrong" thread gets
woken, a thread that could only block *after* the mutex is released. A
broadcast cannot wake the wrong thread, so no race condition should be
possible.

DS