Multithreaded server

leg-0

unread,

Sep 24, 1999, 3:00:00 AM9/24/99

to

I have written a server application for W9x. At the moment this thing is
single threaded. but I need it to be multithreaded so that it could
serve more than one client at the same time. But I don't know HOW. When
I try to connect to this server when one client is using it then i get
no response from server until the other client finishes it's job. When I
try make another socket to listen to the same port as the forst one,
then it also won't work, I can't even do that, because I get an error
message.

1) Is there a way that I can make on socket handle more than one client
at the same time? if yes then how is this done ?
2) Is it possible to make more than one socket to listen to the same
port ? if yes then how ??
3) what is the most reasonable way to solve this problem?

thanx.

--
Ahti Legonkov
(le...@ut.ee)

boris

unread,

Sep 25, 1999, 3:00:00 AM9/25/99

to

leg-0 <ah...@regio.ee> schrieb in im Newsbeitrag:
37EBCD3A...@regio.ee...

> I have written a server application for W9x. At the moment this thing is
> single threaded. but I need it to be multithreaded so that it could
> serve more than one client at the same time. But I don't know HOW. When

Doesn't Windows have anything like select()?
(yes, I know, has nothing to do with this question but I am just interested)

> I try to connect to this server when one client is using it then i get
> no response from server until the other client finishes it's job. When I
> try make another socket to listen to the same port as the forst one,
> then it also won't work, I can't even do that, because I get an error
> message.
>

> [...]

> 3) what is the most reasonable way to solve this problem?

The most reasonable way in my opinion is: Server is listening for new
clients and whenever a new client connects the server creates a thread to
handle this new client. After creating the thread which now communicates
with the client the server is listening again for new clients.
This should work if the clients don't need to communicate with each other.
Then it gets little more complicated because of mutexes and so on.

Boris

David Schwartz

unread,

Sep 27, 1999, 3:00:00 AM9/27/99

to

leg-0 wrote:

> 1) Is there a way that I can make on socket handle more than one client
> at the same time? if yes then how is this done ?

One socket handle for a socket that is 'listen'ing can be used in
'accept' as many times as you wish. Each time, when a client connects,
you will get back a new socket handle that relates to only that one
connection.

> 2) Is it possible to make more than one socket to listen to the same
> port ? if yes then how ??

Just keep calling 'accept'. Each time it returns with success, you will
get a new socket handle that refers to a new connection to that same
port.

> 3) what is the most reasonable way to solve this problem?

What was the problem? How to handle multiple connections in a single
program? Use overlapped I/O or asynchronous I/O events.

If you use threads, please don't use one thread for each connection (or
worse, two threads per connection, one for each direction). That is just
ugly design.

DS

RickB

unread,

Oct 2, 1999, 3:00:00 AM10/2/99

to

Why is it an ugly design. In OOPs terms it could be quite elegant, if
programmed correctly. I wonder how many proxy servers or lad balancing
solutions DON'T use a thread for each connection.

RickB

David Schwartz <dav...@webmaster.com> wrote in message
news:37EF1EC3...@webmaster.com...

David Schwartz

unread,

Oct 2, 1999, 3:00:00 AM10/2/99

to

RickB wrote:
>
> Why is it an ugly design. [One thread (or two threads) per connection]

1) It wastes resources (threads and thread stacks).

2) It's extremely inefficient (context switch required every time you
change which client you're dealing with).

3) It leaves no good way to deal with the inability to create a thread,
other than dropping the client.

4) It's no easier (in fact, it's generally barely different) from
making blocking socket calls.

5) It's more difficult to deal with connection set up and tear down.

6) It can become difficult to share all those threads, especially if
the server has any significant amount of shared state.

> In OOPs terms it could be quite elegant, if
> programmed correctly.

There is seldom any significant programmatical or strucutral
difference. If you imagine each of your many threads in a loop something
like:

void Connection::Process(void)
{
while(!Connection->Shutdown())
{
if(read(Connection()->buffer,...)) Connection()->ProcessReadData();
}
}

All it changes to is some other thread in a poll/select loop that
either:

1) Calls Connection::Process when a file descriptor is ready for I/O,
or

2) Queues a job to a job pool that ultimately results in
Connection::Process being called.

It's going to get pretty hard to sustain an argument that this somehow
violates OOP principles. This can easily be as structured as you want.

> I wonder how many proxy servers or lad balancing
> solutions DON'T use a thread for each connection.

Any that aim to handle thousands of connections.

DS

Patrick TJ McPhee

unread,

Oct 2, 1999, 3:00:00 AM10/2/99

to

In article <7t50es$a42$1...@winter.news.rcn.net>,
RickB <rbr...@oaktreepeak.com> wrote:

[about thread-per-connection design]

% Why is it an ugly design. In OOPs terms it could be quite elegant, if
% programmed correctly. I wonder how many proxy servers or lad balancing
% solutions DON'T use a thread for each connection.

I don't know if I would use the world `ugly', but thread-per-connection
designs have scalability problems and they use more resources than are
necessary, which will ultimately lead to poor performance and failure.

The principal feature of threaded architectures is that resources, such
as connections to outside processes, are shared among all the threads,
meaning that any thread could be doing something with any connection.
You can take advantage of this in quite a few ways, but the obvious ones
are that you can have a small number of threads handling a large number
of connections, and that you can have more than one thread handling a
particularly busy connection.

The advantage of having fewer threads than connections is that you
reduce memory requirements and the number of scheduling entities. This
will tend to improve performance, and it will increase the total number
of connections you can handle. It gives you the flexibility to tune the
application to the machine it's running on, too. If you have a fairly
muscular machine with hundreds of CPUs and gigs of main memory, you
might want to kick off a few more threads than you would on a single-CPU
PC with 32 megs of memory. On the bigger machine, you might have more
threads than connections under normal loads, allowing some connections
to have requests processed in parallel.

Obviously, if all the connections are busy 100% of the time, having
fewer threads than connections isn't going to give you anything, but if
that's the case, you're probably better off not using a threaded
architecture, since all it will be doing is introducing synchronisation
contention in things like malloc and taking away the memory protection
offered by separate process address spaces.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

RickB

unread,

Oct 3, 1999, 3:00:00 AM10/3/99

to

You make some valid points. It's just that your statements are so generally
sweeping. Surely you would acknowledge that specific implementations might
require different approaches. The reason for my post was that the guy that
asked the question didn't state anything about the requirements of the
application he was trying to create. So it's possible that my responses (or
his question) could be based on a different set of assumptions.

1) Resources: NT or Unix? Quite a different set of resource requirements
for threading. Win32 threads are quite cheap, although it's true you still
have a stack for each. So with 1000 threads, I have a 32K stack per
thread(I'm being generous). That's 32 meg. For a server app that's
handling a thousand simultanoeus connections, and where performance is the
primary concern, big deal.

2) Again, NT or Unix. That's a pretty wide brush to paint all OSes with. If
the program is written for NT, thread switching is cheap. For each Unix,
the requirements will be different.

3) I don't even understand your point here. If the main thread is sitting
on an accept, when a connection is made, he creates a client object which
handles the connection and creates it's own threads, then returns to the
accept. The other clients (connections) that are already established just
keep on chugging.

4) They're not mutually exclusive.

5) Not for me.

6) Are we talking about structured programming or OOPs? What happened to
encapsulation? You have created a connection object that allows it's
members to be passed to C calls. Why bother using C++ at all? You're simply
using a class as a structure. How about a design that only has 3 public
members; a pure virtual method for ProcessData, which gets called only after
data is read, a public method called SendData, and a third called
TerminateConnection. Then any derived class can handle those the way they
want and the socket implementation is hidden. Of course, you could have
informational methods as well (AreWeConnected, GetClientIP, GetServerIP,
etc.)?

If you have a single select with, say 1000 clients reading and writing, and
you handle them in a single thread, you would wait on the select and then
handle the reads and writes sequentially. You think this would give better
(equal) performance?

It's interesting that you don't think that any high volume proxies or load
balancing solutions rely on 100s or 1000s of threads. I work for a very
large software company and have evaluated a few of these types of programs.
In fact, I haven't seen any that don't. But what's even more interesting is
that every implementation that I have seen in Java relies heavily on
threading. Personally, for an enterprise app, I would be afraid to write it
in Java and depend on a machine dependant implementation of threading in the
VM that I had no control over. I haven't seen those perform particularly
well.

BTW, in the case of the load balancing solution, all the connections will be
busy all the time, passing data back and forth, so you will have to activley
service them all as quickly as possible.

Rick

David Schwartz <dav...@webmaster.com> wrote in message

news:37F655B2...@webmaster.com...

>
> RickB wrote:
> >
> > Why is it an ugly design. [One thread (or two threads) per connection]
>
> 1) It wastes resources (threads and thread stacks).
>
> 2) It's extremely inefficient (context switch required every time you
> change which client you're dealing with).
>
> 3) It leaves no good way to deal with the inability to create a thread,
> other than dropping the client.
>
> 4) It's no easier (in fact, it's generally barely different) from
> making blocking socket calls.
>
> 5) It's more difficult to deal with connection set up and tear down.
>
> 6) It can become difficult to share all those threads, especially if
> the server has any significant amount of shared state.
>

> > In OOPs terms it could be quite elegant, if

> > programmed correctly.
>
> There is seldom any significant programmatical or strucutral
> difference. If you imagine each of your many threads in a loop something
> like:
>
> void Connection::Process(void)
> {
> while(!Connection->Shutdown())
> {
> if(read(Connection()->buffer,...)) Connection()->ProcessReadData();
> }
> }
>
> All it changes to is some other thread in a poll/select loop that
> either:
>
> 1) Calls Connection::Process when a file descriptor is ready for I/O,
> or
>
> 2) Queues a job to a job pool that ultimately results in
> Connection::Process being called.
>
> It's going to get pretty hard to sustain an argument that this somehow
> violates OOP principles. This can easily be as structured as you want.
>

> > I wonder how many proxy servers or lad balancing

> > solutions DON'T use a thread for each connection.
>

David Schwartz

unread,

Oct 3, 1999, 3:00:00 AM10/3/99

to

RickB wrote:
>
> You make some valid points. It's just that your statements are so generally
> sweeping.

That's because we don't have a specific example to deal with. So I'm
going to stick with the basics.

> Surely you would acknowledge that specific implementations might
> require different approaches.

I know of no implementation that requires one thread per connection.
Simply because it is so trivial to convert one architecture to the
other, and the benefits are so clear.

> The reason for my post was that the guy that
> asked the question didn't state anything about the requirements of the
> application he was trying to create. So it's possible that my responses (or
> his question) could be based on a different set of assumptions.
>
> 1) Resources: NT or Unix? Quite a different set of resource requirements
> for threading. Win32 threads are quite cheap, although it's true you still
> have a stack for each. So with 1000 threads, I have a 32K stack per
> thread(I'm being generous). That's 32 meg. For a server app that's
> handling a thousand simultanoeus connections, and where performance is the
> primary concern, big deal.

If performance is the primary concern, why would you want so many extra
context switches? And why would you want to perform so many more
blocking system calls -- blocking calls are so much more expensive than
non-blocking calls.

> 2) Again, NT or Unix. That's a pretty wide brush to paint all OSes with. If
> the program is written for NT, thread switching is cheap. For each Unix,
> the requirements will be different.

Thread switching is always more expensive than not thread switching!

> 3) I don't even understand your point here. If the main thread is sitting
> on an accept, when a connection is made, he creates a client object which
> handles the connection and creates it's own threads, then returns to the
> accept. The other clients (connections) that are already established just
> keep on chugging.

And what if creating a thread fails?

> 4) They're not mutually exclusive.

Huh?

> 5) Not for me.

Probably because you are imagining the work you have to do for this as
if it there was no way to avoid it. Have you ever written an application
that used a thread pool to manage network I/O?

> 6) Are we talking about structured programming or OOPs? What happened to
> encapsulation? You have created a connection object that allows it's
> members to be passed to C calls. Why bother using C++ at all? You're simply
> using a class as a structure. How about a design that only has 3 public
> members; a pure virtual method for ProcessData, which gets called only after
> data is read, a public method called SendData, and a third called
> TerminateConnection. Then any derived class can handle those the way they
> want and the socket implementation is hidden. Of course, you could have
> informational methods as well (AreWeConnected, GetClientIP, GetServerIP,
> etc.)?

I give up. I think you're just being stubborn here. As I said, you can
make the implementation as neat as you like. There are plenty of clean
class libraries that already do this.

> If you have a single select with, say 1000 clients reading and writing, and
> you handle them in a single thread, you would wait on the select and then
> handle the reads and writes sequentially. You think this would give better
> (equal) performance?

Who said I would do that? And, believe it or not, even that probably
would, because non-blocking system calls are so cheap, it's almost
impossible to make them the limiting factor if the data that is going to
be sent is already computed.

> It's interesting that you don't think that any high volume proxies or load
> balancing solutions rely on 100s or 1000s of threads. I work for a very
> large software company and have evaluated a few of these types of programs.
> In fact, I haven't seen any that don't. But what's even more interesting is
> that every implementation that I have seen in Java relies heavily on
> threading. Personally, for an enterprise app, I would be afraid to write it
> in Java and depend on a machine dependant implementation of threading in the
> VM that I had no control over. I haven't seen those perform particularly
> well.

I agree.

> BTW, in the case of the load balancing solution, all the connections will be
> busy all the time, passing data back and forth, so you will have to activley
> service them all as quickly as possible.

Then the last thing you want is the operating system having to do
thousands of senseless context switches a second! Not to mention, that
causes you to lose the ability to control the fairness of the servicing,
since the OS can wake/schedule threads any damn way it pleases.

DS

RickB

unread,

Oct 3, 1999, 3:00:00 AM10/3/99

to

David,

This is going in the wrong direction. I'm not trying to be religious.
You've made some valid points, I've enjoyed the exchange, and I've learned a
bit.

Yes, I've written network I/O apps with thread pools, for a different type
of implementation. Yes, I am quite capable of being stubborn. But if you
say "I know of no implementation that requires one thread per connection",
then you are implying "and therefore, no implementations with this
requirement can exist". My only point is, as you have pointed out, we don't
know the specific design requirements. All I'm saying is that it's
difficult to make sweeping statements about what would be the "correct"
architecture without those details.

I've seen some of your other posts, and you are quite knowledgeable. I'm
looking forward to another constructive thread (pun intended).

Regards,
Rick

David Schwartz <dav...@webmaster.com> wrote in message

news:37F785A1...@webmaster.com...

David Schwartz

unread,

Oct 3, 1999, 3:00:00 AM10/3/99

to

RickB wrote:
>
> David,
>
> This is going in the wrong direction. I'm not trying to be religious.
> You've made some valid points, I've enjoyed the exchange, and I've learned a
> bit.
>
> Yes, I've written network I/O apps with thread pools, for a different type
> of implementation. Yes, I am quite capable of being stubborn. But if you
> say "I know of no implementation that requires one thread per connection",
> then you are implying "and therefore, no implementations with this
> requirement can exist". My only point is, as you have pointed out, we don't
> know the specific design requirements. All I'm saying is that it's
> difficult to make sweeping statements about what would be the "correct"
> architecture without those details.

I have talked to numerous other experts in the field, we have been
unable to come up with a single hypothetical set of requirements where
this would be the best solution. The closest we have come to such a
requirement is a web server that needs to get most of its files over
slow WAN links to remote NFS servers. And even then, there are much
better ways, they're just much more complex and possibly not worth the
effort.

> I've seen some of your other posts, and you are quite knowledgeable. I'm
> looking forward to another constructive thread (pun intended).

*laugh*

I have been getting a bit religious, and as I said, it's mostly because
of my background. I'm often employed as a consultant to solve software
performance problems, it's what I specialize in, so naturally I tend to
overstress performance.

However, failure to choose optimum algorithms and architectures is the
best way to get yourself into a performance jam. I will tell you from
experience, choosing better architectures almost always leads to easier
implementation, and it always leads to a superior end product that can
do more work with less resources.

I also left out one other disadvantage to a 'one thread per client'
architecture, though it was implied in my comment about worrying about
scheduling. I'm an advocate of 'any thread can do any job' type
architectures, that is, where threads are as symmetric as possible. 'One
thread per client' is a type of 'special threads do special jobs'
architecture, and it suffers from one massive drawback for that
architecture, at least, on most platforms.

Most platforms provide you with synchronization primitives that are
fast and efficient but 'unfair'. For example, POSIX mutexes provide no
guarantee of fairness. This is because fairness is expensive. Threads
are inherently asynchronous and the more synchronization you force, the
more of a penalty you pay.

To see this in a simple example, on a single CPU, you will be most
efficient if you let each thread run as long as possible. However, this
will be very 'unfair' and lead to bursty response on a 'one thread per
connection' model. So you can't do this.

If you have a 'one thread per connection' architecture, you have to be
seriously worried about fairness. It's perfectly legal, for example, for
an architecture to yield a mutex to the 'lowest numbered' thread waiting
for it. This would cause, in a one-thread-per-connection architecture,
there to be 'fast' clients and 'slow' clients, and it would be up to the
luck of the draw. In other words, the vagaries of the system scheduler
and synhronization primitive implementation (and even socket hash
implementation) can have _massive_ effects on how your application
distributes resources to clients.

You can deal with this two ways -- ignore it and hope for the best, or
try to implement your own mechanisms to ensure fairness. Ignoring it
could lead your application to stall some connections on some
architectures and can lead to serious DoS attacks. Trying to code around
it and ensure fairness can lead to complexity that can lead to failure.

In addition, you will also need to be very careful about DoS attacks.
Usually people who implement one-thread-per-connection also
create/destroy threads on each new connection. This means that
connection bursts can lead to thread create/destroy cycles. This not
only kills the CPU but can lead to incredible bursts of memory
consumption and fragmentation. See other threads in this NG for some
stories of those kinds of problems.

Even if you cache your threads (which is probably as much work as using
a real thread pool correctly and ditching the one-thread-per-connection
architecture), you can still be trivially forced to create large numbers
of threads. The extra work you have to do to deal with DoS attacks can
easily swamp any 'savings' from using a poor architecture.

There are other issues as well. If your application has more than a
very small amount of shared state or shared resources, you will have one
thread per connection contending for them. This will dramatically
increase the cost of synchronization and often leads to performance that
falls off worse than linearly with the number of active clients.

I could go on for days, but I'll just sum up what I've learned from
years of experience: One thread per connection _may_ be easier to
implement for some very simple servers with almost no shared state;
however, it never performs better that thread pool type architectures
and is often, in reality, more difficult to implement. One thread per
connection designs never perform well and suck much more CPU and memory
than other architectures.

And you know what, it's easier if you learn the right way first --
fewer bad habits to break.

David Schwartz

Mark Bell

unread,

Oct 3, 1999, 3:00:00 AM10/3/99

to

In article <37F655B2...@webmaster.com>, dav...@webmaster.com
says...

>
> RickB wrote:
> >
> > Why is it an ugly design. [One thread (or two threads) per connection]
>
> 1) It wastes resources (threads and thread stacks).

Minor?

> 2) It's extremely inefficient (context switch required every time you

> change which client you're dealing with).

Minor? When "threads" were heavyweight processes this was so?

> 3) It leaves no good way to deal with the inability to create a thread,
> other than dropping the client.

(no comment)

> 4) It's no easier (in fact, it's generally barely different) from
> making blocking socket calls.
> 5) It's more difficult to deal with connection set up and tear down.

What if it's a local server rather than remote (i.e. you would be using
other IPC than sockets)?

> 6) It can become difficult to share all those threads, especially if
> the server has any significant amount of shared state.

(no comment)

> > I wonder how many proxy servers or lad balancing
> > solutions DON'T use a thread for each connection.
>
> Any that aim to handle thousands of connections.

[simultaneous] connections. How would you deal with this if the IPC
wasn't sockets?

Mark

Baz

unread,

Oct 4, 1999, 3:00:00 AM10/4/99

to

: that every implementation that I have seen in Java relies heavily on

: threading. Personally, for an enterprise app, I would be afraid to write it
: in Java and depend on a machine dependant implementation of threading in the
: VM that I had no control over. I haven't seen those perform particularly
: well.

i'm half-way through writing my own multi-threaded server which sits over socket
connections, and a few months ago i asked on the best way to proceed. And everyone said
DO NOT USE ONE-THREAD-PER-CONNECTION. I can understand the scalability problems, and so i
used a thread-pool implementation. However, I also have taken to modelling/prototyping my
classes in Java, before actually implementing them in Delphi (on NT) - i like the java
syntax and i find i can concentrate on my design rather than the Win32 api(!)), but as far
as i can see, it's impossible to implement anything other than a 1-thread-per-connection
socketserver in Java - at least without relying on exceptions to signal timeouts, which
will bring its own costs. am i right about this? if i am, then surely that really limits
java's power on the server?

Baz

PS: i've never actually used java in anger, i simply use it as a thinking tool. however,
i would like to write a serious piece of software using it, but the more i see, the more
doubts i have about it.

Stefan Seefeld

unread,

Oct 4, 1999, 3:00:00 AM10/4/99

to

RickB wrote:
>
> Why is it an ugly design. In OOPs terms it could be quite elegant, if
> programmed correctly. I wonder how many proxy servers or lad balancing

> solutions DON'T use a thread for each connection.

I think the basic point is that OO and threading are orthogonal concepts.
Don't couple them. You may use whatever threading strategy you like (thread
per request, thread per connection, thread pool) but keep it separated from
the rest of your code. Then you simply have a central entry point from which
you can let a thread take up a task, being the execution of a single object's
method or whatever.

Stefan

_______________________________________________________

Stefan Seefeld
Departement de Physique
Universite de Montreal
email: seef...@magellan.umontreal.ca

_______________________________________________________

...ich hab' noch einen Koffer in Berlin...

Bil Lewis

unread,

Oct 4, 1999, 3:00:00 AM10/4/99

to

Baz,

> but as far
> as i can see, it's impossible to implement anything other than a 1-thread-per-connection
> socketserver in Java - at least without relying on exceptions to signal timeouts, which
> will bring its own costs. am i right about this? if i am, then surely that really limits
> java's power on the server?
>

True, Java does not have a select() call, and this is a problem for
some people. Indeed a number of companies simply make JNI calls
down to C to do the select() for them.

-Bil
--
================
B...@LambdaCS.com

http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301

Phone/FAX: (650) 328-8952

David Schwartz

unread,

Oct 4, 1999, 3:00:00 AM10/4/99

to

Mark Bell wrote:

> [simultaneous] connections. How would you deal with this if the IPC
> wasn't sockets?

System V IPC is obsolete. Try not to use it if possible. See Stevens
for reasons why this is so.

If you are _stuck_ with System V IPC, you may have no choice but to
create lots of threads.

DS

David Schwartz

unread,

Oct 4, 1999, 3:00:00 AM10/4/99

to

Bil Lewis wrote:
>
> Baz,
>
> > but as far
> > as i can see, it's impossible to implement anything other than a 1-thread-per-connection
> > socketserver in Java - at least without relying on exceptions to signal timeouts, which
> > will bring its own costs. am i right about this? if i am, then surely that really limits
> > java's power on the server?
> >
>
> True, Java does not have a select() call, and this is a problem for
> some people. Indeed a number of companies simply make JNI calls
> down to C to do the select() for them.
>
> -Bil

Java, as yet, has no good way to deal with high-performance I/O. So you
really can't use Java for high-performance I/O applications.

In my opinion, as of right now, you can't use Java for high-performance
anything applications. So we'd be talking about how to make the best of
what you've got, as opposed to how to really do things right.

DS

Raghu Angadi

unread,

Oct 4, 1999, 3:00:00 AM10/4/99

to

> 1) Resources: NT or Unix? Quite a different set of resource requirements
> for threading. Win32 threads are quite cheap, although it's true you still
> have a stack for each. So with 1000 threads, I have a 32K stack per
> thread(I'm being generous). That's 32 meg. For a server app that's
> handling a thousand simultanoeus connections, and where performance is the
> primary concern, big deal.
>
> 2) Again, NT or Unix. That's a pretty wide brush to paint all OSes with. If
> the program is written for NT, thread switching is cheap. For each Unix,
> the requirements will be different.

Wow! Threads take less resources and have faster context switches in NT
(compared the Unixes?). Then where does NT lose on performance (which it
seems to)?

David Schwartz

unread,

Oct 5, 1999, 3:00:00 AM10/5/99

to

Raghu Angadi wrote:
>
> > 1) Resources: NT or Unix? Quite a different set of resource requirements
> > for threading. Win32 threads are quite cheap, although it's true you still
> > have a stack for each. So with 1000 threads, I have a 32K stack per
> > thread(I'm being generous). That's 32 meg. For a server app that's
> > handling a thousand simultanoeus connections, and where performance is the
> > primary concern, big deal.
> >
> > 2) Again, NT or Unix. That's a pretty wide brush to paint all OSes with. If
> > the program is written for NT, thread switching is cheap. For each Unix,
> > the requirements will be different.
>

> Wow! Threads take less resources and have faster context switches in NT
> (compared the Unixes?). Then where does NT lose on performance (which it
> seems to)?

Um, hello, on every serious performance comparison I've ever seen, NT
blows most UNIXes away on comparable hardware. I've never seen a UNIX
box handle 12,000 concurrent TCP connections the way NT does.

DS

Igor Khasilev

unread,

Oct 5, 1999, 3:00:00 AM10/5/99

to

David Schwartz <dav...@webmaster.com> wrote:

> I know of no implementation that requires one thread per connection.
> Simply because it is so trivial to convert one architecture to the
> other, and the benefits are so clear.

These architectures are different in behaviour, so we must use architecture
which is better for us. Here is situation where one thread per client will
be better, IMHO: wide range of connection request rate + some finite amount
of service time for each connection + low delay in service. If you have
finite number of workers, you always have risk to get large response time.

> If performance is the primary concern, why would you want so many extra
> context switches? And why would you want to perform so many more
> blocking system calls -- blocking calls are so much more expensive than
> non-blocking calls.

Well, if you use non-blockin read/write, then you have to use poll(select)
which can also block.

> > 2) Again, NT or Unix. That's a pretty wide brush to paint all OSes with. If
> > the program is written for NT, thread switching is cheap. For each Unix,
> > the requirements will be different.

> Thread switching is always more expensive than not thread switching!

Synchronization on job queue also cost something...

> > 3) I don't even understand your point here. If the main thread is sitting
> > on an accept, when a connection is made, he creates a client object which
> > handles the connection and creates it's own threads, then returns to the
> > accept. The other clients (connections) that are already established just
> > keep on chugging.

> And what if creating a thread fails?

Yes, this is really a problem which thread pool solve easily.

> Probably because you are imagining the work you have to do for this as
> if it there was no way to avoid it. Have you ever written an application
> that used a thread pool to manage network I/O?

I was forced to rewrite my server application (http proxy) to support both
paradigms. It was *really* easy. But the only reason for rewrite was bug in
linuxthreads (crash of the whole programm if thread library code was failed
to create new thread). Now programm authomatically use workers under linux,
and use thread per connection on other OS-es (actually can be configured by
command-line switches). There was no any visible performance gain (yes, I
didn't do any special measurements, just compare overall performance).

--
Igor Khasilev |
PACO Links, ig...@paco.net |

RickB

unread,

Oct 5, 1999, 3:00:00 AM10/5/99

to

I concur. We tried to use Java for a "Web loading" type of app. We even
got it so it would perform reasonably well at a moderate rate. It would run
from anywhere to a few hours to a few days and hang, sometimes after
millions of transactions. The VM would just lock up.

We ended up rewriting it in C++.

Rick

David Schwartz <dav...@webmaster.com> wrote in message

news:37F90DFA...@webmaster.com...

David Schwartz

unread,

Oct 5, 1999, 3:00:00 AM10/5/99

to

Igor Khasilev wrote:

>
> David Schwartz <dav...@webmaster.com> wrote:
>
> > I know of no implementation that requires one thread per connection.
> > Simply because it is so trivial to convert one architecture to the
> > other, and the benefits are so clear.
>

> These architectures are different in behaviour, so we must use architecture
> which is better for us. Here is situation where one thread per client will
> be better, IMHO: wide range of connection request rate + some finite amount
> of service time for each connection + low delay in service. If you have
> finite number of workers, you always have risk to get large response time.

*sigh* No.

The rate at which work can be done is determined by the CPU speed, hard
disk speed, network bandwidth, and so on. More threads do not get more
work done.

> > If performance is the primary concern, why would you want so many extra
> > context switches? And why would you want to perform so many more
> > blocking system calls -- blocking calls are so much more expensive than
> > non-blocking calls.
>

> Well, if you use non-blockin read/write, then you have to use poll(select)
> which can also block.

Right, and that means one blocking system call for thousands of sockets
instead of thousands of blocking system calls.

> > > 2) Again, NT or Unix. That's a pretty wide brush to paint all OSes with. If
> > > the program is written for NT, thread switching is cheap. For each Unix,
> > > the requirements will be different.
>
> > Thread switching is always more expensive than not thread switching!
>

> Synchronization on job queue also cost something...

Synchronization costs is roughly proportional to the number of threads
being synchronized. I promise you, 10,000 threads will block more than
10. A well designed job queue will let one thread run unmolested until
it blocks in I/O.

> > Probably because you are imagining the work you have to do for this as
> > if it there was no way to avoid it. Have you ever written an application
> > that used a thread pool to manage network I/O?
>

> I was forced to rewrite my server application (http proxy) to support both
> paradigms. It was *really* easy. But the only reason for rewrite was bug in
> linuxthreads (crash of the whole programm if thread library code was failed
> to create new thread). Now programm authomatically use workers under linux,
> and use thread per connection on other OS-es (actually can be configured by
> command-line switches). There was no any visible performance gain (yes, I
> didn't do any special measurements, just compare overall performance).

How many connections did you test with? What advatange do you imagine
one thread per connection got you?

DS

David Schwartz

unread,

Oct 7, 1999, 3:00:00 AM10/7/99

to

Jonathan Perret wrote:
> Yes, in theory they are orthogonal.
> But thread pool architectures force you to split request processing in
> bits small enough that a few simultaneous big requests don't stall the
> whole server.
>
> While a pool architecture will always win in throughput, a
> thread per connection architecture is more convenient to code for
> when you expect to handle long, CPU-intensive requests.
>
> Take the fictitious example of a "prime number server", that returns to
> a client the smallest prime bigger than the given integer.
> Obviously if you're using a thread pool you can't have a thread do
> a computation that may take minutes. So you're going to want to split
> the computation to ensure reasonable availability.
> How would you do that ?

Nonsense. A "prime number server" will be faster for everyone if it has
as many active threads as there are processors. Purely CPU-bound cases
are the worst for thread-per-connection.

> Even harder, imagine that a computation involves recursive functions.
> This is a case where splitting work is hard, and even if you manage to
> do it the overhead may very well be more than context switching.

Nonsense again. For the same reason.

> In these cases a thread-per-connection model is better.
> Yes, at the end of the day such a server will have computed less prime
> numbers than a pooling server. But it will have been always available.

Define available. What good is a server that can "start" the work
faster if it takes it longer to finishb it?

> So, converting a thread-per-connection server into a thread-pooling one
> is not always as easy as it seems...

That's bull. And you are doing coders who read this list a disservice
by stating it.

DS

Mark Hamstra

unread,

Oct 7, 1999, 3:00:00 AM10/7/99

to

"Jonathan Perret" <jpe...@cybercable.fr> writes:

> I admit that my example is slightly contrived. I am merely trying to
> point out that there are exceptions to the rule that "pooling is better".

The contrivance in your example is the assertion that a thread-per-connection
server is uniquely qualified to satisfy the example. You have essentially
created an example where there are at least two classes of requests: those
that will take a long time to compute and those that are quick and easy.
While the claim that a thread-per-connection model will allow quick and easy
requests to still be served even when many long and hard responses are being
computed may be true, a pooled server with separate pools to handle long and
hard vs. quick and easy tasks will still outperform it while remaining
responsive to new quick and easy requests.

In other words, the existence of multiple classes of service request is
orthogonal to the issue of thread-per-connection vs. thread pool. As long as
the task set can be adequately divided into service request classes, each with
its own pool, there is no unique advantage to the thread-per-connection model,
and most of the disadvantages of such a thread-per-connection model remain
valid concerns. Certainly if there is no way to classify service requests in
some a priori fashion (and thus no reasonable way to map requests to an
appropriate pool), then the thread-per-connection model may be best -- but
that would indeed be an exceptional task profile.

--
Mark Hamstra
Bentley Systems, Inc.

David Schwartz

unread,

Oct 7, 1999, 3:00:00 AM10/7/99

to

Jonathan Perret wrote:

> > Nonsense. A "prime number server" will be faster for everyone if it has
> >as many active threads as there are processors. Purely CPU-bound cases
> >are the worst for thread-per-connection.
>

> I wouldn't be so categorical. You are thinking in terms of throughput
> only, ignoring latency. I should have specified that this server
> would be used for immediate processing (as in, the user wants
> his number as soon as possible), not for batch processing.
> See below.

However you measure it, a prime number server will give the answers the
fastest, under given load, if it has as many threads as there are
processors.

> >> In these cases a thread-per-connection model is better.

> >> Yes, at the end of the day such a server will have computed less prime
> >> numbers than a pooling server. But it will have been always available.
> >
> > Define available. What good is a server that can "start" the work
> >faster if it takes it longer to finishb it?
>

> Available roughly means that I will not have to wait for previous work
> to finish before my request is started. And yes, it makes sense to
> start work earlier at the expense of executing it slightly slower.

This is getting ludicrous. Look, a 2 CPU machine with 100MIPS
processors is capable of 200MIPS, period. No matter how many threads you
run, that's the most you are going to get. If each request takes 50
million instructions to complete, you will max out at 4 requests per
second, period. The amount of work that has to be done is only weakly
related to the number of threads, since it is false that more threads
can do more work.

It won't be "slightly slower", it'll be WAY slower. If you have two
active threads per processor, instead of one, each thread will work half
as fast. So, if the context switching were free, just as much work would
get done. Of course, the context switching is not free, so the extra
threads just slow everything down.

> It seems quite reasonable to have a CPU-bound server process requests
> in a time proportional to the number of outstanding requests, but
> starting them as soon as they are received.
> This means one can have heavy requests chugging along while small
> requests are served almost as fast as with a non-loaded pooling server,
> if you take network latency into account.

This is also unrelated to the number of threads. This is a generic
design issue that is independant of the threading architecture.

> An n-CPU pooling server will stop responding as soon as n heavy requests
> are being processed. To paraphrase, what good is a server that can
> run the work faster if it takes longer to start it ?

The advantage is that you have fewer context switches, so less of your
finite CPU resources are wasted on overhead.

> I admit that my example is slightly contrived. I am merely trying to
> point out that there are exceptions to the rule that "pooling is better".

Your example is actually an excellent example of where pooling works
_best_. Purely CPU-bound loads are the best candidates for pooling,
since more threads never lets you get any work done any faster. At least
with I/O bound threads, you can argue that there's an advantage to
initiating an I/O as soon as possible.

> >> So, converting a thread-per-connection server into a thread-pooling one
> >> is not always as easy as it seems...
> >
> > That's bull. And you are doing coders who read this list a disservice
> >by stating it.
>

> I would like this discussion to remain polite, if you don't mind.

Then you will have to make sense.

> If you are worried about a hypothetical disservice, I will state
> for the record that whatever I happen to post here are my own opinions,
> and that I strongly advise against anyone trusting them without
> further personal thought and experimentation (though if anyone
> insists, I have something to sell you). Apologies for the bad
> English, too.

*laugh* No problem, I'll get more jobs fixing the programs written
using deficient architectures.

DS

David Schwartz

unread,

Oct 7, 1999, 3:00:00 AM10/7/99

to

Mark Hamstra wrote:

> In other words, the existence of multiple classes of service request is
> orthogonal to the issue of thread-per-connection vs. thread pool. As long as
> the task set can be adequately divided into service request classes, each with
> its own pool, there is no unique advantage to the thread-per-connection model,
> and most of the disadvantages of such a thread-per-connection model remain
> valid concerns. Certainly if there is no way to classify service requests in
> some a priori fashion (and thus no reasonable way to map requests to an
> appropriate pool), then the thread-per-connection model may be best -- but
> that would indeed be an exceptional task profile.

In my opinion and experience, even those exceptional cases are _still_
best served by thread pools. However, the number of threads in the pool
should be allowed to grow to, potentially, equal or even exceed the
number of clients.

The reasons for this are several:

1) A thread pool model won't use a thread per connection when it isn't
needed. So if you're getting a whole bunch of simple requests, you won't
be wasting threads.

2) A thread pool model will still require fewer context switches,
because a thread-per-connection model will always need a context switch
when a job completes, and a thread pool model will not.

3) A thread pool model will make it easier to deal with failure to
create a thread.

4) A thread pool model makes it simpler to reuse threads.

DS

Jonathan Perret

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

>> Why is it an ugly design. In OOPs terms it could be quite elegant, if
>> programmed correctly. I wonder how many proxy servers or lad balancing
>> solutions DON'T use a thread for each connection.
>
>I think the basic point is that OO and threading are orthogonal concepts.
>Don't couple them. You may use whatever threading strategy you like (thread
>per request, thread per connection, thread pool) but keep it separated from
>the rest of your code. Then you simply have a central entry point from
which
>you can let a thread take up a task, being the execution of a single
object's
>method or whatever.
>

Yes, in theory they are orthogonal.
But thread pool architectures force you to split request processing in
bits small enough that a few simultaneous big requests don't stall the
whole server.

While a pool architecture will always win in throughput, a
thread per connection architecture is more convenient to code for
when you expect to handle long, CPU-intensive requests.

Take the fictitious example of a "prime number server", that returns to
a client the smallest prime bigger than the given integer.
Obviously if you're using a thread pool you can't have a thread do
a computation that may take minutes. So you're going to want to split
the computation to ensure reasonable availability.
How would you do that ?

Even harder, imagine that a computation involves recursive functions.

This is a case where splitting work is hard, and even if you manage to
do it the overhead may very well be more than context switching.

In these cases a thread-per-connection model is better.

Yes, at the end of the day such a server will have computed less prime
numbers than a pooling server. But it will have been always available.

So, converting a thread-per-connection server into a thread-pooling one

is not always as easy as it seems...

Cheers,
--Jonathan

Jonathan Perret

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

>> Take the fictitious example of a "prime number server", that returns to
>> a client the smallest prime bigger than the given integer.
>> Obviously if you're using a thread pool you can't have a thread do
>> a computation that may take minutes. So you're going to want to split
>> the computation to ensure reasonable availability.
>> How would you do that ?
>

> Nonsense. A "prime number server" will be faster for everyone if it has
>as many active threads as there are processors. Purely CPU-bound cases
>are the worst for thread-per-connection.

I wouldn't be so categorical. You are thinking in terms of throughput
only, ignoring latency. I should have specified that this server
would be used for immediate processing (as in, the user wants
his number as soon as possible), not for batch processing.
See below.

>> In these cases a thread-per-connection model is better.
>> Yes, at the end of the day such a server will have computed less prime
>> numbers than a pooling server. But it will have been always available.
>

> Define available. What good is a server that can "start" the work
>faster if it takes it longer to finishb it?

Available roughly means that I will not have to wait for previous work
to finish before my request is started. And yes, it makes sense to
start work earlier at the expense of executing it slightly slower.

It seems quite reasonable to have a CPU-bound server process requests

in a time proportional to the number of outstanding requests, but
starting them as soon as they are received.
This means one can have heavy requests chugging along while small
requests are served almost as fast as with a non-loaded pooling server,
if you take network latency into account.

An n-CPU pooling server will stop responding as soon as n heavy requests

are being processed. To paraphrase, what good is a server that can
run the work faster if it takes longer to start it ?

I admit that my example is slightly contrived. I am merely trying to

point out that there are exceptions to the rule that "pooling is better".

>> So, converting a thread-per-connection server into a thread-pooling one

>> is not always as easy as it seems...
>

> That's bull. And you are doing coders who read this list a disservice
>by stating it.

I would like this discussion to remain polite, if you don't mind.

If you are worried about a hypothetical disservice, I will state

for the record that whatever I happen to post here are my own opinions,
and that I strongly advise against anyone trusting them without
further personal thought and experimentation (though if anyone
insists, I have something to sell you). Apologies for the bad
English, too.

Cheers,
--Jonathan

Patrick TJ McPhee

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

In article <7tjd68$529$1...@oceanite.cybercable.fr>,
Jonathan Perret <jpe...@cybercable.fr> wrote:

% Take the fictitious example of a "prime number server", that returns to
% a client the smallest prime bigger than the given integer.
% Obviously if you're using a thread pool you can't have a thread do
% a computation that may take minutes. So you're going to want to split

Of course you can. What you can't have is _all_ the threads in the pool
doing a computation that may take minutes. If you get that situation,
you have to either deal with it (increase the number of threads in the
pool) or live with the consequences (poorer response time when many
connections are throwing out big numbers). In a case like this, where
you can predict whether it will take a while to serve a request, you
can keep a pool of threads reserved specifically for handling short
requests.

Stefan Seefeld

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

Finally !!

I think this is the whole matter. While David insists that it's worthless to
increase the number of threads since the overal work isn't finished sooner
(quite on the contrary !), the deal of introducing more threads is really
to increase the responsiveness of smaller tasks.

I think the prime number example isn't that good to show the point.
But if you think of a GUI which sometimes will trigger rather lengthy
commands, you may want to let those commands be executed in a separate thread.
Then it makes sense to keep a worker pool waiting for all kinds of commands
while one thread is *always* ready to keep the GUI in sync with user input.

Igor Khasilev

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

Stefan Seefeld <seef...@magellan.umontreal.ca> wrote:

> Finally !!

> I think this is the whole matter. While David insists that it's worthless to
> increase the number of threads since the overal work isn't finished sooner
> (quite on the contrary !), the deal of introducing more threads is really
> to increase the responsiveness of smaller tasks.

> I think the prime number example isn't that good to show the point.
> But if you think of a GUI which sometimes will trigger rather lengthy
> commands, you may want to let those commands be executed in a separate thread.

This what I tried to proof: number of threads can depend on the length of
execution, acceptable delay and request rate. But, alas this doesn't fit
into David rules.

David Schwartz

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

Stefan Seefeld wrote:

> I think this is the whole matter. While David insists that it's worthless to
> increase the number of threads since the overal work isn't finished sooner
> (quite on the contrary !), the deal of introducing more threads is really
> to increase the responsiveness of smaller tasks.

But that's not the right way to increase the reponsiveness of smaller
tasks. That's a way to decrease the efficiency of the entire server. If
you want to increase the responsiveness of smaller tasks, there are tons
of good ways to do. The best way is to reschedule a task if the job
queue isn't empty and a particular task is taking too long. An
application can do this without a context switch, the operating system
cannot.

> I think the prime number example isn't that good to show the point.
> But if you think of a GUI which sometimes will trigger rather lengthy
> commands, you may want to let those commands be executed in a separate thread.

> Then it makes sense to keep a worker pool waiting for all kinds of commands
> while one thread is *always* ready to keep the GUI in sync with user input.

Of course. I have said many times that dedicating one thread to a
'special' task or task class is entirely acceptable. This is far from a
'thread per connection' or 'thread per client' model.

DS

David Schwartz

unread,

Oct 8, 1999, 3:00:00 AM10/8/99

to

> This what I tried to proof: number of threads can depend on the length of
> execution, acceptable delay and request rate. But, alas this doesn't fit
> into David rules.

It doesn't because if you're purely CPU bound, it will always be better
to break the big tasks into smaller chunks than to create more threads.
For CPU bound tasks, you can never do better than to have one thread per
processor. So again, creating more threads is simply the wrong way to
solve the problem.

DS

Igor Khasilev

unread,

Oct 9, 1999, 3:00:00 AM10/9/99

to

David Schwartz <dav...@webmaster.com> wrote:

> > This what I tried to proof: number of threads can depend on the length of
> > execution, acceptable delay and request rate. But, alas this doesn't fit
> > into David rules.

> It doesn't because if you're purely CPU bound, it will always be better

I'm not purely CPU bound.

> to break the big tasks into smaller chunks than to create more threads.
> For CPU bound tasks, you can never do better than to have one thread per

Again: my tasks are not CPU-bound.

> processor. So again, creating more threads is simply the wrong way to
> solve the problem.

David Schwartz

unread,

Oct 9, 1999, 3:00:00 AM10/9/99

to

Igor Khasilev wrote:
>
> David Schwartz <dav...@webmaster.com> wrote:
>
> > > This what I tried to proof: number of threads can depend on the length of
> > > execution, acceptable delay and request rate. But, alas this doesn't fit
> > > into David rules.
>
> > It doesn't because if you're purely CPU bound, it will always be better
>
> I'm not purely CPU bound.

Right. In pretty much every unrealistic example, a pool of threads will
win. The hard part is in the real world when it becomes non-trivial to
adjust the pool size appropriately. In general, for example, you will
not even know the number of processors present. And even if you did,
there's no guarantee that they are allocated exclusively for your use.

> > to break the big tasks into smaller chunks than to create more threads.
> > For CPU bound tasks, you can never do better than to have one thread per
>
> Again: my tasks are not CPU-bound.

Right. In general, your tasks will be some mix. You'll have number
crunching you'll need to do, you'll need to wait for network data to
arrive, and you'll probably need to do some mucking with the local
filesystem. In most cases, you have to design a server to deal with an
unpredictable mix of load, and you aren't shooting for perfection.

DS

Jonathan Perret

unread,

Oct 9, 1999, 3:00:00 AM10/9/99

to

> But that's not the right way to increase the reponsiveness of smaller
>tasks. That's a way to decrease the efficiency of the entire server. If
>you want to increase the responsiveness of smaller tasks, there are tons
>of good ways to do. The best way is to reschedule a task if the job
>queue isn't empty and a particular task is taking too long. An
>application can do this without a context switch, the operating system
>cannot.

Could you please elaborate on this ? By 'reschedule' do you mean interrupt
a task to let another one run ? How then is an application supposed to do
that without the 'moral equivalent' of a context switch ?

Cheers,
--Jonathan

David Schwartz

unread,

Oct 9, 1999, 3:00:00 AM10/9/99

to

You break a task into subtasks. When you complete a subtask, you check
the job queue. If it's "not too full" you continue processing the
current task. If it's "too full" you requeue the remainder of the job at
the end of the job queue.

While this is usually theoretically more efficient than letting the
threads library or the operating system do the same thing (because you
have more information than it does, a job queue/dequeue can be cheaper
than a context switch, and your job prioritization is not up to the
vagaries of the implementation), it's seldom worth the effort.
Generally, you just let the thread pool grow and deal with the extra
context switches.

A thread pool model can degenerate into a thread per connection model
simply by allowing the number of threads to grow. You'll still gain all
the benefits of a thread pool model in the cases where those benefits
are possible.

DS

Igor Khasilev

unread,

Oct 10, 1999, 3:00:00 AM10/10/99

to

David Schwartz <dav...@webmaster.com> wrote:

> A thread pool model can degenerate into a thread per connection model
> simply by allowing the number of threads to grow. You'll still gain all

But it seems to me that you refuse posibility of growing pool size, you have
fife rules, which limit number of threads to constant and small number. So,
now you agree that thead pool sometimes can grow? If you will answer 'yes',
then you agree that there is some reason for this grow. And if you again
will agree, then I will say that this reason can depend on acceptable
latency (though request rate and service length)

Jonathan Perret

unread,

Oct 10, 1999, 3:00:00 AM10/10/99

to

> You break a task into subtasks. When you complete a subtask, you check
>the job queue. If it's "not too full" you continue processing the
>current task. If it's "too full" you requeue the remainder of the job at
>the end of the job queue.

> While this is usually theoretically more efficient than letting the
>threads library or the operating system do the same thing (because you
>have more information than it does, a job queue/dequeue can be cheaper
>than a context switch, and your job prioritization is not up to the
>vagaries of the implementation), it's seldom worth the effort.
>Generally, you just let the thread pool grow and deal with the extra
>context switches.

Will you agree that breaking a task into subtasks is rarely an easy thing
to do, and often hard to get right ?
What you suggest amounts to polling.
If the operating system you are using is sad enough to have context
switching be less efficient than polling, well, all right.
The same goes for "the vagaries of the implementation". Proper
implementations enforce thread priorities.

If you use polling, the design of the program does become more complicated,
and this is something that definitely needs to be factored in. You're
basically ignoring the benefits of multithreading for design.

> A thread pool model can degenerate into a thread per connection model
>simply by allowing the number of threads to grow. You'll still gain all

>the benefits of a thread pool model in the cases where those benefits
>are possible.

This sounds reasonable enough. I tend to agree, but there are still
a few issues like e.g. the added complexity of a pooling model, and
the (admittedly slight) cost of keeping a pool of threads when the
server is idle.

Cheers,
--Jonathan

David Schwartz

unread,

Oct 10, 1999, 3:00:00 AM10/10/99

to

Igor Khasilev wrote:
>
> David Schwartz <dav...@webmaster.com> wrote:
>

> > A thread pool model can degenerate into a thread per connection model
> > simply by allowing the number of threads to grow. You'll still gain all
>

> But it seems to me that you refuse posibility of growing pool size, you have
> fife rules, which limit number of threads to constant and small number. So,
> now you agree that thead pool sometimes can grow? If you will answer 'yes',
> then you agree that there is some reason for this grow. And if you again
> will agree, then I will say that this reason can depend on acceptable
> latency (though request rate and service length)

You are putting the cart before the horse.

There might be many reasons to grow the thread pool. For one thing, you
generally don't know how many processors are available on the machine or
what percentage of them is available for your use, so you may need to
adjust the thread pool to deal with this. You also won't know how long
local disk I/O will take relative to CPU speed, so the number of threads
you need to pend blocking local filesystem reads may change.

There may not be enough memory available to use as many threads as you
might otherwise like to. And you may encounter failing allocations that
cause you to temporarily reduce the thread pool -- then later, you may
try to raise it back to where you wanted it in the first place.

But this has nothing to do with acceptable latency, request rate,
service length, or anything like that. The number of threads required
has more to do with physical hardware issues, resource availability
issues, and the operating system's I/O layer.

DS

David Schwartz

unread,

Oct 10, 1999, 3:00:00 AM10/10/99

to

Jonathan Perret wrote:

> Will you agree that breaking a task into subtasks is rarely an easy thing
> to do, and often hard to get right ?

No, it's generally nearly trivial. It's frequently automatic. Everytime
you call a blocking function, you are implicitly breaking a task into
subtasks. (How do you think user-space threading libraries work?)

> What you suggest amounts to polling.

No, I'm talking about CPU bound tasks using round-robin to slice the
available CPU resources. It's the same thing the operating system would
have to do.

> If the operating system you are using is sad enough to have context
> switching be less efficient than polling, well, all right.

The OS has to do the same thing -- put down one task and pick up
another. The difference is, if you use dedicated threads, you will
_always_ need a context switch. And if you don't, you won't.

> The same goes for "the vagaries of the implementation". Proper
> implementations enforce thread priorities.

How do you use thread priorities when you have one thread per client?
You would want the priorities equal. And, in that case, you are not
guaranteed of much of anything. You can easily find one client starved
at the exepense of another.

And if you do use thread priorities, you have to deal with all the
disasters that this leads to, such as priority inversion. Not to
mention, thread priorities are simply inherently expensive.

The easiest way to avoid this whole mess is to employ threads
symettrically. If you have something important to do, give priotity to
the _job_, and treat all your threads the same, since they are.

> If you use polling, the design of the program does become more complicated,
> and this is something that definitely needs to be factored in. You're
> basically ignoring the benefits of multithreading for design.

This is a conclusion based upon a false assumption combined with a
misuse of the term "polling".

> > A thread pool model can degenerate into a thread per connection model
> >simply by allowing the number of threads to grow. You'll still gain all

> >the benefits of a thread pool model in the cases where those benefits
> >are possible.
>
> This sounds reasonable enough. I tend to agree, but there are still
> a few issues like e.g. the added complexity of a pooling model, and
> the (admittedly slight) cost of keeping a pool of threads when the
> server is idle.

First of all, I've converted programs that use dedicated threads into
thread pools in less than a day. These are programs totally tens of
thousands of lines of code. So this "extra complexity" is absurd. I hear
this alleged all the time, but I don't believe it.

Second, you don't need to keep the pool of threads when the server is
idle. The beauty of a thread pool model is that you have total control
over the number of threads. You can make the number whatever you find
best. You can keep spares around in case you need them, or you can prune
aggressively. You can even adapt to the different hardware you might
find yourself on, or even adapt in the course of a run to changing
resource availability.

It's in a thread per connection model that you have no control over the
number of threads. You must have one for each connection, and if the
resources aren't there to do that, well, you're secrewed.

DS

Bil Lewis

unread,

Oct 11, 1999, 3:00:00 AM10/11/99

to

> But it seems to me that you refuse posibility of growing pool size, you have
> fife rules, which limit number of threads to constant and small number. So,
> now you agree that thead pool sometimes can grow? If you will answer 'yes',
> then you agree that there is some reason for this grow. And if you again
> will agree, then I will say that this reason can depend on acceptable
> latency (though request rate and service length)

For any specific workload, you can find the optimal number of threads to
run on your machine. Most likely this number is (C * N_CPUS). I have not yet
seen any examples where changing the number of threads as the load changes
helps. (I would be very interested in any solid numbers anyone has related to
this.)

So, evaluate your machine on server startup, create the optimal numbers of threads,
and forget about 'em. Too many people waste too much time trying to get fancy
with this and lose track of what they're really doing.

IMHO

-Bil
--
================
B...@LambdaCS.com

http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301

Phone/FAX: (650) 328-8952

Stefan Seefeld

unread,

Oct 12, 1999, 3:00:00 AM10/12/99

to

David Schwartz wrote:
>
> Stefan Seefeld wrote:
>
> > I think this is the whole matter. While David insists that it's worthless to
> > increase the number of threads since the overal work isn't finished sooner
> > (quite on the contrary !), the deal of introducing more threads is really

> > to increase the responsiveness of smaller tasks.

>
> But that's not the right way to increase the reponsiveness of smaller
> tasks. That's a way to decrease the efficiency of the entire server. If
> you want to increase the responsiveness of smaller tasks, there are tons
> of good ways to do. The best way is to reschedule a task if the job
> queue isn't empty and a particular task is taking too long. An
> application can do this without a context switch, the operating system
> cannot.

Right. But don't forget that ease of development of source code is a goal too !
For example in a GUI system project I'm involved in, we use the abstract notion
of 'Command' a user is supposed to derive from. Then he implements the 'execute'
method which get's called from whatever GUI Controller element it is connected to.
It is not acceptable for GUI developers to care about internals like concurrency
strategies, splitting up a Command into subcommands etc. What we do instead, is
provide a special async command which runs a wrapped command within a dedicated thread
(better: which pushes the wrapped command onto a thread pool's task queue)
While I fully agree that the most efficient way (if you define efficiency as purely
computer bound) is to do what you propose, ease of use of an API also contributes
to efficiency, if properly defined (else I'd prefer to rewrite my OS everytime to fit
the tasks I wan't to be done best).

> > I think the prime number example isn't that good to show the point.
> > But if you think of a GUI which sometimes will trigger rather lengthy
> > commands, you may want to let those commands be executed in a separate thread.
> > Then it makes sense to keep a worker pool waiting for all kinds of commands
> > while one thread is *always* ready to keep the GUI in sync with user input.
>
> Of course. I have said many times that dedicating one thread to a
> 'special' task or task class is entirely acceptable. This is far from a
> 'thread per connection' or 'thread per client' model.

Indeed. I just tried to point out that you were discussing about different things.

Stefan Seefeld

unread,

Oct 14, 1999, 3:00:00 AM10/14/99

to

David Schwartz wrote:
>
> Jonathan Perret wrote:
>
> > Will you agree that breaking a task into subtasks is rarely an easy thing
> > to do, and often hard to get right ?
>
> No, it's generally nearly trivial. It's frequently automatic. Everytime
> you call a blocking function, you are implicitly breaking a task into
> subtasks. (How do you think user-space threading libraries work?)

Fine. But it's definitely quite a bit of work to encapsulate a state with your
overal task so that once you are called back from your polling thread you still
remember where to continue. There are quite a number of interesting papers by
Douglas Schmidt about different concurrency strategies and how to use them in
conjunction with I/O like 'asynchronous completion token ', 'half sync/half async'
etc.
Anyway, it's possible, I just wouldn't call it 'trivial'.

> > If you use polling, the design of the program does become more complicated,
> > and this is something that definitely needs to be factored in. You're
> > basically ignoring the benefits of multithreading for design.
>
> This is a conclusion based upon a false assumption combined with a
> misuse of the term "polling".

Yes, I guess we are not talking about the same. Could you please elaborate ?

Regards, Stefan

Jonathan Perret

unread,

Oct 14, 1999, 3:00:00 AM10/14/99

to

Stefan Seefeld wrote :

>David Schwartz wrote:
>>
>> Jonathan Perret wrote:
>>
>> > Will you agree that breaking a task into subtasks is rarely an easy
thing
>> > to do, and often hard to get right ?
>>
>> No, it's generally nearly trivial. It's frequently automatic.
Everytime
>> you call a blocking function, you are implicitly breaking a task into
>> subtasks. (How do you think user-space threading libraries work?)
>
>Fine. But it's definitely quite a bit of work to encapsulate a state with
your
>overal task so that once you are called back from your polling thread you
still
>remember where to continue. There are quite a number of interesting papers
by
>Douglas Schmidt about different concurrency strategies and how to use them
in
>conjunction with I/O like 'asynchronous completion token ', 'half sync/half
async'
>etc.
>Anyway, it's possible, I just wouldn't call it 'trivial'.

Thank you. Just what I wanted to say, only better expressed ;)

>> > If you use polling, the design of the program does become more
complicated,
>> > and this is something that definitely needs to be factored in. You're
>> > basically ignoring the benefits of multithreading for design.
>>
>> This is a conclusion based upon a false assumption combined with
a
>> misuse of the term "polling".
>
>Yes, I guess we are not talking about the same. Could you please elaborate
?

I guess it's David you're asking to elaborate, but I will also since I am
the one who supposedly made a false assumption and misused the term
"polling".

David, I use the term polling in this case because unless I missed something
fundamental, 'breaking a task into subtasks' means inserting 'rescheduling
points' at one or more places in its code. At these points the thread
checks whether it has something better to do than the current task,
and if so it suspends the current task and goes on with other work.
In effect, the thread will be repeatedly checking the job queue, hence
the word "polling", in the sense of "repeatedly querying a state".

I don't think inserting rescheduling points in code can be described as
trivial. Most of the time you will either 1)do it too frequently, and
hurt performance or 2)not do it frequently enough and hurt latency.
There might be a "sweet spot" where the strategy yields better
performance than kernel scheduling, but it is difficult to find - even
more so when you can't predict how much time is going to elapse between
two rescheduling points (because it depends on the data).

And this does not even take into account the design problems that were
exemplified in Stefan's earlier post.

And if you are using signals to do user-level scheduling, you are likely
to suffer as much overhead as with preemptive multitasking.
Your experience may be different, but last I checked, the performance of
user-space thread libraries was not, to say the least, brilliant.

In fact, you seem to have so little trust in the OS that you are willing
to reimplement in user-space what the kernel is supposed to do.
You obviously have much experience in the field so I guess you have
valid reasons to think that way.

In my own experience though, relying on the kernel for scheduling has
worked satisfactorily and allowed me to concentrate on the higher-level
design of the program instead of the nitty-gritty of multithreading.

I am fully aware that some performance is being "wasted" that way.
Then again, if I *really* wanted to extract the last ounce of
performance from a system I'd program it entirely in assembly language.
For me, it's just another one of those tradeoffs.

And like the others, I see it becoming increasingly frequent in the
near future.

Cheers,
--Jonathan

David Schwartz

unread,

Oct 14, 1999, 3:00:00 AM10/14/99

to

Jonathan Perret wrote:

> I guess it's David you're asking to elaborate, but I will also since I am
> the one who supposedly made a false assumption and misused the term
> "polling".
>
> David, I use the term polling in this case because unless I missed something
> fundamental, 'breaking a task into subtasks' means inserting 'rescheduling
> points' at one or more places in its code. At these points the thread
> checks whether it has something better to do than the current task,
> and if so it suspends the current task and goes on with other work.
> In effect, the thread will be repeatedly checking the job queue, hence
> the word "polling", in the sense of "repeatedly querying a state".

Yes, for the rare degenerate case of a mix of tasks where some of them
are CPU intensive and it's more important to start a new task than
finish a current one. This is a rare case, and it's usually the only
case where this is desirable.

Even then, it may not be necessary. As I've said, a thread pool can
always degenerate into a thread-per-client if you happen to hit the
situation where you really do need a lot of threads to avoid having to
do this and it is actually difficult to do.

The point is, this is so far off the beaten track that it's basically
irrelevant.

> I don't think inserting rescheduling points in code can be described as
> trivial. Most of the time you will either 1)do it too frequently, and
> hurt performance or 2)not do it frequently enough and hurt latency.
> There might be a "sweet spot" where the strategy yields better
> performance than kernel scheduling, but it is difficult to find - even
> more so when you can't predict how much time is going to elapse between
> two rescheduling points (because it depends on the data).

You've actually got it entirely in reverse. The application coder knows
where the sweet spots are. The kernel (or a user-space threads library)
has no clue. It could put the yield right after you locked a critical
mutex -- how does it know?

> And this does not even take into account the design problems that were
> exemplified in Stefan's earlier post.
>
> And if you are using signals to do user-level scheduling, you are likely
> to suffer as much overhead as with preemptive multitasking.

I agree. I've never found this approach necessary.

> Your experience may be different, but last I checked, the performance of
> user-space thread libraries was not, to say the least, brilliant.

I agree. That's why I don't like them. The application programmer is
the only one who knows where it makes sense to yield.

> In fact, you seem to have so little trust in the OS that you are willing
> to reimplement in user-space what the kernel is supposed to do.

Of course! My rule of thumb is that _ANYTHING_ that can be done by the
application should be. Any time you change levels, you incur penalties.
And the kernel may not know what you want.

> You obviously have much experience in the field so I guess you have
> valid reasons to think that way.

It just seems to work a lot better that way.

> In my own experience though, relying on the kernel for scheduling has
> worked satisfactorily and allowed me to concentrate on the higher-level
> design of the program instead of the nitty-gritty of multithreading.

I did the nitty-gritty once. Now I have a library that does the
nitty-gritty for me. Now I get the maximum possible performance without
even having to think about how many threads I'm using or whatnot.

> I am fully aware that some performance is being "wasted" that way.
> Then again, if I *really* wanted to extract the last ounce of
> performance from a system I'd program it entirely in assembly language.
> For me, it's just another one of those tradeoffs.

The one tradeoff you should spent the most time on is the choice of
implementation architectures. I have never found that you gain anything
by dedicating threads. (With the obvious exceptions like a single
network I/O thread, a clock manager thread, a timer firing thread, a UI
thread, etcetera.)

> And like the others, I see it becoming increasingly frequent in the
> near future.

Sooner or later, you'll have to start writing programs that can scale.
Otherwise, your employers will be hiring me to fix your code. (Unless
you aren't writing programs where performance and resource consumption
are super important. In which case, yes, tradeoff performance for ease
of implementation.)

I'll tell you a little secret though. You know what performance is most
important for? The cases where you don't need it. Because then you can
take that performance and trade it off for something you really do need.

Real world example, names omitted to protect the guilty:

A company hired me to add a feature to their software. It was a feature
that was inherently CPU intensive, and they felt that no matter how much
the feature was optimized, it would still bog their software down. My
job was to find some way to implement this feature without sacrificing
performance.

I realized that this feature, no matter how cleverly I implemented it,
would still cause CPU comsumption to exceed what they could spare. So I
profiled their existing code.

I found two silly hotspots. One was a malloc/copy/free cycle for a
static string that they could have just used the original pointer for.
The other was a list traversal algorithm that could have used a map to
avoid the traversal. I fixed those two things, and performance went up
about 35% overall.

That gave me more than enough spare CPU to implement the feature they
wanted. And now their software was faster and more responsive even if
that feature wasn't used.

Moral: Even if you don't need performance, don't trade it off too
quickly. You may be able to trade it off for something better down the
line.

DS

David Schwartz

unread,

Oct 14, 1999, 3:00:00 AM10/14/99

to

Jonathan Perret wrote:

> Two remarks :
> 1) Ever programmed in a real-time environment ? "This" becomes so much closer
> to the beaten track...

It really depends what you mean by real-time. For hard real time, none
of the techniques being discussed are really appropriate. I'm talking
about the 95% case where people are writing multithreaded servers.

> 2) With the concessions you made about degenerating a pool model into a thread-
> per-client model, would you mind if one considered that instead, a thread-per-
> client model can easily degenerate into a pool model, for example if no more
> threads can be created ? Isn't that (more or less) equivalent (assuming decent
> thread creation times, of course) ?

It's generally not possible to do that. If you have a thread-per-client
model, you generally have no way for a thread to 'switch' clients.
That's pretty much what defines that model.

> > You've actually got it entirely in reverse. The application coder knows
> >where the sweet spots are. The kernel (or a user-space threads library)
> >has no clue. It could put the yield right after you locked a critical
> >mutex -- how does it know?
>

> How often would you check the queue in, say, a Perl interpreter ?
> After every command ? Every 10 commands ?

That depends upon a lot of factors. Since checking the queue is nearly
free, you can do it after every command. (I'd basically just check a
flag that was set by the 'put job on queue' function)

I'm trying to imagine the scenario, though. You have a multithreaded
perl interpreter -- what does it do? Run multiple perl commands
simultaneously -- why?

If we're assuming you're CPU limited, you can do no better than one
thread per CPU. Starting requests earlier to take longer to get the
results is no help. And if you're not assuming we're CPU limited, then
when we finish using the CPU is when I'd check.

> How would you split a prime number (oops, again) generator ?

I wouldn't. I'd finish generating the number. It does me no good to try
to generate 30 primes in parallel on a 4 processor machine. _Why_ would
you split a prim number generator? Under any architecture?

> Apart from trivially split tasks I can't think of a case where this
> doesn't involve some head-scratching.

Fortunately, splitting tasks is generally pointless. If you're CPU
bound, splitting tasks just wastes CPU with 'pick up put down' that gets
nothing done any faster. If you're not CPU bound, you are already split
at whatever you are waiting for.

> This isn't the worst of my problems with that approach, though.
> My biggest gripe is that I clearly remember doing that and scratching my head.
> That was back in Win16 and DOS days, when the only way to have a background
> task was to implement it as a resumable, hopefully fast, routine.
> Back then taking longer than expected in such a task meant stalling the whole
> system.
> Nowadays, I just spawn a low-priority thread to check on the config files, a
> high-priority thread to feed the audio DACs, and I write half as much code.
> I am not going back.

Fine. Special threads for truly special tasks is fine.

> >> In fact, you seem to have so little trust in the OS that you are willing
> >> to reimplement in user-space what the kernel is supposed to do.
> >
> > Of course! My rule of thumb is that _ANYTHING_ that can be done by the
> >application should be. Any time you change levels, you incur penalties.
> >And the kernel may not know what you want.
>

> Well you have what one could call a vertical approach to software engineering.
> If it works for you, fine. I'd rather take the penalties and not worry about
> the implementation. The industry seems to be moving this way, too.
> Not that I'm particularly proud to be (what's an idiomatic expression for that
> - "bleating with the sheep" ?), mind you.

The problem is, too many implementations of too many things are too
badly broken.

> >> In my own experience though, relying on the kernel for scheduling has
> >> worked satisfactorily and allowed me to concentrate on the higher-level
> >> design of the program instead of the nitty-gritty of multithreading.
> >
> > I did the nitty-gritty once. Now I have a library that does the
> >nitty-gritty for me. Now I get the maximum possible performance without
> >even having to think about how many threads I'm using or whatnot.
>

> So you admit to some encapsulation. Do you think an OS could/should provide
> such services as your library offers ?

No. That would defeat the point. The whole point is that at the
application level I can do things write. If it were part of the kernel,
I'd lose that.

The one thing I'd like the kernel to do that it doesn't do is a sort of
'wait for anything' function, like NT has. That way I could put all my
threads in one big loop.

> >> I am fully aware that some performance is being "wasted" that way.
> >> Then again, if I *really* wanted to extract the last ounce of
> >> performance from a system I'd program it entirely in assembly language.
> >> For me, it's just another one of those tradeoffs.
> >
> > The one tradeoff you should spent the most time on is the choice of
> >implementation architectures. I have never found that you gain anything
> >by dedicating threads. (With the obvious exceptions like a single
> >network I/O thread, a clock manager thread, a timer firing thread, a UI
> >thread, etcetera.)
>

> Funny. As a matter of fact this is what I mostly use threads for.
> Glad to see you admit *these* exceptions, because they are the rule for me.

Then you don't have the generic 'server'. That's really what I'm
talking about -- implementing multithreaded servers. Special purpose
tasks that don't have a 'bunch of clients' may wind up being almost all
'special purpose' code.

> > Sooner or later, you'll have to start writing programs that can scale.
> >Otherwise, your employers will be hiring me to fix your code. (Unless
> >you aren't writing programs where performance and resource consumption
> >are super important. In which case, yes, tradeoff performance for ease
> >of implementation.)
>

> Indeed, I'm not in the business of writing programs that can scale.
> Unless you call opening 10 UI windows simultaneously "scaling",
> of course...
> I hope this does not make me unwelcome on this ng :)

*laugh* Of course not. Everybody has to face a different set of
problems. And I guess I do sometimes forget to qualify my statements or
state my assumptions. I'm generally assuming a sort of 'client-server
model' where we are talking about the design of a multithreaded server.

> > Real world example, names omitted to protect the guilty:

> [snip]
>
> Thank you for the entertaining anecdote ;)
> Point well taken. I guess it all comes down to the many application domains
> where multithreading is used. What makes sense for an enterprise-scale server
> does not necessarily hold true for a snappy GUI application.

Agreed. Threads are great for GUI applications, but they are used very
differently. Similarly, threads in a client application can be very
handy, but again for somewhat different reasons. If you don't have some
number of 'things to do' that can increase somehow, there may not be
much use for a thread pool.

> Hey, I know how (and when) to use a profiler though ;-)

That's the important thing. Figure out the right things to optimize and
optimize them. Pick good architectures, good models, don't do anything
stupid, and then don't _EVER_ forget to profile.

DS

Jonathan Perret

unread,

Oct 15, 1999, 3:00:00 AM10/15/99

to

dav...@webmaster.com (David Schwartz) wrote :

> Yes, for the rare degenerate case of a mix of tasks where some of them
>are CPU intensive and it's more important to start a new task than
>finish a current one. This is a rare case, and it's usually the only
>case where this is desirable.
>
> Even then, it may not be necessary. As I've said, a thread pool can
>always degenerate into a thread-per-client if you happen to hit the
>situation where you really do need a lot of threads to avoid having to
>do this and it is actually difficult to do.
>
> The point is, this is so far off the beaten track that it's basically
>irrelevant.

Two remarks :

1) Ever programmed in a real-time environment ? "This" becomes so much closer
to the beaten track...

2) With the concessions you made about degenerating a pool model into a thread-
per-client model, would you mind if one considered that instead, a thread-per-
client model can easily degenerate into a pool model, for example if no more
threads can be created ? Isn't that (more or less) equivalent (assuming decent
thread creation times, of course) ?

>> I don't think inserting rescheduling points in code can be described as

>> trivial. Most of the time you will either 1)do it too frequently, and
>> hurt performance or 2)not do it frequently enough and hurt latency.
>> There might be a "sweet spot" where the strategy yields better
>> performance than kernel scheduling, but it is difficult to find - even
>> more so when you can't predict how much time is going to elapse between
>> two rescheduling points (because it depends on the data).
>
> You've actually got it entirely in reverse. The application coder knows
>where the sweet spots are. The kernel (or a user-space threads library)
>has no clue. It could put the yield right after you locked a critical
>mutex -- how does it know?

How often would you check the queue in, say, a Perl interpreter ?

After every command ? Every 10 commands ?

How would you split a prime number (oops, again) generator ?

Apart from trivially split tasks I can't think of a case where this
doesn't involve some head-scratching.

This isn't the worst of my problems with that approach, though.

My biggest gripe is that I clearly remember doing that and scratching my head.
That was back in Win16 and DOS days, when the only way to have a background
task was to implement it as a resumable, hopefully fast, routine.
Back then taking longer than expected in such a task meant stalling the whole
system.
Nowadays, I just spawn a low-priority thread to check on the config files, a
high-priority thread to feed the audio DACs, and I write half as much code.
I am not going back.

>> In fact, you seem to have so little trust in the OS that you are willing

>> to reimplement in user-space what the kernel is supposed to do.
>
> Of course! My rule of thumb is that _ANYTHING_ that can be done by the
>application should be. Any time you change levels, you incur penalties.
>And the kernel may not know what you want.

Well you have what one could call a vertical approach to software engineering.

If it works for you, fine. I'd rather take the penalties and not worry about
the implementation. The industry seems to be moving this way, too.
Not that I'm particularly proud to be (what's an idiomatic expression for that
- "bleating with the sheep" ?), mind you.

>> In my own experience though, relying on the kernel for scheduling has

>> worked satisfactorily and allowed me to concentrate on the higher-level
>> design of the program instead of the nitty-gritty of multithreading.
>
> I did the nitty-gritty once. Now I have a library that does the
>nitty-gritty for me. Now I get the maximum possible performance without
>even having to think about how many threads I'm using or whatnot.

So you admit to some encapsulation. Do you think an OS could/should provide

such services as your library offers ?

>> I am fully aware that some performance is being "wasted" that way.

>> Then again, if I *really* wanted to extract the last ounce of
>> performance from a system I'd program it entirely in assembly language.
>> For me, it's just another one of those tradeoffs.
>
> The one tradeoff you should spent the most time on is the choice of
>implementation architectures. I have never found that you gain anything
>by dedicating threads. (With the obvious exceptions like a single
>network I/O thread, a clock manager thread, a timer firing thread, a UI
>thread, etcetera.)

Funny. As a matter of fact this is what I mostly use threads for.

Glad to see you admit *these* exceptions, because they are the rule for me.

> Sooner or later, you'll have to start writing programs that can scale.

>Otherwise, your employers will be hiring me to fix your code. (Unless
>you aren't writing programs where performance and resource consumption
>are super important. In which case, yes, tradeoff performance for ease
>of implementation.)

Indeed, I'm not in the business of writing programs that can scale.

Unless you call opening 10 UI windows simultaneously "scaling",
of course...
I hope this does not make me unwelcome on this ng :)

> Real world example, names omitted to protect the guilty:
[snip]

Thank you for the entertaining anecdote ;)
Point well taken. I guess it all comes down to the many application domains
where multithreading is used. What makes sense for an enterprise-scale server
does not necessarily hold true for a snappy GUI application.

Hey, I know how (and when) to use a profiler though ;-)

Cheers,
--Jonathan