Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Multithreading underlies new development paradigm

3 views
Skip to first unread message

Paul Morrison

unread,
Jan 30, 2000, 3:00:00 AM1/30/00
to
As Bil Lewis says, multithreading is going to become more and more
important in the IT world. While most of the discussion threads I have
been reading talk about its importance from the point of view of
performance and resource utilization, my work over the last 30 years
strongly indicates that it is one of the basic keys to improving the
application development process, application maintainability and reuse.
The key paradigm change required is a switch from thinking procedurally
to thinking of the application in terms of data streams and their
transforms, where the transforms are implemented as asynchronous
threads.

Rather than going to all the arguments here, I would like to point
interested readers at my home page
http://www.e-smith.org/people/paul/fbp. I have also written a book
called "Flow-Based Programming: A New Aproach to Application
Development", which came out in 1994. It can be obtained via
Amazon.com.

I would love to make contact with people interested in this area,
whether in business or academe.

Paul Morrison

Greg Lindahl

unread,
Jan 30, 2000, 3:00:00 AM1/30/00
to
Paul Morrison <paul.m...@home.com> writes:

> While most of the discussion threads I have
> been reading talk about its importance from the point of view of
> performance and resource utilization, my work over the last 30 years
> strongly indicates that it is one of the basic keys to improving the
> application development process, application maintainability and reuse.

And then there are the people who have discovered that it wastes
money, time, and is generally a big disaster. So, like most software
topics, the answer is unclear.

> The key paradigm change required is a switch from thinking procedurally
> to thinking of the application in terms of data streams and their
> transforms, where the transforms are implemented as asynchronous
> threads.

Sounds like you have confused macro dataflow with multithreading. They
are separate concepts.

-- g

Alireza Moshtaghi

unread,
Jan 30, 2000, 3:00:00 AM1/30/00
to
Hi

I am relatively new in the area of multithreading. In deed I have started
my masters last semester. My area of interest is testing multithreaded
applications specially in shared memory environments like pthreads; but I'm
still not sure what topic to choose maybe you can help me find my way
around.

thanx
ali

Paul Morrison wrote:

> As Bil Lewis says, multithreading is going to become more and more

> important in the IT world. While most of the discussion threads I have


> been reading talk about its importance from the point of view of
> performance and resource utilization, my work over the last 30 years
> strongly indicates that it is one of the basic keys to improving the
> application development process, application maintainability and reuse.

> The key paradigm change required is a switch from thinking procedurally
> to thinking of the application in terms of data streams and their
> transforms, where the transforms are implemented as asynchronous
> threads.
>

Bo Sanden

unread,
Jan 31, 2000, 3:00:00 AM1/31/00
to
Ali,

You will find a lot of material on practical multi-threading at
artemis.coloradotech.edu/~bsanden. It's primarily about the design of
multi-threaded apps with numerous examples. Perhaps it can give you and idea.

Bo Sanden
Colorado Tech.

Bo Sanden

unread,
Jan 31, 2000, 3:00:00 AM1/31/00
to
> And then there are the people who have discovered that it wastes
> money, time, and is generally a big disaster. So, like most software
> topics, the answer is unclear.

I'd say, if threading turns out to be a disaster, it wasn't done right, or it
was used for no good reason. Nothing messes up a program as much as adding a
few threads here and there as you go. I have worked with concurrent apps for
many years, successfully, I like to think. With threads, careful up-front
design pays off, if you will excuse the software-engineering motherhood.

Bo Sanden
Colorado Tech
artemis.coloradotech.edu/~bsanden


Patrick TJ McPhee

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to
In article <38952283...@eb.uah.edu>,
Alireza Moshtaghi <ali...@eb.uah.edu> wrote:

% I am relatively new in the area of multithreading. In deed I have started
% my masters last semester. My area of interest is testing multithreaded
% applications specially in shared memory environments like pthreads; but I'm
% still not sure what topic to choose maybe you can help me find my way
% around.

The important thing to remember in threading, as in life, is to think
globally, but act locally.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Roger A. Faulkner

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to

This is a bit off-topic, but I feel impelled to reply.

Years ago, Ken and Dennis (Thompson and Ritchie) saved us from
concurrency by creating the UN*X process model, a simple model
having the advantage that (unless you rely on external stimuli
like signals and such) is reproducible from run to run, that is
to say, it is debuggable. Given the same input, it will reach
the same state, even if that state is a broken state. You can
run the program over and over until you figure out the problem.

They knew that computers, with hardware interrupts and everything
else going on, were cursed with the problem of concurrency. They
created an OS that dealt with that concurrency and that shielded
user-level programs from that same concurrency.

Now years later we say, "Thanks a lot, but we want concurrency
back. We are big boys and girls now and we can deal with it."

I hate dealing with concurrency. This even though I work on the
operating system with all its locking issues and work on libthread
in Solaris as well. The work is delicate to say the least, and
requires enormous concentration on the details of whatever locking
strategy is employed. Anyone who wants to do this sort of work just
because it is neat and whizzy is crazy.

My advice to anyone considering multithreaded programming is, don't.
Find a simpler way to accomplish your task and enjoy life. If the
demands of the task require multithreading, then meditate a lot and
prepare yourself for a difficult task. Do not deceive yourself by
thinking it will be easy.

You will be faced with mysterious race conditions bugs that only
occur when someone runs your program on a machine under heavy load.
You will tear you hair. You will scream. You will resort to closing
bug reports as "cannot reproduce". If you feel strongly about providing
a high quality product, you will have a nervous breakdown. If you don't
care at all, your company will fail and you will be out of a job.

These are the reasons I hate multithreading.
I'm also good at it.
It's a love/hate relationship.

Roger Faulkner
roger.f...@sun.com

Zhihui Zhang

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to Roger A. Faulkner
On 1 Feb 2000, Roger A. Faulkner wrote:

Your reply is wonderful. No pains, no gains. If you want using threads,
prepare for hard work plus risk.

-Zhihui

--
--------------------------------------------------
| Zhihui Zhang, http://cs.binghamton.edu/~zzhang |
| Dept. of Computer Science, SUNY at Binghamton |
--------------------------------------------------


Bill Todd

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to
It's a bit alarming to think that you concentrate on details only when
writing multi-threaded code - would make me reluctant to use anything else
you created, I think.

Perhaps it's because I started coding in an asynchronous environment before
the concept of intra-process threads was popular (for all I know, before it
was invented), but I really don't find them as scary as you suggest. Nor
explicit asynchrony, for that matter. Guess it's mostly a question of what
you've become used to, not unlike other non-trivial software constructs.

That said, the rest of your advice is sound: concurrency (of whatever kind)
*does* introduce significant complexity of a type that is absent in strictly
serial execution, it takes a while to learn how to handle it right, and
using it without cause is not a good idea (for the reasons you suggest).

But I'd much rather have the choice when it's appropriate - which may be why
I've always considered Unix a toy (though it is starting to offer more such
choices than it used to). Just consider the number of applications over
time that had to use a zillion processes to handle, at significantly lower
efficiency, the job that a single multi-threaded process could have managed
and you'll understand why threads came to Unix, albeit belatedly.

- bill

Roger A. Faulkner <r...@sunraf.Sun.COM> wrote in message
news:87698j$ma4$1...@engnews2.Eng.Sun.COM...

Zalman Stern

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to
In comp.arch Roger A. Faulkner <r...@sunraf.sun.com> wrote:
: Years ago, Ken and Dennis (Thompson and Ritchie) saved us from

: concurrency by creating the UN*X process model, a simple model
: having the advantage that (unless you rely on external stimuli
: like signals and such) is reproducible from run to run, that is
: to say, it is debuggable. Given the same input, it will reach
: the same state, even if that state is a broken state. You can
: run the program over and over until you figure out the problem.

Ever seen an intermittent pipe deadlock? Race conditions on file creation
between two processes? Stupid bugs to be sure and usually easier to debug
than multithreading bugs, but they do exist.

Yes, non-concurrent programs are a hell of a lot easier to write and debug
than concurrent ones. And if you can avoid concurrency, all the better. But
if you have a problem that naturally requires concurrency, or can gain a
lot of performance by using it, then you are much better off using good
tools like independent execution entities with mutual exclusion and
synchronization primitives. It is wonderfull that UNIX made this need rare,
but its a shame that when the need came up, traditional UNIX fell flat far
too often. (And the pat UNIX geek answer of "Don't do that and don't worry
about the performance" is just galling to someone trying to get their job
done.)

Another viewpoint is that the world has changed somewhat and there are just
too many high latency things in the world to be able to write user
responsive programs without threading. My experience interacting with
programs on UNIX suggest that there's much truth in this. (E.g Waiting for
a DNS server timeout. Though Joe Keane sent me mail a while back about how
you should be able to handle that in a select loop. And he's probably
right. We could argue for years whether making everything a file descriptor
is easier than threading though :-))

-Z-

Zalman Stern

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to
In comp.arch Roger A. Faulkner <r...@sunraf.sun.com> wrote:
: These are the reasons I hate multithreading.

: I'm also good at it.
: It's a love/hate relationship.

One more thing: my experience is that you rarely debug a multithreaded
program to correctness. You either get it right by design, or rewrite it
until it is correct by design. I've seen case where the rewriting step
amounted to removing concurrency because the programmer couldn't get it to
work otherwise.

I've also noticed this the number of times I've dealt with people learning
to program with threads. If it has a race condition, they start trying to
figure out what the program is doing by executing it a whole bunch with
minor perturbations, etc. Down that path lies madness. If at all possible,
concentrate on the code and its logic. Make sure you are using good
primitives for the task. (E.g. I love the DEC SRC/Modula 3/pthreads style
atomic unlock and wait on a condition as it eliminates sleep wakeup races
with only a very small amount of programming discipline. And few bugs piss
me off more than having a program which is in a perfectly correct state
except that its asleep. There's no "handle" to figure out what went
wrong...)

(As an aside, part of the problem with traditional UNIX kernels is they
were not written with internal threading in mind. They had a very simple
concurrency model and as demands forced using more complex ones, all hell
broke loose. A thousand hacks bloomed. So I'm not surprised a UNIX kernel
programmer has had bad experiences with threading. Been there, done that.)

Time stamped event logging, ala John Carman's suggestion a while back is a
big win too. This implies having a good central place to put the logging,
but if you're dealing with a concurrent system, you probably want to
identify the "central place" (or very few places) very early on...

And this is just talking about correctness. Plenty a threaded program works
fine but runs slower than what its trying to replace. And traditional
profiling techniques don't work very well at all.

-Z-

Phil McRevis

unread,
Feb 1, 2000, 3:00:00 AM2/1/00
to
[Please do not mail me a copy of your followup]

"Gerald Hilderink" <g.h.hi...@el.utwente.nl> spake the secret code
<87840u$f6g$1...@dinkel.civ.utwente.nl> thusly:

>[C.A.R. Hoare's Communicating Sequential Processes discussion]

This makes me wonder -- does anyone have a CSP implementation built on
pthreads?
--
http://www.xmission.com/~legalize Legalize Adulthood!
lega...@xmission.com
``Ain't it funny that they all fire the pistol, <URL: http://
at the wrong end of the race?''--PDBT www.eden.com/~thewho>

Gerald Hilderink

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
It sounds to me that those who want to avoid concurrency because it adds
more trouble (complexity) than necessary seem to use a wrong paradigm.
Concurrency is natural and should be applied when required. Belief me,
concurrency reduces complexity and may increase the performance of your
application. However, a side effect of concurrency are the dangers of
deadlocks, livelocks and starvations. By choosing the CSP (Communicating
Sequential Processes) paradigm you learn reasoning about concurrent behavior
of processes and to avoid deadlocks, livelocks and starvation at design -
and at implementation. CSP is a theoretical foundation for describing and
verifying the behavior of concurrent systems. For example, at the university
of Kent (UK) and at the university of Twente (NL) CSP packages are being
developed for Java. CSP for Java hides all dangerous low-level
multithreading aspects from the programmer by providing abstract constructs
that simplify reasoning about the behavior of the program. These high-level
constructs put for example debugging, priority inversion, synchronization,
communication, interrupt handling in a total different perspective that is
simpler to understand and to implement, because it is well-defined. The
performance of the software may become even better. Furthermore, people do
not need to learn the theory of CSP to program reliable real-time and
concurrent software.

See http://www.rt.el.utwente.nl/javapp

Gerald Hilderink


Bruce Hoult

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
In article <877om8$97q$5...@nntp4.atl.mindspring.net>, Zalman Stern
<zal...@netcom8.netcom.com> wrote:

> One more thing: my experience is that you rarely debug a multithreaded
> program to correctness. You either get it right by design, or rewrite it
> until it is correct by design. I've seen case where the rewriting step
> amounted to removing concurrency because the programmer couldn't get it to
> work otherwise.

I agree.

For the majority of uses of treading, I find that co-operative threads
(aka coroutines as in Modula-2) are more than adequate for what is needed
and a whole heck of a lot more reproducable and debuggable than preemptive
threads. And subject to a lot fewer race conditions.

-- Bruce

Bill Todd

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
Seems like any non-preemptive mechanism would have a tough time taking
advantage (in a single process) of multiple processors, though (which would
certainly be true of Modula-2 coroutines if they are defined the same way
coroutines were back in the '70s). I'll admit that the need for this may be
somewhat rare, but if you *do* need it (like, for a database server
implementation) the only other option (save for the zillion-process
approach) is to move the whole damn thing into the kernel - and if you think
application-level multi-threading is messy...

- bill

Bruce Hoult <bruce...@pobox.com> wrote in message
news:brucehoult-02...@bruce.bgh...

Zalman Stern

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
In comp.arch Bruce Hoult <bruce...@pobox.com> wrote:
: For the majority of uses of treading, I find that co-operative threads

: (aka coroutines as in Modula-2) are more than adequate for what is needed
: and a whole heck of a lot more reproducable and debuggable than preemptive
: threads. And subject to a lot fewer race conditions.

Yes, it is easier in some sense. And it may even be adequate for many
programs, but I don't think there's enough of a win to rely on
coroutines. Most non-preemptive environments force you to bang your head
against another problem: where do you put explicit preemptions or what
low-level routines do you allow to preempt internally? This is especially
important if your relying on preemptions not happening in sequences of code
that are not otherwise mutually exclusive. (An examples of this is the
traditional UNIX kernel design. Large swaths of code require there will be
no preemption. This code was screwed bigtime when symmetric multiprocessing
rolled around.)

Using nonpreemption guarantees to avoid explicit mutual exclusion is a
really bad practice and should be avoided. At which point, a well written
program will run fine with preemptive threading though a program that is
correct under multithreading may deadlock due to missing preemptions in a
coroutines environment.

(As analogy for the style issue would be indirecting a Macintosh handle and
using the interior pointer across a number of statements without locking
the handle. People do this and it tends to cause bugs when someone adds a
memory moving operation to a routine that is called in one of the
statements that is covered in "the shadow" of the handle use. The same
thing can happen with thread preemptions statements. A Windows style design
where you have to lock the handle to get the pointer is much more robust
and allows preemption at any point as well.)

-Z-


Jan Vorbrueggen

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
Zalman Stern <zal...@netcom8.netcom.com> writes:

> One more thing: my experience is that you rarely debug a multithreaded
> program to correctness. You either get it right by design, or rewrite it
> until it is correct by design.

Agreed.

> (As an aside, part of the problem with traditional UNIX kernels is they
> were not written with internal threading in mind. They had a very simple
> concurrency model and as demands forced using more complex ones, all hell

> broke loose. [...])

Indeed. Having been raised on VMS, in which the kernel has always been
multithreaded, and seen how (fairly) easily first asymmetric and then
symmteric multiprocessing was implemented for it, I almost fell off my
chair when I heard about Unix's "big kernel lock" approach, and of course
suffered through Solaris's gyrations as they tried to convert all those
hacks into functional code.

Jan

Bruce Hoult

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
In article <878kq2$4ep$1...@nntp3.atl.mindspring.net>, Zalman Stern
<zal...@netcom12.netcom.com> wrote:

> Using nonpreemption guarantees to avoid explicit mutual exclusion is a
> really bad practice and should be avoided. At which point, a well written
> program will run fine with preemptive threading

Yes, I agree. But if you happen to make the occasional bug it will be a
lot easier to find under nonpreemption because it will be more repeatable.


> As analogy for the style issue would be indirecting a Macintosh handle and
> using the interior pointer across a number of statements without locking
> the handle. People do this and it tends to cause bugs when someone adds a
> memory moving operation to a routine that is called in one of the
> statements that is covered in "the shadow" of the handle use. The same
> thing can happen with thread preemptions statements. A Windows style design
> where you have to lock the handle to get the pointer is much more robust
> and allows preemption at any point as well.

That's not an analogy -- it's *exactly* the same issue. The Mac memory
manager is a mutator that runs as a coroutine to the main application
thread. The principle is a pretty general one: you can't depend on a
section of code having a particular property unless you know that each
primitive operation also has that property. I've always thought that
vendors of Mac compilers should have (conservatively) maintained
information on whether or not every function "moves memory". We know the
answer for toolbox routines (in the absense of trap patching) because IM
tells us, and the compiler can easily propogate this information to
callers. Exactly the same thing could be done for the property "possibly
calls yield()". It would even be cool if the compiler had a general
framework that allowed the programmer to define new properties :-)

The really conservative version of the principle is: always assume that
any function you call is evil.

Fortunately for the question of caching Mac master pointers an efficient
and yet conservative solution is very easy: *always* write the handle
dereference explicitly i.e. as **p or p[0]->foo. An optomising compiler
will cache the master pointer automatically during CSE, but because of the
C aliasing rules will forget it as soon as you call any function.

That's not a general solution to all failures of preemption, unfortunately :-(

-- Bruce

Kostas Kostiadis

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to

"Roger A. Faulkner" <r...@sunraf.Sun.COM> wrote in message
news:87698j$ma4$1...@engnews2.Eng.Sun.COM...

[snip]

> My advice to anyone considering multithreaded programming is, don't.
> Find a simpler way to accomplish your task and enjoy life. If the
> demands of the task require multithreading, then meditate a lot and
> prepare yourself for a difficult task. Do not deceive yourself by
> thinking it will be easy.
>

That is great advice...In fact we can take this one step further
and also advice anyone considering single-threaded programming to
NOT do it !!! There are lots of problems there too.

> You will be faced with mysterious race conditions bugs that only
> occur when someone runs your program on a machine under heavy load.
> You will tear you hair. You will scream. You will resort to closing
> bug reports as "cannot reproduce". If you feel strongly about providing
> a high quality product, you will have a nervous breakdown. If you don't
> care at all, your company will fail and you will be out of a job.
>

All the above sounds like really crap design to me. It is just that crap
design is a lot easier to implement in a single-threaded program.
Also crap implementations (based on crap design) are a lot easier
to debug in single-threaded program.

So your advice would in fact be: If you can't design OR implement,
write single-threaded software. To that I agree 100%...
(However I think in that case you will be out of a job soon enough
anyway, or you never know....might end up working for MegaSoft :) )

> These are the reasons I hate multithreading.
> I'm also good at it.

You don't say....

K.


Terje Mathisen

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
Bruce Hoult wrote:
>
> In article <878kq2$4ep$1...@nntp3.atl.mindspring.net>, Zalman Stern
> <zal...@netcom12.netcom.com> wrote:
[snip]

> > thing can happen with thread preemptions statements. A Windows style design
> > where you have to lock the handle to get the pointer is much more robust
> > and allows preemption at any point as well.
>
> That's not an analogy -- it's *exactly* the same issue. The Mac memory
> manager is a mutator that runs as a coroutine to the main application
> thread. The principle is a pretty general one: you can't depend on a
> section of code having a particular property unless you know that each
> primitive operation also has that property. I've always thought that
> vendors of Mac compilers should have (conservatively) maintained
> information on whether or not every function "moves memory". We know the
> answer for toolbox routines (in the absense of trap patching) because IM
> tells us, and the compiler can easily propogate this information to
> callers. Exactly the same thing could be done for the property "possibly
> calls yield()". It would even be cool if the compiler had a general
> framework that allowed the programmer to define new properties :-)

Yes, PLEASE!

I have been looking for a clean way (or just any kind of way) to
implement this kind of meta-api:

An API which defines and propagates all the kinds of limitations and
dependencies of a piece of code, i.e. if I use my calendar algorithm as
an example, some way to tell all uses of that code that integers have to
act as if using two's complement arithmetic, and must support at least
26 bit numbers.

Similarly, that the year-month-day to day number algorithm assumes both
year and month to be valid, positive numbers, but that the day value
could have any value.

I believe ADA tried/tries to come closer to this, with explicit
specification of minimum ranges, and Pascal has similar mechanism with
user-defined types.

In C, most everything end up as a plain old number. :-(

Terje

--
- <Terje.M...@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"

Kaz Kylheku

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
On Wed, 2 Feb 2000 00:31:20 -0500, Bill Todd <bill...@foo.mv.com> wrote:
>Seems like any non-preemptive mechanism would have a tough time taking
>advantage (in a single process) of multiple processors, though (which would

Cooperative threads only give you the one single advantage of threading,
and that is the simplification of design of a program which does things
that are nicely expressible using multiple threads of control.

But not only do you lose the ability to support multiple processing elements,
you also lose the ability to prioritize tasks. Computation is effectively
serialized, unless broken by explicit yields. The software cannot, e.g.,
respond to an asynchronous input and switch to a higher priority thread.

Cooperative threading is okay for batch processing, and graphical interfaces.

Also, traditional UNIX kernels are cooperatively threaded. When a process
enters the kernel, preemption is turned off until that process suspends
or leaves the kernel. That is, only user mode execution can be preempted.

Multiprocessing can be introduced into this kind of kernel. You end up with a
kind of hybrid threading which resembles cooperative threading but runs on
multiple processors. The Linux kernel is like this. The differences between
this and preemptive threading are obvious. For example, when you acquire a
critical region lock, it is a *processor* that owns it, rather than a
*process*. Your control flow is fixed on one processor; it's not possible for
the execution of kernel code to be pre-empted and moved to another processor;
only an explicit call which yields control to the scheduler can do this.

Andy Glew

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
Roger A. Faulkner <r...@sunraf.Sun.COM> wrote in message news:87698j$ma4$1...@engnews2.Eng.Sun.COM...

> Years ago, Ken and Dennis (Thompson and Ritchie) saved us from


> concurrency by creating the UN*X process model, a simple model
> having the advantage that (unless you rely on external stimuli
> like signals and such) is reproducible from run to run, that is
> to say, it is debuggable. Given the same input, it will reach
> the same state, even if that state is a broken state. You can
> run the program over and over until you figure out the problem.

(1) UNIX also gave us pipes <data filter1 | filter2 | filter3 >result
which are a very special form of concurrency that is also
reproducible. Pipes are also a lot easy to program than
the non-piped code that accomplishes the same composition
of filters.
The original poster who started this thread was advocating
something like "data streams". For suitably configured stream
networks, this can also lead to reproducible programs.
Aside: non-reproducibility creeps in with select().

(2) While UNIX's simple model eliminated many concurrency
problems, UNIX also exposed many more filesystem based
concurrency problems by not providing any semblance of
file locking.

Kaz Kylheku

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
On Wed, 2 Feb 2000 10:23:44 -0600, Andy Glew <gl...@cs.wisc.edu> wrote:
>Roger A. Faulkner <r...@sunraf.Sun.COM> wrote in message news:87698j$ma4$1...@engnews2.Eng.Sun.COM...
>
>> Years ago, Ken and Dennis (Thompson and Ritchie) saved us from
>> concurrency by creating the UN*X process model, a simple model
>> having the advantage that (unless you rely on external stimuli
>> like signals and such) is reproducible from run to run, that is
>> to say, it is debuggable. Given the same input, it will reach
>> the same state, even if that state is a broken state. You can
>> run the program over and over until you figure out the problem.
>
>(1) UNIX also gave us pipes <data filter1 | filter2 | filter3 >result
>which are a very special form of concurrency that is also
>reproducible. Pipes are also a lot easy to program than
>the non-piped code that accomplishes the same composition
>of filters.

The big reason for that is that the processes do not explicitly share memory,
and do not call into each other.

A similar level of ``debuggability'' can be achieved in threaded programs if
you confine threads to compartments within the code and have them communicate
by queuing messages.

Problem with such designs is that even though you may achieve decent
throughput, the latency of communicating across these compartments may be
unacceptable. That is to say, from the point of view of any single unit of data
being processed, a context switch is required to cross from one compartment to
another; the throughput is maintained because a lot other data items may be
moved in the same context switch.

Joe Seigh

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
Kaz Kylheku wrote:
snip...

>
> Cooperative threading is okay for batch processing, and graphical interfaces.
>

So when the gui won't refresh for a long while because the programmer decided
to do the database query on the event handler thread, they can just say
it's not because they don't have a clue about multi-threaded programming but that
they're doing cooperative threading. There's a *lot* of cooperative threading
going on in the gui programming world apparently.

Joe Seigh

Maynard Handley

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
In article <3898062b.0@seralph9>, "Kostas Kostiadis" <kko...@essex.ac.uk> wrote:

>"Roger A. Faulkner" <r...@sunraf.Sun.COM> wrote in message
>news:87698j$ma4$1...@engnews2.Eng.Sun.COM...
>

I guess in the UK .ac world one gets to work with really smart programmers who
(a) know what they are doing and
(b) can simply walk away from a project that turns into a swamp.

In the commercial world one has to work with other programmers of limited
ability, and one does not have the option of walking away as a disaster
unfolds. The advice to avoid threads where one possibly can makes perfect
sense---it's no different from any other situation where doing things the
"cool way" will indeed make the code run 30% faster---and will also take
an additional year to code and debug, by which point the extra speed is a
moot point.
Kostas has obviously been lucky enough to work only on well-designed
projects with smart people. I've had to work on projects using threads in
a some-what ad hoc fashion (as sometimes happens in the real world, the
1.0 version is well designed but mission creep in version 2.0 and up
complicate matters), and it's not as much fun as some people might think.

Maynard

Zalman Stern

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
In comp.arch Andy Glew <gl...@cs.wisc.edu> wrote:
: Aside: non-reproducibility creeps in with select().

Actually, you can get race conditions simply by having an incorrectly
written program that both writes to a pipeline and reads from the end of
it. (Due to finite buffering.) In order to handle that case you either need
select, non-blocking I/O, asynch event handlers, or threads.

-Z-

Bill Todd

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
Using multi-threading or explicit asynchrony to make a program run 30%
faster is certainly questionable in the commercial world, for the reasons
you suggest. Using it to make a program run 300% or 3000% faster (which can
be the case - unless you use multiple cooperating processes instead, which
is hardly simpler), however, can be quite defensible indeed - though if it
takes you a year longer, you should not do it, because that's a clear
indication that you don't know what you're doing (though we all have to
learn somehow, so perhaps you can chalk it up to experience and then rewrite
it). And your management should be aware that such code must not
subsequently be handed over to trained monkeys to maintain and enhance:
there is indeed an on-going cost to it, though with some discipline it need
not be excessive and can be considered offset in its development of new
people with competence in the area.

Maintaining someone else's poorly-organized code, or even just code that has
become crufty over years of revisions, is seldom fun, and concurrent
execution is only one aspect that's a problem (though if the concurrency is
not fairly cleanly compartmentalized such that its correctness can be
verified, it makes debugging hell).

- bill

Maynard Handley <hand...@ricochet.net> wrote in message
news:handleym-020...@handma3.apple.com...

Greg Alexander

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to
In article <87698j$ma4$1...@engnews2.Eng.Sun.COM>, Roger A. Faulkner wrote:
>In article <3895D483...@acm.org>, Bo Sanden <bsa...@acm.org> wrote:
>>> And then there are the people who have discovered that it wastes
>>> money, time, and is generally a big disaster. So, like most software
>>> topics, the answer is unclear.
>>
>>I'd say, if threading turns out to be a disaster, it wasn't done right, or it
>>was used for no good reason. Nothing messes up a program as much as adding a
>>few threads here and there as you go. I have worked with concurrent apps for
>>many years, successfully, I like to think. With threads, careful up-front
>>design pays off, if you will excuse the software-engineering motherhood.
>
>This is a bit off-topic, but I feel impelled to reply.
>
>Years ago, Ken and Dennis (Thompson and Ritchie) saved us from
>concurrency by creating the UN*X process model, a simple model
>having the advantage that (unless you rely on external stimuli
>like signals and such) is reproducible from run to run, that is
>to say, it is debuggable. Given the same input, it will reach
>the same state, even if that state is a broken state. You can
>run the program over and over until you figure out the problem.
>
>They knew that computers, with hardware interrupts and everything
>else going on, were cursed with the problem of concurrency. They
>created an OS that dealt with that concurrency and that shielded
>user-level programs from that same concurrency.
>
>Now years later we say, "Thanks a lot, but we want concurrency
>back. We are big boys and girls now and we can deal with it."
>
>I hate dealing with concurrency. This even though I work on the
>operating system with all its locking issues and work on libthread
>in Solaris as well. The work is delicate to say the least, and
>requires enormous concentration on the details of whatever locking
>strategy is employed. Anyone who wants to do this sort of work just
>because it is neat and whizzy is crazy.
>
>My advice to anyone considering multithreaded programming is, don't.
>Find a simpler way to accomplish your task and enjoy life. If the
>demands of the task require multithreading, then meditate a lot and
>prepare yourself for a difficult task. Do not deceive yourself by
>thinking it will be easy.
>
>You will be faced with mysterious race conditions bugs that only
>occur when someone runs your program on a machine under heavy load.
>You will tear you hair. You will scream. You will resort to closing
>bug reports as "cannot reproduce". If you feel strongly about providing
>a high quality product, you will have a nervous breakdown. If you don't
>care at all, your company will fail and you will be out of a job.
>
>These are the reasons I hate multithreading.
>I'm also good at it.
>It's a love/hate relationship.

I agree with what you have to say, but I suspect that in the future the
balance may be altered somewhat through automated verification systems. I
can provide you some citations if you want. There are tools in use and
development to explore all possible schedules (with some neat hacks to
vastly reduce the number of actually explored schedules) and check for
assertion violations to debug threaded code, and programs to check for
violations of locking disciplines, etc. A lot of bugs are going to be in
the design and will not be caught by these tools, but the tools sure do
help catch a lot of things that you would never have found through more
traditional debugging methods.

Kaz Kylheku

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to

... or additional processes.

Usenet Poster Boy

unread,
Feb 2, 2000, 3:00:00 AM2/2/00
to

r...@sunraf.Sun.COM (Roger A. Faulkner) writes:

> In article <3895D483...@acm.org>, Bo Sanden <bsa...@acm.org> wrote:
> >> And then there are the people who have discovered that it wastes
> >> money, time, and is generally a big disaster. So, like most software
> >> topics, the answer is unclear.
> >
> >I'd say, if threading turns out to be a disaster, it wasn't done
> >right, or it was used for no good reason. Nothing messes up a
> >program as much as adding a few threads here and there as you go. I
> >have worked with concurrent apps for many years, successfully, I
> >like to think. With threads, careful up-front design pays off, if
> >you will excuse the software-engineering motherhood.
>
> This is a bit off-topic, but I feel impelled to reply.
>
> Years ago, Ken and Dennis (Thompson and Ritchie) saved us from
> concurrency by creating the UN*X process model, a simple model
> having the advantage that (unless you rely on external stimuli
> like signals and such) is reproducible from run to run, that is
> to say, it is debuggable. Given the same input, it will reach
> the same state, even if that state is a broken state. You can
> run the program over and over until you figure out the problem.
>
> They knew that computers, with hardware interrupts and everything
> else going on, were cursed with the problem of concurrency. They
> created an OS that dealt with that concurrency and that shielded
> user-level programs from that same concurrency.
>

Um, er, DMR et al invented some pretty powerful and useful OS
abstractions for Unix, but the process model wasn't one of them. And
although the original research editions didn't export a lot of
concurrency issues to users, there were still plenty of opportunities
to be bitten by concurrency problems, as anyone who ever edited the
password file concurrently with someone else can attest.

The single-thread-of-control per execution-context model, which
predates unix by a good 15 years, does not eliminate concurrency, it
merely makes it a nonissue within the thread of a single
execution-context.

> Now years later we say, "Thanks a lot, but we want concurrency
> back. We are big boys and girls now and we can deal with it."
>

I don't know about you, but I was writing concurrent applications when
I first encountered Unix in '79, and I'm still writing concurrent
applications 20 years later.

> I hate dealing with concurrency. This even though I work on the
> operating system with all its locking issues and work on libthread
> in Solaris as well. The work is delicate to say the least, and
> requires enormous concentration on the details of whatever locking
> strategy is employed. Anyone who wants to do this sort of work just
> because it is neat and whizzy is crazy.
>

On this we agree completely.

> My advice to anyone considering multithreaded programming is, don't.
> Find a simpler way to accomplish your task and enjoy life. If the
> demands of the task require multithreading, then meditate a lot and
> prepare yourself for a difficult task. Do not deceive yourself by
> thinking it will be easy.
>

It's not an all-or-nothing decision, and it can't always be avoided.
If you have a data file that needs to be accessed concurrently, you
have a concurrency application, and no amount of wishing will make it
not be one.

The answer isn't to stick your head in the sand, but rather to adapt
reasonable strategies that keep the concurrency issues well controlled
and well confined.

> You will be faced with mysterious race conditions bugs that only
> occur when someone runs your program on a machine under heavy load.
> You will tear you hair. You will scream. You will resort to
> closing bug reports as "cannot reproduce". If you feel strongly
> about providing a high quality product, you will have a nervous
> breakdown. If you don't care at all, your company will fail and you
> will be out of a job.
>

This can be true. Especially if you can't control the programming
discipline you have to work under.

> These are the reasons I hate multithreading.
> I'm also good at it.
> It's a love/hate relationship.
>

Concurrent programming is hard. If you do it with the raw primitives
very few people know how to do it correctly. (Actually, I think
Leslie Lamport may be the only one :) But we've got some techniques
that are well understood and let you use the concurrency sometimes.

Marty

Kostas Kostiadis

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to

"Maynard Handley" <hand...@ricochet.net> wrote in message
news:handleym-020...@handma3.apple.com...
> In article <3898062b.0@seralph9>, "Kostas Kostiadis" <kko...@essex.ac.uk>
wrote:
>
>
> I guess in the UK .ac world one gets to work with really smart programmers
who
> (a) know what they are doing and
> (b) can simply walk away from a project that turns into a swamp.
>

(a) is definitely not the case...
(b) is no different than quiting your job...

> In the commercial world one has to work with other programmers of limited
> ability, and one does not have the option of walking away as a disaster
> unfolds. The advice to avoid threads where one possibly can makes perfect

It really depends who you work for, and does NOT apply to the entire
commercial world. If a company is clever enough to hire cheap, crap,
programmers, then this company should be aware of the consequences.
A "disaster" does not just happen... Software projects that fail
(e.g. London ambulance service) have a billion factors going wrong.
There are numerous studies as to why software projects fail. I've
never seen a single one that blames it on using multi-threaded code...
For a brief statistical analysis try:
"The Copmuter Bulletin", BCS, January 2000 issue, pp. 24-26.
The article is titled "IT Projects: sink or swim".

> sense---it's no different from any other situation where doing things the
> "cool way" will indeed make the code run 30% faster---and will also take
> an additional year to code and debug, by which point the extra speed is a
> moot point.

There is NOTHING cool about using multiple-threads. Believe me.
And if it takes you an additional year, you are definitely doing something
wrong. By thinking "concurrently" it is normally EASIER to desing you
application.

> Kostas has obviously been lucky enough to work only on well-designed
> projects with smart people. I've had to work on projects using threads in
> a some-what ad hoc fashion (as sometimes happens in the real world, the
> 1.0 version is well designed but mission creep in version 2.0 and up
> complicate matters), and it's not as much fun as some people might think.
>

Kostas had to design his projects, which of course was disasterous at
first, but then you learn as you go along. How can you learn and advance if
you don't even try? Also Kostas had to write clients for a server that he
was not responsible of developing, with requirements that kept
changing at a regular basis. Not to mention that Kostas is not any smarter
than you. In addition to that please quit this "real world" crap because it
doesn't make a good excuse. You have to deliver a software
package, by a given deadline, and you have a fixed amount of resources
at your disposal. So do I !!!

Cheers,
Kostas.


Gerald Hilderink

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to

"Kaz Kylheku" <k...@ashi.footprints.net> wrote in message
news:slrn89gp8...@ashi.FootPrints.net...

> On Wed, 2 Feb 2000 10:23:44 -0600, Andy Glew <gl...@cs.wisc.edu> wrote:
> >Roger A. Faulkner <r...@sunraf.Sun.COM> wrote in message
news:87698j$ma4$1...@engnews2.Eng.Sun.COM...
> >
> >> Years ago, Ken and Dennis (Thompson and Ritchie) saved us from
> >> concurrency by creating the UN*X process model, a simple model
> >> having the advantage that (unless you rely on external stimuli
> >> like signals and such) is reproducible from run to run, that is
> >> to say, it is debuggable. Given the same input, it will reach
> >> the same state, even if that state is a broken state. You can
> >> run the program over and over until you figure out the problem.
> >
> >(1) UNIX also gave us pipes <data filter1 | filter2 | filter3 >result
> >which are a very special form of concurrency that is also
> >reproducible. Pipes are also a lot easy to program than
> >the non-piped code that accomplishes the same composition
> >of filters.
>
> The big reason for that is that the processes do not explicitly share
memory,
> and do not call into each other.
>
> A similar level of ``debuggability'' can be achieved in threaded programs
if
> you confine threads to compartments within the code and have them
communicate
> by queuing messages.
>
> Problem with such designs is that even though you may achieve decent
> throughput, the latency of communicating across these compartments may be
> unacceptable. That is to say, from the point of view of any single unit of
data
> being processed, a context switch is required to cross from one
compartment to
> another; the throughput is maintained because a lot other data items may
be
> moved in the same context switch.

Nowadays, input/output streams are flexible, standardized and are typically
operating system resources for many purposes, such as standard input,
standard output, disk access, pipes, filters, etc. All together they are
implemented for universal use can therefore be unacceptable slow.
Fortunately, there exist a far more powerful communication concepts by means
of channels.
Processes encapsulate their own process space and do not know the name of
other processes. Therefore, a process can never alter the state of another
process directly. This is a safe and secure concept. Processes can only
communicate by channels. Channels are closely related to streams, but
channels are different from streams in that they are express communication
events, which is a basis for the theoretical foundation of CSP and other
theories. The benefit of these vents is that we can reason about the
behaviour of software in terms of these events.
The channel connects processes and takes care of the data transfer (via
memory of via devices). The channel is one-way, unbuffered and provides a
rendezvous synchronization mechanism; in other words, a message can only be
sent if both the writer and reader are ready. If one of the processes is
ready and the other is not, then the first gets blocked until the second is
ready. When both processes commit in communication then the event occurs. It
's simple, but is may sound to be a slow concept, which is not the case. The
overheads are the same as for proper synchronizing shared data. Because this
synchronization (for example using semaphores/monitors) are encapsulated
within the channel it can be highy optimized. Channels are fast because the
directly operate on the scheduler and are directly responsible for
non-preemptive and preemptive scheduling. Another form of optimalization is,
for example on the good-old transputer, channels are implemented on silicon.
Its read and write methods are simple instructions. Also channels are found
in several programming languages, such as occam, handleC and I belief Ada,
as language primitives. With using channels, writing concurrent programs
becomes easy, belief me.

Channels pass objects, such as data primitives one at the time or a pool of
memory (e.g. byte array) in DMA fashion. Buffering of data can be done at
the side of the writer and the buffer can be sent at once or a buffer
mechanism can be implemented inside a channel that will gathered the data.
Still, we can reason about our software in terms of communication events.

Gerald Hilderink

Al Kossow

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
In article <87bn4c$236$1...@dinkel.civ.utwente.nl>, "Gerald Hilderink"
<g.h.hi...@el.utwente.nl> wrote:

> Channels are closely related to streams, but
> channels are different from streams in that they are express communication
> events, which is a basis for the theoretical foundation of CSP and other
> theories.

And the papers/implementations for these 'channels' are where?

pd wilson

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to

I would be interested in the citations you mention....

Thanks

Greg Alexander wrote:
>
>[trim]


>
> I agree with what you have to say, but I suspect that in the future the
> balance may be altered somewhat through automated verification systems. I
> can provide you some citations if you want. There are tools in use and
> development to explore all possible schedules (with some neat hacks to
> vastly reduce the number of actually explored schedules) and check for
> assertion violations to debug threaded code, and programs to check for
> violations of locking disciplines, etc. A lot of bugs are going to be in
> the design and will not be caught by these tools, but the tools sure do
> help catch a lot of things that you would never have found through more
> traditional debugging methods.

Remove the FNORD to email!

Gerald Hilderink

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to

Al Kossow <a...@spies.com> wrote in message
news:aek-030200...@haxrus.apple.com...

Information about channel implementations for Java can be found at
http://www.rt.el.utwente.nl/javapp
http://www.cs.ukc.ac.uk/projects/ofa/jcsp/
At the University of Twente in the Netherlands (first link) C and C++
versions are being developed as well. The C version runs on a naked
TMS320F240 DSP processor and the C++ version is used by students at the
control laboratory on PCs in DOS for building control applications. All
versions, including the Java version, are still beta stuff. Only the Java
version is public for testing by students and researchers of other
universities/companies around the world.
At the University of Kent in the UK (second link) they have Java channels
supporting proper event handling of AWT processes. They have really nice
stuff for writing concurrent applets in Java. They describe some nice
examples of concurrent programming in Java in their class documentation.
With little effort these channels and other fundamental constructs provided
by these packages can be rewritten for C++ and pthread. I would like to see
one doing this. It is a wonderful paradigm where you will realize that it is
possible to use threads without directly programming with threads. Those
difficult synchronization constructs (i.e., monitors, mutexes, semaphores,
signals, locks/unlocks, etc.) that pthreads offers are replaced by simple
and reliable patterns of channel communication. It is amazing that one will
think in terms of abstract primitives such as processes and channels, but
underneath you will find pthreads doing the job. I am involved with the
stuff at Twente, where it is used primarily for programming high quality
mechatronic/control software.


Andy Glew

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
A few comments about J. Paul Morrison's paper
"Flow-Based Programming":

(0) I think I have read this before - maybe the conference
version, or perhaps I encountered the web page
somewhere in my wanderings. Anyway, I liked
it then, and I like it now.

(1) "Flow-Based Programming" (FBP) is an example
of what I mean when I say that multithreaded programming
can be *easier* than singlethreaded programming.
Connecting FBP's coroutines by streams is considerably
easier than figuring out how to write a single loop
that emulates the coroutines in a step by step basis.

Such a single loop *can* be written - continuations
are a way of doing so, also, the paper "The Packet Filter" (ref?)
describes a generic approach to this transformation -
but it is just easier to compose coroutines and streams
(or, UNIX processes and pipes, or database operators
and streams).

(2) FBP's coroutines and streams are similar to UNIX
processes and pipes.

Key difference #1: FBP's streams carry structured data,
rather than the raw data of UNIX pipes. For single producer,
single consumer, streams I'm not sure that this matters so
much: structure can always be imposed on the raw pipe
data. Indeed, Morrison talks about how some FBP applications
have needed to add "brackets" to the streams, which amounts
to the same problem, imposing more structure.
For connections with multiple readers or writers, atomicity
is a concern with pipes; moreover, structure may avoid a
serial bottleneck as packet structure is parsed.

Key difference #2: FBP apparently has tools that make
designing networks of streams easier. UNIX tools in this
regard, the shell <in filter1 | filter2 >out syntax, is primitive
and one dimensional. Named pipes help, but do not
solve the problem.

Some of what I say wrt UNIX pipes also applies to UNIX sockets.

(3) The FBP approach is rather similar to the database approach
espoused by folks such as DeWitt at Wisconsin: operators
connected by queues or streams, possibly spread across
multiprocessors and multicomputers. Of course, greater or
lesser amounts of buffering may be required for some operators,
such as JOIN; sorting a stream in an order reverse to that
in which it initially arrives similarly requires large buffering,
although sometimes transformations can be made on the
streams network to accomplish this.

(4) Another similarity: data management languages like SPSS
and SAS can be considered to be compilers that take
simple operations, each of which can be considered to
be a pass over the data
FOR EACH RECORD IN FILE
add field delta1 to record, where delta1 = thisline.data1 - lastline.data1
FOR EACH RECORD IN FILE
add field delta2 to record, where delta2 = thisline.data2 - lastline.data2
FOR EACH RECORD IN FILE SORTED BY delta1
add field cum2 to record, where cum2 = cum2 + thisline.data2
and transforming it into a minimal number of passes
FOR EACH RECORD IN FILE
add field delta1 to record, where delta1 = thisline.data1 - lastline.data1
add field delta2 to record, where delta2 = thisline.data2 - lastline.data2
SORT BY delta2 INTO streamS
FOR EACH RECORD IN streamS
add field cum2 to record, where cum2 = cum2 + thisline.data2

Again, the key points are:
a) it is easiest to express the operations in terms of full passes over
the data sorted or filtered in some order, i.e. in terms of streams
b) the compiler can automatically figue out how to optimize such passes
It's a pity that such facilities aren't available in C or C++ - anyone want to
write a template expression library for this?

Further, although it is possible to transform user requests from this form
to a classic single Von Neuman thread, if the hardware affords multiple
threads, why not use them directly?

(5) Similarity to CSP: maybe, but I think that CSP is a more general
mechanism, that allows much greater damage to be done.
I think that the special cases supported by FBP and similar
techniques may be enough, or, at least, enough for the industry
to gradually acculturate itself to the full power of CSP.

(6) I still do not hold great hopes for compiler parallelization.

I have much greater hopes that explicit programmer specified parallelism,
especially for special cases like FBP, filter composition, and database operators,
will be able to use multithreaded hardware.

(7) Implications for computer architecture:

Parallel programming involves two "hard problems": dividing the
program into parallel pieces, and load balancing.

My attitude about parallelization is that it should follow the most natural
way for the programmer to express the problem. Load balancing should
not be a programmer concern. For the business problems that FBP
targets, the division into streams and coroutines is "obvious", and the granularity
is naturally coarse; it is only for embarassingly parallel scientific matrix codes
that load balancing, merging ostensibly parallel computations into the same
thread, really matters.

Bottom line: the programmer should not have to worry about load balancing.
The system should do it transparently.

For multithreaded computer architecture, this means that the hardware should
either not have hard limits such as 2 or 4 or 64 threads, with a huge degradation
in performance when the hardware supported thread limit is exceeded.
Jack Dennis makes a good point: modular software development requires
that modules not be affected by each other's resource utilization: if one module
requires 3 threads, and the other 2, and there are only 4 hardware threads,
performance should not fall off a cliff.

Developing schemes to support such smooth adjustment to varying thread
workloads is the most important challenge facing thread based computer
architecture. Either the programming style must dynamically adjust to the
number of hardware threads available, or efficient schemes for context switching
virtual processors must be developed. For example, given suitably large buffers
between FBP style coroutines, in batch processing it is pointless to timeshare:
a coroutine should run until it has filled up its output buffers, and then surrender
to the consumer of its output. This resembles cooperative multitasking, but the
disadvantages of cooperative multitasking would be remedied if a preemptive
multitasking scheduler were aware of the stream structure connecting threads.
Again, probably in a more sophisticated manner than UNIX's naive "block on
pipe output buffer full, block on input buffer empty, schedule preemptively in between."


Zalman Stern

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
In comp.arch Bruce Hoult <bruce...@pobox.com> wrote:
: Yes, I agree. But if you happen to make the occasional bug it will be a

: lot easier to find under nonpreemption because it will be more repeatable.

On the flipside some bugs will not show up as frequently and may be missed
entirely in QA.

-Z-

Kaz Kylheku

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
On Thu, 3 Feb 2000 12:05:33 +0100, Gerald Hilderink
<g.h.hi...@el.utwente.nl> wrote:
>
>The channel connects processes and takes care of the data transfer (via
>memory of via devices). The channel is one-way, unbuffered and provides a
>rendezvous synchronization mechanism; in other words, a message can only be
>sent if both the writer and reader are ready.

That is even worse than a context switch. The delivery time of a message now
not only has a context switch factored in but a potential scheduling delay.
What's worse, each discrete message is treated this way, so throughput
is affected.

You should use queues to decouple threads from each other. That way, the
scheduling delays only happens when a queue drains, or when it becomes full.
And you can transfer more than one message per context switch.

I think that these kinds of compartmentalized approaches are a copout from
doing real multitasking.

>If one of the processes is
>ready and the other is not, then the first gets blocked until the second is
>ready.

Right; in other words, you are killing concurrency. The other thread is doing
useful work, and you block the sending thread until the other one finishes,
instead of doing other useful work.

An asynchronous notification model would be much superior. Instead of a
rendezvous, have your thread do an asynchronous send. Then provide some
interface for collecting replies---with the option to block, if necessary. The
programmer can use this to implement a naive rendezvous or something more
sophisticated as needed based on the requirements for a given problem.

>When both processes commit in communication then the event occurs. It
>'s simple, but is may sound to be a slow concept, which is not the case. The
>overheads are the same as for proper synchronizing shared data. Because this

I do not believe it. Achieving mutually exclusive access to shared data is
vastly different. This is a case of not only ensuring mutually exclusive
access but of delaying the execution of threads until another thread
reaches a given point in its execution. Mutual exclusion and synchronization
are very different beasts in practice, even if they may resemble each other on
paper.

It's one thing to do some update on shared data, and quite another thing
to also wait for another thread to acknowledge that update.

Zalman Stern

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
In comp.arch Terje Mathisen <Terje.M...@hda.hydro.com> wrote:
: I have been looking for a clean way (or just any kind of way) to

: implement this kind of meta-api:

: An API which defines and propagates all the kinds of limitations and
: dependencies of a piece of code, i.e. if I use my calendar algorithm as
: an example, some way to tell all uses of that code that integers have to
: act as if using two's complement arithmetic, and must support at least

: 26 bit numbers.

These seem to be static constraints. For the case Bruce mentioned, you need
to tag functions with a property (can move memory - CMM) and then tell the
compiler how to combine properties (if a function calls another CMM function
then that function must also be CMM). At that point a large number of
really serious Macintosh bugs would have become static compilation errors
through use of a more rigorous programming style.

This is into the realm of intentional or perhaps metaprogramming. Projects
like Open C++ (which is a C++ compiler with a standard extension interface)
allow you to do this.

Alternatively you can try to automatically derive properties from the
program text and some programmer specified ones. This gets into the realm
of ML type inference and formal proof systems.

: Similarly, that the year-month-day to day number algorithm assumes both


: year and month to be valid, positive numbers, but that the day value
: could have any value.

Assertions or range specified types. Modula 3 has 'em IIRC.

-Z-

Kaz Kylheku

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
On Thu, 3 Feb 2000 13:24:34 -0600, Andy Glew <gl...@cs.wisc.edu> wrote:
>A few comments about J. Paul Morrison's paper
>"Flow-Based Programming":
>
>(0) I think I have read this before - maybe the conference
> version, or perhaps I encountered the web page
> somewhere in my wanderings. Anyway, I liked
> it then, and I like it now.
>
>(1) "Flow-Based Programming" (FBP) is an example
> of what I mean when I say that multithreaded programming
> can be *easier* than singlethreaded programming.
> Connecting FBP's coroutines by streams is considerably
> easier than figuring out how to write a single loop
> that emulates the coroutines in a step by step basis.

FBP is only one of many approaches to concurrent programming. It's no panacea.
I did maintenance work on a communication protocol stack whose various layers
were distinct elements in a pipe, and had threads. Communication between the
layers was done with queues.

I, and everyone else who has worked on maintaining the same software, considers
it brain-damaged. Why should a packet go through six context switches on its
way down to the operating system?

There were other problems, too. The queues in this software were not properly
bounded. So a bottleneck at some level could give rise to unbounded growth.
The software could have been fixed to block, but that would create other
problems.

Perhaps the biggest issue was that because these layers communicated using
packets instead of strongly typed function calls, programmers resorted to
increasing the repertoire of integral packet type fields to extend
the functionality of the software. The software acquired undocumented
dependencies among the layers based on the exchange of certain types of
packets. When one programmer tried to replace some layers, he had to
emulate all of the quirks of the old layer.

In other words, the weak typing and vague interfacing brought about by the
packet model led to some serious maintenance headaches.

I think that if you are going to go with this kind of model, you need a
programming language which has some interface abstraction over message
exchanges. Or construct one somehow; don't just exchange objects with type
fields and have switch statements all over the place.

Benjamin Gamsa

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
In article <87cjbu$jhu$1...@dinkel.civ.utwente.nl>,

I'm curious how one would implement, for example, a simple concurrent
database system, with actual parallelism in the transactions.


Gerald Hilderink

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to

Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
news:2000Feb3.1...@jarvis.cs.toronto.edu...

The client process sends a query object over a channel (chan1) to the
database process and then it receives the result (e.g. a table object
(snapshot)) from the database by another channel (chan2), directed to the
client. For multiple clients there are several scenarios, for example,
multiple clients can send queries simultaneously by writing query objects to
the shared channel chan1 and receive the results by reading channel
chan2[cid]. Of course, the channel will let the highest priority process go
first before lower priority processes. In this case, each query object
stores the identity (cid) from each client process that is used by a
multiplexer process that delivers the result back to the right client
process. The client does not necessarily have to wait for the result and may
continue in the mean time when the database is executing the query. The
client process will receive the result by alting on channel chan2[cid]. The
Alternative construct is listening for the client process to serve
chan2[cid] when it becomes ready. There is no busy poling involved. The
entire system is event-driven.
This design pattern can be implemented as described above, like implementing
a data-flow (object) model. You may have noticed that I did not need to
express difficult thread synchronization constructs. Channels and the
Alternative construct are taking care of the synchronization and scheduling
parts. Once you know the fundamental semantics of channels and the
alternative construct (see online documentation), you can reason about the
design and implementation with your colleges and friends.
At the moment, I do not have an implementation of this example ready, but if
you are interested I can make one in Java. Please, let me know.

Gerald.


George Herbert

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
Kaz Kylheku <k...@ashi.footprints.net> wrote:

>George Herbert <gher...@crl.crl.com> wrote:
>>Zalman Stern <zal...@netcom17.netcom.com> wrote:
>>>In comp.arch Andy Glew <gl...@cs.wisc.edu> wrote:
>>>: Aside: non-reproducibility creeps in with select().
>>>
>>>Actually, you can get race conditions simply by having an incorrectly
>>>written program that both writes to a pipeline and reads from the end of
>>>it. [...]
>>
>>Out of curiosity, what would you be doing which would have a single
>>process reading from the same pipeline it's writing to, except for
>>a pipe testing benchmark?
>
>This is called a coprocess.

There have been about three incorrect and one correct reply to this
(the correct one in email 8-). Those that got it wrong, please re-read...
I was asking about what conditions one process would open a pipe, write
to that pipe, and then read from that same pipe.

Forking a seperate process, connected via pipes, and writing to it
and then reading back the results, is a very different mechanism,
and has nothing to do with why one might open a pipe and both write
to it and read from it within the same process.

The one comment I received which was correct indicated that this
has been known to happen (and cause problems, in earlier versions of X)
with cut and paste within Xterms (and presumably, slightly more
generically with other X applications) where you are selecting and
then pasting back into the same xterm. I had been under the impression
that X used sockets for that traffic but appear to sit corrected.


-george william herbert
gher...@crl.com


Zalman Stern

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
In comp.arch Chris Morgan <c...@mihalis.net> wrote:
: Oh I don't know. You can normally get timeout versions of the locking
: operations which do something if they lock for too long - return
: false, raise an exception, that kind of thing.

You aren't thinking of a sleep/wakeup race. Take the following code:

Producer:
P0: lock queue
P1: add element to queue
P2: if queue was empty, wakeup consumer
P3: unlock queue

Consumer:
C0 lock queue
C1: get element setting to NULL if queue empty
C2: unlock queue
C3: if element is NULL, sleep

start with an empty queue and consider the following execution sequence:
C0
C1
C2
P0
P1
P2
P3
C3

The program is left with the queue unlocked containing one element and no
processing happening. At no time was there inordinately long lock
acquisition latency. This is a sleep/wakeup race. The SRC/Modula3/pthreads
model solves this problem by providing an atomic unlock and sleep
primitive.

-Z-

David B. Chorlian

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
With regard to the posting of Gerald Hilderink excerpted below I
would make the following observation:
The concurrency model enforced by the rendezvous method is far too
restrictive for useful programming. There are at least two situations
in which concurrency is useful, one in which there is genuine
independence between two or more instruction or data streams, and
another in which the latency of sequential operations can be masked by
concurrent operations on different parts of a data stream. A
rendezvous mechanism may be fine for the first situation, but is
cumbersome for the second. Consider reading something large from disk
and then doing some data dependent processing. What is wanted here is
asynchronous concurrency, not the serialization imposed by the
rendezvous.

In <87bn4c$236$1...@dinkel.civ.utwente.nl> "Gerald Hilderink" <g.h.hi...@el.utwente.nl> writes:

>Nowadays, input/output streams are flexible, standardized and are typically
>operating system resources for many purposes, such as standard input,
>standard output, disk access, pipes, filters, etc. All together they are
>implemented for universal use can therefore be unacceptable slow.
>Fortunately, there exist a far more powerful communication concepts by means
>of channels.
>Processes encapsulate their own process space and do not know the name of
>other processes. Therefore, a process can never alter the state of another
>process directly. This is a safe and secure concept. Processes can only

>communicate by channels. Channels are closely related to streams, but


>channels are different from streams in that they are express communication
>events, which is a basis for the theoretical foundation of CSP and other

>theories. The benefit of these vents is that we can reason about the
>behaviour of software in terms of these events.

>The channel connects processes and takes care of the data transfer (via
>memory of via devices). The channel is one-way, unbuffered and provides a
>rendezvous synchronization mechanism; in other words, a message can only be

>sent if both the writer and reader are ready. If one of the processes is


>ready and the other is not, then the first gets blocked until the second is

>ready. When both processes commit in communication then the event occurs. It


>'s simple, but is may sound to be a slow concept, which is not the case. The
>overheads are the same as for proper synchronizing shared data. Because this

>synchronization (for example using semaphores/monitors) are encapsulated
>within the channel it can be highy optimized. Channels are fast because the
>directly operate on the scheduler and are directly responsible for
>non-preemptive and preemptive scheduling. Another form of optimalization is,
>for example on the good-old transputer, channels are implemented on silicon.
>Its read and write methods are simple instructions. Also channels are found
>in several programming languages, such as occam, handleC and I belief Ada,
>as language primitives. With using channels, writing concurrent programs
>becomes easy, belief me.

>Channels pass objects, such as data primitives one at the time or a pool of
>memory (e.g. byte array) in DMA fashion. Buffering of data can be done at
>the side of the writer and the buffer can be sent at once or a buffer
>mechanism can be implemented inside a channel that will gathered the data.
>Still, we can reason about our software in terms of communication events.

>Gerald Hilderink


--
David B. Chorlian
Neurodynamics Lab SUNY/HSCB
chor...@spot.cns.hscbklyn.edu
dav...@panix.com

Steve Newman

unread,
Feb 3, 2000, 3:00:00 AM2/3/00
to
In article <87cugp$opc$1...@dinkel.civ.utwente.nl>, "Gerald Hilderink"
<g.h.hi...@el.utwente.nl> wrote:

> Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
> news:2000Feb3.1...@jarvis.cs.toronto.edu...
>

I don't understand how any of the alternatives you describe would allow
the database to work on two transactions in parallel. Clearly the clients
can be working in parallel with one another and in parallel with the
database, but it sounds like the database itself can only be working on
one request at a time. Is it possible for the core database code to be
actively working on two requests in parallel using this approach?
(E.g. simultaneously computing two different joins for two different
clients.)

-- Steve Newman

Greg Alexander

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to

I think that this is actually a case of them not having a clue about
multi-threaded programming. The weakness of the chosen approach does not
excuse the choices.

Greg Alexander

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In article <3899B586...@collins.rockwell.FNORD.com>, pd wilson wrote:
>
>I would be interested in the citations you mention....

What I have on my desk here is at:
http://www.bell-labs.com/people/god/public_psfiles/popl97.ps
however, all of
http://www.bell-labs.com/people/god/
is related (it's mostly about VeriSoft and related topics -- apparently
unrelated to www.verisoft.com).

I also have an article from the ACM Transactions on Computer Systems Vol
15 No 4 November 1997, Pages 391-411 by Stefan Savage et al, called
"Eraser: A Dynamic Data Race Detector for Multithreaded Programs"
describing a program which checks for violations of mutex locking
discipline. I was just given these papers as part of a college class I am
taking, so I am sure there are many more similar papers/projects in
existence. You might want to check the references at the end of
those...one of the more famous ones in this field, I guess, is:

Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed
system. Commun. ACM 21, 7 (July), 558-565.

Though I haven't persnoally seen it. I am by no means an expert. :)
I don't know how widely these tools are used throughout the field. I
think that currently most of them are in the stage of "Oh, I just wrote
this neat program that checks these things, and I ran it on these popular
publicly available programs and it found these bugs. You're welcome."
But for all I know there may be standard programs/approaches that practically
everyone doing serious work on important concurrant systems knows of and
uses. :)

Zalman Stern

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In comp.arch George Herbert <gher...@crl3.crl.com> wrote:
: There have been about three incorrect and one correct reply to this

: (the correct one in email 8-).

Here is the text of your question:
[


Out of curiosity, what would you be doing which would have a single
process reading from the same pipeline it's writing to, except for
a pipe testing benchmark?

]

A pipeline is a series of commands connected by pipes. It is not the same
as a single pipe. (I suppose you could argue that an empty pipeline is a
valid case and is a single pipe. But that there is a degenerate case
doesn't change my point.) I do not believe this is a controversial point of
terminology. Most UNIX programmers I know understand a pipe to be the two
file descriptors coming back from pipe and the kernel data structures
inbetween while a pipeline may have a bunch of processes in the middle.

Even though there are two or more pipes involved in a pipeline, one can get
deadlocks if a single process is trying to write to one end and read from
the other.

-Z-

Jan Vorbrueggen

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
k...@ashi.footprints.net (Kaz Kylheku) writes:

> >The channel connects processes and takes care of the data transfer (via
> >memory of via devices). The channel is one-way, unbuffered and provides a
> >rendezvous synchronization mechanism; in other words, a message can only be
> >sent if both the writer and reader are ready.
>

> That is even worse than a context switch. The delivery time of a message now
> not only has a context switch factored in but a potential scheduling delay.

That's because you're used to think that scheduling and context switching
are expensive. They need not be. Have a look at how the prototype of this
approach, the transputer using occam, implements it.

[much snipped]

Have you read Tony Hoare's paper on CSP, or anything that followed on from it?

Jan

Jan Vorbrueggen

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
"Andy Glew" <gl...@cs.wisc.edu> writes:

> For multithreaded computer architecture, this means that the hardwarr


> should either not have hard limits such as 2 or 4 or 64 threads, with
> a huge degradation in performance when the hardware supported thread
> limit is exceeded. Jack Dennis makes a good point: modular software
> development requires that modules not be affected by each other's
> resource utilization: if one module requires 3 threads, and the other 2,
> and there are only 4 hardware threads, performance should not fall
> off a cliff.

Precisely. I would like to add another aspect: The programmer should be able
to express as much parallelism as possible, or as is easily extractable for
him; the intermediate software (compiler, code restructurer, ...) is
responsible for "destroying" any parallelism that cannot be advantageously
used on a given implementation. The split-brain thinking required to on the
one hand extract as much parallel performance as possible while at the same
time making serialization decisions because parts execute with better
performance if expressed sequentially - a very brittle process, incidentally -
is annoying, to say the least.

Jan

Jan Vorbrueggen

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
Zalman Stern <zal...@netcom13.netcom.com> writes:

> You aren't thinking of a sleep/wakeup race. Take the following code:

[elided]


> The program is left with the queue unlocked containing one element and no
> processing happening.

Isn't this because the wakeup isn't "sticky"? C3 sleeps, although P2, the
consumer wakeup, was executed prior to it. If P2 were implemented in VMS's
$WAKE, and C3 as $HIBER, you would not have a problem, methinks.

Jan

Jan Vorbrueggen

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
dav...@panix.com (David B. Chorlian) writes:

> The concurrency model enforced by the rendezvous method is far too
> restrictive for useful programming. There are at least two situations
> in which concurrency is useful, one in which there is genuine
> independence between two or more instruction or data streams, and
> another in which the latency of sequential operations can be masked by
> concurrent operations on different parts of a data stream. A
> rendezvous mechanism may be fine for the first situation, but is
> cumbersome for the second. Consider reading something large from disk
> and then doing some data dependent processing. What is wanted here is
> asynchronous concurrency, not the serialization imposed by the rendezvous.

Have you ever actually used the model described by Gerhard? Then you would
know that your observation isn't true.

Jan

Zalman Stern

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In comp.arch Jan Vorbrueggen <j...@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote:
: Isn't this because the wakeup isn't "sticky"?

Making them sticky more or less results in semaphores. Which of course work
too. I like the way the SRC/Modula 3 mechanism blends mutual exclusion and
events quite cleanly.

-Z-


Gerald Hilderink

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to

Steve Newman <sne...@acm.no.spam.org> wrote in message
news:snewman-0302...@cm20816640181.cableco-op.com...

The scenario I described was based on multiple clients and one database in
parallel. The database serves one request at the time, but in the mean while
the clients can continue. The multiplexer process functions as a buffering
process that always gathers the requests of the clients.
Yes. In another scenario, the core database can be designed out of parallel
processes that serve multiple queries at the same time. You can get clear
design patterns using processes and channels.

Gerald Hilderink


Gerald Hilderink

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to

Kaz Kylheku <k...@ashi.footprints.net> wrote in message
news:slrn89jmo...@ashi.FootPrints.net...

> On Thu, 3 Feb 2000 12:05:33 +0100, Gerald Hilderink
> <g.h.hi...@el.utwente.nl> wrote:
> >
> >The channel connects processes and takes care of the data transfer (via
> >memory of via devices). The channel is one-way, unbuffered and provides a
> >rendezvous synchronization mechanism; in other words, a message can only
be
> >sent if both the writer and reader are ready.
>
> That is even worse than a context switch. The delivery time of a message
now
> not only has a context switch factored in but a potential scheduling
delay.
> What's worse, each discrete message is treated this way, so throughput
> is affected.

Context switching is very natural in concurrent software. Channels decide
when to context switch. Every context switch introduces overhead, but not as
bad as you may think. We should context switch when necessary. When a
process cannot deliver a message it may block and waits until a message will
be accepted -- waiting processes do not consume time. Otherwise we need to
poll and polling does also consume time. With CSP we can also poll for the
readiness of channels (we do this by using the Alternative construct, which
is a fundamental concept) and there is no busy-polling involved. You may
agree with me that busy-polling is even worse than context switching.

>
> You should use queues to decouple threads from each other. That way, the
> scheduling delays only happens when a queue drains, or when it becomes
full.
> And you can transfer more than one message per context switch.
>
> I think that these kinds of compartmentalized approaches are a copout from
> doing real multitasking.

In Ada/occam/HandelC/Limbo/CSP you use channels to decouple threads
(processes) from each other. Queues are only buffers to one can add and
remove objects. Channels behave in the same way as queues with buffer size
0. With CSP for Java, for example, channels can be extended with a queue of
size >= 0 (channel = new Channel(new Buffer(size)). The queue will be freed
from difficult synchronization constructs, because the channel already takes
care of that. The channel decides to schedule on the full and empty states
of the buffer. A channel with a queue of size 0 is a rendezvous channel.
Channels can hold a message such as a single byte, a pointer to memory or an
entire object (e.g. a document). Each process can buffer its data in the
message. When the message is complete (filled) it can be send by the
channel.
Most queuing techniques you desire is possible with channels.

How large must a buffer be able to determine the optimal performance? I know
that some applications with channels of buffer size 1 becomes faster than
with rendezvous channels, because it safes some context switches, but the
application will not run faster with buffers larger than 1. In this case,
larger buffers only consume memory and have no effect on the performance.
The optimal size of the buffer can be determined by reasoning about the
behavior of the data-flow in the program starting from buffer size 0 (i.e.
rendezvous channels).

Using queues can be hazardous and one should reason about the behavior of
your program when using queues. A program that is deadlock-free,
livelock-free and starvation-free with rendezvous channels is also
deadlock-free, livelock-free and starvation-free with queued channels. Once
a queue reaches the full or empty state it may have nasty consequences
(potential deadlock/livelock) when this behavior was overlooked during
design and testing. A program that has a process that is surrounded by
queues that will never reach the full or empty state (such as override
buffers), will suffer from starvation. You need a scheduling point in that
process to build-in fairness yourself, for example enabling a timeslicer. So
you must be careful when leading off with buffers. Reasoning with channel is
safe and sound.

>
> >If one of the processes is
> >ready and the other is not, then the first gets blocked until the second
is
> >ready.
>

> Right; in other words, you are killing concurrency. The other thread is
doing
> useful work, and you block the sending thread until the other one
finishes,
> instead of doing other useful work.

Every thread (process) has useful work to do. It is killing concurrency to
let a process run all the time until one of the buffers finally becomes
drained or full. All other processes have to wait until this happens. So,
some fairness must be build-in by for example using time-slicing.
Time-slicing is a solution for non-real-time applications. The overheads of
time-slicing can be enormous when compared to the overheads of using
channels. It is the total design of processes and channels that determines
the overall performance of your application, not the channel alone. As
mentioned before, processes waiting for communication and do not consume
time. When processes can engage in a communication event it will wake-up and
competes again with the rest of the (non-waiting) processes of the same
priority. The program is event-driven. With channels no time-slicing is
necessary. Also with CSP a perfect mix of non-preemptive and preemptive
scheduling can be achieved using the Parallel and PriParallel constructs
without the need for time-slicing.

>
> An asynchronous notification model would be much superior. Instead of a
> rendezvous, have your thread do an asynchronous send. Then provide some
> interface for collecting replies---with the option to block, if necessary.
The
> programmer can use this to implement a naive rendezvous or something more
> sophisticated as needed based on the requirements for a given problem.

Buffers and channels are indirect related to an asynchronous notification
model. In CCS, ACP and in CSP this very clever described. It is the Parallel
construct and the Alternative construct that are responsible for the
asynchronous behavior. In CSP the Alternative construct provides
asynchronous behavior based on the non-deterministic choice of channel
communication. The semantics of a buffer is specified by the Alternative
construct: BUFF = (if (state != empty) (left?x -> BUFF)) [] (if (state !=
full) (right!a ->BUFF)). The [] symbol denotes the Alternative construct
(inner choice). The left process may be selected when its condition is true
and when it can engage in event left?x. The right process may be selected
when its condition is true and when it can engage in event right!a. The
Parallel construct is like this ((left!y->P) || BUFF || (right?b -> Q)).
These descriptions can be used to proof any presence of deadlocks. So, the
fundamental Parallel and Alternative constructs are superior and buffers are
composed be these constructs. These constructs can be implemented in many
ways.

>
> >When both processes commit in communication then the event occurs. It
> >'s simple, but is may sound to be a slow concept, which is not the case.
The
> >overheads are the same as for proper synchronizing shared data. Because
this
>

> I do not believe it. Achieving mutually exclusive access to shared data is
> vastly different. This is a case of not only ensuring mutually exclusive
> access but of delaying the execution of threads until another thread
> reaches a given point in its execution. Mutual exclusion and
synchronization
> are very different beasts in practice, even if they may resemble each
other on
> paper.

Yes, mutually exclusive access to shared data is vastly different, but
mutual exclusion is insufficient when this shared data can become valid or
invalid. Some condition has to guard the shared data. What you get is a
monitor construct. A channel is implemented by a monitor construct.
Fortunately, this monitor construct can be highly optimized because the
synchronization pattern of the channel is fixed and hidden from the
programmer. Channels are high-level synchronization construct and monitors
are low-level synchronization constructs. Reasoning about concurrent
behavior with channels is vastly simpler than reasoning with monitors.

You see that channels are not as inefficient as you may think and that the
ideas behind the channel concept are well formed.

Gerald Hilderink


Benjamin Gamsa

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In article <87egho$m4j$1...@dinkel.civ.utwente.nl>,

Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:
>
>Steve Newman <sne...@acm.no.spam.org> wrote in message
>news:snewman-0302...@cm20816640181.cableco-op.com...
>> In article <87cugp$opc$1...@dinkel.civ.utwente.nl>, "Gerald Hilderink"
>> <g.h.hi...@el.utwente.nl> wrote:
>>
>> > Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
>> > news:2000Feb3.1...@jarvis.cs.toronto.edu...
>> >
>> > > I'm curious how one would implement, for example, a simple concurrent
>> > > database system, with actual parallelism in the transactions.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>The scenario I described was based on multiple clients and one database in
>parallel. The database serves one request at the time, but in the mean while
>the clients can continue. The multiplexer process functions as a buffering
>process that always gathers the requests of the clients.
>Yes. In another scenario, the core database can be designed out of parallel
>processes that serve multiple queries at the same time. You can get clear
>design patterns using processes and channels.

But that's the point. I wanted to see multiple transactions being
served in parallel. The fact that clients don't necessarily have to wait
is really a side point. Sure, the result involves no explicit syncronization,
but you've effectively single-threaded the access to the shared data
structure and removed all concurrency and parallelism.

Michael Loftus

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
Kostas Kostiadis wrote:
>
> "Maynard Handley" <hand...@ricochet.net> wrote in message
> >
> > I guess in the UK .ac world one gets to work with really smart programmers
> who
> > (a) know what they are doing and
> > (b) can simply walk away from a project that turns into a swamp.
> >
>
> (a) is definitely not the case...

Yeah, we know. But thanks for the affirmation.

> (b) is no different than quiting your job...

Interviewer: So, Kostas, explain how you handled that difficult
issue at your last job.
Kostas : Quite easily, thank you. I just quit.

[snip]
> If a company is clever enough
[snip]

Bwa ha ha ha ha ha!

[snip]
> and you have a fixed amount of resources at your disposal.
> So do I !!!

Wrong. Resources tend to get cut or disappear. All a part of
that "real-world crap".

--

Michael Loftus
(my real address has no 'x')
Ford Research Labs

*** I am not an official spokesperson for Ford Motor Co. ***

Gerald Hilderink

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to

Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
news:2000Feb4.0...@jarvis.cs.toronto.edu...

There is single-threaded access to the shared data structure, but why would
this remove all concurreny and parallelism? There is one process serving the
shared data to all processes that are interrested in the data. Multiple
transactions can be served when each client has separate channels to the
database. The database contains many concurrent processes (threads)
executing parts of that transaction. Maybe I miss your point?


Bill Todd

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to

Jan Vorbrueggen <j...@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote in
message news:y43dr9j...@mailhost.neuroinformatik.ruhr-uni-bochum.de...

NT has a similar (though less elegant) mechanism, in that resuming a
presumed-suspended thread returns a value which, if zero, indicates that it
was not, in fact, suspended - so in the event of a (presumably rare) race
you can just loop (*outside* the lock region and with a sleep between
re-tries) until the thread suspends, at which point you resume it.

- bill

Bill Todd

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to

Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
news:2000Feb4.0...@jarvis.cs.toronto.edu...

...

> But that's the point. I wanted to see multiple transactions being
> served in parallel. The fact that clients don't necessarily have to wait
> is really a side point. Sure, the result involves no explicit
syncronization,
> but you've effectively single-threaded the access to the shared data
> structure and removed all concurrency and parallelism.

Agreed. And, with apologies from one who has not followed this conversation
up to this point and may be confused, if the 'processes' you're talking
about are Unix-style processes (i.e., executing in independent address
spaces) rather than logical sequential processes executing as threads in the
same address space, the context-switching overhead in the database component
(if it uses multiple Unix-style processes) will be unacceptable in many
applications.

- bill

Bill Todd

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
message news:y49011j...@mailhost.neuroinformatik.ruhr-uni-bochum.de...

> "Andy Glew" <gl...@cs.wisc.edu> writes:
>
> > For multithreaded computer architecture, this means that the
hardwarr
> > should either not have hard limits such as 2 or 4 or 64 threads,
with
> > a huge degradation in performance when the hardware supported thread
> > limit is exceeded. Jack Dennis makes a good point: modular software
> > development requires that modules not be affected by each other's
> > resource utilization: if one module requires 3 threads, and the
other 2,
> > and there are only 4 hardware threads, performance should not fall
> > off a cliff.

Hmmm. My server seems to have missed Andy's and Jack's posts, so apologies
if this comment is out of context.

When I've done kernel multi-threading on NT, I've tried to arrange a number
of threads equal to the number of likely-available processors (a number less
than the total number physically present), and then within each thread
execute asynchronously such that each thread moves on to the next available
work queue item whenever it has to wait (for I/O, a lock, or whatever, which
on completion simply queues up for restart on the work queue), so as to
minimize the required thread-level context-switching (and inter-thread
synchronization caused by an excessive multi-programming level, for that
matter).

This probably constitutes a regression from more elegant mechanisms that
attempt to mask the complexities of multi-threading (let alone asynchrony)
from the poor programmer, and I'd happily adopt such mechanisms if they can
offer comparable execution efficiency.

- bill

Steve Newman

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In article <87f5sk$6nd$1...@dinkel.civ.utwente.nl>, "Gerald Hilderink"
<g.h.hi...@el.utwente.nl> wrote:

> Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
> news:2000Feb4.0...@jarvis.cs.toronto.edu...
>

> > In article <87egho$m4j$1...@dinkel.civ.utwente.nl>,
> > Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:
> >
> > > The scenario I described was based on multiple clients and one database
> > > in parallel. The database serves one request at the time, but in the
> > > mean while the clients can continue. The multiplexer process functions
> > > as a buffering process that always gathers the requests of the clients.
> > > Yes. In another scenario, the core database can be designed out of
parallel
> > > processes that serve multiple queries at the same time. You can get clear
> > > design patterns using processes and channels.
> >

> > But that's the point. I wanted to see multiple transactions being
> > served in parallel. The fact that clients don't necessarily have to wait
> > is really a side point. Sure, the result involves no explicit
syncronization,
> > but you've effectively single-threaded the access to the shared data
> > structure and removed all concurrency and parallelism.
>

> There is single-threaded access to the shared data structure, but why would
> this remove all concurreny and parallelism? There is one process serving the
> shared data to all processes that are interrested in the data. Multiple
> transactions can be served when each client has separate channels to the
> database. The database contains many concurrent processes (threads)
> executing parts of that transaction. Maybe I miss your point?

Without presuming to speak for Benjamin, here's one way of looking at it:

1. A really easy way to write a database system would be as a subroutine
library that operates on the database data structures, with no attention
paid to concurrency, locking, etc. A client could compile against this
library and make calls to it. Very easy and efficient, but absolutely
no parallelism can occur: you can only have one client, the client can't
do work while the database is running, the database can't do work while the
client is running.

2. By placing the client and the database in separate processes or threads,
and having them communicate via channels, you get the ability to have
multiple clients. You also get some parallelism: multiple clients can be
doing work at the same time (internal work, not database calls), and the
database can be doing something at the same time as the clients are doing
things. And you get all this without having to worry about locks, semaphores,
and so forth -- groovy. However, you don't yet have any parallelism *within
the database* -- calls to the database are serialized. (As I understand it.)

I ignore the issue of isolating transactions; isolation complicates the
database code, but doesn't necessarily impact the fundumental nature of the
parallel execution.

3. Many commercial database packages can do work on behalf of two or more
clients in parallel. For example, they can chew on two queries simultaneously
on a 2-CPU system, or they have multiple threads internally so that if one
query blocks on a disk access, another query can slide in and start getting
work done. It's not clear to me, and apparently to Benjamin, how this would
be accomplished using channels. It seems like you need data structures that
are shared across threads, which implies the use of explicit locks or some
similar nastiness.

So: (2) is better than (1), but not as good as (3), in terms of available
parallelism. It's clear how to do (2) with channels, granted. It's not
clear (to me, at least) how to do (3) with channels. Can (3) be done with
channels? If so, can you sketch the technique?

-- Steve Newman

Benjamin Gamsa

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In article <87f5sk$6nd$1...@dinkel.civ.utwente.nl>,
Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:
>
>There is single-threaded access to the shared data structure, but why would
>this remove all concurreny and parallelism? There is one process serving the
>shared data to all processes that are interrested in the data. Multiple
>transactions can be served when each client has separate channels to the
>database. The database contains many concurrent processes (threads)
>executing parts of that transaction. Maybe I miss your point?
>

Maybe I miss yours. The idea behind a (traditional) parallel database
implementation is that different threads can access the same database
at the same time (on different processors in a multiprocessor machine)
by locking just those parts of the database that they need to access.

My understanding of your description is that the clients can run in
parallel and submit requests to a single database process that processes
each request in sequence and then returns the result to the corresponding
client. This sounds to me like a sequential database server implementation.


Greg Alexander

unread,
Feb 4, 2000, 3:00:00 AM2/4/00
to
In article <87f5sk$6nd$1...@dinkel.civ.utwente.nl>, Gerald Hilderink wrote:
>
>Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
>news:2000Feb4.0...@jarvis.cs.toronto.edu...
>> In article <87egho$m4j$1...@dinkel.civ.utwente.nl>,

>> Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:
>> >
>> >Steve Newman <sne...@acm.no.spam.org> wrote in message
>> >news:snewman-0302...@cm20816640181.cableco-op.com...
>> >> In article <87cugp$opc$1...@dinkel.civ.utwente.nl>, "Gerald Hilderink"

>> >> <g.h.hi...@el.utwente.nl> wrote:
>> >>
>> >> > Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
>> >> > news:2000Feb3.1...@jarvis.cs.toronto.edu...
>> >> >
>> >> > > I'm curious how one would implement, for example, a simple
>concurrent
>> >> > > database system, with actual parallelism in the transactions.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >The scenario I described was based on multiple clients and one database
>in
>> >parallel. The database serves one request at the time, but in the mean
>while
>> >the clients can continue. The multiplexer process functions as a
>buffering
>> >process that always gathers the requests of the clients.
>> >Yes. In another scenario, the core database can be designed out of
>parallel
>> >processes that serve multiple queries at the same time. You can get clear
>> >design patterns using processes and channels.
>>
>> But that's the point. I wanted to see multiple transactions being
>> served in parallel. The fact that clients don't necessarily have to wait
>> is really a side point. Sure, the result involves no explicit
>syncronization,
>> but you've effectively single-threaded the access to the shared data
>> structure and removed all concurrency and parallelism.
>>
>
>There is single-threaded access to the shared data structure, but why would
>this remove all concurreny and parallelism? There is one process serving the
>shared data to all processes that are interrested in the data. Multiple
>transactions can be served when each client has separate channels to the
>database. The database contains many concurrent processes (threads)
>executing parts of that transaction. Maybe I miss your point?

I believe the assertion is that the low-level database is inherently
concurrent, not just the things using information from it.

Gerald Hilderink

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to

Bill Todd <bill...@foo.mv.com> wrote in message
news:87f9fk$224$1...@pyrite.mv.net...

>
> Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
> news:2000Feb4.0...@jarvis.cs.toronto.edu...
>
> ...

>
> > But that's the point. I wanted to see multiple transactions being
> > served in parallel. The fact that clients don't necessarily have to
wait
> > is really a side point. Sure, the result involves no explicit
> syncronization,
> > but you've effectively single-threaded the access to the shared data
> > structure and removed all concurrency and parallelism.
>
> Agreed. And, with apologies from one who has not followed this
conversation
> up to this point and may be confused, if the 'processes' you're talking
> about are Unix-style processes (i.e., executing in independent address
> spaces) rather than logical sequential processes executing as threads in
the
> same address space, the context-switching overhead in the database
component
> (if it uses multiple Unix-style processes) will be unacceptable in many
> applications.
>
> - bill
>
>

In this context, processes are active objects, i.e., objects that
encapsulate their own single thread of control and their own state.
Processes may not change states of other processes directly and the threads
within processes may not cross each other except in special synchronization
constructs, such as in channels where they get synchronized. Processes can
have influence on the state of other processes through channel communication
(by passing messages) when both the sender and receiver are committed on
communication (event). So yes they execute in their independent address
space (as part of the program's address space). This description also seems
to fit Unix-style processes, but in our context, processes belong to one
concurrent program that has control of its own processes. There are other
differences, such as; a process is an abstract entity that can be expresses
within a programming language, or a process can be described by its simpler
processes (refinement).
Processes and channels are abstract things because they encapsulate detailed
low-level multithreading and processes and channels are very useful for
designing and reasoning about your software. A very nice thing is, is that
the low-level principles of thread and thread synchronization are still
there but are nicely divided and hidden into fundamental design patterns.
Thus processes run at the lightning speed of threads. Some people call them
lightweight processes and Unix-style processes are heavyweight processes.
Also, some people use the terms 'thread' to refer to lightweight threads and
'processes' to refer to heavyweight threads, whereby lightweight threads
provide faster inter-task communication via a shared "global" space, but
offer weaker encapsulation. In our context, the latter also applies for
processes with channels, hence the name lightweight process. Maybe it
clarifies when I say: "A channel can be seen as a name for a design pattern
(or wrapper) of a 'synchronized shared variable'". It is a wrong assumption
that processes with channels is a slow concept.
It's actually a shift in paradigm. My experience is that concurrency, based
on thread programming, breaks with the OO paradigm, whereas the CSP model
(processes, composition constructs and channels) offers a 'glue logic'
between threads and objects.
For example, I can easily model concurrency in UML with processes (active
objects) and channels (shared passive objects), but I have lots of trouble
with the use of threads and thread synchronization constructs (e.g.,
semaphores and monitors). Aside, you may agree that reverse engineering of a
concurrent program (in UML) is much easier with processes and channels than
with threads and thread synchronization constructs.

We need threads, but we also need another concurrency paradigm -- a paradigm
that is based on processes and channels.

Gerald.


Gerald Hilderink

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to

Steve Newman <sne...@acm.no.spam.org> wrote in message
news:snewman-0402...@cm20816640181.cableco-op.com...

> In article <87f5sk$6nd$1...@dinkel.civ.utwente.nl>, "Gerald Hilderink"
> <g.h.hi...@el.utwente.nl> wrote:
>
> > Benjamin Gamsa <b...@eecg.toronto.edu> wrote in message
> > news:2000Feb4.0...@jarvis.cs.toronto.edu...
> >
> > > In article <87egho$m4j$1...@dinkel.civ.utwente.nl>,
> > > Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:
> > >
> > > > The scenario I described was based on multiple clients and one
database
> > > > in parallel. The database serves one request at the time, but in the
> > > > mean while the clients can continue. The multiplexer process
functions
> > > > as a buffering process that always gathers the requests of the
clients.
> > > > Yes. In another scenario, the core database can be designed out of
> parallel
> > > > processes that serve multiple queries at the same time. You can get
clear
> > > > design patterns using processes and channels.
> > >
> > > But that's the point. I wanted to see multiple transactions being
> > > served in parallel. The fact that clients don't necessarily have to
wait
> > > is really a side point. Sure, the result involves no explicit
> syncronization,
> > > but you've effectively single-threaded the access to the shared data
> > > structure and removed all concurrency and parallelism.
> >
> > There is single-threaded access to the shared data structure, but why
would
> > this remove all concurreny and parallelism? There is one process serving
the
> > shared data to all processes that are interrested in the data. Multiple
> > transactions can be served when each client has separate channels to the
> > database. The database contains many concurrent processes (threads)
> > executing parts of that transaction. Maybe I miss your point?
>

Yes, I belief it is possible with channels. Sketching the technique depends
on some factors.

Every process can be a sender or receiver or both a sender and receiver at
the same time (performing input and output in parallel). Channels copy
messages from sender to receiver. It can do copying by value (by sending
data) or copying by reference (by sending a pointer). In the first case data
is protected and can be used exclusively by processes without
synchronization and the latter case allows sharing data. However, shared
data must explicitly be synchronized to guarantee validity when two or more
processes can access the data simultaneously. I prefer the first case,
because it is safer and cleaner. The latter case can be much faster when
lots of data is sent at once (on a shared memory system), but one must be
careful. However, you can use this safely when the sender releases the
reference to the shared object after it has been passed to the receiver. The
receiver will be the only process referring to the object.

When multiple queries are processed in parallel then nothing is alarming
when they use distinct records from the database. But when records are
shared and used by other query processes then some memory coherency protocol
must be applied. For example, if one process updates a record that was also
used by another process then this process must be notified that the data was
updated. This can be difficult stuff.
I can imagine that each query process must claim records when they are in
use by the process and must released them when the query process is ready.
A query process that wants to use an already claimed record must wait until
it becomes available. A special process could take care of the claiming and
releasing of the data so that the content of the database is always
guaranteed to be consistent. In this case, copying by reference could be
used efficiently here. So, the records do not explicitly have to be
synchronised by some nasty thread synchronization construct.

A very useful technique is to sketch a data-flow model of your database
engine. Specify the tasks as processes (bubbles) and specify the
transactions as data-flows (arrows) between the processes. You start at a
highly abstract level and you can model each process further by a sub
data-flow model (refinement). Remember that in a data-flow model no methods
are invoked on processes by other processes, only messages are passed
between processes. You must choose between copying records and sharing
records for each process. If you succeed to design such data-flow model and
that also incorporates a memory coherency protocol then you can implement
your design with processes (represented by bubbles) and channels
(represented by arrows).
The result will be a natural design of data-flow model(s) containing a few
or many concurrent processes with channels. This design will be your
concurrent implementation skeleton of processes and the next step is to
design/implement the 'sequential' processes.

Gerald.

Kaz Kylheku

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
On Mon, 7 Feb 2000 01:48:06 +0100, Gerald Hilderink
<g.h.hi...@el.utwente.nl> wrote:
>In this context, processes are active objects, i.e., objects that
>encapsulate their own single thread of control and their own state.
>Processes may not change states of other processes directly and the threads
>within processes may not cross each other except in special synchronization
>constructs, such as in channels where they get synchronized. Processes can
>have influence on the state of other processes through channel communication
>(by passing messages) when both the sender and receiver are committed on
>communication (event). So yes they execute in their independent address
>space (as part of the program's address space).

And of course, you are going to insist that switching address spaces
as part of the context switch is cheap too!

Maybe on some specially tuned academic OS; but what if you are writing
for mainstream operating systems?

>clarifies when I say: "A channel can be seen as a name for a design pattern
>(or wrapper) of a 'synchronized shared variable'". It is a wrong assumption
>that processes with channels is a slow concept.

I think that it's an excellent assumption if you are targetting certain types
of platforms that are used a great deal in the real world.

I would put very large subsystems into their own processes; that is to say,
chunking into processes is at a very high level. Entire services or middleware
components might go into their own process. But not tightly cooperating
subsystems.

Continuing with my communication protocol example, I wouldn't put the layers
into separate processes with their own address spaces! But an entire
piece of middleware built around the protocol might be a candidate.

Processes *are* nice for fault isolation and debugging in general, there
is no doubt.

>It's actually a shift in paradigm. My experience is that concurrency, based
>on thread programming, breaks with the OO paradigm, whereas the CSP model

Individual mileage varies. I find that concurrency based on thread programming
enhances the OO paradigm. Objects collaborate as usual, except using
multiple threads.

>(processes, composition constructs and channels) offers a 'glue logic'
>between threads and objects.

>For example, I can easily model concurrency in UML with processes (active
>objects) and channels (shared passive objects), but I have lots of trouble
>with the use of threads and thread synchronization constructs (e.g.,
>semaphores and monitors).

I tend to hide synchronization and mutual exclusion constructs in class
implementations. So the design ends up being focused on patterns of
collaboration through abstract interfaces.

That some object has an internal mutex is an implementation detail to me; the
external description of that object says that it is thread safe, and follows
certain rules, etc.

Here is a typical interaction; one object asks another to do something, and the
other object balks because it is not ready. Later, the object that was acted
upon spontaneously invokes a method one the actor to indicate that it is ready.

>Aside, you may agree that reverse engineering of a
>concurrent program (in UML) is much easier with processes and channels than
>with threads and thread synchronization constructs.

I'd say that you are going beyond reverse engineering if you take a program
that isn't divided into processes with separate address spaces and produce a
design for that program in terms of such processes; by then you may have
already synthesized an essentially new design.

Bruce Hoult

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
In article <slrn89snn...@ashi.FootPrints.net>,
k...@ashi.footprints.net wrote:

> And of course, you are going to insist that switching address spaces
> as part of the context switch is cheap too!
>
> Maybe on some specially tuned academic OS; but what if you are writing
> for mainstream operating systems?

There is no reason that it has to be expensive. On the processor family
I'm most familiar with (PowerPC), all you have to do is change the 16
segment registers (possibly fewer) and maybe the BAT registers (though
they probably point the same place all the time). In particular, you
don't need to touch the page tables or flush the caches.

If there was some reason that you really need ultra-fast switches, I can't
see any reason that you couldn't make an architecture with just a single
"process ID" register that selected the address space -- but if you needed
that then you'd also want to have it select the appropriate bank of
user-mode registers at the same time. As it is, switching address spaces
is already about as fast as saving and restoring registers.

-- Bruce

Bill Todd

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to

Kaz Kylheku <k...@ashi.footprints.net> wrote in message
news:slrn89snn...@ashi.FootPrints.net...

> On Mon, 7 Feb 2000 01:48:06 +0100, Gerald Hilderink
> <g.h.hi...@el.utwente.nl> wrote:
> >In this context, processes are active objects, i.e., objects that
> >encapsulate their own single thread of control and their own state.
> >Processes may not change states of other processes directly and the
threads
> >within processes may not cross each other except in special
synchronization
> >constructs, such as in channels where they get synchronized. Processes
can
> >have influence on the state of other processes through channel
communication
> >(by passing messages) when both the sender and receiver are committed on
> >communication (event). So yes they execute in their independent address
> >space (as part of the program's address space).
>
> And of course, you are going to insist that switching address spaces
> as part of the context switch is cheap too!

I could easily be wrong, but my interpretation of the above was that the
'independent address spaces' were independent by virtue of convention (call
it etiquette - no object messes directly with any data but its own internal
state) rather than by virtue of executing in different logical address
spaces - i.e., that they were in fact threads within a single process (at
least if you wanted them to be).

- bill


Kaz Kylheku

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
On Mon, 07 Feb 2000 19:41:34 +1300, Bruce Hoult <bruce...@pobox.com> wrote:
>In article <slrn89snn...@ashi.FootPrints.net>,
>k...@ashi.footprints.net wrote:
>
>> And of course, you are going to insist that switching address spaces
>> as part of the context switch is cheap too!
>>
>> Maybe on some specially tuned academic OS; but what if you are writing
>> for mainstream operating systems?
>
>There is no reason that it has to be expensive. On the processor family
>I'm most familiar with (PowerPC), all you have to do is change the 16
>segment registers (possibly fewer) and maybe the BAT registers (though
>they probably point the same place all the time). In particular, you
>don't need to touch the page tables or flush the caches.

That's because data caches on modern architectures tend to work over physical
memory. So they are not affected by the change of virtual address space.

However, what about the virtual address *translation* cache? That must surely be
flushed. Unless your hardware has a jukebox of these, so it can swap them in
and out. (But if your design is sufficiently ``thread happy'', that resource
will run out on you too. ;)

On architectures that don't have this, the processor has to, for each memory
access for which no translation is in the cache, walk the hiearchical page
table data structures. On some architectures, operating-system supplied handler
code is executed to handle translation cache misses.

>If there was some reason that you really need ultra-fast switches, I can't

Among such reasons would be the use of a programming paradigm that requires
them, like excessive use of rendezvous. :)

You can circumvent the need for ultra fast switches by dispatching an action
directly within the thread that wants it done.

Well designed programs tend to minimize voluntary context switches by
making threads run until their quantum expires, as much as possible.

>see any reason that you couldn't make an architecture with just a single
>"process ID" register that selected the address space -- but if you needed
>that then you'd also want to have it select the appropriate bank of
>user-mode registers at the same time. As it is, switching address spaces
>is already about as fast as saving and restoring registers.

Even if that is the case, these are still not as fast as a procedure call.
Of course, a procedure call is not without risks; it may take too long.
Certain threads may not tolerate lengthy, or very variable calls into another
subsystem, because, say, they must provide a predictable response time to some
event, or achieve decent throughput in terms of responses in a given period.
In that case, I'd probably eat the context switch to win predictability;
that is, dispatch another thread so that the response thread can short circuit
back quickly to processing another event. But I wouldn't use rendezvous.
Rather, I would queue work for the other thread so that in one context switch,
a lot of processing would be transferred over to that thread, which could then
keep busy for a good portion of its timeslice. Some operating systems use
threads this way for, e.g., delayed driver processing that cannot take place in
an interrupt service routine.

Anyway, I'd tend to say away from designs that depend on architecture-specific
rationalizations like ``address space switches are cheap on my system''.
If you take into account a broad variety of architectures, context switching
is, in practice, several orders of magnitude more expensive than a direct
subroutine call. In turn, context switching between address spaces is probably
another order of magnitude slower than context switching among threads, as a
conservative rule of thumb, due to the need to reload the aforementioned
address translation caches.

Jan Vorbrueggen

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
k...@ashi.footprints.net (Kaz Kylheku) writes:

> However, what about the virtual address *translation* cache? That must surely
> be flushed.

Ever heard about TLBs with address space IDs? Scratch that "surely". And at
the rate the transistor budget is growing, the architects will have to eat
another pipeline stage for TLB lookup in order to make them larger.

> > As it is, switching address spaces
> >is already about as fast as saving and restoring registers.
>
> Even if that is the case, these are still not as fast as a procedure call.

That's irrelevant. The correct comparison is a thread-switch.

> Certain threads may not tolerate lengthy, or very variable calls into another
> subsystem, because, say, they must provide a predictable response time to
> some event,

I'd say such a situation is a design error, for the first part of your
sentence. The second is a valid concern, but given that "all programming
is an exercise in caching", you are hard put to do hard real-time scheduling
on modern processors anyway.

> In turn, context switching between address spaces is probably
> another order of magnitude slower than context switching among threads, as a
> conservative rule of thumb, due to the need to reload the aforementioned
> address translation caches.

See above. It's just that nobody seems to bother to improve the
context-switching code in commercial OSes.

Jan

Terje Mathisen

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
Zalman Stern wrote:
>
> In comp.arch Terje Mathisen <Terje.M...@hda.hydro.com> wrote:
> : I have been looking for a clean way (or just any kind of way) to
> : implement this kind of meta-api:
>
> : An API which defines and propagates all the kinds of limitations and
> : dependencies of a piece of code, i.e. if I use my calendar algorithm as
> : an example, some way to tell all uses of that code that integers have to
> : act as if using two's complement arithmetic, and must support at least
> : 26 bit numbers.
>
> These seem to be static constraints. For the case Bruce mentioned, you need
> to tag functions with a property (can move memory - CMM) and then tell the
> compiler how to combine properties (if a function calls another CMM function
> then that function must also be CMM). At that point a large number of
> really serious Macintosh bugs would have become static compilation errors
> through use of a more rigorous programming style.

Thanks, that's a much better description of the problem I'd like to
solve!

> This is into the realm of intentional or perhaps metaprogramming. Projects

Meta-programming: Definitely.

How to convey information which is logically outside the expressible
syntax of the language being used.

Very soon, this can hit Mr Gödel's limits. :-(

> like Open C++ (which is a C++ compiler with a standard extension interface)
> allow you to do this.

Thanks, I'll have to take alook at that!

Terje

--
- <Terje.M...@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"

Gerald Hilderink

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
Bill Todd <bill...@foo.mv.com> wrote in message
news:87lrgs$rmr$1...@pyrite.mv.net...

>
> Kaz Kylheku <k...@ashi.footprints.net> wrote in message
> news:slrn89snn...@ashi.FootPrints.net...
> > On Mon, 7 Feb 2000 01:48:06 +0100, Gerald Hilderink
> > <g.h.hi...@el.utwente.nl> wrote:
> > >In this context, processes are active objects, i.e., objects that
> > >encapsulate their own single thread of control and their own state.
> > >Processes may not change states of other processes directly and the
> threads
> > >within processes may not cross each other except in special
> synchronization
> > >constructs, such as in channels where they get synchronized. Processes
> can
> > >have influence on the state of other processes through channel
> communication
> > >(by passing messages) when both the sender and receiver are committed
on
> > >communication (event). So yes they execute in their independent address
> > >space (as part of the program's address space).
> >
> > And of course, you are going to insist that switching address spaces
> > as part of the context switch is cheap too!
>
> I could easily be wrong, but my interpretation of the above was that the
> 'independent address spaces' were independent by virtue of convention
(call
> it etiquette - no object messes directly with any data but its own
internal
> state) rather than by virtue of executing in different logical address
> spaces - i.e., that they were in fact threads within a single process (at
> least if you wanted them to be).
>
> - bill
>

Yes Bill, this is correct.

In parallel progamming languages such as in Ada and occam, processes are
only types or interfaces that follow certain rules. These rules demand that
process types must have their own independent logical work space. The
compiler checks these rules and generates errors when these rules are
violated. The compiler then generates code of the processes using one
address space.

Kaz Kylheku <k...@ashi.footprints.net> wrote in message
news:slrn89snn...@ashi.FootPrints.net...
> On Mon, 7 Feb 2000 01:48:06 +0100, Gerald Hilderink
> <g.h.hi...@el.utwente.nl> wrote:
> >In this context, processes are active objects, i.e., objects that
> >encapsulate their own single thread of control and their own state.
> >Processes may not change states of other processes directly and the
threads
> >within processes may not cross each other except in special
synchronization
> >constructs, such as in channels where they get synchronized. Processes
can
> >have influence on the state of other processes through channel
communication
> >(by passing messages) when both the sender and receiver are committed on
> >communication (event). So yes they execute in their independent address
> >space (as part of the program's address space).
>

> And of course, you are going to insist that switching address spaces
> as part of the context switch is cheap too!

On most processors, for example on a x86 in protected mode (32-bit flat
memory), a context save, a stack pointer switch and a context restore are
enough to switch from one process (thread) to the next process (thread).
Also switching segment registers may do the job.

>
> Maybe on some specially tuned academic OS; but what if you are writing
> for mainstream operating systems?

As far as I know, all 'real-time' operating systems provide fast context
switching.

>
> >clarifies when I say: "A channel can be seen as a name for a design
pattern
> >(or wrapper) of a 'synchronized shared variable'". It is a wrong
assumption
> >that processes with channels is a slow concept.
>
> I think that it's an excellent assumption if you are targetting certain
types
> of platforms that are used a great deal in the real world.
>
> I would put very large subsystems into their own processes; that is to
say,
> chunking into processes is at a very high level. Entire services or
middleware
> components might go into their own process. But not tightly cooperating
> subsystems.
>

Why don't you consider each subsystem or service to be a separate process?
To me, a subsystem is likely to be a process and a process is a subsystem.


> I tend to hide synchronization and mutual exclusion constructs in class
> implementations. So the design ends up being focused on patterns of
> collaboration through abstract interfaces.

The Process, Channel, ChannelInput and ChannelOutput interfaces are abstract
interfaces. Their semantics are derived from the theory of CSP.

There also exists a so-called CallChannel interface that extends the simple
Channel interface to allow method names other than read() and write() for
channels. CallChannels are channels with rendezvous behaviour, but their
interface contains the public methods of the servicing process. Client
processes may only invoke methods of the server process via the CallChannel.
When a client process invokes a method() on the CallChannel and the server
process invokes accept() (both committing in the event) on the CallChannel
then method() will be invoked on the server process by the CallChannel
(call-back). Channels are a bit low-level and can be used for low-level
communication, for example, with hardware. Every time the servicing process
is ready to accept a new message (thus in a safe state) from a client
process, it invokes the accept() method. Processes with channels (including
CallChannels) are ALWAYS thread-safe.

>
> That some object has an internal mutex is an implementation detail to me;
the
> external description of that object says that it is thread safe, and
follows
> certain rules, etc.

Exactly. If an object does not say that it is thread-safe then you must
check this first and/or wrap some synchronization around it. Processes with
channels are ALWAYS thread-safe.

>
> >Aside, you may agree that reverse engineering of a
> >concurrent program (in UML) is much easier with processes and channels
than
> >with threads and thread synchronization constructs.
>
> I'd say that you are going beyond reverse engineering if you take a
program
> that isn't divided into processes with separate address spaces and produce
a
> design for that program in terms of such processes; by then you may have
> already synthesized an essentially new design.

Yes. Indeed, the program must have a notion of processes and channels
otherwise a new design will be the result. But, a program WITH notion of
processes and channels is easy for reverse engineering into small sub-models
for each process. A large concurrent program that is based on threads and
thread synchronization will mostly result in a large complex model.
Splitting up such a model into simpler sub-models is difficult (if not
impossible) for automatic algoritms, because sufficient information is
missing. Process and channel interfaces add information to the program that
can be used by an automitic algorithm. The semantics of processes and
channels can be used to generate dynamic models and a nice side-effect is
that statecharts do not explode (each process is described by its
statechart).

Gerald.

Joe Seigh

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
A point was raised earlier about the ability of CSP channels to handle concurrent
processing. Channels work by sending all of the data in toto. If you want to
have concurrent processing of data, you have to break the data into multiple
parts using multiple channels. As soon as you do that then the coordination
problem becomes equivalent in complexity to that of using normal threading
paradigms. Maybe some problems decompose into a solution nicely under your
paradigm but the exception doesn't prove the rule.

Joe Seigh

Ira D. Baxter

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
If you are interested in metaprogramming, you should look
at the Design Maintenance System vision, which suggests in theory exactly
how to harness such properties.

Baxter, I. 1992. Design Maintenance Systems, Communications of the ACM
35(4), April, 1992, Association for Computing Machinery, New York. (Not
available online).

We are building a system to do this. A first version exists; you can learn
about it at: http://www.semdesigns.com/Products/DMS/DMSToolkit.html
It has quite a ways to go before it can easily express these properties in
practice.

-- IDB

>> In comp.arch Terje Mathisen <Terje.M...@hda.hydro.com> wrote:
>> : I have been looking for a clean way (or just any kind of way) to
>> : implement this kind of meta-api:
>>
>> : An API which defines and propagates all the kinds of limitations and
>> : dependencies of a piece of code, i.e. if I use my calendar algorithm as
>> : an example, some way to tell all uses of that code that integers have
to
>> : act as if using two's complement arithmetic, and must support at least
>> : 26 bit numbers.

>> This is into the realm of intentional or perhaps metaprogramming.
Projects
>
>Meta-programming: Definitely.

pd wilson

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to
Thanks Greg! It's all news to me ;-)

Greg Alexander wrote:
>
> In article <3899B586...@collins.rockwell.FNORD.com>, pd wilson wrote:
> >
> >I would be interested in the citations you mention....
>
> What I have on my desk here is at:
> http://www.bell-labs.com/people/god/public_psfiles/popl97.ps
> however, all of
> http://www.bell-labs.com/people/god/
> is related (it's mostly about VeriSoft and related topics -- apparently
> unrelated to www.verisoft.com).
>
> I also have an article from the ACM Transactions on Computer Systems Vol
> 15 No 4 November 1997, Pages 391-411 by Stefan Savage et al, called
> "Eraser: A Dynamic Data Race Detector for Multithreaded Programs"
> describing a program which checks for violations of mutex locking
> discipline. I was just given these papers as part of a college class I am
> taking, so I am sure there are many more similar papers/projects in
> existence. You might want to check the references at the end of
> those...one of the more famous ones in this field, I guess, is:
>
> Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed
> system. Commun. ACM 21, 7 (July), 558-565.
>
> Though I haven't persnoally seen it. I am by no means an expert. :)
> I don't know how widely these tools are used throughout the field. I
> think that currently most of them are in the stage of "Oh, I just wrote
> this neat program that checks these things, and I ran it on these popular
> publicly available programs and it found these bugs. You're welcome."
> But for all I know there may be standard programs/approaches that practically
> everyone doing serious work on important concurrant systems knows of and
> uses. :)

Kostas Kostiadis

unread,
Feb 7, 2000, 3:00:00 AM2/7/00
to

"Michael Loftus" <mlof...@xford.com> wrote in message
news:389B1330...@xford.com...

[snip]


> > > (b) can simply walk away from a project that turns into a swamp.

[snip]

>
> > (b) is no different than quiting your job...
>
> Interviewer: So, Kostas, explain how you handled that difficult
> issue at your last job.
> Kostas : Quite easily, thank you. I just quit.

"is no different" as I phrased it, implies comparison.
I was just making a point that IT IS NO DIFFERENT in academia
to "walk away" comparing to the industry.
If you are the "quiter" kind of person, it doesn't matter who
you work for (i.e. industry or academia).

Although your example is quite funny (NOT), you missed the
point.

>
> [snip]
> > and you have a fixed amount of resources at your disposal.
> > So do I !!!
>
> Wrong. Resources tend to get cut or disappear. All a part of
> that "real-world crap".
>

Apart from very rare and extreme situations, if your resources
disappear just like that, this is again a sign of crap design/managment
of the overall project (which was my initial point 2 messages ago)

I really wanted to talk about the original thread titled
"Multithreading underlies new development paradigm". I will NOT
continue this industry vs. academia discussion with you because
it is way of topic and a waste of my time (and probably yours).

If you have anything else to say, please reply to me personally
rather than polluting this group, and I will do the same.

> --
>
> Michael Loftus
> (my real address has no 'x')
> Ford Research Labs
>
> *** I am not an official spokesperson for Ford Motor Co. ***

I'd put this last line in capitals if I were you. Although the "***"
do emphasize, I would really make sure :)

Cheers,
K.


Charles Bryant

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to
In article <slrn89sr...@ashi.FootPrints.net>,

Kaz Kylheku <k...@ashi.footprints.net> wrote:
>Well designed programs tend to minimize voluntary context switches by
>making threads run until their quantum expires, as much as possible.

Well designed programs never let their quantum expire because they
have completed their work before that and wait on I/O getting more
work or delivering results. Unless a program is compute bound, or
real-time with bounded work to do in each time unit, if it
experiences quantum expiry frequently it will exhibit a poor response
time to work requests.

>Anyway, I'd tend to say away from designs that depend on architecture-specific
>rationalizations like ``address space switches are cheap on my system''.
>If you take into account a broad variety of architectures, context switching
>is, in practice, several orders of magnitude more expensive than a direct

>subroutine call. In turn, context switching between address spaces is probably


>another order of magnitude slower than context switching among threads, as a
>conservative rule of thumb, due to the need to reload the aforementioned
>address translation caches.

While that's a good idea up to a point, it also exhibits "designing
for inefficiency". This is the principle that, because a particular
way of doing something is (or is perceived to be) inefficient, it is
avoided, which leads to there being no reason to make it more
efficient. So, in this example, if you're afraid of thread context
switch times, and therefore you minimise the number of context
switches, not only is there little incentive to speed up context
switches, but there may be an incentive to do unnecessary work at
each context switch on the grounds that they rarely happen and it's a
convenient place to do the work.

In fact, for many typical processors a thread context switch should
be about the same cost as a function call (unless a user/kernel mode
switch is required).

--
Eppur si muove


Benjamin Gamsa

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to
In article <2000-02-1...@chch.demon.co.uk>,

Charles Bryant <n163067...@chch.demon.co.uk> wrote:
>
>In fact, for many typical processors a thread context switch should
>be about the same cost as a function call (unless a user/kernel mode
>switch is required).
>

I guess it depends on your definition of typical processor and typical
function call.

pla...@nospam-cs.wisc.edu

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to
In comp.arch Zalman Stern <zal...@netcom3.netcom.com> wrote:
> : An API which defines and propagates all the kinds of limitations and
> : dependencies of a piece of code, i.e. if I use my calendar algorithm as
> : an example, some way to tell all uses of that code that integers have to
> : act as if using two's complement arithmetic, and must support at least
> : 26 bit numbers.
> [...]

> This is into the realm of intentional or perhaps metaprogramming. Projects
> like Open C++ (which is a C++ compiler with a standard extension interface)
> allow you to do this.


Not sure if this is completely relevant but Dawson Engler
(now at Stanford) is doing some work on "interface compilation"
where assertions about behavior of library code/data can drive
a compiler.

http://www.cs.stanford.edu/~engler

Manoj


Bill Todd

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to

Charles Bryant <n163067...@chch.demon.co.uk> wrote in message
news:2000-02-1...@chch.demon.co.uk...

> In article <slrn89sr...@ashi.FootPrints.net>,
> Kaz Kylheku <k...@ashi.footprints.net> wrote:
> >Well designed programs tend to minimize voluntary context switches by
> >making threads run until their quantum expires, as much as possible.
>
> Well designed programs never let their quantum expire because they
> have completed their work before that and wait on I/O getting more
> work or delivering results. Unless a program is compute bound, or
> real-time with bounded work to do in each time unit, if it
> experiences quantum expiry frequently it will exhibit a poor response
> time to work requests.

But programs designed for optimal performance have the thread issue the I/O
(or other potentially lengthy request) asynchronously (to be completed as a
new work item) and pick up another work item rather than suspend - hence
running out the thread's quantum (as long as additional work items are
available to process) without the consequence of poor response time and
allowing the number of threads to be matched to the number of available
processors rather than running a much higher number (to allow for the fact
that most of them are waiting for something most of the time) and thus
incurring potentially much greater synchronization overhead during the
inevitable resource-allocation traffic jams at that multi-programming level.

- bill


M Sweger

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to
Bill Todd (bill...@foo.mv.com) wrote:

: Charles Bryant <n163067...@chch.demon.co.uk> wrote in message

: - bill


So what threading, channel etc.etc algorithm does the multhreading
computer use at,

http://www.tera.com

Is it more efficient at the hardware level vs. software?


--
Mike,
mik...@whiterose.net


Harry George

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to
Wouldn't Modula-3's INTERFACE mechanism be applicable here? E.g.,
with:

TYPE CalendarInteger = BITS 26 FOR [-33550000..+33550000];

pla...@nospam-cs.wisc.edu writes:

--
Harry George
hgg...@seanet.com

Zalman Stern

unread,
Feb 12, 2000, 3:00:00 AM2/12/00
to
In comp.arch Harry George <hgg...@seanet.com> wrote:
: Wouldn't Modula-3's INTERFACE mechanism be applicable here? E.g.,
: with:

: TYPE CalendarInteger = BITS 26 FOR [-33550000..+33550000];

There is no promise in the language spec that a 26-bit packed type (BITS
FOR) is supported.

More importantly, the insteresting constraint here, namely the subrange
restriction, only works because it is supported primitively by the
language. (As it is in many other languages.) Modula 3 has no way to
annotate interfaces with more complex constraints, such as the "can move
memory" case I discussed before.

That said, Modula 3 is a far better language than most.

-Z-

Zalman Stern

unread,
Feb 13, 2000, 3:00:00 AM2/13/00
to
In comp.arch Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:
: On most processors, for example on a x86 in protected mode (32-bit flat

: memory), a context save, a stack pointer switch and a context restore are
: enough to switch from one process (thread) to the next process (thread).
: Also switching segment registers may do the job.

If you are switching address spaces, you must count the cost of TLB misses
that usually occur right after the switch.

-Z-

Terje Mathisen

unread,
Feb 14, 2000, 3:00:00 AM2/14/00
to
pla...@nospam-cs.wisc.edu wrote:
> Not sure if this is completely relevant but Dawson Engler
> (now at Stanford) is doing some work on "interface compilation"
> where assertions about behavior of library code/data can drive
> a compiler.
>
> http://www.cs.stanford.edu/~engler

I've taken a look at his web page, quite interesting!

Thanks for the link.

Bruce Hoult

unread,
Feb 15, 2000, 3:00:00 AM2/15/00
to
In article <8876lf$vgq$1...@nntp5.atl.mindspring.net>, Zalman Stern
<zal...@netcom15.netcom.com> wrote:

> In comp.arch Gerald Hilderink <g.h.hi...@el.utwente.nl> wrote:

> : On most processors, for example on a x86 in protected mode (32-bit flat


> : memory), a context save, a stack pointer switch and a context restore are
> : enough to switch from one process (thread) to the next process (thread).
> : Also switching segment registers may do the job.
>

> If you are switching address spaces, you must count the cost of TLB misses
> that usually occur right after the switch.

I don't see why a task switch that executes a given number of instructions
to perform a given algorithm should be any worse wth regard to TLB misses
than a function call that executes the same number of instructions to
perform the same algorithm.

Perhaps you are assuming that task switches in general do more work than
function calls, and that therefore the cache and TLB get more completely
trashed by task switches, but surely that is precisely the thinking that
this thread is trying to get rid of? (and the kind of thinking that leads
to slow task switches)

-- Bruce

Anne & Lynn Wheeler

unread,
Feb 15, 2000, 3:00:00 AM2/15/00
to

task switch can involve different address space table ... in which
case a processor may use completely different TLB entries ... even if
they resolve to the same physical locations. Futhermore, TLBs may only
support a limited depth of address space tables (i.e. each TLB entry
is tagged with an address space index ... and there are only a limited
number of active address space indexes). If the address space index
table is full then a table entry has to be scavanged which also
involves first flushing all the TLB entries tagged/associated with that
table entry.

It is possible to have virtual page thrashing (constantly replacing
virtual pages in real memory). It is also possible to have cache
thrashing and TLB thrashing. And it is possible to have address space
index table thrashing ... which not only effects the entry in the
address space index table ... but also results in a global TLB purge for
all entries associated/tagged with that table entry.

--
Anne & Lynn Wheeler | ly...@adcomsys.net, ly...@garlic.com
http://www.garlic.com/~lynn/ http://www.adcomsys.net/lynn/

Bruce Hoult

unread,
Feb 15, 2000, 3:00:00 AM2/15/00
to
In article <uu2jbm...@mail.adcomsys.net>, Anne & Lynn Wheeler
<ly...@adcomsys.net> wrote:

> task switch can involve different address space table

I would have said that a task switch "usually" (or at least "very often")
involves switching to a different address space.


>... in which
> case a processor may use completely different TLB entries ... even if
> they resolve to the same physical locations. Futhermore, TLBs may only
> support a limited depth of address space tables (i.e. each TLB entry
> is tagged with an address space index ... and there are only a limited
> number of active address space indexes). If the address space index
> table is full then a table entry has to be scavanged which also
> involves first flushing all the TLB entries tagged/associated with that
> table entry.

All that might be true, on certain machines, at certain times. Or it
might never be true on certain machines. For example, there might be no
such thing at the hardware as an "address space table". I have in front
of me MPC750UM/AD, the "MPC750 RISC Microprocessor User's Manual" and I
see no signs of any such thing as an address space index in either the
segment registers (or BATs), the TLB, or the cache.

On a task switch, you need to load 16 32-bit segment register with values
appropriate to the incoming task. These control, for each 256 MB address
range, the mapping to a 52-bit virtual memory space, plus the access
protections for that address range for that task. You can't run out of
address space (or process) indexes because there aren't any. If you want
to share TLB entries between tasks then that is easily arranged. You
don't need to flush either the TLB or the cache on a task switch.


> It is possible to have virtual page thrashing (constantly replacing
> virtual pages in real memory). It is also possible to have cache
> thrashing and TLB thrashing. And it is possible to have address space
> index table thrashing ... which not only effects the entry in the
> address space index table ... but also results in a global TLB purge for
> all entries associated/tagged with that table entry.

All of those things are possible. They happen when the size of the
computation being performed by the machine as a whole exceeds some limit.
But task switching as such has *no* effect on VM paging, cache thrashing
or TLB thrashing. When you're writing a program and deciding how to
partition it, the choice of using functions or tasks is neutral with
regard to these things. A task switch will take longer than a function
call -- you need to save all the user mode registers instead of just some,
and you need a trip to supervisor mode and back, and you need to change
about as many registers again as the user mode ones. But that's a pretty
small constant multiple of the work needed to make a function call.

I'm sure other machines have more expensive task switches and suffer from
the problems you list, but aren't those things *necessary* consequences of
task switching?

-- Bruce

Peter Ludemann

unread,
Feb 15, 2000, 3:00:00 AM2/15/00
to
Bruce Hoult wrote:
>
> In article <uu2jbm...@mail.adcomsys.net>, Anne & Lynn Wheeler
> <ly...@adcomsys.net> wrote:
>
> > task switch can involve different address space table
>
> I would have said that a task switch "usually" (or at least "very often")
> involves switching to a different address space.

Not on AS/400 where everything runs in one address space and pointer creation is a privileged operation (pointer creation does access validation). Nor on Nortel's DMS-100 (telephone switch) circa 1980, where again everything was in the same address space. [For both machines, task switching was very cheap; the main cost being in cache misses due to non-locality.] I'm sure there are other examples of machines and operating systems that don't do address space switching on task switch (especially if you count multi-threading as a kind of multi-tasking).

- peter

Anne & Lynn Wheeler

unread,
Feb 15, 2000, 3:00:00 AM2/15/00
to

bruce...@pobox.com (Bruce Hoult) writes:

> All that might be true, on certain machines, at certain times. Or it
> might never be true on certain machines. For example, there might be no
> such thing at the hardware as an "address space table". I have in front
> of me MPC750UM/AD, the "MPC750 RISC Microprocessor User's Manual" and I
> see no signs of any such thing as an address space index in either the
> segment registers (or BATs), the TLB, or the cache.
>
> On a task switch, you need to load 16 32-bit segment register with values
> appropriate to the incoming task. These control, for each 256 MB address
> range, the mapping to a 52-bit virtual memory space, plus the access
> protections for that address range for that task. You can't run out of
> address space (or process) indexes because there aren't any. If you want
> to share TLB entries between tasks then that is easily arranged. You
> don't need to flush either the TLB or the cache on a task switch.

yep, typically if there is virtual memory there is some sort of TLB as
high speed cache of virtual->real mapping, either hardware loaded
(which is more likely with page tables) or software managed (more
likely with inverted tables). In either case the TLB entries have some
sort of address space affinity. In the case of inverted tables w/801,
address space affinity is at the segment register level ... and the
"tag/association" is the value loaded into the segment register (40-28
= 12 bit value in the case of ROMP & 52-28=24 bit value in case of
RIOS/power)

The sharing issue that I had with the design back in the mid '70s was
that it had so few concurrent sharable objects (i.e. registers). The
original design point had the segment registers instructions inline
w/o having to cross a protection domain (i.e. as easy as base
registers). Moving to more open software systems required that any
segment register chatter activity be moved into a protection domain
with kernal calls. In the original design point of the architecture it
was straigtforward to have an application with dozens of (almost)
concurrent shared objects. A partial solution was to go to composite
shared objects (within a single 256mbyte object) ... but then it
became a administrative packaging issue with other limitatins.


misc. references

http://www.garlic.com/~lynn/98.html#25
http://www.garlic.com/~lynn/98.html#26
http://www.garlic.com/~lynn/98.html#27

misc. other 801/romp/rios/power references:

http://www.garlic.com/~lynn/94.html#47
http://www.garlic.com/~lynn/95.html#5
http://www.garlic.com/~lynn/95.html#9
http://www.garlic.com/~lynn/95.html#11
http://www.garlic.com/~lynn/96.html#15
http://www.garlic.com/~lynn/97.html#5
http://www.garlic.com/~lynn/99.html#64
http://www.garlic.com/~lynn/99.html#67
http://www.garlic.com/~lynn/99.html#129

Benjamin Gamsa

unread,
Feb 15, 2000, 3:00:00 AM2/15/00
to
In article <brucehoult-15...@bruce.bgh>,

Bruce Hoult <bruce...@pobox.com> wrote:
>
>On a task switch, you need to load 16 32-bit segment register with values
>appropriate to the incoming task. These control, for each 256 MB address
>range, the mapping to a 52-bit virtual memory space, plus the access
>protections for that address range for that task. You can't run out of
>address space (or process) indexes because there aren't any. If you want
>to share TLB entries between tasks then that is easily arranged. You
>don't need to flush either the TLB or the cache on a task switch.

Of course, the segment registers essentially act as address space
indices, since they separate common virtual addresses into distinct
entries in the TLB. They are also more than just that; e.g., if you
can arrange to share segments (such as for shared libraries).

As for the extra costs of cross-address-space vs. in-address-space
calls, you have to factor in the separate stacks, occupying more TLB
slots, and the fact that functions in separate address spaces can't be
packed into a single page, unless they're in a shared library. Of
course there are also secondary costs, such as the fact that the
compiler can't inline the function.

Bruce Hoult

unread,
Feb 16, 2000, 3:00:00 AM2/16/00
to
In article <38A91202...@earthlink.net>, Peter Ludemann
<plud...@earthlink.net> wrote:

Of course. The one I'm typing this on (MacOS), for one (it does, however,
have to copy around a bunch of low memory globals. ick). Plus versions
of Windows prior to Win95. The dreaded Qantel I wrote about a few weeks
ago.

-- Bruce

stephane...@my-deja.com

unread,
Feb 22, 2000, 3:00:00 AM2/22/00
to
In article <handleym-020...@handma3.apple.com>,
hand...@ricochet.net (Maynard Handley) wrote:
> In article <3898062b.0@seralph9>, "Kostas Kostiadis"
<kko...@essex.ac.uk> wrote:
>
> >"Roger A. Faulkner" <r...@sunraf.Sun.COM> wrote in message
> >news:87698j$ma4$1...@engnews2.Eng.Sun.COM...
> >
> >[snip]
> >
> >> My advice to anyone considering multithreaded programming is,
don't.
> >> Find a simpler way to accomplish your task and enjoy life. If the
> >> demands of the task require multithreading, then meditate a lot and
> >> prepare yourself for a difficult task. Do not deceive yourself by
> >> thinking it will be easy.
> >>
> >
> >That is great advice...In fact we can take this one step further
> >and also advice anyone considering single-threaded programming to
> >NOT do it !!! There are lots of problems there too.
> >
> >> You will be faced with mysterious race conditions bugs that only
> >> occur when someone runs your program on a machine under heavy load.
> >> You will tear you hair. You will scream. You will resort to
closing
> >> bug reports as "cannot reproduce". If you feel strongly about
providing
> >> a high quality product, you will have a nervous breakdown. If you
don't
> >> care at all, your company will fail and you will be out of a job.
> >>
> >
> >All the above sounds like really crap design to me. It is just that
crap
> >design is a lot easier to implement in a single-threaded program.
> >Also crap implementations (based on crap design) are a lot easier
> >to debug in single-threaded program.
> >
> >So your advice would in fact be: If you can't design OR implement,
> >write single-threaded software. To that I agree 100%...
> >(However I think in that case you will be out of a job soon enough
> >anyway, or you never know....might end up working for MegaSoft :) )
> >
> >> These are the reasons I hate multithreading.
> >> I'm also good at it.
> >
> >You don't say....
>
> I guess in the UK .ac world one gets to work with really smart
programmers who
> (a) know what they are doing and

> (b) can simply walk away from a project that turns into a swamp.
>
> In the commercial world one has to work with other programmers of
limited
> ability, and one does not have the option of walking away as a
disaster
> unfolds. The advice to avoid threads where one possibly can makes
perfect
> sense---it's no different from any other situation where doing things
the
> "cool way" will indeed make the code run 30% faster---and will also
take
> an additional year to code and debug, by which point the extra speed
is a
> moot point.
> Kostas has obviously been lucky enough to work only on well-designed
> projects with smart people. I've had to work on projects using
threads in
> a some-what ad hoc fashion (as sometimes happens in the real world,
the
> 1.0 version is well designed but mission creep in version 2.0 and up
> complicate matters), and it's not as much fun as some people might
think.
>
> Maynard

I must agree with Maynard. Developpers who always developped tend to
forget that there is another side to the medal... Maintenance! When
doing maintenance on a specific product that has had many releases, a
product that your company was not the creator, when the client decides
it wants performance at a low cost, you cannot do things like a
developper. You have to do it with a maintenance minding that a pure
developper cannot understand. Transforming a poorly designed product
that is mono-thread to a multi-thread one, at a low cost with that, is
very painful.

Maybe Kostas is very good (even a genius) in his domain, and as Maynard
mentionned, may have had the luck to work with other geniuses. But
geniuses are rare enough, and that if Kostas finds himself within a
group of regular people, my only guess is that he will suffer from a
superiority complex. The real world, THE REAL ONE, is composed of all
kinds of people, the majority being just average. So average people
make average software. And for average people, multi-threading can be
quite complex to understand.

Nonetheless, I agree with those who say that a good multi-thread design
makes a product a lot less complex. But for that you have to understand
multi-thread very well. And always remember that transforming an
already existing software to multi-thread is more difficult than
developing it from scratch.

Oh, and by the way, before saying that it's our job to make sure that a
client understands it is dangerous to save money when transforming (or
developing) a product to multi-thread, you must understand that many
clients don't care about giving money on the next financial year. All
they have in mind is this year's budget. (My apologies to those who
don't think this way, but I know you are few.) They only want to save
money NOW, no matter the costs later.

Sent via Deja.com http://www.deja.com/
Before you buy.

Zalman Stern

unread,
Feb 23, 2000, 3:00:00 AM2/23/00
to
In comp.arch Bruce Hoult <bruce...@pobox.com> wrote:
: In article <8876lf$vgq$1...@nntp5.atl.mindspring.net>, Zalman Stern
: <zal...@netcom15.netcom.com> wrote:
:> If you are switching address spaces, you must count the cost of TLB misses

:> that usually occur right after the switch.

: I don't see why a task switch that executes a given number of instructions
: to perform a given algorithm should be any worse wth regard to TLB misses
: than a function call that executes the same number of instructions to
: perform the same algorithm.

Function calls do not switch address spaces. If you are context switching
to a thread of control which uses the same address space and the hardware
uses the same resources to map that space, then you are not switching
address spaces. (E.g. using the same segment ID on PowerPC or region ID on
IA64.) If the hardware has large enough caches associated with address
space mappings to handle your task mix, then you will see relatively little
slow down. Though that situation is uncommon in practice and fragile with
respect to using more tasks or changing the context switch behavior or even
memory access patterns. (It is also fragile with respect to hardware as
there is pressure to reduce the size of the innermost TLB structures.)

It may still be worth using address spaces for abstraction and
compartmentalization purposes, but the costs should be measured accurately.

You might be able to get someone to buy you lunch, but there still is no
such thing as a free one.

-Z-

0 new messages