"The Problem with Threads" - IEEE's Computer magazine article

32 views
Skip to first unread message

kimo

unread,
Aug 17, 2006, 11:21:29 PM8/17/06
to
See the link for tables that didn't make it into the text below.


http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1005&path=computer/homepage/0506&file=cover.xml&xsl=article.xsl&

May 2006

Cover Feature
The Problem with Threads
Edward A. Lee
University of California, Berkeley
For concurrent programming to become mainstream, we must discard
threads as a programming model. Nondeterminism should be judiciously
and carefully introduced where needed, and it should be explicit in
programs.

Concurrent programming is difficult, 1yet many technologists predict
the end of Moore's law will be answered with increasingly parallel
computer architectures—multicore or chip multiprocessors (CMPs). 2If
we hope to achieve continued performance gains, programs must be able
to exploit this parallelism.

Automatic exploitation of parallelism in sequential programs, through
either computer architecture techniques such as dynamic dispatch or
automatic parallelization of sequential programs, 3offers one possible
technical solution. However, many researchers agree that these
automatic techniques have been pushed to their limits and can exploit
only modest parallelism. Thus, programs themselves must become more
concurrent.

Understanding why concurrent programming is so difficult can help us
solve the problem. The physical world is highly concurrent, and our
very survival depends on our ability to reason about concurrent
physical dynamics. This reasoning doesn't extend to concurrent programs
because we have chosen abstractions that do not even vaguely resemble
the physical world's concurrency. We have become so used to these
computational abstractions that we have forgotten they are not
immutable. The difficulty of concurrent programming is a consequence of
these abstractions, and if we can let go of them, the problem will be
fixable.
THREADS

In general-purpose software engineering practice, we have reached a
point where one approach to concurrent programming dominates all
others—namely, threads, sequential processes that share memory. They
represent a key concurrency model supported by modern computers,
programming languages, and operating systems. Many general-purpose
parallel architectures in use today—such as symmetric
multiprocessors—are direct hardware realizations of the thread
abstraction.

Some applications can use threads very effectively—for example,
so-called embarrassingly parallel applications that essentially spawn
multiple independent processes such as build tools (PVM gmake) or Web
servers. Given these applications' independence, programming is
relatively easy and the abstraction being used is more like processes
than threads. Where such applications do share data, they do so through
database abstractions, which manage concurrency through such mechanisms
as transactions. However, client-side applications are not so simple.

Threads are not the only possibility for concurrent programming. In
scientific computing, where performance requirements have long demanded
concurrent programming, data-parallel language extensions and
message-passing libraries—such as PVM, MPI, and OpenMP—dominate
over threads for concurrent programming. Computer architectures
intended for scientific computing often differ significantly from
so-called general-purpose architectures. They commonly support vectors
and streams in hardware, for example. However, even in this domain,
concurrent programs remain tedious to write. C and Fortran dominate,
despite a long history of much better data-parallel languages.


Nondeterminism and Threads

>From a fundamental perspective, threads are seriously flawed as a
computation model. To wit,

Let N = { 0, 1, 2, ...} represent the natural numbers and B = {0, 1} be
the set of binary digits. Let Bω be the set of all finite sequences of
bits, and Bω = (N→B) be the set of all infinite sequences of bits,
each of which is a function that maps N into B. Further, let B** = B* U
Bω, which we will use to represent the state of a computing machine,
its potentially infinite inputs, and its potentially infinite outputs.
Finally, let Q denote the set of all partial functions with domain and
codomain B**. Partial functions are functions that may or may not be
defined on each element of their domain.

An imperative machine (A, c) is a finite set A ⊂ Q of atomic actions
and a control function c : B** → N. The set A represents the atomic
actions, typically instructions, of the machine; the function c
represents how instructions are sequenced. We assume that A contains
one halt instruction h ∈ A with the property that

 ∀ b ∈ B**, h(b) = b

That is, the halt instruction leaves the state unchanged.

A sequential program of length m ∈ N is a function

p: N→ A, where

∀n ≥ m, p(n) = h

That is, a sequential program is a finite sequence of instructions
tailed by an infinite sequence of halt instructions. The set of all
sequential programs, which we denote P, is a countably infinite set.

An execution of this program is a thread. It begins with an initial b0
∈ B**, which represents the initial state of the machine and the
potentially infinite input, and for all n ∈ N,

bn+1 = p(c(bn))(bn) (1)

Here, c(bn) provides the index into the program p for the next
instruction p(c(bn)). That instruction is applied to the state bn to
get the next state bn+1. If for any n ∈ N, c(bn) > m, then p(c(bn)) =
h and the program halts in state bn and the state henceforth never
changes. If for all initial states b0 ∈ B a program p halts, then p
defines a total function in Q. If a program p halts for some b0 ∈ B,
then it defines a partial function in Q.


We now get to the core appeal that sequential programs have. Given a
program and an initial state, the sequence given by Equation 1 is
defined. If the sequence halts, then the function computed by the
program is defined. Any two programs p and p' can be compared and be
found equivalent if they compute the same partial function. That is,
they are equivalent if they halt for the same initial states, and for
such initial states their final state is the same. Such a theory of
equivalence is essential for any useful formalism. In this classical
theory, programs that do not halt are all equivalent; this creates
serious problems when applying the theory of computation to embedded
software, where useful programs do not halt.

We lose these essential and appealing properties of programs when
multiple threads are composed. Consider two multithreaded programs, p1
and p2, that execute concurrently. In this case, we replace Equation 1
with the following

bn+1 = pi(c(bn))(bn), where i ∈ { 1, 2} (2)

At each step n, either program can provide the next atomic action.
Consider now whether we have a useful theory of equivalence. That is,
given a pair of multithreaded programs (p1, p2) and another pair (p'1,
p'2), when are these two pairs equivalent? A reasonable extension of
the basic theory defines them to be equivalent if all interleavings
halt for the same initial state and yield the same final state. The
enormous number of possible interleavings makes it extremely difficult
to reason about such equivalence except in trivial cases where, for
example, the state B** is partitioned so that the two programs are
unaffected by each other’s partition.

Even worse, given two programs p and p' that are equivalent when
executed according to Equation 1, if they execute in a multithreaded
environment we can no longer conclude they are equivalent. We must know
about all other threads that might execute—something that may not
itself be well- defined—and we would have to analyze all possible
interleavings. We conclude that no useful theory of equivalence can be
applied to threads.

The core abstraction of computation given by Equation 1, on which all
widely used programming languages are built, emphasizes deterministic
composition of deterministic components. Both the actions and their
sequential composition are deterministic. Sequential execution is,
semantically, function composition—a neat, simple model where
deterministic components compose into deterministic results.

In distributed computing, threads are often not a practical abstraction
because creating the illusion of shared memory often costs too much.
Even so, we have gone to considerable lengths to create distributed
computing mechanisms that emulate multithreaded programming. CORBA and
.NET, for example, are rooted in distributed object-oriented
techniques, where software components interact with proxies that behave
as if they were local objects with shared memory. Object orientation's
data abstraction limits the extent to which the illusion of shared
memory must be preserved, so such techniques prove reasonably cost
effective. They make distributed programming look much like
multithreaded programming.

Yet this argument is not a resurrection of the old shared-memory versus
message-passing debate. Message passing can be made as nondeterministic
and difficult to understand as threads. Conversely, shared memory can
be used in deterministic and understandable ways—using data-parallel
languages, for example. The argument here is against the use of
nondeterministic mechanisms to achieve deterministic aims.

Embedded computing also exploits concurrency models other than threads.
Programmable DSP architectures are often VLIW machines. Video signal
processors often combine SIMD with VLIW and stream processing. Network
processors provide explicit hardware support for streaming data.
However, despite considerable innovative research, in practice,
programming models for these domains remain primitive. Designers write
low-level assembly code that exploits specific hardware features,
combining this code with C code only where performance is noncritical.

For many embedded applications, reliability and predictability are far
more important than expressiveness or performance. We can argue that
this should be true in general-purpose computing, but that's a side
argument. I contend that achieving reliability and predictability using
threads is essentially impossible for many applications.
THREADS AS COMPUTATION

>From a fundamental perspective, threads are seriously flawed as a
computation model because they are wildly nondeterministic, as the
"Nondeterminism and Threads" sidebar describes. The programmer's job is
to prune away that nondeterminism. We have developed tools to assist in
the pruning: Semaphores, monitors, and more modern overlays on threads
offer the programmer ever more effective pruning. But pruning a wild
mass of brambles rarely yields a satisfactory hedge.

To offer another analogy, a folk definition of insanity is to do the
same thing over and over again and expect the results to be different.
By this definition, we in fact require that programmers of
multithreaded systems be insane. Were they sane, they could not
understand their programs.

Moreover, implementing a multithreaded computation model is difficult.
Witness, for example, the subtleties with the Java memory model, where
even astonishingly trivial programs produce considerable debate about
their possible behaviors.4

We must and can build concurrent computation models that are far more
deterministic, and we must judiciously and carefully introduce
nondeterminism where needed. Nondeterminism should be explicitly added
to programs, and only where needed, as it is in sequential programming.
Threads take the opposite approach. They make programs absurdly
nondeterministic and rely on programming style to constrain that
nondeterminism to achieve deterministic aims.
HOW BAD IS IT IN PRACTICE?

In practice, many programmers today write multithreaded programs that
work. This appears to be a contradiction, but programmers can employ
tools that prune away much of the nondeterminism.

Object-oriented programming, for example, limits the visibility that
certain portions of a program have into portions of the state. This
effectively partitions the state space into disjoint sections. Where
programs do operate on shared portions of this state space, semaphores,
mutual-exclusion locks, and monitors (objects with mutually exclusive
methods) provide mechanisms that programs can use to prune away more of
the nondeterminism. But in practice, these techniques yield
understandable programs only for very simple interactions.

public class VaulHolder (
private List listeners = new LinkedList():
private int value:
public interface Listener (
public void valueChanged (int newValue) :
)
public void setValue (int newValue) (
value = newValue;
Iterator i = listeners.iterator ():
while (i.hasNext()) (
(Listener) i.next ()). valueChanged (newValue):
)
)
)

Figure 1. A Java implementation of the observer pattern, valid for one
thread.

Consider the observer pattern,5 a very simple and widely used design
pattern. Figure 1 shows a Java implementation valid for a single
thread. This shows two methods from a class where an invocation of the
setValue() method triggers notification of the new value by calling the
valueChanged() method of any objects that have been registered by a
call to addListener().

The code in Figure 1 is not thread safe, however. That is, if multiple
threads can call setValue() or addListener(), the listeners list could
be modified while the iterator is iterating through the list,
triggering an exception that will likely terminate the program.

The simplest solution adds the Java keyword synchronized to each of the
setValue() and addListener() method definitions. The synchronized
keyword in Java implements mutual exclusion, turning instances of this
ValueHolder class into monitors and preventing any two threads from
being in these methods simultaneously. When the program calls a
synchronized method, the calling thread attempts to acquire an
exclusive lock on the object. If any other thread holds that lock, the
calling thread stalls until the lock releases.

However, this solution is unwise because it can lead to deadlock. In
particular, suppose we have an instance a of ValueHolder and an
instance b of another class that implements the Listener interface.
That other class can do anything in its valueChanged() method,
including acquiring a lock on another monitor. If it stalls in
acquiring that lock, it will continue to hold the lock on this
ValueHolder object while stalled. Meanwhile, whatever thread holds the
lock it is trying to acquire might call addListener() on a. Both
threads are now blocked with no hope of becoming unblocked. This sort
of potential deadlock lurks in many programs that use monitors.

public class ValueHolder (
private List listeners = new LinkedList ():
private int value:
public interface Listener (
public void valueChanged (int newValue):
)
public synchronized void addListener (Listener listener) (
listeners.add (listenet);
)
public void setValue (int newValue) (
List copyOfListeners;
synchronized (this) (
value = newValue;
copyOfListeners - new LinkedList (listeners):
)
Iterator i = copyOfListeners.iterator ():
while(i.hasNext() ) (
((Listener) i.next () ), valueChanged (newValue) :
)
)
)

Figure 2. A Java implementation of the observer pattern that attempts
to be thread safe.

Already, this rather simple design pattern is proving difficult to
implement correctly. Consider the improved implementation shown in
Figure 2. While holding a lock, the setValue() method copies the
listeners list. Since the addListeners() method is synchronized, this
avoids the concurrent modification exception that might occur with the
code in Figure 1. Further, it calls valueChanged() outside of the
synchronized block to avoid deadlock.

This code is still not correct, however. Suppose two threads call
setValue(). One will set the value last, leaving that value in the
object. But listeners might be notified of value changes in the
opposite order and will conclude that the final value of the
ValueHolder object is the wrong one.

This pattern can be made to work robustly in Java. Yet even this very
simple and commonly used design pattern has required some rather
intricate thinking about possible interleavings.

I speculate that most multithreaded programs have such bugs. I
speculate further that the bugs have not proved to be major handicaps
only because today's architectures and operating systems deliver modest
parallelism.

The cost of context switching is high, so only a tiny percentage of the
possible interleavings of thread instructions ever occur in practice. I
conjecture that most multithreaded general-purpose applications are so
full of concurrency bugs that—as multicore architectures become
commonplace—these bugs will begin to show up as system failures.

This paints a bleak scenario for computer vendors: Their
next-generation machines will become widely known as the ones on which
many programs crash.

These same computer vendors advocate more multithreaded programming to
provide the concurrency that can exploit the parallelism they would
like to sell us. Intel, for example, has embarked on an active campaign
to get leading computer science academic programs to put more emphasis
on multithreaded programming. If they succeed, and programmers make
more intensive use of multithreading, the next generation of computers
will become nearly unusable.
FIXING THREADS BY MORE AGGRESSIVE PRUNING

Several approaches to solving this concurrency problem share a common
feature. Specifically, they preserve the essential thread model of
computation for programmers, but provide them with more aggressive
mechanisms for pruning its enormously nondeterministic behavior.
Software engineering processes

Better software engineering processes provide the first technique.
While essential for reliable multithreaded programs, these processes
are not sufficient.

An anecdote from the Ptolemy Project is telling. In early 2000, my
group began developing the Ptolemy II kernel,6 a modeling environment
supporting concurrent computation models. An early objective was to
permit modification of concurrent programs via a graphical user
interface while those programs executed. The challenge involved
ensuring that no thread could ever see an inconsistent view of the
program structure. The strategy used Java threads with monitors
(http:// ptolemy.eecs.berkeley.edu).

Part of the Ptolemy Project experiment sought to determine whether we
could develop effective software engineering practices for an academic
research setting. We developed a process that included a four-level
code maturity rating system (red, yellow, green, and blue), design
reviews, code reviews, nightly builds, regression tests, and automated
code coverage metrics. We wrote the kernel portion that ensured a
consistent view of the program structure in early 2000, design reviewed
to yellow, and code reviewed to green. The reviewers included
concurrency experts, not just inexperienced graduate students.

We wrote regression tests that achieved 100 percent code coverage. The
nightly build and regression tests ran on a two-processor SMP machine,
which exhibited different thread behavior than the development
machines, which all had a single processor.

The Ptolemy II system itself became widely used, and every use of the
system exercised this code. No problems were observed until the code
deadlocked in April 2004, four years later.

Our relatively rigorous software engineering practice had identified
and fixed many concurrency bugs. But that a problem as serious as a
deadlock could go undetected for four years despite this practice is
alarming. How many more such problems remain? How long must we test
before we can be sure to have discovered all such problems?
Regrettably, I must conclude that testing might never reveal all the
problems in nontrivial multithreaded code.

There are tantalizingly simple rules for avoiding deadlock, however.
For example, always acquire locks in the same order.8 However, this
rule is difficult to apply because no method signature in any widely
used programming language indicates what locks the method acquires. We
must examine the source code of all methods called—and all methods
that those methods call—to confidently invoke a method.

Even if we fix this language problem by making locks part of the method
signature, this rule makes it extremely difficult to implement
symmetric accesses, where interactions can originate from either end.
And no such fix gets around the extreme difficulty of reasoning about
mutual exclusion locks. If programmers cannot understand their code,
the code will not be reliable.

We might conclude that the problem lies in how Java realizes threads.
Perhaps the synchronized keyword is not the best pruning tool. Indeed,
version 5.0 of Java, introduced in 2005, added several other mechanisms
for synchronizing threads. These mechanisms do enrich the programmer's
toolkit for pruning nondeterminacy. But using mechanisms such as
semaphores still requires considerable sophistication, and the result
likely will still be incomprehensible programs with subtle lurking
bugs.
Design patterns

Software engineering process improvements alone will not do the job.
Another helpful approach uses vetted design patterns for concurrent
computation. 8,9 Indeed, these are an enormous help when the problem
being solved matches one of the patterns.

However, this approach presents two difficulties. First, implementation
of the patterns, even with careful instructions, is still subtle and
tricky. Programmers will make errors, and there are no scalable
techniques for automatically checking compliance of implementations to
patterns. Second, combining the patterns can be difficult. Because
their properties are not typically composable, nontrivial programs that
require using more than one pattern are unlikely to be understandable.

Databases are an example of a common use of patterns in concurrent
computation, particularly with the notion of transactions. Transactions
support speculative unsynchronized computation on a copy of the data
followed by a commit or abort. A commit occurs when it can be shown
that no conflicts have occurred.

Transactions can be supported on distributed hardware, as is common for
databases; in software on shared-memory machines; or, most
interestingly, in hardware on shared-memory machines. In the latter
case, the technique meshes well with the cache consistency protocols
required anyway on these machines.

Although transactions eliminate unintended deadlocks, despite recent
extensions for composability, 10they remain a highly nondeterministic
interaction mechanism. They are well-suited to intrinsically
nondeterminate situations, where for example multiple actors compete
nondeterministically for resources. But they are poorly suited for
building determinate concurrent interactions.

MapReduce11 is a particularly interesting use of patterns inspired by
the higher-order functions found in Lisp and other functional
languages. Google has used this pattern for large-scale distributed
processing of huge data sets. Whereas most patterns provide
fine-grained shared-data structures with synchronization, MapReduce
provides a framework for the construction of large distributed
programs. The pattern's parameters are pieces of functionality
represented as code rather than as pieces of data.

Experts can encapsulate patterns into libraries as with the concurrent
data structures in Java 5.0 and STAPL in C++. Although this greatly
improves the reliability of implementations, constraining all
concurrent interactions to occur via these libraries requires some
programmer discipline. Folding the capabilities of these libraries into
languages in which syntax and semantics enforce these constraints could
eventually lead to more easily constructed concurrent programs.

Higher-order patterns such as MapReduce offer some particularly
interesting challenges and opportunities for language designers. These
patterns function at the level of coordination languages rather than
more traditional programming languages. New coordination languages
compatible with established programming languages, such as Java and
C++, are more likely to gain acceptance than new programming languages
that replace established ones.

A common compromise extends established programming languages with
annotations or a few selected keywords to support concurrent
computation. This compromise admits the reuse of significant portions
of legacy code when concurrency is not an issue, but requires rewriting
to expose concurrency. For example, Split-C12 and Cilk13 —both C-like
languages supporting multithreading—follow this strategy.

A related approach combines language extensions with constraints that
limit the expressiveness of established languages to get more
consistent and predictable behavior. For example, Guava14 constrains
Java so that it cannot access unsynchronized objects from multiple
threads. It further makes explicit the distinction between locks that
ensure the integrity of read data and locks that enable safe
modification of the data.

These language changes prune away considerable nondeterminacy without
sacrificing much performance, but they still have deadlock risk.
Other techniques

Promises , also called futures, provide another approach that puts more
emphasis on the avoidance of deadlock, as seen in the E programming
language (www.erights.org/). Here, instead of blocking access to shared
data, programs proceed with a proxy of the data they expect to get
eventually, using the proxy as if it were the data itself.

Yet another approach leaves the programming languages and mechanisms
for expressing concurrency unchanged, instead introducing formal
program analysis to identify potential concurrency bugs in
multithreaded programs. This is done, for example, in the Intel thread
checker and can help considerably by revealing program behaviors
difficult for a human to spot. Similarly, less formal techniques, such
as performance debuggers, can also help, making it easier for
programmers to sort through the vast nondeterminacy of program
behaviors.

Although promising, applying both the formal and informal techniques
still requires considerable expertise, and the techniques also suffer
from scalability limitations.

All these techniques prune away some of the threads' nondeterminacy.
However, they all still result in nondeterministic programs. For
applications with intrinsic nondeterminacy, such as servers, concurrent
database accesses, or competition for resources, this is appropriate.
But achieving deterministic aims through nondeterministic means remains
difficult.

Achieving deterministic concurrent computation requires approaching the
problem differently. Instead of starting with a highly nondeterministic
mechanism like threads and relying on the programmer to prune that
nondeterminacy, we should start with deterministic, composable
mechanisms and introduce nondeterminism only where needed.
ALTERNATIVES TO THREADS

Consider again the simple observer pattern shown in Figures 1 and 2,
which is not so easy to implement using threads.

figure 3 image

Figure 3. Observer pattern implemented in a rendezvous-based
coordination language with a visual syntax. Two possible three-way
rendezvous interactions can occur, repeatedly and in nondeterministic
order.
Rendezvous director

Now, consider Figure 3, which shows the observer pattern implemented in
Ptolemy II's Rendezvous domain. The box at the upper left, labeled
Rendezvous director, provides an annotation specifying that this
diagram represents a CSP-like concurrent program, in which each
component, represented by an icon, is a process, and communication is
by rendezvous. The system specifies the processes themselves using
ordinary Java code, so this framework is properly viewed as a
coordination language, which happens to have a visual syntax.

Reo15 inspired the Ptolemy II implementation of this rendezvous domain,
which includes a Merge block that specifies a conditional rendezvous.
In the diagram, the block specifies that either of the two Value
Producers can rendezvous with both the Value Consumer and Observer.
That is, two possible three-way rendezvous interactions can occur,
repeatedly and in nondeterministic order.

Once the icons' meanings become clear, the diagram expresses the
observer pattern. Everything about the program is deterministic except
the explicitly nondeterministic interaction specified by the Merge
block. Were that block absent, the program would specify deterministic
interactions between deterministic processes. Deadlock is provably
absent—in this case, the lack of cycles in the diagram ensures no
deadlock. The multiway rendezvous ensures that the Value consumer and
Observer see new values in the same order. The observer pattern becomes
trivial, as it should be.
PN director

Now that the trivial programming problem is truly trivial, we can start
to consider interesting elaborations. We can replace the Rendezvous
director in Figure 3 with a "PN director" that realizes the Kahn
process networks (PN) model of concurrency. 16 In this model, each icon
again represents a process, but instead of rendezvous-based
interactions, the processes communicate via message passing with
conceptually unbounded FIFO queues and blocking reads.

In the original PN model, the blocking reads ensure that every network
defines a deterministic computation. In this case, the Merge block
explicitly merges streams nondeterministically. Augmenting the PN model
with such explicit nondeterminism is common for embedded software
applications.17

The PN implementation has all the cited advantages of the
implementation in Figure 3, with the added property that the Observer
need not keep up with the Value consumer. Notifications can be queued
for later processing. In a thread-based implementation, we will
unlikely ever get to the point of asking such questions because the
programmer effort to get any form of the observer pattern correct is so
excessive.
SR director

A third implementation would elaborate on the nature of the
nondeterminism that the nondeterministic merge represents. The
implementation could use the principles of synchronous languages18 to
ensure fairness. In Ptolemy II, the same model can be implemented with
an SR (synchronous/ reactive) director, which implements a synchronous
model related to Esterel, SIGNAL, and Lustre. The last of these has
been used successfully to design highly concurrent, safety-critical
software for aircraft-control applications. Using threads for such
applications would not be wise.
DE director

A fourth implementation would focus on the timing of nondeterministic
events. In Ptolemy II, a similar model using the DE (discrete events)
director would provide a timed specification with rigorous semantics
related to that of hardware description languages such as VHDL and
Verilog and to network modeling frameworks such as Opnet Modeler.
Judicious nondeterminism

In all four cases—rendezvous, PN, SR, and DE—we started with an
interaction mechanism that was deterministic in how it performed the
computation, although in the first three cases it was not deterministic
in the sense of timing. These designs judiciously introduce
nondeterminism exactly where needed. This style of design differs from
the threads style, which starts with a wildly nondeterministic
interaction mechanism and attempts to prune away undesired
nondeterminism.

The implementation shown in Figure 3 and the PN version both use Java
threads. However, the programmer's model does not use threads. Compared
to all the techniques described in the previous sections, this is
closest to MapReduce, which has a similar flavor of streaming data
through computations. But unlike MapReduce, it receives support from a
rigorous coordination language sufficiently expressive to describe a
wide range of interactions. Four distinct coordination languages are
mentioned here, with rendezvous, PN, SR, and DE semantics.

This established style of concurrency, in which data flows through
components, has been called "actor-oriented."19 These architectures can
take many forms. Unix pipes resemble PN, although they are more limited
in that they do not support cyclic graphs. Message passing packages
like MPI and OpenMP include facilities for implementing rendezvous and
PN, but in a less structured context that emphasizes expressiveness
rather than determinacy. A naive user of such packages can easily
encounter unexpected nondeterminacy.

Languages such as Erlang make message-passing concurrency an integral
part of a general-purpose language. Languages such as Ada make
rendezvous an integral part. Functional languages and single-assignment
languages also emphasize deterministic computations, but they are less
explicitly concurrent, so controlling and exploiting concurrency can be
more challenging. Data-parallel languages also emphasize determinate
interactions, but require low-level rewrites of software.

All these approaches offer pieces of the solution. But it seems
unlikely that any one will become mainstream.
CHALLENGES AND OPPORTUNITIES

Threads continue to dominate the concurrent programming landscape
despite the existence of alternatives. Many obstacles prevent these
alternatives from taking root, probably the most important being that
the very notion of programming, and the core abstractions of
computation, are deeply rooted in the sequential paradigm to which most
widely used programming languages adhere. Syntactically, threads
provide either a minor extension to these languages, as in Java, or
just an external library. Semantically, they thoroughly disrupt the
languages' essential determinism.

Regrettably, programmers seem more governed by syntax than semantics.
The alternatives to threads that have taken root, like MPI and OpenMP,
share this same key feature. They make no syntactic change to
languages. Alternatives that replace these languages with entirely new
syntax, such as Erlang or Ada, have not taken root, and probably will
not. Even languages with minor syntactic modifications to established
languages, like Split-C or Cilk, remain esoteric.

The message is clear. We should not replace established languages. We
should instead build on them. However, building on them using only
libraries is unsatisfactory. Libraries offer little structure, no
pattern enforcement, and few composable properties.
Coordination languages

The right answer can be found in coordination languages, also called
composition languages, which introduce new syntax. That syntax,
however, serves purposes orthogonal to those of established programming
languages.

Whereas a general-purpose concurrent language like Erlang or Ada must
include syntax for mundane operations such as arithmetic expressions, a
coordination language need not specify anything more than coordination.
Given this, the syntax can be noticeably distinct.

The program shown in Figure 3 uses a visual syntax to specify
actor-oriented coordination. Although here a visual syntax serves only
pedagogical purposes, conceivably such visual syntaxes eventually will
be made scalable and effective, as certain parts of UML have been for
object-oriented programming. If not, we can easily envision scalable
textual syntaxes that specify the same structure.

Coordination languages themselves have been around for a long time.20
They too have failed to take root. One reason for this is that their
acceptance amounts to capitulation on one key front: homogeneity. A
prevailing undercurrent in programming languages research is that any
worthy programming language must be general purpose. It must be, at a
minimum, sufficiently expressive to express its own compiler. Adherents
to the language view as traitors any of their colleagues seduced by
another language.

A key development, however, has broken the ice. UML—properly viewed
as a family of languages, each with a visual syntax—is routinely
combined with C++ and Java. Programmers have begun accepting the use of
more than one language, especially when the disjoint languages provide
complementary features. The program in Figure 3 follows this spirit in
that it diagrammatically specifies a large-grained structure quite
orthogonal to fine-grained computation.

Concurrency models with stronger determinism than threads, such as Kahn
process networks, CSP, and dataflow, have also been available for some
time. Some have led to programming languages, such as Occam, and some
have led to domain-specific frameworks such as YAPI.17 Most, however,
have principally been used to build elaborate process calculi, and they
have not had much effect on mainstream programming. I believe this can
change if we use these concurrency models to define coordination
languages rather than replacement ones.
Coordination language design

Full of pitfalls, designing good coordination languages is no easier
than designing good general-purpose languages. For example, programmers
can easily be trapped by the false elegance of a spare set of
primitives. In general-purpose languages, we know that seven primitives
are sufficient, as in Turing machines, but no one builds a serious
programming language on these.

figure 4 image

Figure 4. Two ways to accomplish deterministic interleaving using
rendezvous. The upper model uses nondeterministic mechanisms to
accomplish deterministic aims. In contrast, in the lower model,
well-chosen language primitives enable simple, direct, deterministic
expression of deterministic aims.

Figure 4 shows two implementations of a simple concurrent computation.
In the upper program, an adaptation of an example from Farhad Arbab's
work, 15the system deterministically interleaves successive outputs
from Data source 1 and 2, which appear in alternating order at the
Display block. This is a quite complex, even puzzling, way to provide
this rather simple functionality.

In contrast, Figure 4's lower program is easily understood. The
Commutator block performs a rendezvous with each of its inputs in
top-to-bottom order and thus accomplishes the same interleaving.
Judicious choice of language primitives enables simple, direct, and
deterministic expressions of deterministic aims. The upper model uses
nondeterministic mechanisms, albeit more expressive ones, to accomplish
deterministic aims, making it much more obtuse.

Coordination languages must develop scalability and modularity features
analogous to those in established languages. Ptolemy II, for example,
provides a sophisticated, modern type system at the
coordination-language level. Moreover, it offers a preliminary form of
inheritance and polymorphism adapted from object-oriented techniques.
19 A huge opportunity exists in adapting the concept of higher-order
functions to coordination languages, which would enable constructs like
MapReduce at the coordination-language level.

A more challenging, long-term opportunity would adapt the theory of
computation to provide better foundations for concurrent computations.
Although researchers have made considerable progress in this direction,
much more must be done. In addition to the sequential computation
modeled as functions mapping bit sequences into bit sequences, a
corresponding concurrent model 21 that, instead of a function

f : B** AE B**

(see the "Nondeterminism and Threads" sidebar) gives concurrent
computation as a function

f : (T AE B**) AE (T AE B**)

with T a partially or totally ordered set of tags, where the ordering
can represent time, causality, or more abstract dependency relations. A
computation viewed in this way maps an evolving bit pattern into an
evolving bit pattern. This basic formulation has been shown to be
adaptable to many concurrent computation models.

Achieving concurrency in software is difficult. However, much of this
difficulty arises from the abstractions for concurrency we have chosen.
Threads provide the dominant method in use today for general-purpose
computing. But nontrivial multithreaded programs are incomprehensible
to humans. Design patterns, better granularity of atomicity, improved
languages, and formal methods can improve the programming model. These
techniques, however, merely chip away at the unnecessarily enormous
nondeterminism of the threading model, which remains intrinsically
intractable.

If we expect concurrent programming to become mainstream, and if we
demand reliability and predictability from programs, we must discard
threads as a programming model. We can construct concurrent programming
models that are much more predictable and understandable than threads
based on a simple principle: Deterministic ends should be accomplished
with deterministic means. Nondeterminism should be judiciously and
carefully introduced where needed, and it should be explicit in
programs. This principle seems obvious, yet threads do not accomplish
it. They must be relegated to the engine room of computing, to be
suffered only by expert technology providers.

Acknowledgments

I acknowledge the thought-provoking comments and suggestions from Joe
Buck (Synopsys), Mike Burrows (Google), Stephen Edwards (Columbia), Jim
Larus (Microsoft), Sandeep Shukla (Virginia Tech), and Mark Miller.

References

1. H. Sutter and J. Larus, "Software and the Concurrency
Revolution," ACM Queue, vol. 3, no. 7, 2005, pp. 54-62.
2. M. Creeger, "Multicore CPUs for the Masses," ACM Queue, vol. 3,
no. 7, 2005, pp. 63-64.
3. U. Banerjee et al., "Automatic Program Parallelization," Proc.
IEEE, vol. 81, no. 2, 1993, pp. 211-243.
4. W. Pugh, "Fixing the Java Memory Model," Proc. Conf. Java Grande,
ACM Press, 1999, pp. 89-98.
5. E. Gamma et al., Design Patterns: Elements of Reusable
Object-Oriented Software, Addison-Wesley, 1994.
6. J. Eker et al., "Taming Heterogeneity—The Ptolemy Approach,"
Proc. IEEE, vol. 91, no. 2, 2003, pp. 127-144.
7. H.J. Reekie et al., Software Practice in the Ptolemy Project,
tech. report series GSRC-TR-1999-01, Gigascale Semiconductor Research
Center, Univ. of California, Berkeley, Apr. 1999.
8. D. Lea, Concurrent Programming in Java: Design Principles and
Patterns, Addison-Wesley, 1997.
9. D.C. Schmidt et al., Pattern-Oriented Software
Architecture—Patterns for Concurrent and Networked Objects, Wiley,
2000.
10. T. Harris et al., "Composable Memory Transactions," Proc. ACM
Symp. Principles and Practice of Parallel Programming (PPoPP), ACM
Press, 2005, pp. 48-60.
11. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing
on Large Clusters, Proc. 6th Symp. Operating System Design and
Implementation (OSDI), Usenix Assoc., 2004, pp. 137-150.
12. D.E. Culler et al., "Parallel Programming in Split-C," ACM/IEEE
Conf. Supercomputing, ACM Press, 1993, pp. 262-273.
13. R.D. Blumofe et al., "Cilk: An Efficient Multithreaded Runtime
System," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel
Programming (PPoPP), ACM Press, 1995, pp. 207-216.
14. D.F. Bacon, R.E. Strom, and A. Tarafdar, "Guava: A Dialect of
Java without Data Races," ACM SIGPLAN Notices, vol. 35, 2000, pp.
382-400.
15. F. Arbab, "A Behavioral Model for Composition of Software
Components," L'Object, to appear 2006.
16. G. Kahn and D.B. MacQueen, "Coroutines and Networks of Parallel
Processes," Information Processing, B. Gilchrist, ed., North-Holland
Publishing, 1977, pp. 993-998.
17. E.A. de Kock et al., "YAPI: Application Modeling for Signal
Processing Systems," Proc. 37th Design Automation Conf. (DAC 00), ACM
Press, 2000, pp. 402-405.
18. A. Benveniste and G. Berry, "The Synchronous Approach to Reactive
and Real-Time Systems," Proc. IEEE, vol. 79, no. 9, pp. 1270-1282.
19. E.A. Lee and S. Neuendorffer, "Classes and Subclasses in
Actor-Oriented Design," Proc. ACM/IEEE Conf. Formal Methods and Models
for Codesign (MEMOCODE), 2004;
http://ptolemy.eecs.berkeley.edu/publications/papers/04/Classes/.
20. G. Papadopoulos and F. Arbab, "Coordination Models and
Languages," Advances in Computers—The Engineering of Large Systems,
vol. 46, M. Zelkowitz, ed., Academic Press, 1998, pp. 329-400.
21. E.A. Lee and A. Sangiovanni-Vincentelli, "A Framework for
Comparing Models of Computation," IEEE Trans. Computer-Aided Design of
Circuits and Systems, vol. 17, no. 12, 1998, pp. 1217-1229.


Edward A. Lee is a professor, chair of the Electrical Engineering
Division, and associate chair of the Electrical Engineering and
Computer Science Department at the University of California, Berkeley.
His research interests include embedded software, concurrency,
real-time software, signal processing, and communications. Lee received
a PhD inelectrical engineering and computer sciences from the
University of California, Berkeley. He is a Fellow of the IEEE. Contact
him at e...@eecs.berkeley.edu.

David Schwartz

unread,
Aug 18, 2006, 4:32:59 AM8/18/06
to

I don't know what I can say other than that I disagree completely with
almost everything he said.

Fundamentally, concurrency should be the rule with serialization
enforced only where there are dependencies.

Having to find the concurrency is harder than having to find the
serialization. Suppose you have ten people that have totally different
tasks to do, but they only share two cars with drivers. Finding the
serialization is easy -- it's the cars and drivers. Finding the
concurrency is much harder -- it's every possible combination of
things that two people could do at once.

DS

David Hopwood

unread,
Aug 18, 2006, 11:53:41 AM8/18/06
to

Finding the concurrency is easy: it's the ten people and the two
(car with driver)s, i.e. 12 processes (assuming that a car is permanently
associated with a driver). Concurrency is not the same thing as
parallelism.

It is a mistake to assume that an implementation using 12 processes
would incur the overhead of 12 heavyweight threads, or that an increased
number of processes per se complicates the system.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

John Hickin

unread,
Aug 18, 2006, 1:41:26 PM8/18/06
to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:1155889979.9...@i42g2000cwa.googlegroups.com...

>
> I don't know what I can say other than that I disagree completely with
> almost everything he said.
>
> Fundamentally, concurrency should be the rule with serialization
> enforced only where there are dependencies.

Finding the dependencies can be hard. Finding new ones can be even harder.

>
> Having to find the concurrency is harder than having to find the
> serialization. Suppose you have ten people that have totally different
> tasks to do, but they only share two cars with drivers. Finding the
> serialization is easy -- it's the cars and drivers. Finding the
> concurrency is much harder -- it's every possible combination of
> things that two people could do at once.
>

This example is quite good and it explains what I don't like with threads.
In the real world, 8 of the 10 people in your example will go on to do
something else. In a multi-threaded world, 8 threads are blocked. Unless, of
course, you use some sort of lock-free algorithm. Unless I've missed the
mark, lock-free is quite avant guard, and not taught at the undergraduate
level. It may be time for a change there.

Anyway, the article was interesting to read. The author is an IEEE Fellow,
and you don't get that designation without earning it.


Regards, John.


Joe Seigh

unread,
Aug 18, 2006, 6:20:11 PM8/18/06
to
(snip)

I'm a little leery of some of the proposed cures for this so called problem.
They tend to be problem domain specific and proponents seem to think that
because they solve all of their problems within that specific problem domain
then they're good for everybody no matter what the other problem domains are.

I think the main problem is there aren't as many thread savvy programmers out
there as some may think, and some aren't as thread savvy as they think. If
you're having a lot of problems with deadlock, it *not* because you're a
thread expert and threading is really difficult. The definition of a thread
expert is one who knows how to avoid deadlock and other threading problems.

My impression is that there are a lot of programming organizations whose
product is thread sensitive (servers, databases, etc...) and which are
rather mediocre in the thread skills department. It's this attitude, that
you can get by with rather mediocre skills, that is causing damage to the
efficient exploitation of multi-threading.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,
Aug 18, 2006, 6:34:04 PM8/18/06
to
"John Hickin" <hic...@nortelnetworks.com> wrote in message
news:ec4u46$hvv$1...@zcars129.ca.nortel.com...

>
> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:1155889979.9...@i42g2000cwa.googlegroups.com...
[...]

>
> This example is quite good and it explains what I don't like with threads.
> In the real world, 8 of the 10 people in your example will go on to do
> something else. In a multi-threaded world, 8 threads are blocked.

http://groups.google.com/group/comp.programming.threads/msg/a3e38e27df971e0d
( refer to the pseudo-code at the bottom of the msg )

Any thoughts?


> Unless, of
> course, you use some sort of lock-free algorithm. Unless I've missed the
> mark, lock-free is quite avant guard, and not taught at the undergraduate
> level. It may be time for a change there.

Totally agreed:

http://groups.google.com/group/comp.lang.c++.moderated/msg/d07b79e9633f3e52

FWIW:

I don't agree with the commonly assertion that lock-free programming is far
to complicate for a "normal" programmer to learn.

I also don't agree with the assertion that threads are vastly too difficulty
and fragile for any programmer to use... The paper the OP linked to seems to
attempt to make this type of assertion.


>
> Anyway, the article was interesting to read. The author is an IEEE Fellow,
> and you don't get that designation without earning it.

Humm... Did you read the part where is basically says that programmers that
use threads are insane? He points to a "folk saying",


'repeatedly performing the same action and expecting different results
defines a form of insanity'
(paraphrase)


The he says this, again paraphrasing

'if programmers that use threads were not insane, they simply would not be
able to understand any of their programs'


I wonder if he was being serious... Anyway....


Yikes! I guess that I am INSANE!!!!!!!!!
YeeeHaw.... Weeeee.....

;)


I find it amusing that the IEEE Fellow seems to not be able to create a
thread-safe observer pattern... I have several lock-free observer patterns
that can rely on my vZOOM or Joe Seighs RCU-SMR algorithms.

For some reason, I don't think that the author could create a highly
scaleable thread-safe, lock-based or lock-free, observer pattern
implementation if his life depended on it???

:O


Sean Kelly

unread,
Aug 18, 2006, 7:32:10 PM8/18/06
to
The author presents a somewhat skewed viewpoint in his attempt to make
a convincing argument, but I agree with the general gist of the
article--in essence, he is saying that traditional multithreaded
programming is difficult to get right, and its typically
nondeterministic nature makes verifying program correctness quite
difficult. He goes on to say that the historic solutions for these
problems (process algebras, mostly) have had limited success because
popular languages offer little to no "in language" support for them,
and suggests a language overlay (similar to UML) as a way to provide
such support.

I do think that traditional multithreaded programming using Hoare
Monitors and such operates at too low a level for day-to-day use, and
that simply abstracting the model through keywords such as Java's
'synchronized' is insufficient. I also agree that CSP and similar
algebras are quite appealing, and that it should be quite possible to
implement something fairly reasonable without requiring the target
language semantics to be modified--a library solution, for example.
Frankly, I'm surprised that this process isn't farther along. I've
seen a few Java projects for implementing CSP, and perhaps one C++
project as well, but they're largely academic in origin and receive
basically no public attention so far as I can tell. And these are
concepts that are over 25 years old.


Sean

kimo

unread,
Aug 18, 2006, 8:10:31 PM8/18/06
to
> "Chris Thomasson":

> I find it amusing that the IEEE Fellow seems to not be able to create a
> thread-safe observer pattern... I have several lock-free observer patterns
> that can rely on my vZOOM or Joe Seighs RCU-SMR algorithms.
>
> For some reason, I don't think that the author could create a highly
> scaleable thread-safe, lock-based or lock-free, observer pattern
> implementation if his life depended on it???

I have lots of respect for you practioners on c.p.t. but I find one of
the points in the article persuasive - this issue of composibility of
patterns - in that, even if one creates safe patterns - combining them
safely is a whole n'other ball of wax.

And I do appreciate his acknowledgment that for the solution to work it
must apply real world contrants like preserving current code when
possible.

FYI: Here's the recent Harris paper on attempts to support
Composibility with Software Transactional Memory

http://lambda-the-ultimate.org/node/463

Chris Thomasson

unread,
Aug 18, 2006, 9:07:59 PM8/18/06
to
"kimo" <kimocr...@gmail.com> wrote in message
news:1155946231....@i42g2000cwa.googlegroups.com...

>> "Chris Thomasson":
>> I find it amusing that the IEEE Fellow seems to not be able to create a
>> thread-safe observer pattern... I have several lock-free observer
>> patterns
>> that can rely on my vZOOM or Joe Seighs RCU-SMR algorithms.
>>
>> For some reason, I don't think that the author could create a highly
>> scaleable thread-safe, lock-based or lock-free, observer pattern
>> implementation if his life depended on it???
>
> I have lots of respect for you practioners on c.p.t. but I find one of
> the points in the article persuasive - this issue of composibility of
> patterns - in that, even if one creates safe patterns - combining them
> safely is a whole n'other ball of wax.

Humm... It depends on what patterns you are going to throw together... I can
create observer/command/event patterns out of various lock-free
reader/writer patterns... Humm...


I need to think some more on this...


> And I do appreciate his acknowledgment that for the solution to work it
> must apply real world contrants like preserving current code when
> possible.
>
> FYI: Here's the recent Harris paper on attempts to support
> Composibility with Software Transactional Memory
>
> http://lambda-the-ultimate.org/node/463

I don't really like TM all that much...

:O


Humm... Some brief thoughts on STM:


http://groups.google.com/group/comp.arch/msg/1b9e405080e93149


http://groups.google.com/group/comp.arch/msg/bbbc035cf1a8502c?hl=en


http://groups.google.com/group/comp.arch/msg/11b14c4bda2d5d82?hl=en


http://groups.google.com/group/comp.arch/msg/9b00fda2752966f9?hl=en


http://groups.google.com/group/comp.arch/msg/335aeb22fd6fe526?hl=en


http://groups.google.com/group/comp.arch/msg/1ace9400b1b16cd4


http://groups.google.com/group/comp.arch/msg/995379a16beb3b69
(simply excellent Wheeler post)


Some more VERY brief thoughts:


IMHO, if anybody actually looked under hood of the various STM
implementations' that are out there, even in Concurrent Haskell, they would
quickly realize that they have major issues with forward progress, and are
just saturated with frequent atomic operations and multiple #StoreLoad
dependencies (e.g., expensive memory barriers)... IMO, anytime your code
depends on some form of explicit contention manager, you know you have
"problems"... STM heavily depends (e.g., totally relies) on complex
contention management schemes just to help them along during periods of
load...


IMO, STM basically sacrifices' performance and forward progress, for so
called "ease-of-use" characteristics'. For instance, go ahead and compare a
STM version of a single producer/consumer queue vs a lock-free version. The
lock-free version is 100% loopless and can avoid atomic operations and
StoreLoad memory barriers... The STM will have to use barriers to "announce"
the transaction, the use barriers and validation techniques to "commit" the
transaction... All of this is in a giant loop... The transaction might not
even commit, and will have to be retired! Yikes!


I guarantee you that the test results for the STM vs. Lock-Free version of
a single producer/consumer queue won't even be in the same dimension... This
holds true for many other lock-free algorithms...


Also, STM has some issues wrt incorporating I/O into their transaction
logic.


I should do a write up on STM... Maybe I will... It might be worth while...


;)


David Hopwood

unread,
Aug 18, 2006, 9:11:56 PM8/18/06
to
Sean Kelly wrote:
> [...] I've seen a few Java projects for implementing CSP, and perhaps

> one C++ project as well, but they're largely academic in origin and receive
> basically no public attention so far as I can tell.

Perhaps part of the problem is the widespread prejudice against projects
that are considered "academic". Greater acknowledgement that the vast majority
of good ideas in computer science have come from academia, would be a start.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Dave Butenhof

unread,
Aug 18, 2006, 2:10:45 PM8/18/06
to
You know, I have a lot of respect for someone who will go out and spend
years of their time researching and experimenting to come up with new
models -- for concurrency, synchronization abstractions, languages, or
whatever.

But this seems to be just wasting everyone's time griping about the best
models developed by other researchers through the past 30 or 40 years.
Threads are widespread and popular because they work, and while they're
lower level and sometimes (but not always) more complicated than MPI or
OpenMP they flourish because they have distinct advantages -- largely in
that they're more general and flexible.

I will agree that creating an illusion of shared memory in distributed
computing is rather strange (I'll stop short of saying "dumb", although
that's the word that first came to mind); but that has nothing to do
with threads, which aren't designed, intended, or deployed for
distributed computing. (And while MPI can be deployed locally, the fact
that many people have spent a lot of time on a 2-level model employing
local shared memory threads and remote MPI message passing is far more
than just a "comfort level" thing -- there are significant performance
advantages here.)

Non-determinism is good for most purposes, because it's efficient. Even
"synchronous" I/O is asynchronous under the covers, because it's silly
for the entire OS to wait for the disk platter to rotate. To the extent
that useful non-determinism can be hidden "transparently" to the
"end-programmer", that's fine; but let's have ideas for doing that
rationally. The real world is NOT deterministic. Physics are; but the
application of physics is never perfectly reproducible even IN a lab
much less outside. Just as the physics of semiconductors and other
circuitry are (mostly, at a coarse enough grain) deterministic... even
though the behavior of the complete system is (like the real world)
effectively non-deterministic.

So a GOOD, NATURAL, USABLE programming model will support
non-determinism but provide tools to "determine" critical areas just as
we do naturally in the real world. Anyone who starts out assuming
otherwise is, so far as I'm concerned, not of much interest.

I don't need (or respect) another "I'm threadaphobic and I'm going to
treat you to all the justifications I can devise to share my pain". I'd
love to hear some creative alternative models that address actual needs.

Frank Cusack

unread,
Aug 18, 2006, 10:32:06 PM8/18/06
to
On 17 Aug 2006 20:21:29 -0700 "kimo" <kimocr...@gmail.com> wrote:
> See the link for tables that didn't make it into the text below.
>
>
> http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1005&path=computer/homepage/0506&file=cover.xml&xsl=article.xsl&
>
>
>
> May 2006
>
> Cover Feature
> The Problem with Threads
> Edward A. Lee
> University of California, Berkeley
...

You forgot to paste in

This site and all contents (unless otherwise noted) are Copyright
©2006, IEEE, Inc. All rights reserved.

kimo

unread,
Aug 19, 2006, 12:24:44 AM8/19/06
to
Chris Thomasson wrote:

> I don't really like TM all that much...

Well let's not diverge onto TM in this thread - my point is tha
composability is a big issue when you bring together multiple
threadsafe patterns suddenly you introduce all sorts of issues - I
think his point here is valid.

David Schwartz

unread,
Aug 19, 2006, 12:31:51 AM8/19/06
to

John Hickin wrote:

> This example is quite good and it explains what I don't like with threads.
> In the real world, 8 of the 10 people in your example will go on to do
> something else. In a multi-threaded world, 8 threads are blocked. Unless, of
> course, you use some sort of lock-free algorithm. Unless I've missed the
> mark, lock-free is quite avant guard, and not taught at the undergraduate
> level. It may be time for a change there.

Lock-free is nutty. Locks are good because they deschedule contending
threads. Allowing threads to continue running as they contend is
pointless.

The 8 threads would only be blocked if they could not proceed without
access to a resource they needed, in which case they need to be
blocked. If there was any work that could be done without that
resource, there should be a thread trying to do it.

> Anyway, the article was interesting to read. The author is an IEEE Fellow,
> and you don't get that designation without earning it.

True.

Chris Thomasson

unread,
Aug 19, 2006, 12:53:41 AM8/19/06
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1155961911.1...@m73g2000cwd.googlegroups.com...

>
> John Hickin wrote:
>
>> This example is quite good and it explains what I don't like with
>> threads.
>> In the real world, 8 of the 10 people in your example will go on to do
>> something else. In a multi-threaded world, 8 threads are blocked. Unless,
>> of
>> course, you use some sort of lock-free algorithm. Unless I've missed the
>> mark, lock-free is quite avant guard, and not taught at the undergraduate
>> level. It may be time for a change there.
>
> Lock-free is nutty. Locks are good because they deschedule contending
> threads. Allowing threads to continue running as they contend is
> pointless.

Humm...


Barry Kelly

unread,
Aug 19, 2006, 1:19:52 AM8/19/06
to
David Schwartz wrote:

> I don't know what I can say other than that I disagree completely with
> almost everything he said.
>
> Fundamentally, concurrency should be the rule with serialization
> enforced only where there are dependencies.

It's funny, I didn't read him saying the reverse. Did I miss it?

-- Barry

--
http://barrkel.blogspot.com/

Barry Kelly

unread,
Aug 19, 2006, 1:41:05 AM8/19/06
to
Dave Butenhof wrote:

> Threads are widespread and popular because they work, and while they're
> lower level and sometimes (but not always) more complicated than MPI or
> OpenMP they flourish because they have distinct advantages -- largely in
> that they're more general and flexible.

I think they're widespread because, until now, the vast majority of our
computer systems have had one fast core, and thus the most performant
approach has actually been to have one thread where possible. I don't
think an explicit threading approach is going to scale to lots of slow
cores. I think it's useful, as a mental experiment, to imagine what
programming with (say) 1 million 4kHz cores would be like. What would a
programming language for that look like? What kind of primitives would
it have? What would it need in order to be as fast, for today's tasks,
as a single 4 GHz processor? How do you find enough work to keep all
those processes busy?

> Non-determinism is good for most purposes, because it's efficient.

Sure - but you need to abstract away the non-determinism before you
produce a result, otherwise the result itself won't be deterministic.

> So a GOOD, NATURAL, USABLE programming model will support
> non-determinism but provide tools to "determine" critical areas just as
> we do naturally in the real world. Anyone who starts out assuming
> otherwise is, so far as I'm concerned, not of much interest.

I didn't read him as saying non-determinism can be avoided. Once you
have genuine parallelism or distribution, you pretty much get
non-determinism for free - the challenge is to iron it out of your
results, and preferably out of the boundaries between your composable
abstractions. It's my opinion that most programmers will be more
productive with composable abstractions that have deterministic
contracts, because reasoning about non-determinism rises exponentially
(literally) with the relevant concurrent operations. That mental model
doesn't scale into the millions.

> I don't need (or respect) another "I'm threadaphobic and I'm going to
> treat you to all the justifications I can devise to share my pain". I'd
> love to hear some creative alternative models that address actual needs.

I'm interested in the Kahn process networks that he points to at the end
of the article. They fit in well with the 'futures' concept he also
mentions, also for example see Herb Sutter's "C++ Futures" talk. I can
imagine a language that superficially looks like Java/whatever, but
actually manipulates futures and dataflow pipes. What's more, I've
programmed systems in the past that would have benefited from this
approach - in fact, I see them everywhere, flows of data tying input to
output.

Joe Seigh

unread,
Aug 19, 2006, 8:48:18 AM8/19/06
to
David Schwartz wrote:
> John Hickin wrote:
>
>
>>This example is quite good and it explains what I don't like with threads.
>>In the real world, 8 of the 10 people in your example will go on to do
>>something else. In a multi-threaded world, 8 threads are blocked. Unless, of
>>course, you use some sort of lock-free algorithm. Unless I've missed the
>>mark, lock-free is quite avant guard, and not taught at the undergraduate
>>level. It may be time for a change there.
>
>
> Lock-free is nutty. Locks are good because they deschedule contending
> threads. Allowing threads to continue running as they contend is
> pointless.

It would be if that's what lock-free algorithms actually did. Continuing
time after time to make the same invalid point is pointless.

Joe Seigh

unread,
Aug 19, 2006, 8:55:47 AM8/19/06
to
Composability is a big issue if you are into fuctional programming.
It doesn't make a lot of sense outside of that area of application.
Composability is not a requirement. Some people think it's needed
because they haven't figured out how to deal with multithreading.
It's an admission of failure, not a proof of requirement.

Barry Kelly

unread,
Aug 19, 2006, 12:12:01 PM8/19/06
to
Joe Seigh wrote:

> kimo wrote:
> > Chris Thomasson wrote:
> >
> >>I don't really like TM all that much...
> >
> > Well let's not diverge onto TM in this thread - my point is tha
> > composability is a big issue when you bring together multiple
> > threadsafe patterns suddenly you introduce all sorts of issues - I
> > think his point here is valid.
> >
> Composability is a big issue if you are into fuctional programming.

I don't agree. Composability is an artifact of the components of any
successful complex system. If the complex system isn't componentized,
then it's too complex to understand as a whole - you need to use
functional programming, object orientation, procedural decomposition or
some other abstraction tool to break it apart into components. These
components, whatever they are, must be understandable and more simple in
and of themselves, so that they can be written in isolation and tested
etc., with well understood semantics. If they don't compose well, then
you haven't successfully reduced the complexity of the system through
abstractions - because your abstractions are leaking too much. This is
as true in concurrency as it is anywhere else in software.

> It doesn't make a lot of sense outside of that area of application.
> Composability is not a requirement. Some people think it's needed
> because they haven't figured out how to deal with multithreading.
> It's an admission of failure, not a proof of requirement.

I don't follow. It seems to me that saying that composability isn't
required, yet also agreeing that a large tranche (almost certainly the
vast majority) of software developers can't deal with low-level issues
like memory visibility and barriers, does not lead to the conclusion
that a market doesn't exist for composable abstractions for concurrency.
I simply don't see how that your conclusion is supported by your
arguments.

It isn't that a proof of requirement is needed - there simply needs to
be a market incentive! There is no proof of requirement for assemblers
because people who haven't figured out how to deal with machine code are
failing, similarly for compilers - all those people who admit failure
with assembler.

The challenge is to make Joe Random Programmer productive with
concurrency. If Joe Random Programmer can't deal with 50,000
concurrently executing threads, that doesn't mean he's incompetent and
should now be out on the streets, looking for a new job - it simply
means that a better model for Joe Random Programmer hasn't been
invented.

David Hopwood

unread,
Aug 19, 2006, 12:27:12 PM8/19/06
to
Joe Seigh wrote:
> kimo wrote:
>> Chris Thomasson wrote:
>>
>>> I don't really like TM all that much...
>>
>> Well let's not diverge onto TM in this thread - my point is tha
>> composability is a big issue when you bring together multiple
>> threadsafe patterns suddenly you introduce all sorts of issues - I
>> think his point here is valid.
>>
> Composability is a big issue if you are into fuctional programming.
> It doesn't make a lot of sense outside of that area of application.
> Composability is not a requirement.

I'm flabbergasted. That's the most fundamentally wrong statement I've heard
this month.

Composability is *always* a requirement. How are you going to construct
programs of nontrivial size or complexity without composability? How are you
going to maintain or extend them?

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Hopwood

unread,
Aug 19, 2006, 12:37:56 PM8/19/06
to
David Schwartz wrote:
> John Hickin wrote:
>
>>This example is quite good and it explains what I don't like with threads.
>>In the real world, 8 of the 10 people in your example will go on to do
>>something else. In a multi-threaded world, 8 threads are blocked. Unless, of
>>course, you use some sort of lock-free algorithm. Unless I've missed the
>>mark, lock-free is quite avant guard, and not taught at the undergraduate
>>level. It may be time for a change there.
>
> Lock-free is nutty. Locks are good because they deschedule contending
> threads. Allowing threads to continue running as they contend is
> pointless.

Mutual exclusion is good partly because it deschedules contending
threads. Locks have many disadvantages that are not shared by the
mechanisms for mutual exclusion used by other concurrency models, such
as message passing, or some transactional models.

Lock-free is nutty because it is far too complex to apply for what it
does; not because of performance issues.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Joe Seigh

unread,
Aug 19, 2006, 2:28:03 PM8/19/06
to

I hate stealth terminology. Composability is a term from functional
programming. There's a little bit more to it than just putting things
together. I refuse to get co-opted in that manner.

David Hopwood

unread,
Aug 19, 2006, 7:01:24 PM8/19/06
to

Your refusal to accept this usage is ideosyncratic. Try Googling for
"composability"; at least the first 80 links (I stopped checking after
that) are consistent with kimo's intended usage -- essentially as a
synonym for "compositionality". That includes links not specifically
about computing, e.g. uses in cryptography and physics. Few if any of
the links are specific to functional programming.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Schwartz

unread,
Aug 19, 2006, 9:59:11 PM8/19/06
to

David Hopwood wrote:

> > Lock-free is nutty. Locks are good because they deschedule contending
> > threads. Allowing threads to continue running as they contend is
> > pointless.

> Mutual exclusion is good partly because it deschedules contending
> threads. Locks have many disadvantages that are not shared by the
> mechanisms for mutual exclusion used by other concurrency models, such
> as message passing, or some transactional models.

How do you pass messages without locks to protect the messages? (Or a
lock-free message passing algorithm.)

You can construct message passing schemes with locks, but you're still
using locks to synchronize/exclude.

> Lock-free is nutty because it is far too complex to apply for what it
> does; not because of performance issues.

I agree that it is nutty for that reason too.

DS

David Schwartz

unread,
Aug 19, 2006, 10:33:25 PM8/19/06
to

Joe Seigh wrote:
> David Schwartz wrote:

> > Lock-free is nutty. Locks are good because they deschedule contending
> > threads. Allowing threads to continue running as they contend is
> > pointless.

> It would be if that's what lock-free algorithms actually did.

That is precisely what lock-free algorithms do. Consider, for example,
two threads accessing the same linked-list.

With a lock-based algorithm, only one thread would be scheduled. The
remaining thread would run full speed and other CPUs would run other
tasks.

With a lock-free algorithm, two threads could run on the same
collection at the same time. Neither would be descheduled and the two
threads would constantly ping-pong the linked-list across the FSB, each
running at a crawl.

> Continuing
> time after time to make the same invalid point is pointless.

Why is it invalid? Contention is bad. Locks minimize contention by
finding threads that do not contend to run in parallel. Lock-free
maximizes contention by making contending threads run slower.

DS

Chris Thomasson

unread,
Aug 19, 2006, 11:48:30 PM8/19/06
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1156041205.8...@m79g2000cwm.googlegroups.com...

My go^! We have been over and over this... I challenge you to design a pure
lock-based design that outperforms a lock-based design that has lock-free
techniques scattered throughout its infrastructure. I doubt that a 100%
lock-based design has the inherent ability to outperform a 100% lock-free
solution; but, forget that statement for a moment... Na... Been there done
that...

;)

A 100% lock-based solution would not be able to cope with the scalability
characteristics' exhibited by a effective marriage of lock-based and
lock-free programming... Hence, my AppCore project, or my vZOOM project....

Okay... Lets go back to my old SenderX...


I f$ing dare you to design a 100% lock-based observer pattern that
outperforms any observer pattern design I throw at it. My designs happen to
blend lock-based and lock-free programming together in a perfect marriage,
divorce is highly unlikely in my scenario, simply because lock-free
programming gives you this kind of "god" like control... When you design you
synchronization protocols from scratch( asm, or in woz terms, blasting a
keyboard with hexadecimal from data stores in your brain, time-share
assembler to much $/time)... You seems to disgrace the performance benefits'
of lock-free programming by deferring to the good ol' statement about how
lock-free algorithms are just nutty!!!

Any thoughts'?


David Hopwood

unread,
Aug 20, 2006, 12:03:41 AM8/20/06
to
David Schwartz wrote:
> David Hopwood wrote:
>
>>>Lock-free is nutty. Locks are good because they deschedule contending
>>>threads. Allowing threads to continue running as they contend is
>>>pointless.
>
>>Mutual exclusion is good partly because it deschedules contending
>>threads. Locks have many disadvantages that are not shared by the
>>mechanisms for mutual exclusion used by other concurrency models, such
>>as message passing, or some transactional models.
>
> How do you pass messages without locks to protect the messages? (Or a
> lock-free message passing algorithm.)
>
> You can construct message passing schemes with locks, but you're still
> using locks to synchronize/exclude.

You seem to be confusing implementation with semantics.

The semantics just specifies that each process acts on incoming messages
sequentially, i.e. one message at a time. This is the model that an
application programmer deals with, and it has no concept of locks. Depending
to some extent on the precise variant of message passing, this avoids most
of the problems associated with shared-state lock-based models -- e.g. lack
of composability, and difficulty of avoiding deadlocks and race conditions in
large programs -- *regardless* of how the message passing is implemented.

The implementation of a message passing system also does not require locks,
nor is there any particular reason to use them. For example, if we have n
scheduling entities (SEs) where each process is assigned to an SE, then we
can use n^2 single-reader single-writer FIFO queues between the SEs. On most
architectures, the memory model is strong enough that this requires no
explicit synchronization. Many other implementations are of course possible
(using locks or not).

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Chris Thomasson

unread,
Aug 20, 2006, 12:41:27 AM8/20/06
to
"David Hopwood" <david.nosp...@blueyonder.co.uk> wrote in message
news:xaRFg.18235$fV1....@fe1.news.blueyonder.co.uk...

> David Schwartz wrote:
>> David Hopwood wrote:
>>
>>>>Lock-free is nutty. Locks are good because they deschedule contending
>>>>threads. Allowing threads to continue running as they contend is
>>>>pointless.
>>
>>>Mutual exclusion is good partly because it deschedules contending
>>>threads. Locks have many disadvantages that are not shared by the
>>>mechanisms for mutual exclusion used by other concurrency models, such
>>>as message passing, or some transactional models.
>>
>> How do you pass messages without locks to protect the messages? (Or a
>> lock-free message passing algorithm.)
>>
>> You can construct message passing schemes with locks, but you're still
>> using locks to synchronize/exclude.
>
> You seem to be confusing implementation with semantics.
>
[...]

>
> The implementation of a message passing system also does not require
> locks,
> nor is there any particular reason to use them. For example, if we have n
> scheduling entities (SEs) where each process is assigned to an SE, then we
> can use n^2 single-reader single-writer FIFO queues between the SEs. On
> most
> architectures, the memory model is strong enough that this requires no
> explicit synchronization. Many other implementations are of course
> possible
> (using locks or not).

http://groups.google.ca/group/comp.programming.threads/msg/a1f155896395ffb9
( refer to the last paragraph of my response)

Yes. highly efficient message passing between threads/processes can be
achieve with single producer/consumer lock-free FIFO:

http://appcore.home.comcast.net/

Agreed, David?


Chris Thomasson

unread,
Aug 20, 2006, 12:42:13 AM8/20/06
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:JeqdnaP1-vReeHrZ...@comcast.com...

Yikes, I meant:

Agreed, Davis S.

Sorry for any confusion!


Chris Thomasson

unread,
Aug 20, 2006, 12:49:50 AM8/20/06
to
"David Hopwood" <david.nosp...@blueyonder.co.uk> wrote in message
news:xaRFg.18235$fV1....@fe1.news.blueyonder.co.uk...
> David Schwartz wrote:
>> David Hopwood wrote:
[...]

> The implementation of a message passing system also does not require
> locks,
> nor is there any particular reason to use them. For example, if we have n
> scheduling entities (SEs) where each process is assigned to an SE, then we
> can use n^2 single-reader single-writer FIFO queues between the SEs. On
> most
> architectures, the memory model is strong enough that this requires no
> explicit synchronization. Many other implementations are of course
> possible
> (using locks or not).

Also, check this simple scheme out:

http://groups.google.com/group/comp.lang.c++.moderated/msg/7ff4c4ba29d901a9

Multiple per-thread single producer/consumer w/ multiplex solution...


Humm... Interesting...


Chris Thomasson

unread,
Aug 20, 2006, 12:57:09 AM8/20/06
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:OdmdnaJuU7EhenrZ...@comcast.com...

My per-thread 100% lock-free distributed memory allocator is an *award
winning* design that was examined by experts, <marketing side, (can't help
it!)>, CoolThreads Contest:

;)

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69/e3efa5628aad4a82?lnk=gst&q=lock-free+slab+memory+allocator&rnum=2#e3efa5628aad4a82

Is a perfect solution for an allocation scheme that message passing, or any
highly parallel and/or concurrent application... It will be included in my
vZOOM package after the second round winner is announced. I may lose,
however, at least I know that this particular allocator design is still
worth commercialization... Most definitely... Indeed!

:)


Any thoughts?

David Hopwood

unread,
Aug 20, 2006, 1:23:50 AM8/20/06
to
Chris Thomasson wrote:

> "David Hopwood" <david.nosp...@blueyonder.co.uk> wrote:
>
>> The implementation of a message passing system also does not require
>> locks, nor is there any particular reason to use them. For example, if
>> we have n scheduling entities (SEs) where each process is assigned to
>> an SE, then we can use n^2 single-reader single-writer FIFO queues
>> between the SEs. On most architectures, the memory model is strong
>> enough that this requires no explicit synchronization. Many other
>> implementations are of course possible (using locks or not).
>
> http://groups.google.ca/group/comp.programming.threads/msg/a1f155896395ffb9
> ( refer to the last paragraph of my response)

Right. Of course there is a bit more to it than this; for example load
balancing between SEs is important. It can also be useful to have a facility
for "urgent" messages with a higher priority than other messages, or efficient
support for "selective receive" (choosing the next message matching a pattern),
as used in Erlang.

A nice property of this approach is that it is easy to implement strong
message ordering guarantees, such as causal ordering or E-ORDER
(Chapter 19 of http://www.cypherpunks.to/erights/talks/thesis/).

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Schwartz

unread,
Aug 20, 2006, 7:07:50 PM8/20/06
to

David Hopwood wrote:

> You seem to be confusing implementation with semantics.

If you want to argue that synchronization structures above primitives
that provide lock-free semantics are often preferred over ones that
don't, I won't argue with you. But at that point I don't think you're
saying much.

For example, I have a network layer that provides send and receive
queues between the network server or client code and the network I/O
code. I wouldn't call that lock-free or lockless.

Even notionally, when one thread is adding a message to the queue,
another thread checking the queue will either get that message or not
get that message, depending upon which acquired a "notional lock"
first.

If you can say that algorithms that are implemented with locks are
semantically lock free, why can't I say that those algorithms can just
as easily be understood semantically as having locks?

Would I be wrong to describe a message passing scheme as *semantically*
deciding whether one thread gets a message if another thread passes it
at the same time as depending upon which one got there first? And is
"there" somehow semantically different if it's a lock or not?

I guess it depends what level you're looking at. You can always find a
low-enough level where there is a lock of some kind (even if it's 'LOCK
INC' on a P4) and a high-enough level where there isn't.

DS

Chris Thomasson

unread,
Aug 20, 2006, 8:14:28 PM8/20/06
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:gIidnSaAB4fGRHrZ...@comcast.com...

Ouch! Sorry for sounding like a fu%king jerk David S.; I let the old SenderX
out of his box and chains, not always a wise choice!

:O


However, the challenge still stands...

I think that you would have kind of a hard time designing a 100% lock-based
design that can outperform a blend of locks and lock-free techniques... Lets
focus of scalability and throughput...


David Schwartz

unread,
Aug 21, 2006, 3:52:13 AM8/21/06
to

Chris Thomasson wrote:

> I think that you would have kind of a hard time designing a 100% lock-based
> design that can outperform a blend of locks and lock-free techniques... Lets
> focus of scalability and throughput...

Let me be clear, I am not 100% opposed to the use of lock-free
algorithms. I am, however, 100% opposed to cheerleading for lock-free
algorithms without pointing out their weaknesses.

A lock-free algorithm will pretty much always perform so long as there
is enough work that can be done by threads that do not happen to
contend. So lock-free algorithms work best in situations where this is
not likely to be the case, for example inside the kernel and inside the
threading library. However, they work badly in situations where this is
not likely to be the case, which is most application-level structures.

The problem is that this doesn't show on the benchmarks if your
measurement is work done by the algorithm per unit time. It shows up
quite well if your benchmark is work done by the system per unit time.

The reality is that if two threads are both scheduled at the same time
and accessing the same collection, there will be a performance penalty.
The threads will drop to FSB speed and other CPUs will also be
penalized as they contend for the CPU.

The fundamental error in the assumptions made by the cheerleaders for
lock-free is that they forget what it is we are trying to avoid --
contention. Lock-free algorithms are contention maximizing, that is,
they work to convince the system to schedule threads that contend.

DS

Chris Thomasson

unread,
Aug 21, 2006, 4:22:50 AM8/21/06
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1156041205.8...@m79g2000cwm.googlegroups.com...
>
> Joe Seigh wrote:
>> David Schwartz wrote:
>
>> > Lock-free is nutty. Locks are good because they deschedule contending
>> > threads. Allowing threads to continue running as they contend is
>> > pointless.
>
>> It would be if that's what lock-free algorithms actually did.
>
> That is precisely what lock-free algorithms do. Consider, for example,
> two threads accessing the same linked-list.
>
> With a lock-based algorithm, only one thread would be scheduled. The
> remaining thread would run full speed and other CPUs would run other
> tasks.
>
> With a lock-free algorithm, two threads could run on the same
> collection at the same time. Neither would be descheduled and the two
> threads would constantly ping-pong the linked-list across the FSB, each
> running at a crawl.

Remember to keep in mind that the internal state that makes up a mutex can
be thrashed across the FSB as well:

http://groups.google.com/group/comp.programming.threads/msg/7c15b3488642d1a6
(please read whole* msg)


*

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f80fb2901b42e447/7c15b3488642d1a6?#7c15b3488642d1a6
(If you have some time to kill, I would recommend reading through this
thread... Especially the 'Lock-Free Not Useful?' parts)

Enjoy!

;)


Joe Seigh

unread,
Aug 21, 2006, 6:15:40 AM8/21/06
to

You're overlooking the fact that both Chris and myself have been promoting
read lock-free as a solution to the reader/writer problem where it
clearly reduces contention or avoids it altogether.

If we haven't mentioned that lock-free doesn't apply to everything, I should point
out that threading in general doesn't apply to everything and that everyone
here doesn't qualify every single post with that. And there are people out there
who dislike multi-threading intensly and have hissy fits everytime threading is
mention. Nobody here takes them too seriously.

Dave Butenhof

unread,
Aug 21, 2006, 8:16:05 AM8/21/06
to
Barry Kelly wrote:
> Dave Butenhof wrote:
>
>> Threads are widespread and popular because they work, and while they're
>> lower level and sometimes (but not always) more complicated than MPI or
>> OpenMP they flourish because they have distinct advantages -- largely in
>> that they're more general and flexible.
>
> I think they're widespread because, until now, the vast majority of our
> computer systems have had one fast core, and thus the most performant
> approach has actually been to have one thread where possible.

Thread programming has never been focused on "one core", fast or
otherwise. So, sure, if your exclusive target is single core computer
systems, like traditional single core/single processor PCs, then threads
offer nothing more than a decent programming abstraction for concurrent
I/O management. While it's not the BEST abstraction for that purpose,
it's effective and general, and widely understood. Sometimes that's
actually more important than pure "performance". Parallelizing
computation on a single core system is wasteful; but THREADING the
application that performs the computation (and doing it well) means that
it can be faster when you DO have more than one core. And the converse
of your argument is of course that now even many mid-range consumer PCs
DO have multiple cores.

> I don't
> think an explicit threading approach is going to scale to lots of slow
> cores. I think it's useful, as a mental experiment, to imagine what
> programming with (say) 1 million 4kHz cores would be like. What would a
> programming language for that look like? What kind of primitives would
> it have? What would it need in order to be as fast, for today's tasks,
> as a single 4 GHz processor? How do you find enough work to keep all
> those processes busy?

A problem, indeed; and one that far transcends threads in the abstract
and concrete sense. How DOES one keep them busy? How DO they communicate
or cooperate? Intriguing questions, to be sure, and entirely irrelevant
to the discussion here. A mega-node network is complicated; and any form
of temporal determinism is both difficult and extremely expensive. How
do you even manage a coherent stream of time across all of them to
reliably, always, know which events come first? (Or is determinism not
THAT important?)

Nobody so far has done a shared memory multiprocessor on anywhere near
that scale, and one hits the serious obstacles at far lower levels than
"threads", or even an OS. Without shared memory, "threads" are
irrelevant in the sense we use the term, because the whole point is
independent cooperative execution streams within the same address space.

Very possibly an entirely different, and likely entirely NEW model will
prove best. That doesn't mean such a model will be better for the
systems we program and design NOW. Just as you point out that "single"
(or non-) threaded compute-bound programming will generally win on a
single core system, anything that will beat threads in performance and
"programmability" for mega-core systems may well fall flat on its
metaphorical face for systems of mere tens of cores. Besides... people
cling to familiar models that are usable (however barely) rather than
flocking to better models.

>> Non-determinism is good for most purposes, because it's efficient.
>
> Sure - but you need to abstract away the non-determinism before you
> produce a result, otherwise the result itself won't be deterministic.

SOMETIMES, determinism is critical. Most of the time it's not. If I want
to "find" all files in a filesystem with some characteristics, or grep
for matching text, there are certainly times when it's critical that the
output is in some pre-defined order. But most of the time I only want a
complete list, and the faster the better. The same goes for prime
factors, or the likely location of oil deposits, tornados, etc.

"Determinism" is a very relative term. Over-determinism is often bad (or
at least unjustifiably costly); and applying the "wrong" determinism can
even make you miss the answer you "should have wanted".

Deterministic software constructs will always be built on top of
non-determinism... at best, ultimately he's only suggesting that the
management of non-determinism move down a level in programming. He seems
to even admit this at the end: "[Threads] must be relegated to the
engine room of computing, to be suffered only by expert technology
providers." The engine room, after all, is where most of the "real work"
takes place. It's nice for the "control room" up above to require no
more than a wheel and a lever labeled "forward, reverse, and stop"; but
that interface has little impact on the operation of the engine room below.

> I'm interested in the Kahn process networks that he points to at the end
> of the article. They fit in well with the 'futures' concept he also
> mentions, also for example see Herb Sutter's "C++ Futures" talk. I can
> imagine a language that superficially looks like Java/whatever, but
> actually manipulates futures and dataflow pipes. What's more, I've
> programmed systems in the past that would have benefited from this
> approach - in fact, I see them everywhere, flows of data tying input to
> output.

Certainly all useful areas for research, and if this encourages people
to learn more about the ideas he mentions, there's perhaps at least some
small redeeming value. But claiming "For concurrent programming to
become mainstream, we must discard threads as a programming model" is
just stupid. Virtually every application running on Windows or Mac OS X
is multithreaded, and probably most on Linux; even if the application
itself doesn't create any... and there are enormous practical benefits.
Most of them actually work. Concurrent programming IS mainstream, and
has been for quite a while; and most of that concurrency is expressed
(correctly and reliably) using threads. Of course there are bugs, and
programmers who don't know enough to apply the essential concepts
correctly. That's always been true, for any technology, and always will
be true. Deterministic broken programs are no less broken than
non-deterministic broken programs. And in complicated systems
deterministic bugs aren't necessarily even much easier to find.

Very likely there are better concurrency programming models. Certainly
neither C, C++, nor Java are the theoretical ultimate languages to
express algorithms -- but to start out complaining that they're
irrelevant and inherently wrong isn't going to win you many friends. Is
the guy just trying to piss people off and get some free attention?
Perhaps. Maybe he'll even use any notoriety he gains for something more
constructive, though I'm not sure that could justify this little polemic.

Oh, yeah, and then there's his idiotic and juvenile misapplication of
the classic "insanity" joke... guaranteed to win friends and influence
people.

David Schwartz

unread,
Aug 21, 2006, 11:58:17 AM8/21/06
to

Chris Thomasson wrote:

> Remember to keep in mind that the internal state that makes up a mutex can
> be thrashed across the FSB as well:

Only if the implementation stupidly tries to make mutexes too fair.
Otherwise, all but one of the contending threads will rapidly be
descheduled, allowing only threads that do not contend to run in
parallel.

DS

Chris Thomasson

unread,
Aug 21, 2006, 5:34:16 PM8/21/06
to
"Barry Kelly" <barry....@gmail.com> wrote in message
news:ds7de2phe5atbd2ok...@4ax.com...

> Dave Butenhof wrote:
>
>> Threads are widespread and popular because they work, and while they're
>> lower level and sometimes (but not always) more complicated than MPI or
>> OpenMP they flourish because they have distinct advantages -- largely in
>> that they're more general and flexible.
>
> I think they're widespread because, until now, the vast majority of our
> computer systems have had one fast core, and thus the most performant
> approach has actually been to have one thread where possible. I don't
> think an explicit threading approach is going to scale to lots of slow
> cores. I think it's useful, as a mental experiment, to imagine what
> programming with (say) 1 million 4kHz cores would be like. What would a
> programming language for that look like?

Well, for me, the programming language will hopefully look exactly like C
and assembly.

:)


I would treat a million cores as a super computer on a single chip. I would
"try" to avoid things that super computers do not like to execute (e.g.,
atomic-ops/membars). RCU was designed for this sort of thing. I have no
problem with a million+ cores.

Heck, I would LOVE to see vZOOM running on that processor. Neat! It already
has the ability to scale to that many processors, no problem. I am not
intimidated...


> What kind of primitives would
> it have? What would it need in order to be as fast, for today's tasks,
> as a single 4 GHz processor? How do you find enough work to keep all
> those processes busy?

Well, it totally depends on an applications work load... If an application
simply can't generate enough load fast enough to justify using ALL of the
cores on the system, so be it... No problem...


Herb's futures thingy is basically a thread-pool w/ FIFO queue, very
straightforward and simple; IMHO, its almost reinventing the wheel? Can an
implementation of futures scale to one million cores? Most likely, it
depends on the actual implementation of the thread-pool and its queues...
Will an implementation of futures for a 1,000,000+ core processor use pools
of threads behind the scenes; IMO, 100% for sure...

So, what's the problem with threads again?

;)


BTW and FWIW, here is my pseudo-code implementation of a highly distributed
allocator that has the ability to scale up to 1 million cores:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69/e3efa5628aad4a82?lnk=gst&q=lock-free+slab+memory+allocator&rnum=2#e3efa5628aad4a82

We can code scaleable applications with the programming languages that exist
today... If your really interested in getting your application to scale to
million+ cores, I would study up on lock-free distributed algorithms...


Chris Thomasson

unread,
Aug 21, 2006, 5:46:05 PM8/21/06
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:5YGdncTFvv4CuXfZ...@comcast.com...

> "Barry Kelly" <barry....@gmail.com> wrote in message
> news:ds7de2phe5atbd2ok...@4ax.com...
>> Dave Butenhof wrote:
[...]

Here is another example of a highly distributed thread-to-thread
communication scheme that can definitely scale to 1,000,000+ cores:

http://groups.google.ca/group/comp.programming.threads/msg/a1f155896395ffb9

http://groups.google.com/group/comp.programming.threads/msg/e8eede3c6121881f?hl=en

http://groups.google.com/group/comp.programming.threads/msg/ae88b25edcf6cf7f?hl=en

I should do a complete implementation of this... I have done a similar
implementation in the past, but I think that this deserves a rewrite; my
skills have drastically increased since way back then... At least, I HOPE
they have!

;)


Humm... I already have the necessary queuing and eventcount functionality
here:

http://appcore.home.comcast.net/

I think that alls I would need to do is build an extremely robust, and
highly distributed, thread-pool... Then I can mess around with lots of "fun"
stuff... I could do a complete implementation of Herbs futures with this
scheme... Humm...


Would anybody be interested in such a thing?


David Schwartz

unread,
Aug 21, 2006, 8:03:56 PM8/21/06
to

Joe Seigh wrote:

> You're overlooking the fact that both Chris and myself have been promoting
> read lock-free as a solution to the reader/writer problem where it
> clearly reduces contention or avoids it altogether.

I agree about that. On hardware where memory synchronization is cheaper
than a synchronized operation, lock-free readers are a win. The more
frequent readers in comparison to writers, the bigger the win.

> If we haven't mentioned that lock-free doesn't apply to everything, I should point
> out that threading in general doesn't apply to everything and that everyone
> here doesn't qualify every single post with that. And there are people out there
> who dislike multi-threading intensly and have hissy fits everytime threading is
> mention. Nobody here takes them too seriously.

Fair enough. The problem is, I see a lot of harm done by people who
hear the lock-free cheerleaders and write really bad code. It is very
important to understand that lock-free algorithms can perform
wonderfully in a test harness and disastrously in a real-world
application.

For example, I once saw a program that used a lock-free collection in
the case that is the very worst possible for lock-free. It was almost
textbook.

Imagine you have two collections, 1 and 2. Imagine you have two threads
that want to access collection 1, A1 and B1, and two threads that want
to access collection 2, A2 and B2. Each thread accesses the collection
a lot.

With two CPUs, a lock-based collection will quickly schedule A1 with B1
or B2, or A2 with B1 or B2. It will not schedule A1 with B1 or A1 with
B2 because the lock will deschedule the contending thread and allow two
threads that do not contend to run almost immediately.

With a lock-free collection, it is just as likely that A1 and B1 will
run at the same time, blasting the shared collection information across
the FSB. If there was another CPU, say one not allocated to this
process, it would crawls as it fought the two slow threads for the FSB.

However, in a benchmark with just two threads accessing the same
collection, the lock-free collection may look much better. (The CPU
utilization will be higher, and that's good, right? More work per unit
time will get done. Never mind that the CPUs are being used
inefficiently.)

DS

Barry Kelly

unread,
Aug 21, 2006, 8:51:48 PM8/21/06
to
Chris Thomasson wrote:

> "Barry Kelly" <barry....@gmail.com> wrote in message
> news:ds7de2phe5atbd2ok...@4ax.com...
> > Dave Butenhof wrote:
> >
> >> Threads are widespread and popular because they work, and while they're
> >> lower level and sometimes (but not always) more complicated than MPI or
> >> OpenMP they flourish because they have distinct advantages -- largely in
> >> that they're more general and flexible.
> >
> > I think they're widespread because, until now, the vast majority of our
> > computer systems have had one fast core, and thus the most performant
> > approach has actually been to have one thread where possible. I don't
> > think an explicit threading approach is going to scale to lots of slow
> > cores. I think it's useful, as a mental experiment, to imagine what
> > programming with (say) 1 million 4kHz cores would be like. What would a
> > programming language for that look like?
>
> Well, for me, the programming language will hopefully look exactly like C
> and assembly.
>
> :)

Of course you would :) I'm thinking that shared memory starts becoming
the bottleneck - not algorithmically, but bus-wise. It seems to me, as
an interested observer looking on, that there's a trend towards NUMA to
get the architecture to scale. That means that traditional C code, for
which a thread of execution isn't particularly closely tied to a set of
data, wouldn't necessarily map well to the primitives of the
architecture (i.e. cores with local memory, at that stage).

I think that, looking back over the history of C, that it's strongest
feature is the close map of the language primitives to the machine. Not
many other languages of its day had primitives like '+=' and '++' (not
to mention differentiating between pre-increment and post-increment),
yet one can write a relatively simple C compiler that can generate
relatively optimized code, simply because the programmer can use
machine-language-like constructs. Abstractions that aren't native to the
machine, such as odd calling conventions like call-by-name etc., or
closures which generally require fairly complex transformations turning
locals into heap-allocated variables etc., simply aren't included.

If the machine starts looking like something else, a non-von-Neumann
architecture, what happens then? Is C really the best choice?

> > What kind of primitives would
> > it have? What would it need in order to be as fast, for today's tasks,
> > as a single 4 GHz processor? How do you find enough work to keep all
> > those processes busy?
>
> Well, it totally depends on an applications work load... If an application
> simply can't generate enough load fast enough to justify using ALL of the
> cores on the system, so be it... No problem...

I've got ideas about how it could be done, but it's all speculation
about how hardware develops as well as software. The way I see it,
hardware (i.e. CPUs) and software are tied together trying to run a
three-legged race - neither can get too far ahead or behind. From what I
understand of the hardware, shared memory gets more expensive with more
cores. Thus, it seems to me, we've got to look at how to structure
software such that it doesn't require shared memory. IOW, consider
message passing (i.e. other forms of OO), dataflow / functional
approaches etc.

I'm not anti-thread in any way for today and tomorrow's software, since
that's the primitive we have and we can make it work right now with care
and dedication.

Basically, I think the current C-based thread primitive will become
increasingly unwieldy in the decades to come because, slowly but surely,
it will no longer map well to the underlying architecture - simply
because it only ties *code* to a processor, and doesn't encapsulate
*memory*.

> Herb's futures thingy is basically a thread-pool w/ FIFO queue, very
> straightforward and simple; IMHO, its almost reinventing the wheel? Can an
> implementation of futures scale to one million cores? Most likely, it
> depends on the actual implementation of the thread-pool and its queues...
> Will an implementation of futures for a 1,000,000+ core processor use pools
> of threads behind the scenes; IMO, 100% for sure...

WRT to queuing, I don't think it'll be too long before it becomes a CPU
primitive; but what will the primitive look like? I can't see it dealing
with shared memory, but I could consider it working with "local" cores -
cores physically close together on the chip.

> So, what's the problem with threads again?

For massive concurrency in the future, I think the problem is that
threads encapsulate code, and don't encapsulate the notion of local
data. That is, when accessing non-local memory becomes even more
expensive, the source of programs written in C-based languages and their
kind won't reveal the increased cost simply by perusal of the code.

Anyway, this whole subthread is a completely idle speculation, don't
read too much seriousness / rigour into anything I spewed :)

Chris Thomasson

unread,
Aug 22, 2006, 12:45:24 AM8/22/06
to
"Barry Kelly" <barry....@gmail.com> wrote in message
news:0kjke2l0j6lnpaalh...@4ax.com...
> Chris Thomasson wrote:
[...]

>> So, what's the problem with threads again?
>
> For massive concurrency in the future, I think the problem is that
> threads encapsulate code, and don't encapsulate the notion of local
> data.

[...]

Please excuse me for not commenting on your whole post, I should think about
it some more... Anyway:


I think I have to quickly disagree with you on the concept that threads may
have "problems with local data"...


Well, POSIX for a 1,000,000+ processor would, wow...; IMHO, include a highly
efficient "pthread_getspecific_malloc(...) like" function that will allow
you to allocate from "core local" memory. You can use core local memory to
communicate, unless the memory system treats core local access by a foreign
core as an exception; I think that would be a damn shame... I mean, think
about this...

All of the cores are on the same chip!

How hard would it be for core local memory to be accesses by foreign core
just *fractions* of a nanometer away??? Ahhh, let the hardware guys figure
it all out for me...

;)

Visualize entire shared memory as a "network" of per-core-local memory....
Humm. Perhaps... vZOOM could still use proxy gc techniques to get the job
done... Humm.. A couple of hundred thousand "reader" cores executing
searches that get generated by the massive server load of the future, using
absolutely no membars/atomics, beautiful!!!

lol

on extensive linked data-structures... Shared memory may no be obsolete
after all! lol...

;)


Conceptually, a thread would be analogous to a hardware thread, so the
thread could truly be bound to a core... Local memory wrt the thread and the
core would be a natural extension of the POSIX API... IMHO, there could be
the possibility of the current pthread_getspecific(...) function to be
augmented with "core *local* memory techniques"...


> Anyway, this whole subthread is a completely idle speculation, don't
> read too much seriousness / rigour into anything I spewed :)

Humm. You have made some good points...

Thank you.


Chris Thomasson

unread,
Aug 22, 2006, 4:05:42 AM8/22/06
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:rOCdnTVFxMg3FHfZ...@comcast.com...

> "Barry Kelly" <barry....@gmail.com> wrote in message
> news:0kjke2l0j6lnpaalh...@4ax.com...
>> Chris Thomasson wrote:
> [...]
>>> So, what's the problem with threads again?
>>
>> For massive concurrency in the future, I think the problem is that
>> threads encapsulate code, and don't encapsulate the notion of local
>> data.
>
> [...]
>
> Please excuse me for not commenting on your whole post, I should think
> about it some more... Anyway:
>
>
> I think I have to quickly disagree with you on the concept that threads
> may have "problems with local data"...
[...]


Humm... Let me think here...


<brainstorming/babbling-section: RCU-SMR or vZOOM on mega-core, in the year
20XX>


basic hardware assist for implementing "partially copy-on-write deferred
reclamation (PDR)" (e.g., PDR == Proxy GC) techniques on mega-core chips
could, "probably", actually rely on "something" like the following "very
crude" setup:

For your own protection: I am no hardware guy!

lol ;)


In the future, on-chip memory and coherency systems could possibly partition
their numerous cores into programmically adjustable **prioritized logical
groups of cores** during hard OS system boots. The priority and layout of
the logical groups could be adjusted/introspected as needed by the OS via.
inputs from one or two assembly instructions. A user-level application could
use a simple OS API for adjustments/introspection into the logical groups of
cores that are partitioned from the chip . The memory system would generate
some sort of signal, on a periodic/episodic basis, every time it detects
that 'all' of the cores in a logical 'group' recently executed 'anything'
that has functionality analogous to a 'memory barrier'... The
periodic/episodic signals can then be "picked up" by software via a
"special" interrupt/signal handler, perhaps w/ OS assist, "interface" of the
"future"...

;)

crude special handler scheme for PDR:

#define CPU_GROUP_SYNCEPOCH 1
#define CPU_COMPLETE_SYNCEPOCH 2


/* syncepoch processing */
extern void process_cpu_group_epoch(int64_t groupid, int64_t priority);
extern void process_cpu_complete_epoch(void);


/* current syncepoch */
static int64_t volatile p_current_syncepoch = 0;


/* special signal handler */
void special_cpu_syncepoch_signal_handler
( int64_t sig_flags,
int64_t groupid,
int64_t priority ) {

if (sig_flags & CPU_COMPLETE_SYNC_EPOCH) {
process_cpu_complete_epoch();
memory_barrier();


} else if (sig_flags & CPU_GROUP_SYNC_EPOCH) {
process_cpu_group_epoch(groupid, priority);
memory_barrier();
}

}


A group of reader threads gets bound to a logical core group... A logical
group synchronization epoch means that the memory system indicates that all
reader threads in that particular group have just recently went into some
sort of quiescent state (e.g., executed something like a membar)... So you
can focus the PDR polling logic on the reader threads that belong to that
"particular group of cores"... A complete synchronization epoch indicates
that every logical group of cores have just recently went into some sort of
quiescent state (e.g., executed something like a membar)... So you can
focus polling logic for all reader threads in every logical group of
cores...


As you can probably see, this is attempting to come up with some sort of
solution that basically amortizes PDR synchronization epoch detection over a
chips **prioritized logical groups of cores**...

Could work out fine... And its distributed nature seems that synchronization
epoch detection can be highly scaleable.


</brainstorming/babbling-section: back to reality!?!?!>


Any thoughts on my pointless ramblings?


Yikes!

lol ;)


Chris Thomasson

unread,
Aug 25, 2006, 7:45:31 PM8/25/06
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1156175897....@m73g2000cwd.googlegroups.com...

David S... Did you carefully read my post?:

http://groups.google.com/group/comp.programming.threads/msg/7c15b3488642d1a6

You will realize that I mentioned that the internal state of the mutex will
be thrashed across the FSB if your threads make frequent accesses to
uncontended mutexs... IMHO, designing a locking scheme that avoids
contention that results in blocking is an ideal locking design...

However, uncontended mutexs still have overhead... Think about it for a
minute...


Markus Elfring

unread,
Aug 26, 2006, 10:24:34 AM8/26/06
to
> A 100% lock-based solution would not be able to cope with the scalability
> characteristics' exhibited by a effective marriage of lock-based and
> lock-free programming... Hence, my AppCore project, or my vZOOM project....

I am curious on the conditions/licence under with it will become
available after the publication of the final results from the
CoolThreads contest.

A lot of lock based libraries exist. The useable selection for lock-free
implementations seems to be much smaller.
Are there any chances that they move from academic and operating systems
research into more applications?
Would a lock-/wait-free STL be useful?
Do intellectual properties and corresponding patent issues prevent any
progress in this software area here?

Will the situation be improved if well-known atomic interfaces will be
added to the usual toolbox?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/package-summary.html

Is another interesting and open competition running between more
alternative approaches for non-blocking common data structures?
- Academia:
University of Cambridge Computer Laboratory
Brown University, Computer Science Department
University of Rochester, Computer Science Department
University of Pittsburgh, Computer Science Department
Rensselaer Polytechnic Institute, Computer Science Department
Chalmers university of technology, Computing Science Department
University of Illinois, Center for Reliable and High-Performance Computing
Technical University of Dresden, Institute for System Architecture
Columbia University, Computer Music Center
University of Groningen

- Companies:
Parallel Scalable Solutions
Hewlett-Packard Development
Intel
IBM
Sun
Necklace Ltd.
Centre national de création musicale

Regards,
Markus

Joe Seigh

unread,
Aug 26, 2006, 11:34:03 AM8/26/06
to
Markus Elfring wrote:
>> A 100% lock-based solution would not be able to cope with the
>> scalability characteristics' exhibited by a effective marriage of
>> lock-based and lock-free programming... Hence, my AppCore project, or
>> my vZOOM project....
>
>
> I am curious on the conditions/licence under with it will become
> available after the publication of the final results from the
> CoolThreads contest.
>
> A lot of lock based libraries exist. The useable selection for lock-free
> implementations seems to be much smaller.
There's a lack of portable lower level api's to build lock-free libraries
from.

> Are there any chances that they move from academic and operating systems
> research into more applications?

Someone needs to do a killer app first.

> Would a lock-/wait-free STL be useful?

The STL api hard codes single threading. A lock-free library would have
a different api so it wouldn't be plug compatible with STL

> Do intellectual properties and corresponding patent issues prevent any
> progress in this software area here?

IP and patent issues prevent any progress from about the mid 90's when
patenting took off, at least as far as open source is concerned.
Commercial development is limited because relatively few commercial
vendors understand it or are aware of it even.

>
> Will the situation be improved if well-known atomic interfaces will be
> added to the usual toolbox?
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/package-summary.html

The C++00x proposal is very low level. You'd still need the mid level stuff
so you could start creating library based api's. The Java stuff might give you
an idea of what stuff might look like but Java has atomically thread-safe
GC, albeit w/ stop the world, and it is rather hard to add new low level stuff
to Java so progress in Java is slow.

Richard

unread,
Aug 26, 2006, 12:04:09 PM8/26/06
to John Hickin
John Hickin wrote:
> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:1155889979.9...@i42g2000cwa.googlegroups.com...
>> I don't know what I can say other than that I disagree completely with
>> almost everything he said.
>>
>> Fundamentally, concurrency should be the rule with serialization
>> enforced only where there are dependencies.
>
> Finding the dependencies can be hard. Finding new ones can be even harder.
>
>> Having to find the concurrency is harder than having to find the
>> serialization. Suppose you have ten people that have totally different
>> tasks to do, but they only share two cars with drivers. Finding the
>> serialization is easy -- it's the cars and drivers. Finding the
>> concurrency is much harder -- it's every possible combination of
>> things that two people could do at once.
>>
>
> This example is quite good and it explains what I don't like with threads.
> In the real world, 8 of the 10 people in your example will go on to do
> something else. In a multi-threaded world, 8 threads are blocked. Unless, of
> course, you use some sort of lock-free algorithm. Unless I've missed the
> mark, lock-free is quite avant guard, and not taught at the undergraduate
> level. It may be time for a change there.
>
> Anyway, the article was interesting to read. The author is an IEEE Fellow,
> and you don't get that designation without earning it.

I have learned, that this means nothing absolutely nothing.
I once had someone on the hostap mailinglist, that was part of the team,
that worked on IEEE 802.11b. He complained about a not working
feature(association without ssid wasn't working for his adapter), we
mentioned, that a hidden ssid is basically nonsense, since it doesn't
increase your security at all. He totally disagreed until we explained
to him, why and how you can find out quickly the ssid of some AP.

Therefore sometimes people don't know they own code, design etc. So I
wouldn't believe everything, just because someone has a certain name,
degree etc.

>
>
> Regards, John.
>
>

David Hopwood

unread,
Aug 26, 2006, 12:59:06 PM8/26/06
to
Joe Seigh wrote:
> [...] The Java stuff might give you

> an idea of what stuff might look like but Java has atomically thread-safe
> GC, albeit w/ stop the world, [...]

Java (more precisely any given version of it) is a language; it doesn't
define a particular GC algorithm.

The most commonly used implementation, Sun's JDK, has four choices of GC
algorithm in recent versions, including concurrent and incremental
"low pause" collectors. While these do briefly pause all threads, they are
not "stop the world" GCs in the sense that term is normally meant.

<http://java.sun.com/docs/hotspot/gc1.4.2/#4.3.%20When%20to%20Use%20the%20Concurrent%20Low%20Pause%20Collector|outline>
<http://java.sun.com/docs/hotspot/gc1.4.2/#4.5.%20When%20to%20Use%20the%20Incremental%20Low%20Pause%20Collector|outline>

JDK 1.5 has further GC improvements, although I don't know the details.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Markus Elfring

unread,
Aug 26, 2006, 3:49:29 PM8/26/06
to
> JDK 1.5 has further GC improvements, although I don't know the details.

Just look at
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.%20The%20Concurrent%20Low%20Pause%20Collector%7Coutline

Markus Elfring

unread,
Aug 26, 2006, 5:15:48 PM8/26/06