Re: Multicore programming - is that really hard?

Chris Thomasson

unread,

Feb 14, 2008, 12:05:49 PM2/14/08

to

[comp.programming.threads added into discussion]

<hzm...@hotmail.com> wrote in message
news:d7cee607-dd33-4c54...@p69g2000hsa.googlegroups.com...
> I read a number of people saying few knows how to program on a
> multicore computers. They did not elaborate and to me this has become
> rhetoric. We have had symmetric multiprocessors and pthreads for a
> long time. Can't we just use pthreads, or maybe some other things
> too, to program on a multicore computer? Or somehow multicores do
> present some new challenges that we did not have before, like on
> SMPs? Please enlighten me.

Are you trying to achieve very-high scalability? What is your exact problem
domain? PThreads will work fine in most cases... However, if you want to
realize outstanding performance and scalability characteristics, then you
might want to take a look into some other types of algorithms:

http://groups.google.com/group/comp.programming.threads/msg/fdc665e616176dce

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/ef3f26bbe1c52e0b

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/1f6e0e7eca04b6d0

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/205dcaed77941352

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/380ee52d7237fe3a

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/96f280d49a63bb9f

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8329a6ddcb95b8ac

(try and read as much as possible...)

Keep in mind that future multi-core designs are going to revolve around NUMA
which means heavily distributed multi-threaded programming techniques are
essential...:

http://groups.google.com/group/comp.programming.threads/msg/4230a9d3cbc20ea9

http://groups.google.com/group/comp.arch/msg/a7501814f19731cd

http://groups.google.com/group/comp.arch/msg/ac41b1c179f0e7ca

Designing high-speed distributed algorithms is VERY difficult if you don't
know _exactly_ what your doing...

Any questions/comments?

--

tim

unread,

Feb 14, 2008, 12:10:42 PM2/14/08

to

In article
<d7cee607-dd33-4c54...@p69g2000hsa.googlegroups.com>,
> <hzm...@hotmail.com> wrote:
>>I read a number of people saying few know how to program on a multicore

>>computers. They did not elaborate and to me this has become rhetoric.
>>We have had symmetric multiprocessors and pthreads for a long time.
>>Can't we just use pthreads, or maybe some other things too, to program
>>on a multicore computer? Or somehow multicores do present some new
>>challenges that we did not have before, like on SMPs? Please enlighten
>>me.

Some programs are relatively easy to split into massively parallel
threads. A good example is the generation of 3D graphics. Even so, this is
regarded as a difficult area to program, and dropout rates in at least one
graphics programming course I am aware of are very high as a result of the
technical difficulties. To get good performance all sorts of arcane matters
need to be taken into account. Have a look at any of the guides to 3D
graphics programming on the ATI or Nvidia sites for examples.

There is also a big move to exploit the graphics chips for computation. In
some cases this is very successful, but often at a high cost. Algorithms
may have to be redone from scratch. For example traditional sort methods
do not parallelize well and you need to use something like bitonic sort.

Transactional systems such as many web sites can often be split into
threads in a way that is transparent to the programmer. As a result,
transactional programming is not that hard, but you need support from
experts when things go wrong. Suddenly your application starts running
slowly. Why? It could be database contention or many other things. Maybe
you have memory cache line clashes.

Sometimes you can spin off a thread to do some computationally intensive
work while the foreground thread looks after the ordinary work. I did this
for a wav2midi program I was trying to get going. Even in this simple
case, it was annoyingly tedious to get the synchronizing of the threads
working correctly. And because the problems are timing related, they do
not always recur on schedule. So it is hard to test these sorts of
applications. Debugging a large and seriously parallel program can quickly
develop a nightmarish quality.

Operating systems and application servers and databases have had to deal
with these issues for some time. Have a look at the Linux code to deal
with multiple CPUs. It is quite complex.

I'm writing a compiler and I found I can spread the load across multiple
CPUs but to do this required several days of work rewriting parts of the
code. I had to ensure each stage could process its work incrementally,
passing incremental outputs as soon as they were available. I had to put
locks around common data items (like global error counts).

Sometimes the CPU or compiler can parallelize work for you. Modern CPUs do
this automatically, but the limits are quite modest. A lot of hardware is
needed to ensure nothing interferes when it shouldn't. Compilers also can
parallelize work, with varying degrees of success. Gcc does this, which in
part explains why GCC compiles take many more processing cycles than they
used to.

In many cases there is no simple and easy way to split a program into
parallel components in a way that will work correctly and will perform well.

Part of the problem is that programming languages lack good, standardised
support for parallel programming constructs. Another challenge is that the
degree of parallelism required is rapidly increasing.

It all gets harder again when you have non-uniform access to memory.
Search for NUMA for information about this, or look at the PS3 chip
performance documentation.

Tim Josling
au.org.melbpc@tej words backwards.

--

Chris Thomasson

unread,

Feb 14, 2008, 12:20:39 PM2/14/08

to

"tim" <Ti...@internet.com> wrote in message
news:13r81lc...@corp.supernews.com...

> In article
> <d7cee607-dd33-4c54...@p69g2000hsa.googlegroups.com>,
>> <hzm...@hotmail.com> wrote:
>>>I read a number of people saying few know how to program on a multicore
>>>computers. They did not elaborate and to me this has become rhetoric.
>>>We have had symmetric multiprocessors and pthreads for a long time.
>>>Can't we just use pthreads, or maybe some other things too, to program
>>>on a multicore computer? Or somehow multicores do present some new
>>>challenges that we did not have before, like on SMPs? Please enlighten
>>>me.
>
> Some programs are relatively easy to split into massively parallel
> threads. A good example is the generation of 3D graphics. Even so, this is
> regarded as a difficult area to program, and dropout rates in at least one
> graphics programming course I am aware of are very high as a result of the
> technical difficulties. To get good performance all sorts of arcane
> matters
> need to be taken into account. Have a look at any of the guides to 3D
> graphics programming on the ATI or Nvidia sites for examples.
>
> There is also a big move to exploit the graphics chips for computation.

[...]

Indeed:

http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=%22Chris+Thomasson%22+GPU

The GPU can crunch arrays better than the CPU... Turn array into texture and
tell GPU to render it in special ways...

--

David Schwartz

unread,

Feb 14, 2008, 4:57:23 PM2/14/08

to

On Feb 14, 9:05 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> Any questions/comments?

When you hear someone say "multicore programming is hard", 99 times
out of 100 what they really mean is that multi-threading is hard.
This is both true and false.

If you are already familiar with multi-thread programming on SMP
machines, there is almost nothing new you need to learn to program for
multi-core processors. Even if you are writing operating systems, the
differences (caused by things like shared caches) are miniscule.

So if you've been writing high-end applications for a decade or so
that run on SMP machines, you're in great shape. Desktop machines are
just starting to look like servers have forever.

On the other hand, if you never bothered to learn multi-threading
because you were only writing mass-market programs to run on very few
machines that had more than one CPU ...

DS

--

Szabolcs Ferenczi

unread,

Feb 14, 2008, 5:59:23 PM2/14/08

to

On Feb 14, 6:05 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> Any questions/comments?

Yes, concurrent programming is hard.

The situation is similar to the relation between classical physics and
quantum physics. If you have learnt the basic principles in classical
physics, you will be in trouble with the indeterministic world in
quantum physics.

Similarly, if you are very much adapted to sequential/deterministic
programming concepts, you can not very easily adapt to the non-
deterministic nature of concurrent programs.

It is well illustrated with the inadequate programming fragments that
newcomers publish in this forum. They proudly announce that they have
come up with a super fast queue but the get operation returns false
when the queue is empty. Thus, they are not aware of the non-
deterministic nature of the concurrent programs. Q.E.D.

Best Regards,
Szabolcs

--

Dmitriy V'jukov

unread,

Feb 14, 2008, 8:49:06 PM2/14/08

to

[Your PST c.p. spam filtering moderator is going home for the evening.
Post to be sent out tomorrow; I have V-day commitments.]

On 15 фев, 01:59, Szabolcs Ferenczi <szabolcs.feren...@gmail.com> wrote:

> It is well illustrated with the inadequate programming fragments that
> newcomers publish in this forum. They proudly announce that they have
> come up with a super fast queue but the get operation returns false
> when the queue is empty. Thus, they are not aware of the non-
> deterministic nature of the concurrent programs. Q.E.D.

Citation, please. Provide some links to 'non-deterministic aware'
algorithms.
What they must return? Trulse? Falue?

Dmitriy V'jukov

--

Eugene Miya

unread,

Feb 15, 2008, 5:50:10 PM2/15/08

to

The c.p. moderator has to run a field experiment out in Death Valley for
the long weekend. I may have some wireless access but don't count on it
(like a Starbucks or if I swing past JPL or Caltech).
I will for certain be back on line Wednesday. Get posts to me before
430 PST (just under 2 hrs.) or you will have to wait until Wed. or trim
the Newsgroups down to c.p.t. (losing the c.p.).

--

Szabolcs Ferenczi

unread,

Feb 18, 2008, 5:08:22 PM2/18/08

to

On Feb 14, 6:05 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> Any questions/comments?

Yes, I have comments but I removed comp.parallel since it seems to be
over-moderated. So I repeat my message of last Friday, which I have
sent to both list but has been moderated out for some reason:

First about the phrase "multicore programming": It is the trend that
people are trying to use new terms just to indicate that it is about
something very up to date issue. However, most of the time there is
nothing new.

It should be well known that multiple cores, i.e. multiple processors,
can be put together in two basic ways: (1) so that the cores are
coupled via pieces of shared memory; as well as (2) so that cores are
coupled via communication channels only. Multi-threading is concerned
with shared memory communication. On a single processor multi-
threading logically matches with multiple cores coupled via shared
memory but not with multiple cores connected via communication
channels.

What is called multi-threading nowadays is originating back in the
early '70s. It is the amalgam of low level concepts of early
concurrent programming: forking, semaphores, and very much reduced
versions of some more elaborate concepts such as Critical Regions,
Conditional Critical Regions, and Monitors. So much reduced that it
has almost nothing to do with the original concepts.

Not to mention the so-called lock-free and other xxx-free fever, where
people are a-keen to gain a couple of processor cycles in the short
term scheduling but, most of the time, the long term scheduling is
forgotten. I.e. they do not care about how many cycles they loose due
to forgetting about long term synchronization. Thus, one aspect of the
non-deterministic nature of processes are ignored at the macro-level
while enormous effort is put into the micro-level.

Most of the time the misunderstanding comes from that people used to
sequential/deterministic programming cannot really understand the
simple fact: in a multiprogramming environment the "once true, always
true" or "once false, always false" is not valid for the shared
variables. It used to be valid for a sequential program..---This
misunderstanding results in a get operation returning a boolean.

Finally, I would say, there are two _basic_ programming paradigms for
programming the so-called multicore machines both originating from the
'78s. One is the Distributed Processes (Per Brinch Hansen: Distributed
Processes: A Concurrent Programming Concept. Commun. ACM 21(11):
934-941 (1978)) and the other is the Communicating Sequential
Processes (Communicating Sequential Processes, C.A.R. Hoare.
Communications of the ACM 21(8):666-677, August 1978). These two basic
concurrent programming paradigms should give the basis of the so-
called multi-core programming concepts. They map nicely to the two
basic kinds of multiprocessors: shared memory coupled processors and
distributed memory ones.

Especially, CSP is designed in a way that handling of non-determinism
is designed directly into the notation (see e.g. the alternative
construction). You cannot find this feature, i.e. handling of non-
determinism by direct language means, in the current multi-threaded
repertoire. Even worse, todays most popular multi-threading arsenal is
at the level of libraries rather than at the language level.

Anyway, ceterum censeo, it is hard to build a good program from
cooperating sequential processes just because it requires a basically
different thinking.

Best Regards,
Szabolcs

Chris Thomasson

unread,

Feb 18, 2008, 6:46:16 PM2/18/08

to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:d725e878-0f77-407c...@s12g2000prg.googlegroups.com...

> > On Feb 14, 6:05 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> > Any questions/comments?

> Yes, I have comments but I removed comp.parallel since it seems to be
> over-moderated. So I repeat my message of last Friday, which I have
> sent to both list but has been moderated out for some reason:

Eugene is currently busy:

http://groups.google.com/group/comp.programming.threads/msg/03441a0ff6311538

> First about the phrase "multicore programming": It is the trend that
> people are trying to use new terms just to indicate that it is about
> something very up to date issue. However, most of the time there is
> nothing new.

Multi-Core systems are new in the sense that there are now multiple
processors on a single chip. Inter-processor communication is cheaper,
making multi-threading more efficient. However, the new chips that are going
to come out will have a NUMA-like setup. Therefore, distributed threading
techniques will be in demand; especially algorithms like RCU-SMR/vZOOM.

> It should be well known that multiple cores, i.e. multiple processors,
> can be put together in two basic ways: (1) so that the cores are
> coupled via pieces of shared memory; as well as (2) so that cores are
> coupled via communication channels only.
> Multi-threading is concerned
> with shared memory communication. On a single processor multi-
> threading logically matches with multiple cores coupled via shared
> memory but not with multiple cores connected via communication
> channels.

Humm, if I understand you correctly; that's not entirely true. For instance,
you still use multi-threading techniques all over the place on the Cell
processor. I would say that the Cell is a hybrid of shared-memory and
DMA-Channel communications. Each SPU has a main-thread which uses a
DMA-channel interface for communication with other SPU's or the main PPC(s)
where there can be multiple-threads:

http://groups.google.com/group/comp.arch/msg/198a048b98fabe45

http://groups.google.com/group/comp.arch/browse_frm/thread/93b1619857dfce03

They have memory barriers and LL/SC syncronization; They even have LL/SC
hooked up to the DMA channel interface; very cool!:

http://groups.google.com/group/comp.arch/msg/56b9fcc296e8564a

What do you think of the Cell architecture? I like it a lot.

> What is called multi-threading nowadays is originating back in the
> early '70s. It is the amalgam of low level concepts of early
> concurrent programming: forking, semaphores, and very much reduced
> versions of some more elaborate concepts such as Critical Regions,
> Conditional Critical Regions, and Monitors. So much reduced that it
> has almost nothing to do with the original concepts.

> Not to mention the so-called lock-free and other xxx-free fever, where
> people are a-keen to gain a couple of processor cycles in the short
> term scheduling but, most of the time, the long term scheduling is
> forgotten. I.e. they do not care about how many cycles they loose due
> to forgetting about long term synchronization. Thus, one aspect of the
> non-deterministic nature of processes are ignored at the macro-level
> while enormous effort is put into the micro-level.

Are you seriously thinking that xxx-free fever gains only a couple of
processor cycles? There are many applications where the clever application
of such synchronization schemes give you an overall boost in performance and
scalability. The gains are usually an order of magnitude:

http://groups.google.com/group/comp.programming.threads/msg/097c0fe56dda4582

For instance, applying a algorithm such as vZOOM or RCU-SMR to a main-memory
database can give extremely dramatic performance, and more importantly
scalability...

> Most of the time the misunderstanding comes from that people used to
> sequential/deterministic programming cannot really understand the
> simple fact: in a multiprogramming environment the "once true, always
> true" or "once false, always false" is not valid for the shared
> variables. It used to be valid for a sequential program..---This
> misunderstanding results in a get operation returning a boolean.

Who cares if a get function returns boolean? You can still hook it up to
conditional blocking. For instance, an eventcount can convert get function
which returns bool into a get function that returns an object:

___________________________________________________________
struct collection {
[...]

void put(node* n) {};

bool get() {...};
};

struct collection_wrapper {
collection m_col;
eventcount m_ec;

void put(node* n) {
m_col.put(n);
m_ec.signal();
}

node* get() {
node* n;
while (! (n = m_col.get())) {
eventcount::key const key = m_ec.get();
if (n = m_col.get()) { break; }
m_ec.wait(key);
}
return n;
}
};
___________________________________________________________

or you can use a POSIX condvar. I am not too sure why you think a get
function that returns bool is bad...

> Finally, I would say, there are two _basic_ programming paradigms for
> programming the so-called multicore machines both originating from the
> '78s. One is the Distributed Processes (Per Brinch Hansen: Distributed
> Processes: A Concurrent Programming Concept. Commun. ACM 21(11):
> 934-941 (1978)) and the other is the Communicating Sequential
> Processes (Communicating Sequential Processes, C.A.R. Hoare.
> Communications of the ACM 21(8):666-677, August 1978). These two basic
> concurrent programming paradigms should give the basis of the so-
> called multi-core programming concepts. They map nicely to the two
> basic kinds of multiprocessors: shared memory coupled processors and
> distributed memory ones.

I agree that addressing NUMA is important because that's going to be the way
things are wrt future multi-core designs. Also, you mention message-passing.
Lock-free programming can be used to address both of those programming
paradigms quite nicely. In fact, RCU was created to run on NUMA systems.
Erlang, and basically any other message-passing system can greatly beneifit
from using ultra-overhead wait-free queues.

> Especially, CSP is designed in a way that handling of non-determinism
> is designed directly into the notation (see e.g. the alternative
> construction). You cannot find this feature, i.e. handling of non-
> determinism by direct language means, in the current multi-threaded
> repertoire. Even worse, todays most popular multi-threading arsenal is
> at the level of libraries rather than at the language level.

You are kind of sounding like you support the fallacy of the following
article mentioned in this thread:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b192c5ffe9b47926
(read all!)

> Anyway, ceterum censeo, it is hard to build a good program from
> cooperating sequential processes just because it requires a basically
> different thinking.

non-determinism can be a key aspect of scalability. IMHO, trying to hide it
adds too much overhead. It only helps people who cannot program threads.
Their brains cannot think concurrently, there they have to get rid of the
non-deterministic nature of things just so they can begin to think clearly.
Too bad.

Dmitriy V'jukov

unread,

Feb 19, 2008, 1:27:48 AM2/19/08

to

On Feb 19, 2:46 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> > Most of the time the misunderstanding comes from that people used to
> > sequential/deterministic programming cannot really understand the
> > simple fact: in a multiprogramming environment the "once true, always
> > true" or "once false, always false" is not valid for the shared
> > variables. It used to be valid for a sequential program..---This
> > misunderstanding results in a get operation returning a boolean.
>
> Who cares if a get function returns boolean? You can still hook it up to
> conditional blocking. For instance, an eventcount can convert get function
> which returns bool into a get function that returns an object:

Object is even more deterministic than boolean. I think Szabolcs
Ferenczi wants something undeterministic like 'trulse' or 'falue'.
Or he wants to plug-in performance-killing freshness algorithm
everywhere.

> ___________________________________________________________
> struct collection {
> [...]
>
> void put(node* n) {};
>
> bool get() {...};
>
> };
>
> struct collection_wrapper {
> collection m_col;
> eventcount m_ec;
>
> void put(node* n) {
> m_col.put(n);
> m_ec.signal();
> }
>
> node* get() {
> node* n;
> while (! (n = m_col.get())) {
> eventcount::key const key = m_ec.get();
> if (n = m_col.get()) { break; }
> m_ec.wait(key);
> }
> return n;
> }};

More importantly one can combine several producer-consumer queues,
scheduler, epoch-detection mechanism and other stuff with one
eventcount per thread. Something one can not make with blocking
queues.

Dmitriy V'jukov

Chris Thomasson

unread,

Feb 20, 2008, 3:18:13 PM2/20/08

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:k_adnR0N6bsmiyfa...@comcast.com...

^^^^^^^^^^^^^^^^^^^^^

node* get() {...};

Of course!

return value for get is NULL for false; any other value is true.

> };
>
>
> struct collection_wrapper {
> collection m_col;
> eventcount m_ec;
>
> void put(node* n) {
> m_col.put(n);
> m_ec.signal();
> }
>
> node* get() {
> node* n;
> while (! (n = m_col.get())) {
> eventcount::key const key = m_ec.get();
> if (n = m_col.get()) { break; }
> m_ec.wait(key);
> }
> return n;
> }
> };
> ___________________________________________________________

[...]

Phil Hobbs

unread,

Feb 21, 2008, 5:50:57 PM2/21/08

to

Eugene Miya wrote:
> In article <d7cee607-dd33-4c54...@p69g2000hsa.googlegroups.com>,
> <hzm...@hotmail.com> wrote:
>> I read a number of people saying few know how to program on a
>> multicore computers. They did not elaborate and to me this has become
>> rhetoric. We have had symmetric multiprocessors and pthreads for a
>> long time. Can't we just use pthreads, or maybe some other things
>> too, to program on a multicore computer? Or somehow multicores do
>> present some new challenges that we did not have before, like on
>> SMPs? Please enlighten me.
>

> Moderator hat off.
>
> I would doubt new challenges significantly distinct from other threading.
> It has to be be shown otherwise.
> Your "can't?" question can be turned around substituting message passing.
> The more loosely coupled would say that's enough.
>
> %A Marc Mezard
> %T Passing Messages Between Disciplines
> %J Science
> %V 301
> %N 5640
> %D 19 September 2003
> %P 1685-1686
> %K physics/computer science (AI) perspectives,
> information theory error correction, belief propagation (BP),
> discrete optimization satisfiability, statistical physics spin glasses,
> %X Nice short paper on the meeting of 2 disciplines.
> It glosses over certain global/long distance topics, but quite nice.
>
> The CS community can't even agree on a decent parallel programming
> paradigm or model.
>
> And a further step removed can be Why can't we just do it all automagically?
> These guys are merely sitting back waiting, can't stand message passing, and
> have been perhaps burned by attempts at using threads.
>
> Knuth is in this camp. He challenges anyone to parallelize TeX and get
> significant speed up in a reasonably portable way. He noted last April
> or May that he wrote it serially. No prize money mentioned.
>
>
> Anyone else?

Part of the difficulty is that performance on different classes of
problem is so dependent on the degree of connection among processors and
between them and memory. To achieve flat or nearly uniform memory
access, as in a classical SMP, the amount of cache-consistency traffic
grows astronomically with the number of processors and the size of the
caches, and so does its cost and power consumption. Since
single-processor performance requires lots of cache, there's a lot of
pressure to make machines less and less SMP-like. Highly multicore
machines will be doing several more or less unrelated things at once,
and so most of the cache consistency traffic will be meaningless. It's
like the old saw about advertising: "I know half our advertising budget
is wasted, I just don't know which half." It's hard to justify that
much extra hardware when most of it is doing nothing useful, most of the time.

Getting good performance out of a NUMA-type machine requires pretty deep
knowledge of how the machine is organized, and even of what else is
going to be running. That's intrinsically much more complicated and
machine-specific than any uniprocessor or small-SMP program, and it gets
harder and harder as the number of cores increases, which progressively
degrades the interconnection bandwidth and latency.

That's a significant part of the reason the programming model remains
broken, IMO.

Cheers,

Phil Hobbs

--

Chris Thomasson

unread,

Feb 23, 2008, 1:48:34 AM2/23/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:a0f381e5-f9dd-4fad...@s12g2000prg.googlegroups.com...

> On Feb 19, 2:46 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > Most of the time the misunderstanding comes from that people used to
>> > sequential/deterministic programming cannot really understand the
>> > simple fact: in a multiprogramming environment the "once true, always
>> > true" or "once false, always false" is not valid for the shared
>> > variables. It used to be valid for a sequential program..---This
>> > misunderstanding results in a get operation returning a boolean.
>>
>> Who cares if a get function returns boolean? You can still hook it up to
>> conditional blocking. For instance, an eventcount can convert get
>> function
>> which returns bool into a get function that returns an object:
>
> Object is even more deterministic than boolean.

Yeah.

> I think Szabolcs
> Ferenczi wants something undeterministic like 'trulse' or 'falue'.
> Or he wants to plug-in performance-killing freshness algorithm
> everywhere.

I don't know why he does not seem to like get functions that return bool. I
would have to say that, perhaps the function would be better if it were
named something like 'tryget'. In general, I like the flexibility of a
tryget function, over its waitget counterpart. Sometimes I like to try and
perform other tasks if the tryget fails. Also, with external
synchronization, one could always convert a tryget to waitget. It does not
have to be in the base algorithm itself.

>
>
>> ___________________________________________________________
[...]

>
> More importantly one can combine several producer-consumer queues,
> scheduler, epoch-detection mechanism and other stuff with one
> eventcount per thread. Something one can not make with blocking
> queues.

Yeah, eventcount's are fun to play with! ;^)

IMHO, the low-overhead implementation adds to the overall functionality of
many non-blocking algorithms. The added ability is the all important
conditional blocking ability. It's the ultimate marriage between
lock/wait-free and lock-based programming. The ability to separate out naked
signals from fence signals is a plus. In the next update of AppCore, I am
adding the ability to just send a single signal to the eventcount waitset,
instead of just broadcasting. Here is native prototype implementation for
Windows:

http://groups.google.com/group/comp.os.ms-windows.programmer.win32/browse_frm/thread/ad679279179144e

Funny, on windows, I find it easier to implement a native eventcount first,
and stick a condvar on top of it... Rather than implement a condvar, and use
it as a slow-path in the eventcount pseudo-code I posted some years ago.

Chris Thomasson

unread,

Mar 6, 2008, 1:04:56 PM3/6/08

to

[comp.programming.threads added...]
[C.P. Moderator: OK.]

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message

news:3c6d1366-fe78-4f67...@e10g2000prf.googlegroups.com...
> On Mar 2, 4:56 pm, hzmo...@hotmail.com wrote:
>> ... How are multicores different from
>> multiprocessors (i.e. sockets)? Didn't all these problems these
>> researchers are trying to solve now already exist when multiprocessors
>> emerged many years ago? Are these researchers re-discovering these
>> problems and perhaps re-inventing the solutions (or the solutions had
>> never been found)?
>
> I think multicores are not any different from (shared memory)
> multiprocessors from the point of view of the programming model. On
> the other hand, multicores are quite different from the parallel
> computers such as vector processors.

Multi-Core systems are new in the sense that there are now multiple
processors on a single chip. Inter-processor communication is cheaper,

making multi-threading more efficient. If you want something like vector
processors, well, try some general-purpose GPU programming:

http://groups.google.com/group/comp.parallel/msg/7869ec858cfe7883

[...]

> Now, from the programming point of view, there emerged two basic
> paradigms: (1) the shared resource based concurrent programs, and
> (2) message based concurrent programs.

How do you account for the fact that nearly all of the existing
message-based languages are implemented with shared-memory multi-threading?

> What we call multi-threading today, belongs to the first class.

Humm, if I understand you correctly; that's not entirely true. For instance,
you still use multi-threading techniques all over the place on the Cell
processor. I would say that the Cell is a hybrid of shared-memory and
DMA-Channel communications. Each SPU has a main-thread which uses a
DMA-channel interface for communication with other SPU's or the main PPC(s)
where there can be multiple-threads:

http://groups.google.com/group/comp.arch/msg/198a048b98fabe45

http://groups.google.com/group/comp.arch/browse_frm/thread/93b1619857...

They have memory barriers and LL/SC syncronization; They even have LL/SC
hooked up to the DMA channel interface; very cool!:

http://groups.google.com/group/comp.arch/msg/56b9fcc296e8564a

What do you think of the Cell architecture?

> You have a correct guess that in early
> days of the programming of shared memory multiprocessors there has
> been attempts to come up with language level solutions such as
> Critical Regions (CR), Conditional Critical Regions (CCR), and
> Monitors. Unfortunately, none of these language concepts are taken
> over properly in today's multi-threaded programming repertoire.

Non-sense! How do you explain POSIX Threads? Of course, we have been over
this:

http://groups.google.com/group/comp.programming.threads/msg/ed695d0cb48fc551

http://groups.google.com/group/comp.programming.threads/msg/7cfbb24f5e6d5a3f

> Probably the researchers of multicore programming should do nothing
> else but carefully study and properly adapt what has been already
> discovered in the programming of shared memory multiprocessors.

Tell that to SUN research:

http://research.sun.com/projects/dashboard.php?id=29

They are into developing new lock/wait-free algorithms... However, quite a
few of these algorithms originated in the 80's after CAS was invented...
Anyway, why should I not try invent new non-blocking synchronization
algorithms?

--

Szabolcs Ferenczi

unread,

Mar 6, 2008, 5:02:16 PM3/6/08

to

On Mar 6, 7:04 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Szabolcs Ferenczi" <szabolcs.feren...@gmail.com> wrote in message

> news:3c6d1366-fe78-4f67...@e10g2000prf.googlegroups.com...
> > On Mar 2, 4:56 pm, hzmo...@hotmail.com wrote:
> >> ... How are multicores different from
> >> multiprocessors (i.e. sockets)? Didn't all these problems these
> >> researchers are trying to solve now already exist when multiprocessors
> >> emerged many years ago? Are these researchers re-discovering these
> >> problems and perhaps re-inventing the solutions (or the solutions had
> >> never been found)?
>
> > I think multicores are not any different from (shared memory)
> > multiprocessors from the point of view of the programming model. On
> > the other hand, multicores are quite different from the parallel
> > computers such as vector processors.
>
> Multi-Core systems are new in the sense that there are now multiple
> processors on a single chip. Inter-processor communication is cheaper,
> making multi-threading more efficient.

What you are talking about are qualitative issues only but not any
architectural issues. Besides, I was talking about the programming model.

>From the point of view of the programming model, I even should not
have used the term memory but instead I should have used address space
which would have been more appropriate.

> If you want something like vector
> processors,

No, I do not want any vector processors. If I want one, I get access
to it. I just mentioned it as an example for a non-MIMD architecture.

> ...
> > Now, from the programming point of view, there emerged two basic
> > paradigms: (1) the shared resource based concurrent programs, and
> > (2) message based concurrent programs.
>
> How do you account for the fact that nearly all of the existing
> message-based languages are implemented with shared-memory multi-threading?

I am afraid you do not know again what you are talking about. You can
implement message-based programs on the single processor as well. You
can even implement multi-threading on a single processor. Oh yes, you
favourite cell model can also be implemented on the sequential
computer. So your counter argument just does not apply. You intermix
some issues again.

> > What we call multi-threading today, belongs to the first class.
>
> Humm, if I understand you correctly;

No, I am afraid you do not.

> ...

> What do you think of the Cell architecture?

I think we are not talking about it in this discussion thread. If you
want to discuss them, please open a discussion thread about it and
dump your valuable thoughts about it there. You are welcome.

> > You have a correct guess that in early
> > days of the programming of shared memory multiprocessors there has
> > been attempts to come up with language level solutions such as
> > Critical Regions (CR), Conditional Critical Regions (CCR), and
> > Monitors. Unfortunately, none of these language concepts are taken
> > over properly in today's multi-threaded programming repertoire.
>
> Non-sense! How do you explain POSIX Threads?

You call something "non-sense" what you do not know about. On the
other hand, if you do not know POSIX threads either, I am not going to
explain it to you here, just use Google. Google is your friend.
Besides, POSIX threads are not at the language level and I was talking
about language means. If you do not understand this, please refrain
yourself calling it "non-sense".

> ...

> > Probably the researchers of multicore programming should do nothing
> > else but carefully study and properly adapt what has been already
> > discovered in the programming of shared memory multiprocessors.
>
> Tell that to SUN research:
>
> http://research.sun.com/projects/dashboard.php?id=29
>
> They are into developing new lock/wait-free algorithms...

Lock-free algorithms are some very low level programming techniques
but they have no language consequences whatsoever. At least so far
not. You can try and work on the language issues for the lock-free
techniques, of course. Nevertheless, since lock-free techniques are
even more complicated and therefore more error prone ones, they make
multicore programming even harder.

Best Regards,
Szabolcs

--

Eugene Miya

unread,

Mar 6, 2008, 5:15:17 PM3/6/08

to

In article <d800ca6e-33de-4b48...@s37g2000prg.googlegroups.com>,
<hzm...@hotmail.com> wrote:
>my original questions:

>How are multicores different from multiprocessors (i.e. sockets)?

Well, I come from software. A socket means Berkeley-style IPC to me.

>Didn't all these problems these researchers are trying to
>solve now already exist when multiprocessors emerged many years ago?

Well, let me ask you rhetorically:
do you think those researchers solved those problems years ago?

Think about that for a moment. Give yourself context and assumptions.

No cheating. Take the moment.

The answer should be no. Two reasons for this: we overconcentrated on
hardware (the easy part) and we never came up with software transition
aids which didn't have great limitations. The community assumed we
would leave that to programmers. ha.

The other thing is which multiprocessors? We have no standardized
architecture, and likely won't in the foreseeable future, and you can
expect to wallow in that.

>Are these researchers re-discovering these problems and
>perhaps re-inventing the solutions (or the solutions had
>never been found)?

General solutions have likely never been found.
Well, we have had some limited success with vectorization (easy
vectorization, the history of that is some what interesting but not seen
by most programmers). I'm not certain I would use the word
"re-discovered." Whom do you think these researchers are?
Do you mean compilers people?
Do you realize the number of institutions where a person can go learn
about parallelizing compilers in the USA can be counted on 1 hand even
missing a couple of fingers?

--

Chris Thomasson

unread,

Mar 6, 2008, 7:49:51 PM3/6/08

to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message

news:095060e2-43a5-4ab0...@m36g2000hse.googlegroups.com...

>
> On Mar 6, 7:04 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Szabolcs Ferenczi" <szabolcs.feren...@gmail.com> wrote in message
>> news:3c6d1366-fe78-4f67...@e10g2000prf.googlegroups.com...
>> > On Mar 2, 4:56 pm, hzmo...@hotmail.com wrote:

[...]

>> > Now, from the programming point of view, there emerged two basic
>> > paradigms: (1) the shared resource based concurrent programs, and
>> > (2) message based concurrent programs.
>>
>> How do you account for the fact that nearly all of the existing
>> message-based languages are implemented with shared-memory
>> multi-threading?
>
> I am afraid you do not know again what you are talking about.

I am pointing out that there are more than 2 basic paradigms. You forget
hybrid approaches.

> You can
> implement message-based programs on the single processor as well. You
> can even implement multi-threading on a single processor. Oh yes, you
> favourite cell model can also be implemented on the sequential
> computer. So your counter argument just does not apply. You intermix
> some issues again.

>> > What we call multi-threading today, belongs to the first class.
>>
>> Humm, if I understand you correctly;
>
> No, I am afraid you do not.
>
>> ...
>> What do you think of the Cell architecture?
>
> I think we are not talking about it in this discussion thread. If you
> want to discuss them, please open a discussion thread about it and
> dump your valuable thoughts about it there. You are welcome.

The Cell architecture is a good example model for a shared-memory
message-passing hybrid. You mention those two models as if they are mutually
excludable. I am trying to tell you that they are not. It seems like you
fail to understand that shared-memory multi-threading and message-passing
can be efficently combined. Its not one or the other.

>> > You have a correct guess that in early
>> > days of the programming of shared memory multiprocessors there has
>> > been attempts to come up with language level solutions such as
>> > Critical Regions (CR), Conditional Critical Regions (CCR), and
>> > Monitors. Unfortunately, none of these language concepts are taken
>> > over properly in today's multi-threaded programming repertoire.
>>
>> Non-sense! How do you explain POSIX Threads?
>
> You call something "non-sense" what you do not know about. On the
> other hand, if you do not know POSIX threads either, I am not going to
> explain it to you here, just use Google. Google is your friend.
> Besides, POSIX threads are not at the language level and I was talking
> about language means. If you do not understand this, please refrain
> yourself calling it "non-sense".

I have to say non-sense again... your assertion that POSIX threads are not
at the language level is erroneous. POSIX Threads is most certainly on the
language level as it puts restrictions on what a C compiler can do. Please
read here:

http://groups.google.com/group/comp.programming.threads/msg/729f412608a8570d

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/63f6360d939612b3

POSIX Threads and C have a relationship that goes back many years.

>> > Probably the researchers of multicore programming should do nothing
>> > else but carefully study and properly adapt what has been already
>> > discovered in the programming of shared memory multiprocessors.
>>
>> Tell that to SUN research:
>>
>> http://research.sun.com/projects/dashboard.php?id=29
>>
>> They are into developing new lock/wait-free algorithms...
>
> Lock-free algorithms are some very low level programming techniques
> but they have no language consequences whatsoever. At least so far
> not.

What do you mean they have no language consequences? I suggest you read this
entire thread:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/29ea516c5581240e

> You can try and work on the language issues for the lock-free
> techniques, of course.

Standard C++ is going to support all the tools you need to create high-end
non-blocking algorithms. Atomic operations and memory barrier functionality
is going to be defined by the Standard, and become an integral part of the
C++ language.

Java has standardized support for lock/wait-free algorithms as well.

> Nevertheless, since lock-free techniques are
> even more complicated and therefore more error prone ones, they make
> multicore programming even harder.

This kind of sounds like somebody attacking C because it does not check
array boundaries or something. Anyway, I am happy to know that C++ will
allow programmers to create very high-performance multi-threaded
applications under a common Standard.

--

Eugene Miya

unread,

Mar 13, 2008, 5:53:41 PM3/13/08

to

In article <D5-dnY9pof5r5lLa...@comcast.com>,
Chris Thomasson <cri...@comcast.net> wrote:
>http://groups.google.com/group/comp.parallel/msg/7869ec858cfe7883
...
>http://groups.google.com/group/comp.arch/msg/198a048b98fabe45
...
>http://groups.google.com/group/comp.arch/browse_frm/thread/93b1619857...
...
>http://groups.google.com/group/comp.arch/msg/56b9fcc296e8564a
...

>> Critical Regions (CR), Conditional Critical Regions (CCR), and

...
>http://groups.google.com/group/comp.programming.threads/msg/ed695d0cb48fc551
>
>http://groups.google.com/group/comp.programming.threads/msg/7cfbb24f5e6d5a3f

All these archive links are dead.

>Tell that to SUN research:
>http://research.sun.com/projects/dashboard.php?id=29

This link remains alive.
--

Chris Thomasson

unread,

Mar 14, 2008, 2:59:29 PM3/14/08

to

"Eugene Miya" <eug...@cse.ucsc.edu> wrote in message
news:47d99455$1@darkstar...

> In article <D5-dnY9pof5r5lLa...@comcast.com>,
> Chris Thomasson <cri...@comcast.net> wrote:
>>http://groups.google.com/group/comp.parallel/msg/7869ec858cfe7883
> ...
>>http://groups.google.com/group/comp.arch/msg/198a048b98fabe45
> ...

>>http://groups.google.com/group/comp.arch/browse_frm/thread/93b1619857...
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The link above is the only one which shows up with [no subject] error.

> ...
>>http://groups.google.com/group/comp.arch/msg/56b9fcc296e8564a
> ...
>>> Critical Regions (CR), Conditional Critical Regions (CCR), and
> ...
>>http://groups.google.com/group/comp.programming.threads/msg/ed695d0cb48fc551
>>
>>http://groups.google.com/group/comp.programming.threads/msg/7cfbb24f5e6d5a3f
>
>
> All these archive links are dead.

They take to proper Google Groups pages when I click on them. Humm, weird...
On my end, all but one link you listed is dead; perhaps Google was doing
some maintenance or something...

[Moderator: Possible; even likely.]

Szabolcs Ferenczi

unread,

Mar 14, 2008, 3:59:20 PM3/14/08

to

On Feb 13, 11:43 pm, hzmo...@hotmail.com wrote:
> I read a number of people saying few knows how to program on a

> multicore computers. They did not elaborate and to me this has become
> rhetoric. We have had symmetric multiprocessors and pthreads for a
> long time.

First of all let me stress that multicore programming is hard.
It is hard because it is basically concurrent programming.
Concurrent programming requires a kind of thinking quite different to sequential
programming and it is hard to get rid of old habits. We can see
symptoms when people are trying to make concurrent programs with a
mind imprinted with sequential/deterministic programming habits. Those
attempts result in over controlling the interactions thus making the
system inefficient.

Let me note also that from the point of view of scaling, shared memory
multiprocessors (called multicores nowadays) are not so good as
distributed memory multiprocessors. From the programming point of
view, on the other hand, programming with shared memory is easier
since the data structures are somewhat closer to the ones in
sequential programs. Only access conflicts must be avoided, which is
not easy though.

From the programming point of view, the key feature is whether a
programming language has any statement suitable for handling non-
deterministic events. Practically, it means that the Guarded Commands
(GC) of Dijkstra must be adapted. It is irrespective of whether
communication happens via message exchange or via shared resources.
Dijkstra's GC has been adapted to message based language concepts
already since it is one of the basic elements in the Communicating
Sequential Processes language proposal (CSP) of Hoare. Brinch Hansen,
on the other hand, has shown in the Edison language how to adapt GC to
shared resource communication.

In the Edison language it is like this:

when B1 do SL1
else B2 do SL2
else Bn do SLn
end

When a process arrives at this statement, it will be delayed until one
of the conditions B1 .. Bn is true. When one is true, the
corresponding statement is performed, if more are true, one is
selected and the corresponding statement is performed. Note that the
`when' statement in Edison is a Conditional Critical Region with
multiple conditions. If you want a simple Critical Region you just
specify the condition `true': when true do S.

Multicore programming is especially hard because one observation from
Dijkstra is often ignored by the novice. It should always be kept in
mind by anyone making concurrent programs that in a uniprogramming
environment we have "once all false, always all false", which
justifies the use of the deterministic selection statements such as
`if' in sequential programs. In a multiprogramming environment,
inspecting the state of the shared variables or communication channels
should not result in the thread going on and "doing something else"
when "all false". Rather the thread must wait until one of the
conditions become true and then take the corresponding action.

> Can't we just use pthreads, or maybe some other things
> too, to program on a multicore computer? Or somehow multicores do
> present some new challenges that we did not have before, like on
> SMPs? Please enlighten me.

Working with library-level calls like the pthread library is almost as
error prone and is at a low level in concurrent programming as it is
programming in assembly language.

It is necessary that high level programming languages incorporate new
constructions such as the Guarded Commands and Conditional Critical
Regions so that programmers should be able to think at a higher level
of abstraction. Also compilers will have a chance to filter out most
of the programming errors that are made nowadays just because of the
low level of programming (e.g. pthread library calls).

The other low level feature in the pthread-like libraries is the
forking. In high level languages the main tool for process creation
should be the structured form of parallel statement. There are some
attempts towards it, but (of course) it is not called parallel
statement but rather something like `fork/join'.

Best Regards,
Szabolcs

--

Chris Thomasson

unread,

Mar 17, 2008, 1:48:20 PM3/17/08

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:7q2dnbmwifeeMETa...@comcast.com...

>
> "Eugene Miya" <eug...@cse.ucsc.edu> wrote in message
> news:47d99455$1@darkstar...

[...]

>> ...
>>>http://groups.google.com/group/comp.arch/msg/56b9fcc296e8564a
>> ...
>>>> Critical Regions (CR), Conditional Critical Regions (CCR), and
>> ...
>>>http://groups.google.com/group/comp.programming.threads/msg/ed695d0cb48fc551
>>>
>>>http://groups.google.com/group/comp.programming.threads/msg/7cfbb24f5e6d5a3f
>>
>>
>> All these archive links are dead.
>
> They take to proper Google Groups pages when I click on them. Humm,
> weird...
> On my end, all but one link you listed is dead; perhaps Google was doing
> some maintenance or something...

Crap!! Well, I meant to say:

All but _one_ link takes me to proper Google Groups pages when I click on
them.
Humm, weird... On my end, _only__one__link_ you listed is dead; perhaps

Google was doing some maintenance or something...

> [Moderator: Possible; even likely.]

I think I cut a couple of characters off the dead one (e.g., no subject).

Sorry for the major typo!

Jesper Larsson Traff

unread,

Mar 19, 2008, 1:12:50 PM3/19/08

to

Just for information, some of the issues that have come
up in this thread will be discussed at the IPDPS 2008 Panel:

"How to avoid making the same Mistakes all over again:
What the parallel-processing Community has (failed) to offer
the multi/many-core Generation"

Panelists

Hideharu Amano, Keio University
Anwar Ghuloum, Intel
John Gustavson, Clearspeed
Keshav Pingali, University of Austin, Texas
Vivek Sarkar, Rice University, Texas
Uzi Vishkin, University of Maryland
Kathy Yelick, University of Berkeley, California

best regards

Jesper Larsson Traff, IPDPS 2008 Panel moderator
NEC Laboratories Europe, St. Augustin, Germany

hzm...@hotmail.com wrote:

>
> I read a number of people saying few knows how to program on a
> multicore computers. They did not elaborate and to me this has become
> rhetoric. We have had symmetric multiprocessors and pthreads for a

> long time. Can't we just use pthreads, or maybe some other things

> too, to program on a multicore computer? Or somehow multicores do
> present some new challenges that we did not have before, like on
> SMPs? Please enlighten me.

--

Szabolcs Ferenczi

unread,

Apr 19, 2008, 4:20:50 PM4/19/08

to

On Mar 19, 7:12 pm, Jesper Larsson Traff <tr...@ccrl-nece.de> wrote:
> Just for information, some of the issues that have come
> up in this thread will be discussed at the IPDPS 2008 Panel:
>
> "How to avoid making the same Mistakes all over again:
> What the parallel-processing Community has (failed) to offer
> the multi/many-core Generation"
>
> Panelists
>
> Hideharu Amano, Keio University
> Anwar Ghuloum, Intel
> John Gustavson, Clearspeed
> Keshav Pingali, University of Austin, Texas
> Vivek Sarkar, Rice University, Texas
> Uzi Vishkin, University of Maryland
> Kathy Yelick, University of Berkeley, California
>
> best regards
>
> Jesper Larsson Traff, IPDPS 2008 Panel moderator
> NEC Laboratories Europe, St. Augustin, Germany

Can you summarise the discussion for us, please?

Best Regards,
Szabolcs

--