Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Multithreaded programming: is the C++ standardization committee listening?

8 views
Skip to first unread message

Andrei Alexandrescu (See Website for Email)

unread,
Aug 18, 2004, 8:05:56 AM8/18/04
to
Years ago, Java decided to add support for multithreading to the core
language. Many MT experts said that the support was not very good. Standard
C++, as we all know, chose (and as far as I understand, is undecided as of
the next standardization iteration) to stay away from MT altogether.

Since then, a predictable sequence of events has happened.

The Java community got better at what they were doing, and the C++ community
continued to segregate deeper and deeper among standard-conforming
programmers who don't use MT, multi-standard programmers who use, say,
pthreads and its C binding (and tread carefully as far as mixing C++ with
pthreads go), and hard-core MT programmers who use nonportable documentation
provided by their own platform and compiler.

Once it became evident that threads must be part of the language, the
pursuit of better language-level abstractions for multithreading naturally
continued within the Java research and industrial communities. Also,
hard-core C and C++ MT programmers became a more and more isolated clique
(occasionally perceived at snooty by the standard-compliant programmers who
don't understand the complexities that are at stake).

As a result, much of the research on the recent lock-free data structures
(which surpass the now-familiar lock-based structures in performance) has
been done in Java or uses C/C++ only as "pseudocode", the real C/C++ code
being not portable.

Java even has lock-free structures in their current distribution
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html).
If this trend will continue, soon Java will be the language of choice for
portable multithreaded programming, and standard C++ will be out in the cold
clenching its teeth.

Does the C++ standardization committee plan to address threads at all, or
continue to ignore them?


Andrei

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Francis Glassborow

unread,
Aug 18, 2004, 2:34:45 PM8/18/04
to
In article <2ofbi8F...@uni-berlin.de>, "Andrei Alexandrescu (See
Website for Email)" <SeeWebsit...@moderncppdesign.com> writes

>Does the C++ standardization committee plan to address threads at all, or
>continue to ignore them?

I would have thought that someone with your level of expertise would be
exactly the kind of person WG21 could expect to propose such a feature,
and then pursue the issue to its completion. What about a preliminary
paper from you (which should include consideration of current library
based proposals) for the next meeting. That meeting is just down the
road for you and so it should not be too difficult for you to attend and
present the paper.


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

David

unread,
Aug 18, 2004, 2:46:55 PM8/18/04
to

Andrei,

While I understand your position, it would be unreasonable to add
a thread concept to either the C or C++ standards. The primary
reason for this would be that the compiler would need to know how
the target computer implements threads and if it did not, for it to
provide some underlying code to pretend that threading did exist.

While we can compare C/C++ vs. Java as languages, Java also has
certain other concepts that C/C++ does not. Java has the sandbox,
files, threads/processes, and other concepts more associated with
a definition of an operating system.

While it may be useful to have C/C++ provide these extensions,
their use/capability/reliability might be unpredictable on certain
systems. C/C++ was not intended to provide the notion of even a
single thread -- it defines a sequence of operations.

Much of my work is on systems that are at the embedded or
kernel level. Perhaps 40% of my time is spent in the apllication
world. Yet most of my code is C/C++. Assembler, Forth, Java,
Ada, and a few others are used perhaps once a year. Adding
a thread concept to C/C++ would not be beneficial at my level
of coding since it would require knowledge of the system.

David

Andrei Alexandrescu (See Website for Email)

unread,
Aug 19, 2004, 8:01:01 AM8/19/04
to
"David" <FlyLike...@United.Com> wrote in message
news:rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost...

> While I understand your position, it would be unreasonable to add
> a thread concept to either the C or C++ standards. The primary
> reason for this would be that the compiler would need to know how
> the target computer implements threads and if it did not, for it to
> provide some underlying code to pretend that threading did exist.

Not necessarily. That's actually a fallacy that I'd like to dispel right
here and now.

When writing

std::cout << "Hello, world! World? World? Where are you?\n";

a program might not display anything anywhere. There are systems without a
console, and there are systems (such as Windows) in which you could display
a console if you wanted, but only by using nonstandard calls. On those
systems, the code above compiles and links but doesn't do zilch.

And nobody blinks an eye.

Do systems lacking a console simulate it? No. Why, then, saying that no
thread support means that the language would need to simulate it on machines
that don't have it? On those machines, calls to thread creation functions
would fail, and the synchronization-related keywords and classes will do
nothing.

And nobody will blink an eye.

> While we can compare C/C++ vs. Java as languages, Java also has
> certain other concepts that C/C++ does not. Java has the sandbox,
> files, threads/processes, and other concepts more associated with
> a definition of an operating system.

C++ does have files, doesn't it. "Java already has things C++ doesn't" is
not a serious argument anyway. If Java has good things, then why not looking
into doing those good things, and preferably better.

> While it may be useful to have C/C++ provide these extensions,
> their use/capability/reliability might be unpredictable on certain
> systems. C/C++ was not intended to provide the notion of even a
> single thread -- it defines a sequence of operations.

I know that. I am talking about the upcoming standardization step.

> Much of my work is on systems that are at the embedded or
> kernel level. Perhaps 40% of my time is spent in the apllication
> world.

Programmer time or application time? :o)

> Yet most of my code is C/C++. Assembler, Forth, Java,
> Ada, and a few others are used perhaps once a year. Adding
> a thread concept to C/C++ would not be beneficial at my level
> of coding since it would require knowledge of the system.

I understand that. But providing performant portable threading would empower
a lot of programmers who spend 90% of their programming time writing
portable applications.


Andrei

Hyman Rosen

unread,
Aug 19, 2004, 8:08:03 AM8/19/04
to
David wrote:
> The primary reason for this would be that the compiler would need
> to know how the target computer implements threads and if it did
> not, for it to provide some underlying code to pretend that threading
> did exist.

First of all, how is "pretend threading" different from "real threading"?
Second, Ada provides support for concurrency defined portably within the
language without any trouble at all. So does Java, for that matter, albeit
less well.

> Much of my work is on systems that are at the embedded or kernel level.

Then you should certainly be aware of Ada and its tasking mechanisms.

Adam Zell

unread,
Aug 19, 2004, 8:12:48 AM8/19/04
to
"Andrei Alexandrescu \(See Website for Email\)" <SeeWebsit...@moderncppdesign.com> wrote:
: Java even has lock-free structures in their current distribution

: (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html).
: If this trend will continue, soon Java will be the language of choice for
: portable multithreaded programming, and standard C++ will be out in the cold
: clenching its teeth.
:
Many lock-free algorithms depend on garbage collection, which puts the discussion
back into the great GC debate. Other methods for GC-like management (such as Michael's
SMR) are complex, or untested in production.

http://www.research.ibm.com/people/m/michael/podc-2002.pdf
http://www.cs.tau.ac.il/~shanir/reading%20group/p73-michael.pdf

Antonio Maiorano

unread,
Aug 19, 2004, 8:28:05 PM8/19/04
to
"David" <FlyLike...@United.Com> wrote in message
news:rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost...

<snip>


> Much of my work is on systems that are at the embedded or
> kernel level. Perhaps 40% of my time is spent in the apllication
> world. Yet most of my code is C/C++. Assembler, Forth, Java,
> Ada, and a few others are used perhaps once a year. Adding
> a thread concept to C/C++ would not be beneficial at my level
> of coding since it would require knowledge of the system.
>
> David

Hi David,

I'm not sure I understand your reasoning. You say that it would not be
beneficial to have threading concepts in C++ because it would require for
you to have knowledge of the system, but that is exactly the problem we have
now because there is NO threading concept in C++. Now to code a
multithreaded program in C/C++, you must read and understand the
platform-specific APIs. Not only is this time-consuming, but the subtle
differences between platforms are usually learned the hard way.

Perhaps you meant to say that programming multithreaded code is different on
all systems, and should not be part of the language. This has often been the
argument against it, but I must disagree with this wholeheartedly. Wherever
I've worked, we've always had some abstraction of threads and thread locking
constructs in some library that is cross-platform. Indeed, Boost.Thread
exists for this reason. Fact is, if C++ finally included some multithreaded
constructs as part of the standard, then vendors would finally be forced to
implement them on the different platforms. If the lowest common denomintator
provided by the language would not give you enough control, then you could
always turn to the platform-specific library, but that would be up to the
programmer.

--
Antonio Maiorano
Software Engineer, Electronic Arts
E-Mail: m a i o r a n o AT v i d e o t r o n DOT c a

Joe Seigh

unread,
Aug 19, 2004, 8:30:14 PM8/19/04
to

David wrote:
>

> While I understand your position, it would be unreasonable to add
> a thread concept to either the C or C++ standards. The primary
> reason for this would be that the compiler would need to know how
> the target computer implements threads and if it did not, for it to
> provide some underlying code to pretend that threading did exist.
>

It's not necessary for the compiler to implement threading. That
can be done in a library. It would be nice for the compiler to
supply the primitives necessary to implement threading correctly,
things like optimization directives, memory alignment, how to avoid
word tearing, etc...

The abstract concept of threading is well understood, you don't really
need to define that. You also don't need to define a memory model, just
assume totally ordered memory and let the thread api implementors use
the optimization directives and platform dependent memory barriers to
deal with that.

Joe Seigh

Nicola Musatti

unread,
Aug 19, 2004, 8:50:38 PM8/19/04
to
"David" <FlyLike...@United.Com> wrote in message news:<rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost>...
[...]

> While I understand your position, it would be unreasonable to add
> a thread concept to either the C or C++ standards. The primary
> reason for this would be that the compiler would need to know how
> the target computer implements threads and if it did not, for it to
> provide some underlying code to pretend that threading did exist.
>
> While we can compare C/C++ vs. Java as languages, Java also has
> certain other concepts that C/C++ does not. Java has the sandbox,
> files, threads/processes, and other concepts more associated with
> a definition of an operating system.

I agree with you. In a way Andrei is addressing the wrong body, when
probably he should lobby Microsoft and the Open Group (and possibly
others for less common platforms) to get together and develop a
portable, modern C++ API to the underlying OS services.

Even if the two were to develop independent, incompatible API's it
would be an improvement over the current situation.

Cheers,
Nicola Musatti

Tokyo Tomy

unread,
Aug 19, 2004, 8:59:15 PM8/19/04
to
"Andrei Alexandrescu \(See Website for Email\)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<2ofbi8F...@uni-berlin.de>...

> Years ago, Java decided to add support for multithreading to the core
> language. Many MT experts said that the support was not very good. Standard
> C++, as we all know, chose (and as far as I understand, is undecided as of
> the next standardization iteration) to stay away from MT altogether.
>
> Since then, a predictable sequence of events has happened.
>
> ...

I expect thread in the C++ standard. We need a common base line of MT
programming, which is useful in many MT applications, but not in all
MT applications.

I like boost:thread, because it is simple for beginners like me, but
apparently not so versatile for MT experts. For example, boost:thread
cannot control the thread priority.

Many portable thread libraries available today show limitations which
come from the portability. Portable thread libraries must lower the
level of functions within the GCM (Greatest Common Measure) among
functions provided by platforms. Windows provides named mutex which
can give a lock beyond process boundary, but other platforms do not
provide such a mutex or give other mechanisms to do so. Therefore, it
is impossible to make up a portable versatile MT library which
satisfies MT experts. But please give us a base line for MT
programming.

Look at File system:
We have at least four methods to handle files on Windows.
1. C style
2. C++ style
3. Win32 style
4. MFC style
I usually use C++ style, but I have a freedom to use Win32 style for
more file security.

I usually use std::string, but sometime I use CString for windows
programming.

C++ programmers have this kind of freedom, but I don't think Java
programmers do have, because their programs must work on the JVM,
which gives portability.
It may be true:


> Years ago, Java decided to add support for multithreading to the core
> language. Many MT experts said that the support was not very good.

But, will not be true for C++, I think. Maybe, C++ can do without
change of core language.

> As a result, much of the research on the recent lock-free data structures
> (which surpass the now-familiar lock-based structures in performance) has
> been done in Java or uses C/C++ only as "pseudocode", the real C/C++ code
> being not portable.

Lock-free data structures are charming. I hope the idea has nothing to
do with GC. If so, C++ must provide a conventional locking mechanism
at least. Anyway, C++ needs a base line for MT programming.

Craig

unread,
Aug 19, 2004, 9:03:39 PM8/19/04
to
David wrote:
<snip>

> Andrei,
>
> While I understand your position, it would be unreasonable to add
> a thread concept to either the C or C++ standards. The primary
> reason for this would be that the compiler would need to know how
> the target computer implements threads and if it did not, for it to
> provide some underlying code to pretend that threading did exist.
>
> While we can compare C/C++ vs. Java as languages, Java also has
> certain other concepts that C/C++ does not. Java has the sandbox,
> files, threads/processes, and other concepts more associated with
> a definition of an operating system.
>
> While it may be useful to have C/C++ provide these extensions,
> their use/capability/reliability might be unpredictable on certain
> systems. C/C++ was not intended to provide the notion of even a
> single thread -- it defines a sequence of operations.
>
> Much of my work is on systems that are at the embedded or
> kernel level. Perhaps 40% of my time is spent in the apllication
> world. Yet most of my code is C/C++. Assembler, Forth, Java,
> Ada, and a few others are used perhaps once a year. Adding
> a thread concept to C/C++ would not be beneficial at my level
> of coding since it would require knowledge of the system.
>
> David

Perhaps what Andrei should be proposing is a threading standard library.
We have the STL which is present on most platforms and gives us very
useful standard containers that behave identically on each platform. I
see a standard threading model as being the best way forward. If an
existing threading model was adopted it could be used exactly like the
STL is and to the same effect. I do hope the docs are better though!<grin>
Which thread model though? Posix?
Daves comments are very important and a Standard Thread model is the
only way I see for it to be integrated into the C++ language. BTW
comparing Java which runs in a virtual machine to C++ with regard to
threads etc. is a little unfair. I find C++ increasingly frustrating to
use when Java is so much more productive in some types of programming.

Craig

Robert Kindred

unread,
Aug 19, 2004, 9:19:19 PM8/19/04
to

"David" <FlyLike...@United.Com> wrote in message
news:rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost...
> On Wed, 18 Aug 2004 12:05:56 UTC, "Andrei Alexandrescu \(See Website for
> Email\)" <SeeWebsit...@moderncppdesign.com> wrote:
>

[]


> > Does the C++ standardization committee plan to address threads at all,
or
> > continue to ignore them?
> >
> >
> > Andrei
>
> Andrei,
>
> While I understand your position, it would be unreasonable to add
> a thread concept to either the C or C++ standards. The primary
> reason for this would be that the compiler would need to know how
> the target computer implements threads and if it did not, for it to
> provide some underlying code to pretend that threading did exist.
>

Actually, compilers don't know many other things, such as how to open a
file. This is left to the run-time library.

Also, the original threads library did not use the operating system. I
don't know of an official name for fork/join/quit, but it was a convenient
way for the programmer to express parallelism, though there was no support
from the os. I have seen this used in DOS. Since each pseudo-thread
implemented run-to-completion, it was also more efficient than "real"
threading. If I wrote a C++ program that used a standardized threading
system, I would be satisfied that it ran unchanged in DOS. It is not
possible to over-emphasize the importance of standards.

[]

Craig Henderson

unread,
Aug 19, 2004, 9:20:10 PM8/19/04
to

"Andrei Alexandrescu (See Website for Email)"
<SeeWebsit...@moderncppdesign.com> wrote in message
news:2ohoobF...@uni-berlin.de...

> When writing
>
> std::cout << "Hello, world! World? World? Where are you?\n";
>
> a program might not display anything anywhere. There are systems without a
> console, and there are systems (such as Windows) in which you could
display
> a console if you wanted, but only by using nonstandard calls. On those
> systems, the code above compiles and links but doesn't do zilch.
>
> And nobody blinks an eye.
>
> Do systems lacking a console simulate it? No. Why, then, saying that no
> thread support means that the language would need to simulate it on
machines
> that don't have it? On those machines, calls to thread creation functions
> would fail, and the synchronization-related keywords and classes will do
> nothing.
>
> And nobody will blink an eye.
>

Fail? Isn't that inconsistent? In a Windows environment, for example, does
this throw an exception?

std::cout << "Hello, world! World? World? Where are you?\n";

if (std::cout.fail())
throw runtime_error("No console");

No, it fails silently.

Craig

Balog Pal

unread,
Aug 19, 2004, 9:21:44 PM8/19/04
to

"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message
news:pg$VmMDxW...@robinton.demon.co.uk...

> In article <2ofbi8F...@uni-berlin.de>, "Andrei Alexandrescu (See
> Website for Email)" <SeeWebsit...@moderncppdesign.com> writes
> >Does the C++ standardization committee plan to address threads at all, or
> >continue to ignore them?
>
> I would have thought that someone with your level of expertise would be
> exactly the kind of person WG21 could expect to propose such a feature,
> and then pursue the issue to its completion. What about a preliminary
> paper from you (which should include consideration of current library
> based proposals) for the next meeting. That meeting is just down the
> road for you and so it should not be too difficult for you to attend and
> present the paper.

Yess. YESS! :)

Even if a mere outline or summary of what we see now it could be a start.
And a magnet for contributions.

Paul

Balog Pal

unread,
Aug 19, 2004, 9:22:26 PM8/19/04
to
"David" <FlyLike...@United.Com> wrote in message
news:rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost...

> While I understand your position, it would be unreasonable to add


> a thread concept to either the C or C++ standards. The primary
> reason for this would be that the compiler would need to know how
> the target computer implements threads and if it did not, for it to
> provide some underlying code to pretend that threading did exist.

Would you explain that in detail?

Suppose C++ formally defined threads, memory model, rules of interthread
communications. And put in some essential primitives like mutex and
thread_start.

Now how that applies to a compiler already having thread support: it just
have to map the few primitives to OS calls (like it implements the C library
or anything). [and certainly revise the library for std-mandated
thread-safety where necessary.]

Where no thread support exist all the primitives do nothing and the call to
thread_start always fails with 'no resources to start yet another thread'.
Just like it could fail with thread support on asking too much.

Having threads standardized would relieve uys from poking tons of other
documents or rely on heuristics. As we can an do use threads even now
with C++ on many platforms, just it is cumbersome and puts us in a fragile
situation.

> While it may be useful to have C/C++ provide these extensions,
> their use/capability/reliability might be unpredictable on certain
> systems.

You mean systems already with or without OS threading support?

> C/C++ was not intended to provide the notion of even a
> single thread -- it defines a sequence of operations.

That's how generally a thread is defined IMHO. ;-) So we already have
defined behavior of a single thread. TODO is definig interaction with
another.

Paul

Brian Neal

unread,
Aug 19, 2004, 9:23:25 PM8/19/04
to
I am a real-time embedded systems programmer, and I work almost
entirely in C++. I also do not feel the strong need to add threading
to C++. We have our own set of classes for threads, mutexes,
semaphores, etc, and we have multiple versions of these classes for
different OS's. I have never felt the need for a special thread safe
container sanctioned by the library. If you want a thread safe queue
so that threads can communicate with each other, we just wrap access
to a stl queue (or dequeue, whatever) up in a templated class using
one of our mutex objects. Done and done. Would it be nice to have all
this standardized? Maybe, but requirements for real-time embedded
systems vary wildly, and I fear that a general solution would not
satisify everyone. Besides, it's just too easy to make your own given
C++'s existing expressive power and multiple paradigm features.

For what I do everyday, I find nothing lacking in C++, and cringe at
the thought of the everything and the kitchen sink approach of Java.

BN

(If you need to contact me, don't use the return address on the post,
use brian at surfguitar101.com)

David Abrahams

unread,
Aug 19, 2004, 9:28:52 PM8/19/04
to
"David" <FlyLike...@United.Com> writes:

> While it may be useful to have C/C++ provide these extensions,
> their use/capability/reliability might be unpredictable on certain
> systems.

Sounds like FUD to me.

> C/C++ was not intended to provide the notion of even a single thread

>From what I've heard him say about this, I think Bjarne Stroustrup
would disagree with you.

--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Andrei Alexandrescu (See Website for Email)

unread,
Aug 19, 2004, 9:35:57 PM8/19/04
to
"Adam Zell" <ze...@zell.best.vwh.net> wrote in message
news:mpWUc.1393$NC6...@newsread1.mlpsca01.us.to.verio.net...

> "Andrei Alexandrescu \(See Website for Email\)"
> <SeeWebsit...@moderncppdesign.com> wrote:
> : Java even has lock-free structures in their current distribution
> :
> (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html).
> : If this trend will continue, soon Java will be the language of choice
> for
> : portable multithreaded programming, and standard C++ will be out in the
> cold
> : clenching its teeth.
> :
> Many lock-free algorithms depend on garbage collection, which puts the
> discussion
> back into the great GC debate.

Amendment: Many *old* lock-free algorithms depend on garbage collection. (I
hope this argument will be the brickwall covered with armed concrete that GC
counterargumens can safely run into.)

Starting 2.5 years ago, the field is burgeoning with algos that don't rely
on GC and this year Maged published the first general purpose lock-free
memory allocator in the world
(http://www.research.ibm.com/people/m/michael/pldi-2004.pdf).

> Other methods for GC-like management (such as Michael's
> SMR) are complex, or untested in production.
>
> http://www.research.ibm.com/people/m/michael/podc-2002.pdf

A more polished version is at
http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf.

I think it became clear that lock-free is the up-and-coming way to do
things. "The algorithms are complex" is not an argument for the standard
continuing to thwart an important area, and let other languages lead the
way. And by the way - Hoard or ptmalloc are one order of magnitude more
complex than Maged's And the C++ standardization committee has quite some
track record of standardizing things that were untested in production :o).
Unfortunately, those things were not proven theoretically or
semi-theoretically, and this is embarrasingly visible today.

On the other hand, lock-free structures have solid theoretical underpinnings
(see Herlihy's extraordinary paper "A methodology for implementing highly
concurrent data objects"
http://www.cs.brown.edu/people/mph/Herlihy93/herlihy93methodology.pdf) and
now, predictably, the field is swarming with better and implementations of
said structures.

Those waters are inaccessible to C++. Do we want to continue that way?


Andrei

Walt Karas

unread,
Aug 19, 2004, 9:37:48 PM8/19/04
to
"David" <FlyLike...@United.Com> wrote in message news:<rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost>...


If we take this line of reasoning to an exterme, we'd have to take
the STL and much of the I/O library out of the standard library,
because many small embedded systems don't have a heap or a file
system.

I think a better reason not to add threads to the C++ standard lib.
is that it probably makes more sense for the POSIX committee to add
an alternative C++ interface to their threads library.

Jim Melton

unread,
Aug 19, 2004, 9:45:25 PM8/19/04
to
"Hyman Rosen" <hyr...@mail.com> wrote in message
news:10928615...@master.nyc.kbcfp.com...

> David wrote:
> > The primary reason for this would be that the compiler would need
> > to know how the target computer implements threads and if it did
> > not, for it to provide some underlying code to pretend that threading
> > did exist.
>
> First of all, how is "pretend threading" different from "real threading"?
> Second, Ada provides support for concurrency defined portably within the
> language without any trouble at all. So does Java, for that matter, albeit
> less well.
>
> > Much of my work is on systems that are at the embedded or kernel
level.
>
> Then you should certainly be aware of Ada and its tasking mechanisms.


Isn't Ada conceptually a cross between C++ and Java, in that Java only works
in the context of the Java Virtual Machine (single, standardized run-time
environment), C++ has absolutely no run-time environment requirements, and
Ada has a required run-time environment (akin to, but not quite a virtual
machine)?

In C++, threading/tasking (when provided either through a C binding or an
unportable extension) is directly mapped to the underlying kernel thread
capabilities (hence, the lack of portability). Don't both Ada and Java
define a threading/tasking model that must be mapped (with varying degrees
of success) to the underlying kernel mechanisms? I do remember some
"impedance mismatch" issues in Ada tasking, but that was many years ago, and
I haven't stayed current.

I'm not opposed to language support for MT, but it seems that the first
thing required would be to derive a platform-independent construct that is a
superset of all possible threading/tasking/scheduling models. For example,
would the proposed standard support thread priorities? What then, would a
platform that doesn't support preemptive multi-tasking do to comply with the
standard?

It seems that the problem may be too much for a *language* standards body to
tackle. Remember that Java standardized the run-time environment, which is
the only way they could standardize a threading model.
--
<disclaimer>
Opinions posted are those of the author.
My company doesn't pay me enough to speak for them.
</disclaimer>
--
Jim Melton
Software Architect, Fusion Programs
Lockheed Martin IS&S
(303) 971-3846

Jim Melton

unread,
Aug 19, 2004, 9:46:09 PM8/19/04
to
"Andrei Alexandrescu (See Website for Email)"
<SeeWebsit...@moderncppdesign.com> wrote in message
news:2ohoobF...@uni-berlin.de...
> "David" <FlyLike...@United.Com> wrote in message
> news:rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost...
> > While I understand your position, it would be unreasonable to add
> > a thread concept to either the C or C++ standards. The primary
> > reason for this would be that the compiler would need to know how
> > the target computer implements threads and if it did not, for it to
> > provide some underlying code to pretend that threading did exist.
>
> Not necessarily. That's actually a fallacy that I'd like to dispel right
> here and now.

My first reaction to David's point was, "so what? the compiler already knows
about the underlying system". But in the case of cross-compilers and
embedded programming, the compiler may very well not have any (or
sufficient) knowledge of the underlying system.

> When writing
>
> std::cout << "Hello, world! World? World? Where are you?\n";
>
> a program might not display anything anywhere. There are systems without a
> console, and there are systems (such as Windows) in which you could
display
> a console if you wanted, but only by using nonstandard calls. On those
> systems, the code above compiles and links but doesn't do zilch.

There may (or may not) be a difference. std::cout << "Hello world" is
compiled into function calls. Function calls are resolved at link (or run)
time. An environment that doesn't have a console needs to have a library
implementation that silently swallows the call. Now, it is true that the
standard does mandate the standard library (and maybe that is where you were
heading), but there is a huge difference between a compiler (language)
implementation and a library implementation.

--
<disclaimer>
Opinions posted are those of the author.
My company doesn't pay me enough to speak for them.
</disclaimer>
--
Jim Melton
Software Architect, Fusion Programs
Lockheed Martin IS&S
(303) 971-3846

Andrei Alexandrescu (See Website for Email)

unread,
Aug 20, 2004, 6:05:24 AM8/20/04
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:4123DBE6...@xemaps.com...

> It's not necessary for the compiler to implement threading. That
> can be done in a library.

I am sorry, that is not true. Any threading library must cooperate with the
compiler so as to enforce correct code generation.

> It would be nice for the compiler to
> supply the primitives necessary to implement threading correctly,
> things like optimization directives, memory alignment, how to avoid
> word tearing, etc...
>
> The abstract concept of threading is well understood, you don't really
> need to define that. You also don't need to define a memory model, just
> assume totally ordered memory and let the thread api implementors use
> the optimization directives and platform dependent memory barriers to
> deal with that.

That's entirely unworkable. Compilers don't know statically which threads
execute which code, so they'll need to assume the worst for any code they
compile. It would disable pretty much all meaningful optimizations. That's
like instant 3x speed reduction. Then, the hardware reorders instructions
too. That would force compilers to insert cache flushing and/or membar
instructions at every C++ sequence point. That's like another 50x speed
reduction, if not more. It's just totally unworkable.

Memory model is essential to MT programs. An MT program wants to benefit of
optimizations and reorderings most of the time, except in a few key points
where synchronization is needed. Those few key points will be conveyed by
the programmer to the compiler. As of today there is /no way/ for the
programmer to specify those key points.

I agree that sticking with a sequentially consistent model would be easiest
way out. It's also the most naive.


Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Aug 20, 2004, 6:05:46 AM8/20/04
to
"Nicola Musatti" <Nicola....@ObjectWay.it> wrote in message
news:a327cf48.0408...@posting.google.com...

> I agree with you. In a way Andrei is addressing the wrong body, when
> probably he should lobby Microsoft and the Open Group (and possibly
> others for less common platforms) to get together and develop a
> portable, modern C++ API to the underlying OS services.

I'm trying to address the body that is most interested in the future of C++
and in the ability of the C++ community to write good programs.

Andrei

Hyman Rosen

unread,
Aug 20, 2004, 6:14:19 AM8/20/04
to
Jim Melton wrote:
> My first reaction to David's point was, "so what? the compiler already knows
> about the underlying system".

What the compiler knows and what the programmer knows must be in sync,
and therefore must be specified. That's why it has to be a part of the
standard and of the language, so that everyone knows what is expected
and what is required.

Hyman Rosen

unread,
Aug 20, 2004, 6:14:48 AM8/20/04
to
Jim Melton wrote:
> Ada has a required run-time environment

No. Ada has a tasking model that's defined within the language, and it can
be supported in a wide variety of ways on different underlying systems.

> derive a platform-independent construct that is a superset of all possible
> threading/tasking/scheduling models.

Absolutely not. The language support for concurrency should be good enough
so that a large variety of useful things and techniques can be programmed,
but there is no necessity to try and support every possible thing.

> For example, would the proposed standard support thread priorities?

Ada does.

> What then, would a platform that doesn't support preemptive multi-tasking
> do to comply with the standard?

You don't need preemptive multitasking to support priorites. You just need
a system where a task begins running in response to an external event, like
interrupt or signal handlers. The only way a suspended higher priority task
can come to life is through an external event. But you don't need time-slicing.

> It seems that the problem may be too much for a *language* standards body to
> tackle.

It worked very well for Ada.

Andrei Alexandrescu (See Website for Email)

unread,
Aug 20, 2004, 9:47:19 AM8/20/04
to
"Craig" <benbowc...@attglobal.net> wrote in message
news:aX%Uc.4123$zS6.4...@news02.tsnz.net...

> Perhaps what Andrei should be proposing is a threading standard library.
> We have the STL which is present on most platforms and gives us very
> useful standard containers that behave identically on each platform. I
> see a standard threading model as being the best way forward. If an
> existing threading model was adopted it could be used exactly like the
> STL is and to the same effect. I do hope the docs are better
> though!<grin>
> Which thread model though? Posix?
> Daves comments are very important and a Standard Thread model is the
> only way I see for it to be integrated into the C++ language.

Yes; the difference is that, while STL can be written in portable C++,
threads cannot.

> BTW
> comparing Java which runs in a virtual machine to C++ with regard to
> threads etc. is a little unfair. I find C++ increasingly frustrating to
> use when Java is so much more productive in some types of programming.

Me too. Let's not forget, however, that each language has an abstract
machine it runs on. So I don't think there's anything that fundamentally
limits C++ from doing threads. And by the way - Java VMs use OS native
threads more often than not.


Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Aug 20, 2004, 9:48:09 AM8/20/04
to
"Brian Neal" <SurfCed...@netscape.net> wrote in message
news:26d6ee84.04081...@posting.google.com...

>I am a real-time embedded systems programmer, and I work almost
> entirely in C++. I also do not feel the strong need to add threading
> to C++. We have our own set of classes for threads, mutexes,
> semaphores, etc, and we have multiple versions of these classes for
> different OS's.

This viewpoint is basically, we've implemented them ourselves for each OS we
need for, and as such we don't feel a particular need for them in the
Standard. I think this can be seen as an argument in favor of others not
needing to duplicate your work, nor you investing more work when porting
your synchronization primitives to other platforms.

> I have never felt the need for a special thread safe
> container sanctioned by the library. If you want a thread safe queue
> so that threads can communicate with each other, we just wrap access
> to a stl queue (or dequeue, whatever) up in a templated class using
> one of our mutex objects. Done and done.

That's fine for your particular performance needs. What if you or others
need to scale up your queue and it chokes in the face of contention, while
your competitor has a queue that's 50x faster?

> Would it be nice to have all
> this standardized? Maybe, but requirements for real-time embedded
> systems vary wildly, and I fear that a general solution would not
> satisify everyone. Besides, it's just too easy to make your own given
> C++'s existing expressive power and multiple paradigm features.

Wrong. You can NOT implement threading support portably.

> For what I do everyday, I find nothing lacking in C++, and cringe at
> the thought of the everything and the kitchen sink approach of Java.

What is in Java that would be best removed?


Andrei

Hyman Rosen

unread,
Aug 20, 2004, 9:49:24 AM8/20/04
to
Craig wrote:
> Perhaps what Andrei should be proposing is a threading standard library.

Mo. A library is insufficient. Threading support will impact areas
of the language proper, and those impacts must be specified by the
standard.

Jim Rogers

unread,
Aug 20, 2004, 9:50:30 AM8/20/04
to
Jim Melton <jim.m...@lmco.com> wrote in
news:cg31vu$ja...@cui1.lmms.lmco.com:

> Isn't Ada conceptually a cross between C++ and Java, in that Java only
> works in the context of the Java Virtual Machine (single, standardized
> run-time environment), C++ has absolutely no run-time environment
> requirements, and Ada has a required run-time environment (akin to,
> but not quite a virtual machine)?

This is a fascinating bit of fiction. Ada can be implemented with a
run time environment so that it can run on a bare system with no OS.
Most Ada compilers are targeted at a particular OS / hardware
combionation, just like C++ compilers. In fact the GNAT compiler is
part of the GNU compiler tool chain. It uses exactly the same
back-end as gcc and g++.

>
> In C++, threading/tasking (when provided either through a C binding or
> an unportable extension) is directly mapped to the underlying kernel
> thread capabilities (hence, the lack of portability). Don't both Ada
> and Java define a threading/tasking model that must be mapped (with
> varying degrees of success) to the underlying kernel mechanisms? I do
> remember some "impedance mismatch" issues in Ada tasking, but that was
> many years ago, and I haven't stayed current.

Ada tasking mechanisms area typically high levels of abstraction
compared to a threading model provided by a particular OS. In fact,
the Ada tasking model does not even require threads. It can be
implemented using inter-process communication. Most frequently,
however, Ada tasking constructs are mapped to OS thread mechanisms.

One advantage of the higher abstraction levels for Ada tasking is
the portability of source code. Any tasking constructs that work
on a Linux system will also work on a Win32 system or an IBM
mainframe. The only requirement is to recompile the unchanged
source code for the target system.

>
> I'm not opposed to language support for MT, but it seems that the
> first thing required would be to derive a platform-independent
> construct that is a superset of all possible
> threading/tasking/scheduling models. For example, would the proposed
> standard support thread priorities? What then, would a platform that
> doesn't support preemptive multi-tasking do to comply with the
> standard?
>

I do not think you want to wait until you have a superset of all
possible models. That could take an infinite amount of time. What you
do want is an abstraction of all currently significant models, with
an emphasis on being able to map your abstraction to divergent models.
New threading abstractions could be added in subsequent updates of the
C++ standard.

Although thread priorities sound like a wonderful tool, and are
available on many platforms, they are just as problematic as
process priorities. Improper use of thread priorities can result
in deadlocks. On the other hand, thread priorities can be useful
tools for implementing lock-free synchronization. Some Ada
compilers temporarily promote a task to the highest priority while
it is executing a method of a protected object (similar to a
critical section) and then return the task to its normal priority
as soon as it completes the method. This priority promotion gives
the task much of the control of a lock without the problems of
lock acquisition and release.

Ada also allows the programmer to define, within limits, which
operations are atomic. Atomic operations cannot be interrupted.
For certain designs, atomic operations are useful and are much
faster than any form of locking synchronization.

It is conceivable that a C++ standard for threading would either
provide an execption for platforms not supporting threading, or would
allow a compiler to generate code to provide fundamental threading
support. The first option limits portability. The second option will
create much larger executables due to the threading system.

The Ada tasking model is significantly orthoganol to its type model.
This feature prevents the creation of task inheritance hierarchies.
One can create task types, but not inherit from a task type. Java
allows the creation of inheritance hierarchies for threads. I have
not encountered a situation where the Java model makes design or
implementation easier. In fact, the Java model is actually much
harder to refactor than the Ada model if your design undergoes
radical change.

If C++ does implement MT in its standard the standard body should
carefully consider various kinds of communication between threads.
The two major kinds of communication between threads are
asynchronous communication through some intermediate shared buffer
object, and direct synchronous communication. Java allows direct
or indirect communication, but the only thread-safe communication
is always synchronized. The Ada model properly allows both
synchronous and thread-safe asynchronous communications.

> It seems that the problem may be too much for a *language* standards
> body to tackle. Remember that Java standardized the run-time
> environment, which is the only way they could standardize a threading
> model. --

The JVM was not the product of threading. Threading was implemented in
a common (but poor) abstraction in Java through the JVM.

If C++ does create some standard abstractions for MT, I hope more care
is given to real threading theory and practice. The original Java
threading abstractions are unnecessarily crude and inefficient. I do not
think C++ should try to mirror such inefficiencies.

Jim Rogers

Joe Seigh

unread,
Aug 20, 2004, 10:07:28 AM8/20/04
to

"Andrei Alexandrescu (See Website for Email)" wrote:
>
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:4123DBE6...@xemaps.com...
> > It's not necessary for the compiler to implement threading. That
> > can be done in a library.
>
> I am sorry, that is not true. Any threading library must cooperate with the
> compiler so as to enforce correct code generation.

True, but not as much as you think. If part of the semantics are defined
by the library, then the language doesn't need to know about that part.

>
> > It would be nice for the compiler to
> > supply the primitives necessary to implement threading correctly,
> > things like optimization directives, memory alignment, how to avoid
> > word tearing, etc...
> >
> > The abstract concept of threading is well understood, you don't really
> > need to define that. You also don't need to define a memory model, just
> > assume totally ordered memory and let the thread api implementors use
> > the optimization directives and platform dependent memory barriers to
> > deal with that.
>
> That's entirely unworkable. Compilers don't know statically which threads
> execute which code, so they'll need to assume the worst for any code they
> compile. It would disable pretty much all meaningful optimizations. That's
> like instant 3x speed reduction. Then, the hardware reorders instructions
> too. That would force compilers to insert cache flushing and/or membar
> instructions at every C++ sequence point. That's like another 50x speed
> reduction, if not more. It's just totally unworkable.

Assuming totally ordered memory and enforcing it are two different things.

>
> Memory model is essential to MT programs. An MT program wants to benefit of
> optimizations and reorderings most of the time, except in a few key points
> where synchronization is needed. Those few key points will be conveyed by
> the programmer to the compiler. As of today there is /no way/ for the
> programmer to specify those key points.
>
> I agree that sticking with a sequentially consistent model would be easiest
> way out. It's also the most naive.


The decision of when and where to insert memory barriers depends on the implementation
of the various synchronization constructs. Even making memory barriers a built in
is problematic since there is no standard definition of memory barriers. It varies
by platform. If you used Intel style memory barriers, you would lose some of the
benefits of Sun Sparc memory barriers which can be more finely tuned. Though
hard coding memory barrier semantics in a lanugage might be a good way to dope slap
hardware architects who now pretty much do whatever they please without regard to
the consequences of their actions.

Joe Seigh

Pete Becker

unread,
Aug 20, 2004, 11:35:14 PM8/20/04
to
Brian Neal wrote:
>
> I am a real-time embedded systems programmer, and I work almost
> entirely in C++. I also do not feel the strong need to add threading
> to C++. We have our own set of classes for threads, mutexes,
> semaphores, etc, and we have multiple versions of these classes for
> different OS's. I have never felt the need for a special thread safe
> container sanctioned by the library. If you want a thread safe queue
> so that threads can communicate with each other, we just wrap access
> to a stl queue (or dequeue, whatever) up in a templated class using
> one of our mutex objects. Done and done. Would it be nice to have all
> this standardized? Maybe, but requirements for real-time embedded
> systems vary wildly, and I fear that a general solution would not
> satisify everyone. Besides, it's just too easy to make your own given
> C++'s existing expressive power and multiple paradigm features.
>
> For what I do everyday, I find nothing lacking in C++, and cringe at
> the thought of the everything and the kitchen sink approach of Java.
>

Hear, hear! I'd like to underscore what you said in your first
paragraph: the problem with most "solutions" to multithreading problems
is that they try too hard or promise too much. Writing correct
multi-threaded code requires top down application design. It is not
something you can add with compiler switches or a library. What you can
do with compiler switches and a library is provide a toolkit that the
implementors of a well designed application can use to make it easier to
write their code and easier to port it. As you say, this means threads,
mutexes, semaphores, etc. It does not mean synchronized containers or
iostreams, because that imposes overhead but doesn't ensure correct
synchronization. Here's my usual example:

void f()
{
std::cout << "Hello, " << "world\n";
}

If this function is called repetitively from multiple threads without
any higher level synchronization the "Hello"s and the "world"s will be
intertwined, regardless of whether individual insertions are guarded.

Writing fast, robust multithreaded applications is not easy. Having the
appropriate low level tools helps, but ultimately it's a matter of
painstaking design and implementation. Most programmers are not capable
of writing solid multithreaded applications (I include myself in that
group), and giving them the illusion that they are simply produces more
applications that fail in subtle ways.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Walter

unread,
Aug 20, 2004, 11:51:15 PM8/20/04
to

"Hyman Rosen" <hyr...@mail.com> wrote in message
news:jtfVc.7142$si.7007@trndny06...

> Craig wrote:
> > Perhaps what Andrei should be proposing is a threading standard library.
>
> Mo. A library is insufficient. Threading support will impact areas
> of the language proper, and those impacts must be specified by the
> standard.

I'm pretty ignorant of the subtleties of multithreaded programming. What
needs to be done to the core language to have a solid foundation upon which
to build?

Alexey Sarytchev

unread,
Aug 20, 2004, 11:54:10 PM8/20/04
to
"Andrei Alexandrescu (See Website for Email)"
<SeeWebsit...@moderncppdesign.com> wrote in message
news:2olgi4F...@uni-berlin.de...

> "Craig" <benbowc...@attglobal.net> wrote in message
> news:aX%Uc.4123$zS6.4...@news02.tsnz.net...
> > Perhaps what Andrei should be proposing is a threading standard library.
> > We have the STL which is present on most platforms and gives us very
> > useful standard containers that behave identically on each platform. I
> > see a standard threading model as being the best way forward. If an
> > existing threading model was adopted it could be used exactly like the
> > STL is and to the same effect. I do hope the docs are better
> > though!<grin>
> > Which thread model though? Posix?
> > Daves comments are very important and a Standard Thread model is the
> > only way I see for it to be integrated into the C++ language.
>
> Yes; the difference is that, while STL can be written in portable C++,
> threads cannot.

I think there is some misunderstanding between people here.
It looks for me that a lot of people think about MT issues only as API -
how one cans start/stop threads, create a mutex/whatever. But there are
more basic issues. Basically, C++ standard doesn't say anything about
how multiple threads can interact, how they can use shared resources,
in particular, memory. It's not surprising, the standard doesn't say
anything
about multiple threads at all. So, if you are not using some knowledge
about particular platform you can't assume anything. And the only
half-portable way to work in MT environment is to synchronize _all_
access to shared memory with some sort of platform-dependent
mechanism, mutex/critical section/whatever. But this is sub optimal
at best. In some situations it can lead to huge performance degradation,
especially on machines with relaxed memory model. As I understand
Andrei's proposal is to add a way for a programmer to specify how
threads can work with shared memory. And that requires changes in
core language, that can't be done by a library. Creating a standard API
for working with multiple threads is a less important issue, IMHO.
There are different threading models, different system APIs and
it's impossible to find a way to satisfy everyone. But I think it's pretty
feasible to define a model for access to shared memory, which will
be general enough to cover majority of wide-used architectures.
And it can also be "free" for situations when that is not required,
so you will not incur any penalty if your platform doesn't support
threads or if access to shared memory has some strong guarantees
from hardware and you don't have to use special commands for this.

I think it was a great idea to raise that question and I think
multithreading issues MUST be addressed by C++ standard.
Way to go, Andrei!

Regards,
Alexey Sarytchev

Chris Noonan

unread,
Aug 20, 2004, 11:55:23 PM8/20/04
to
"Andrei Alexandrescu \(See Website for Email\)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<2ok24cF...@uni-berlin.de>...

> Starting 2.5 years ago, the field is burgeoning with algos that don't rely
> on GC and this year Maged published the first general purpose lock-free
> memory allocator in the world
> (http://www.research.ibm.com/people/m/michael/pldi-2004.pdf).

If I may correct you on one small point, the LeapHeap
allocator (http://www.leapheap.com) has been available
since September 2002. LeapHeap version 1.0 was lock-free
in the same slightly restricted sense as Michael's allocator
is. The Nexus Database Systems memory manager for Delphi
(http://www.nexusdb.com) is also lock-free, though I do not
know whether the algorithms for it are in the public domain.
Granted, Maged Michael's paper may be the first *academic*
publication of its kind.


Chris

(Please do not use the given email address; it is spammed out)

llewelly

unread,
Aug 20, 2004, 11:57:39 PM8/20/04
to
SurfCed...@netscape.net (Brian Neal) writes:
[snip]

> For what I do everyday, I find nothing lacking in C++, and cringe at
> the thought of the everything and the kitchen sink approach of Java.
[snip]

I find it interesting that years ago, C++ was viewed as the 'kitchen
sink' language, and Java was viewed as small and neat. The
rabbit-like population explosion in the Java standard libraries
has changed everything.

llewelly

unread,
Aug 20, 2004, 11:58:01 PM8/20/04
to
David Abrahams <da...@boost-consulting.com> writes:

> "David" <FlyLike...@United.Com> writes:
>
>> While it may be useful to have C/C++ provide these extensions,
>> their use/capability/reliability might be unpredictable on certain
>> systems.
>
> Sounds like FUD to me.
>
>> C/C++ was not intended to provide the notion of even a single thread
>
>>From what I've heard him say about this, I think Bjarne Stroustrup
> would disagree with you.

I don't know whose position this supports, but IIRC, one of the first
things implemented in C with Classes was a tasking library.

Brian Neal

unread,
Aug 21, 2004, 12:06:42 AM8/21/04
to
Hyman Rosen <hyr...@mail.com> wrote in message news:<0KeVc.7134$si.2503@trndny06>...

>
> It worked very well for Ada.

I have programmed in both Ada83 and Ada95 and while I admire a great
deal of things about Ada, it's tasking model is not one of them. The
rendezvous concept is extremely bizarre, and every project I worked on
abandoned using that feature. Having tasks communicate by sending
messages through queues always won over the rendezvous with its myriad
and mind boggling set of options and flavors.

The other thing about Ada is that its complexity is so great, that the
Ada compiler has to do so many things and get them all right, it is a
wonder anyone can make an Ada compiler at all. This is probably why a
decent Ada compiler was so expensive. When we used Ada on top of a
proprietary real-time OS, we found that the compiler was not
translating our Ada tasks, semaphore calls, etc, into the proper
native OS calls. It would invariably leave some tasking option that
the OS supported out of our reach as Ada application developers. It
was very frustrating. In some cases we had to make the native OS calls
ourselves because the compiler missed something, and needless to say
that is pretty bad as portability is Ada's strong point.

If C++ is going to support threading, I hope it doesn't fall into this
trap. That is why I favor putting that stuff in a library (set of
standard OS abstraction classes). I am not sure why the language
itself has to get involved, yet..... This is based upon my experiences
as a C++ real-time embedded systems programmer.

Brian Neal

unread,
Aug 21, 2004, 12:07:26 AM8/21/04
to
"Andrei Alexandrescu \(See Website for Email\)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<2olg17F...@uni-berlin.de>...

> "Brian Neal" <SurfCed...@netscape.net> wrote in message
> news:26d6ee84.04081...@posting.google.com...
> >I am a real-time embedded systems programmer, and I work almost
> > entirely in C++. I also do not feel the strong need to add threading
> > to C++. We have our own set of classes for threads, mutexes,
> > semaphores, etc, and we have multiple versions of these classes for
> > different OS's.
>
> This viewpoint is basically, we've implemented them ourselves for each OS we
> need for, and as such we don't feel a particular need for them in the
> Standard. I think this can be seen as an argument in favor of others not
> needing to duplicate your work, nor you investing more work when porting
> your synchronization primitives to other platforms.

Yes I agree. I am in favor of having a library for threads,
semaphores, mutexes, condition variables, etc. Something like boost
but much more "meat" is needed. I am not convinced that the language
needs to support this directly however.

>
> > I have never felt the need for a special thread safe
> > container sanctioned by the library. If you want a thread safe queue
> > so that threads can communicate with each other, we just wrap access
> > to a stl queue (or dequeue, whatever) up in a templated class using
> > one of our mutex objects. Done and done.
>
> That's fine for your particular performance needs. What if you or others
> need to scale up your queue and it chokes in the face of contention, while
> your competitor has a queue that's 50x faster?

Huh? So I rewrite it or tune it. How would using some queue sanctioned
by the language avoid the same problem? I am still going to have
problems with it under certain conditions, so I would have to write my
own or tune it somehow in any event. I can't tell you how much of my
job is to work around broken implementations or "standard"
implementations that don't cut it in my specific environment.

> > Would it be nice to have all
> > this standardized? Maybe, but requirements for real-time embedded
> > systems vary wildly, and I fear that a general solution would not
> > satisify everyone. Besides, it's just too easy to make your own given
> > C++'s existing expressive power and multiple paradigm features.
>
> Wrong. You can NOT implement threading support portably.

Nice. Whatever. Nothing is truly portable in a real-time embedded
system. However my team has written highly portable software without
direct language support. We have done this by developing a set of OS
abstraction classes without C++ knowing what we are doing.

Maybe you can help me understand what you want added to the language?
What is the problem you are so concerned about?

Jim Rogers

unread,
Aug 21, 2004, 12:07:48 AM8/21/04
to
Joe Seigh <jsei...@xemaps.com> wrote in
news:4125EAEA...@xemaps.com:

> The decision of when and where to insert memory barriers depends on
> the implementation of the various synchronization constructs. Even
> making memory barriers a built in is problematic since there is no
> standard definition of memory barriers. It varies by platform. If
> you used Intel style memory barriers, you would lose some of the
> benefits of Sun Sparc memory barriers which can be more finely tuned.
> Though hard coding memory barrier semantics in a lanugage might be a
> good way to dope slap hardware architects who now pretty much do
> whatever they please without regard to the consequences of their
> actions.

The source of your confusion is that you are assuming the wrong
abstraction level. Any language supporting threading must provide an
abstraction above any particular OS implementation. Compilers targeted
to a specific platform would implement that abstraction in the
semantics of the native threading capabilities.

C++ could specify a memory barrier abstraction. On an Intel machine
that abstraction would be implemented using Intel style memory
barriers, while on a Sun Sparc machine the Sun Sparc memory barrier
implementation would be used. This also allows reasonable porting
of the language standard to new hardware technologies without
changing or violating the language standard.

The problem is similar to the manner that a language supports file
systems. There are many different file system implementations on
different platforms. The language standard does not specify the
low level characteristics of a file system. It describes abstract
operations such as opening and closing files.

Jim Rogers

Jim Melton

unread,
Aug 21, 2004, 12:14:01 AM8/21/04
to
"Jim Rogers" <jimmaure...@worldnet.att.net> wrote in message
news:Xns954AE76706679...@127.0.0.1...

> Jim Melton <jim.m...@lmco.com> wrote in
> news:cg31vu$ja...@cui1.lmms.lmco.com:
>
> > Isn't Ada conceptually a cross between C++ and Java, in that Java only
> > works in the context of the Java Virtual Machine (single, standardized
> > run-time environment), C++ has absolutely no run-time environment
> > requirements, and Ada has a required run-time environment (akin to,
> > but not quite a virtual machine)?
>
> This is a fascinating bit of fiction.

Sorry for my ignorance.

> > In C++, threading/tasking (when provided either through a C binding or
> > an unportable extension) is directly mapped to the underlying kernel
> > thread capabilities (hence, the lack of portability). Don't both Ada
> > and Java define a threading/tasking model that must be mapped (with
> > varying degrees of success) to the underlying kernel mechanisms? I do
> > remember some "impedance mismatch" issues in Ada tasking, but that was
> > many years ago, and I haven't stayed current.
>
> Ada tasking mechanisms area typically high levels of abstraction
> compared to a threading model provided by a particular OS. In fact,
> the Ada tasking model does not even require threads. It can be
> implemented using inter-process communication. Most frequently,
> however, Ada tasking constructs are mapped to OS thread mechanisms.

So, my assertion is correct, that Ada defines a model that must be mapped to
the underlying kernel mechanisms. The fact that this model can be
implemented independent of threads or any other specific kernel capability
is a testament to the fact that it is high level indeed. And I do know that
people are using Ada in real-time and embedded systems, so it is not a
totally broken model.

However, I have to think that, given C++'s heritage, any multi-threading
support that might be included into the language would have to allow for the
level of "down to the metal" programming available today. If the approach is
to pick a model, and say that anyone who needs more than the model allows
must use non-standard extensions, then I'd say you failed.

The problem that I see is that thread support is more than preventing two
threads from accessing the same resource at the same time. It also has to
address execution timelines, scheduling, and thread lifecycles. In my narrow
thinking, this is necessarily OS dependent.

> One advantage of the higher abstraction levels for Ada tasking is
> the portability of source code. Any tasking constructs that work
> on a Linux system will also work on a Win32 system or an IBM
> mainframe.

Should this read "Any ADA tasking constructs..."? Or do you mean to imply
that Win32 has a proper superset of tasking constructs available on Linux,
and IBM mainframe has a proper superset of Win32 constructs?

> > I'm not opposed to language support for MT, but it seems that the
> > first thing required would be to derive a platform-independent
> > construct that is a superset of all possible
> > threading/tasking/scheduling models. For example, would the proposed
> > standard support thread priorities? What then, would a platform that
> > doesn't support preemptive multi-tasking do to comply with the
> > standard?
> >
>
> I do not think you want to wait until you have a superset of all
> possible models. That could take an infinite amount of time. What you
> do want is an abstraction of all currently significant models, with
> an emphasis on being able to map your abstraction to divergent models.
> New threading abstractions could be added in subsequent updates of the
> C++ standard.

OK, poorly worded. It is likely impossible to create a superset of ALL
possible models. And POSIX threads are (should be?) already portable between
Win32 and Unix. I suspect, though, that there are other models that need to
be accommodated, or else we would all just use POSIX threads.

> Although thread priorities sound like a wonderful tool, and are
> available on many platforms, they are just as problematic as
> process priorities.

So the solution is to exclude them from the model? Seems a bit Draconian.

> Improper use of thread priorities can result
> in deadlocks.

Improper use of mutexes can result in deadlocks. And improper memory
management can result in core dumps. Improper syntax can result in
compilation errors. A language that prevents you from improperly using
something (by not letting you use it at all) would not be C++.

> On the other hand, thread priorities can be useful
> tools for implementing lock-free synchronization. Some Ada
> compilers temporarily promote a task to the highest priority while
> it is executing a method of a protected object (similar to a
> critical section) and then return the task to its normal priority
> as soon as it completes the method. This priority promotion gives
> the task much of the control of a lock without the problems of
> lock acquisition and release.

... as long as the task doesn't do anything that might cause it to block.
Priority inversion is a tricky problem in distributed programming, in
particular.

> It is conceivable that a C++ standard for threading would either
> provide an execption for platforms not supporting threading, or would
> allow a compiler to generate code to provide fundamental threading
> support. The first option limits portability. The second option will
> create much larger executables due to the threading system.

It seems to me, and I could certainly be wrong, that threading is outside
the purvue of the compiler. The (C++) compiler converts language constructs
to machine code. It does not define execution sequencing. The fact that the
Ada language includes definition of execution-time things (like atomic
operations or thread creation/management) demonstrates that the language
defines both a syntax and an execution environment. C++ does not (currently)
include this constraint. I do believe that in order to support threading in
the language you would have to add these constraints, but I'm not convinced
that the C++ community could swallow the change to the language.

> If C++ does implement MT in its standard the standard body should
> carefully consider various kinds of communication between threads.

Agreed.

> > It seems that the problem may be too much for a *language* standards
> > body to tackle. Remember that Java standardized the run-time
> > environment, which is the only way they could standardize a threading
> > model. --
>
> The JVM was not the product of threading. Threading was implemented in
> a common (but poor) abstraction in Java through the JVM.

Right, but since the JVM is an abstraction on top of hardware/OS (in effect,
a virtual hardware/OS), it is reasonable to define a single threading model
for this new virtual platform. That there are problems just shows how hard a
standard execution model is to define.

> If C++ does create some standard abstractions for MT, I hope more care
> is given to real threading theory and practice. The original Java
> threading abstractions are unnecessarily crude and inefficient. I do not
> think C++ should try to mirror such inefficiencies.

Fair enough.

--
<disclaimer>
Opinions posted are those of the author.
My company doesn't pay me enough to speak for them.
</disclaimer>
--
Jim Melton
Software Architect, Fusion Programs
Lockheed Martin IS&S
(303) 971-3846

llewelly

unread,
Aug 21, 2004, 6:16:01 AM8/21/04
to
Hyman Rosen <hyr...@mail.com> writes:

> Jim Melton wrote:
> > My first reaction to David's point was, "so what? the compiler already knows
> > about the underlying system".
>
> What the compiler knows and what the programmer knows must be in sync,
> and therefore must be specified. That's why it has to be a part of the
> standard and of the language, so that everyone knows what is expected
> and what is required.

I think the barrier being encountered here is that such things are
already specified. Not by ISO C++, but by posix, or by win32.

The fact that it is in some sense already availible in a standarized
form in many environments I think blinds people to two issues:
(1) the standards in question have considerable effect on the
semantics and implementation of the language, (2) the semantics
in question were aimed at either unix + C, or windows and C, and
therefor may not be entirely suitable to C++.

Widley used third party party threading libraries for c++, such as
Boost Threads and ACE give the impression that a library only
approach is enough; the fact that these libraries rely on the
compiler providing aforementioned posix or win32 threading
semantics is an implementation detail, which is easy to remain
ignorant of or forget.

Now I'll bring up the issue of existing practice. As far as I know,
one of the comitee's duties is to standardize existing
practice. I look around, and I see that many environments provide
threading, and many applications rely on it. So threading is part
of existing practice.

llewelly

unread,
Aug 21, 2004, 6:17:24 AM8/21/04
to
Jim Melton <jim.m...@lmco.com> writes:

> "Hyman Rosen" <hyr...@mail.com> wrote in message
> news:10928615...@master.nyc.kbcfp.com...
>> David wrote:
>> > The primary reason for this would be that the compiler would need
>> > to know how the target computer implements threads and if it did
>> > not, for it to provide some underlying code to pretend that threading
>> > did exist.
>>
>> First of all, how is "pretend threading" different from "real threading"?
>> Second, Ada provides support for concurrency defined portably within the
>> language without any trouble at all. So does Java, for that matter, albeit
>> less well.
>>
>> > Much of my work is on systems that are at the embedded or kernel
> level.
>>
>> Then you should certainly be aware of Ada and its tasking mechanisms.
>
>
> Isn't Ada conceptually a cross between C++ and Java, in that Java only works
> in the context of the Java Virtual Machine (single, standardized run-time
> environment), C++ has absolutely no run-time environment requirements, and
> Ada has a required run-time environment (akin to, but not quite a virtual
> machine)?

Every language - even C++ - has a required run-time environment.
If you mean to say that Ada falls midway between C++ and Java in the
degree of insulation from platform-specific issues provided, that
seems correct, but I really can't say. Hopefully someone who
knows more Ada than me (like Hyman) will comment. If on the other
hand, you mean to say that Ada falls midway between C++ and Java
in terms of the overhead of the runtime environment, I don't
think so - I think an Ada runtime can be expected to have
approximately as much overhead as a typical C++ runtime.

>
> In C++, threading/tasking (when provided either through a C binding or an
> unportable extension) is directly mapped to the underlying kernel thread
> capabilities (hence, the lack of portability). Don't both Ada and Java
> define a threading/tasking model that must be mapped (with varying degrees
> of success) to the underlying kernel mechanisms? I do remember some
> "impedance mismatch" issues in Ada tasking, but that was many years ago, and
> I haven't stayed current.
>
> I'm not opposed to language support for MT, but it seems that the first
> thing required would be to derive a platform-independent construct that is a
> superset of all possible threading/tasking/scheduling models. For example,
> would the proposed standard support thread priorities? What then, would a
> platform that doesn't support preemptive multi-tasking do to comply with the
> standard?

[snip]

I think you are forgetting something. The comittee would like to
deliver a new standard by 200[89]. There are only 2 meetings per
year - so, 7 more meetings before 2008 rolls around, 9 more
before 2009. It seems to me any threading proposal this late in
the game must strive for minimalism and simplicity.

I am a long, long way from being a threads expert, but, if I
understand correctly, the minimal functionality is:

* An atomic operation, which could be any of:
- atomic compare and swap
- atomic test and set
- load-locked + store conditional

* Memory barriers.

* Create a new thread of execution.

* Destroy an existing thread of execution.

I don't think a threads proposal should try for more, on the grounds
that time is running short.

Jim Rogers

unread,
Aug 21, 2004, 6:18:01 AM8/21/04
to
Jim Melton <jim.m...@lmco.com> wrote in
news:cg60am$ja...@cui1.lmms.lmco.com:

> So, my assertion is correct, that Ada defines a model that must be
> mapped to the underlying kernel mechanisms. The fact that this model
> can be implemented independent of threads or any other specific kernel
> capability is a testament to the fact that it is high level indeed.
> And I do know that people are using Ada in real-time and embedded
> systems, so it is not a totally broken model.

The Ada threading (or tasking as it is named in Ada) model is very
workable. It is also extremely efficient. In those sense it is
definitely not broken.

>
> However, I have to think that, given C++'s heritage, any
> multi-threading support that might be included into the language would
> have to allow for the level of "down to the metal" programming
> available today. If the approach is to pick a model, and say that
> anyone who needs more than the model allows must use non-standard
> extensions, then I'd say you failed.

This idea of being "down to the metal" seems to be the fundamental
argument against MT support in C++. As I mentioned in another post,
C++ provides abstractions for file operations that are somewhat
above the metal. I do not understand why an abstract threading model
would be so much less acceptable.

>
> The problem that I see is that thread support is more than preventing
> two threads from accessing the same resource at the same time. It also
> has to address execution timelines, scheduling, and thread lifecycles.
> In my narrow thinking, this is necessarily OS dependent.

The problem with trying to stay "down to the metal" is that you will
always be stuck with OS dependent threading models. Unfortunately,
most of the OS threading libraries are written in C and not C++, so
they are pretty far from even being language standard libraries.

The solution used by languages (including Java and Ada) that provide
an abstraction for threading includes abstractions for priority control,
scheduling, and management of thread lifecycles.

Thinking about it now, I understand where the C++ distaste for higher
level abstractions at the language level may come from. Most C++
programmers use Java as their example of threading abstractions. That
model is inefficient and clumsy for all but the simplest designs.
The Java model need not be the basis for a C++ implementation.
C++ can quite clearly produce a much more efficient and elegant
threading model than the Java model.

>
>> One advantage of the higher abstraction levels for Ada tasking is
>> the portability of source code. Any tasking constructs that work
>> on a Linux system will also work on a Win32 system or an IBM
>> mainframe.
>
> Should this read "Any ADA tasking constructs..."? Or do you mean to
> imply that Win32 has a proper superset of tasking constructs available
> on Linux, and IBM mainframe has a proper superset of Win32 constructs?

I mean "Any Ada tasking constructs ...". I do not mean to imply any
proper subset relationship between Linux, Win32 and IBM mainframe
threading constructs. There is, however, a significant overlap of
threading abstractions between the different OS's. For instance,
the locking abstraction provided by a Linux mutex can be mapped to
the locking abstraction of a Win32 critical section. While one
construct is not a strict subset of the other, they provide similar
abstract capabilities, although the details may by quite different.

>
>> Improper use of thread priorities can result
>> in deadlocks.
>
> Improper use of mutexes can result in deadlocks. And improper memory
> management can result in core dumps. Improper syntax can result in
> compilation errors. A language that prevents you from improperly using
> something (by not letting you use it at all) would not be C++.

I did not mean to imply that thread priorities should be excluded from
a C++ MT model. I simply wanted to identify priorities as being one of
the more dangerous tools in threading. My general approach to priorities
is to NOT use them in my program unless performance issues dictate a
need for their use.

>
>> On the other hand, thread priorities can be useful
>> tools for implementing lock-free synchronization. Some Ada
>> compilers temporarily promote a task to the highest priority while
>> it is executing a method of a protected object (similar to a
>> critical section) and then return the task to its normal priority
>> as soon as it completes the method. This priority promotion gives
>> the task much of the control of a lock without the problems of
>> lock acquisition and release.
>
> ... as long as the task doesn't do anything that might cause it to
> block. Priority inversion is a tricky problem in distributed
> programming, in particular.

Absolutely true. The Ada standard declares that all operations on
protected objects must not be "potentially blocking". It then goes
on to define which kinds of actions are "potentially blocking".

The standard also goes on to define some rather sophisticated behavior
for protected operations.

Ada protected objects can have any combination of three different kinds
of operations:

* Procedures provide unconditional exclusive access to the protected
object. Procedures are allowed to alter the state of the protected
object.

* Entries provide conditional exclusive access to the protected object.
Entries are also allowed to alter the state of the object when a
declared boundary condition is true.

* Functions provide unconditional shared access to the protected object.
Functions are not supposed to alter the state of the object.

The standard describes this selective ability to update the protected
object in the following terms:

Within the body of a protected function (or a function declared immediately
within a protected_body), the current instance of the enclosing protected
unit is defined to be a constant (that is, its subcomponents may be read
but not updated). Within the body of a protected procedure (or a procedure
declared immediately within a protected_body), and within an entry_body,
the current instance is defined to be a variable (updating is permitted).

A protected entry automatically maintains a queue of tasks waiting for
the boundary condition to evaluate to TRUE. The tasks in the queue are
processed either in FIFO order, or in priority order, as selected by the
programmer.

When the boundary condition of a protected entry becomes TRUE, the next
task in the queue (following the chosen queuing policy) executes the
body of the protected entry. When the body execution is complete the
boundary condition is re-evaluated. If the condition is still TRUE,
and at least one more task is waiting in the entry queue, and the
first task still has time left on its time slice, the task
again executes the entry body as a proxy for the waiting task. This
achieves the efficiency of avoiding task switching. This cycle continues
until the boundary evaluates to FALSE, or the queue is empty, or the task
time slice expires.

I think you will recognize that this efficiency in the Ada tasking model
is not directly supported by any OS threading primitives. In fact, it is
more efficient than a simplistic "down to the metal" approach. The Ada
syntax for a protected object is also very simple, and therefore easy
for the programmer to understand.

There are many common design patterns for protected objects. I will
demonstrate a few. Remember that a protected object in Ada is a shared
memory region accessed by two or more tasks (threads). The protected
part implies protection from inappropriate simultaneous access. It has
no relation to the C++ concept of protected.

If you want a buffer designed so that reading tasks will read the
current value simultaneously, while writing tasks will have exclusive
unconditional access to the buffer you could write the following
code.

Note: Ada requires a separation of the interface specification from
the implementation. You will see that as what looks like two
definitions of the same thing, with different details.

protected Read_Many is
procedure Write(Item : in Integer);
function Read return Integer;
private
Internal : Integer := 0;
end Read_Many;

protected body Read_Many is
procedure Write(Item : in Integer) is
begin
Internal := Item;
end Write;
function Read return Integer is
begin
return Internal;
end Read;
end Read_Many;

In this example I have used a procedure to provide unconditional
exclusive update access and a function to provide unconditional
shared read access. The locking necessary to ensure this works
properly is defined in Posix as a read-write lock. The Ada compiler
generates all the necessary lock manipulation code.

Another example is where we want the reading threads to only read
value instances they have never read before. This model will
result in significant synchronization between the reading and
writing threads.

protected Read_Exclusive is
entry Write(Item : in Integer);
entry Read(Item : out Integer);
private;
Internal : Integer;
Is_New : Boolean := False;
end Read_Exclusive;

protected body Read_Exclusive is
entry Write(Item : in Integer) when not Is_New is
begin
Internal := Item;
Is_New := True;
end Write;

entry Read(Item : out Integer) when Is_New is
begin
Item := Internal;
Is_New := False;
end Read;
end Read_Exclusive;

In this example both the Read and Write operations modify the
state of the protected object. Each operation is a conditional
operation. The Read operation can only read new values. The
write operation can only overwrite old values. Two entries are
use for this purpose. Notice that the boundary conditions are
part of the implementation and not part of the interface.

You will also notice that the operations within the protected
object bodies are very simple and very fast. The longer the
operation, the slower the threads run because of the locking
semantics. You want to execute the critical sections as
quickly as possible.

Remember I mentioned a queue for each entry. The queue and its
operations are all handled by the compiler. The programmer only
deals with the domaine-specific behavior of the protected
object.

Jim Rogers

Andrei Alexandrescu (See Website for Email)

unread,
Aug 21, 2004, 6:18:35 AM8/21/04
to
"Brian Neal" <SurfCed...@netscape.net> wrote in message
news:26d6ee84.04082...@posting.google.com...

> Yes I agree. I am in favor of having a library for threads,
> semaphores, mutexes, condition variables, etc. Something like boost
> but much more "meat" is needed. I am not convinced that the language
> needs to support this directly however.

As I mentioned on csc++, the language needs to change the execution model so
that it accomodates threads. That, and not the library, is the hardest part.

>> That's fine for your particular performance needs. What if you or others
>> need to scale up your queue and it chokes in the face of contention,
>> while
>> your competitor has a queue that's 50x faster?
>
> Huh? So I rewrite it or tune it. How would using some queue sanctioned
> by the language avoid the same problem?

I was replying to the statement that a thread-safe queue is nothing more
than a queue with locking slapped onto it.

If the language doesn't offer the fine-grained synchronization mechanisms
needed for high-performance synchronization, no amount of "tuning" can do
anything. It's like the DCLP - it's fast, but doesn't work portably because
the language doesn't have the expressiveness to make it work.

Java has.

> I am still going to have
> problems with it under certain conditions, so I would have to write my
> own or tune it somehow in any event. I can't tell you how much of my
> job is to work around broken implementations or "standard"
> implementations that don't cut it in my specific environment.

At least you have the language in which to write the alternate
implementation.

>> Wrong. You can NOT implement threading support portably.
>
> Nice. Whatever. Nothing is truly portable in a real-time embedded
> system. However my team has written highly portable software without
> direct language support. We have done this by developing a set of OS
> abstraction classes without C++ knowing what we are doing.

Well good for you. But that can't form an argument against standardization.

> Maybe you can help me understand what you want added to the language?
> What is the problem you are so concerned about?

I wrote an explanatory post on csc++, in response to Ed Diener I recall.


Andrei

Joe Seigh

unread,
Aug 21, 2004, 11:15:07 PM8/21/04
to

The question is what the semantics of the memory barrier abstractions
should be. The problem is most people don't realize that there is
quite a variety of memory barriers. For instance, Sparc has all
possible combinations of
#Load#Load
#Load#Store
#Store#Store
#Store#Load

while Intel IA-32 just has
LFENCE
SFENCE
MFENCE

That doesn't include the various kinds of dependent loads and store ordering
which have important performance implications in lock-free programming.

Should we define 15+ kinds of memory barriers as the api, or just 3? If the
latter, some programmers may not be willing to take the performance hit of
using overly strong memory barriers and ignore the use of a portable api.

I'm not arguing that we should have memory barriers as an api since there
are problems when you go to too low a level for the api as has been shown.
But if you do want them, it's just not a simplistic as some would believe.

Joe Seigh

Andrei Alexandrescu (See Website for Email)

unread,
Aug 21, 2004, 11:15:30 PM8/21/04
to
"llewelly" <llewe...@xmission.dot.com> wrote in message
news:86brh5u...@Zorthluthik.local.bar...

> I am a long, long way from being a threads expert, but, if I
> understand correctly, the minimal functionality is:
>
> * An atomic operation, which could be any of:
> - atomic compare and swap
> - atomic test and set
> - load-locked + store conditional
>
> * Memory barriers.
>
> * Create a new thread of execution.
>
> * Destroy an existing thread of execution.
>
> I don't think a threads proposal should try for more, on the grounds
> that time is running short.

You forgot changes to the observable behavior.

Also, IIRC, test & set is less powerful than CAS or LL/SC.

Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Aug 22, 2004, 6:45:16 AM8/22/04
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:412767F8...@xemaps.com...
[snip]

> Should we define 15+ kinds of memory barriers as the api, or just 3? If
> the
> latter, some programmers may not be willing to take the performance hit of
> using overly strong memory barriers and ignore the use of a portable api.
>
> I'm not arguing that we should have memory barriers as an api since there
> are problems when you go to too low a level for the api as has been shown.
> But if you do want them, it's just not a simplistic as some would believe.

I think that's a great point and question to be asked.

My (not crystal-clear yet) view is to introduce causality semantics, such as
"this thing must happen before that thing as seen by all threads" and let
the compiler figure out what barriers exactly to insert. Key is to allow
programmers to "express themselves". Giving a very granular set of 16
barriers is one way; but giving only 1-2 causality constructs could have the
compiler insert any of the 16 barriers, depending on what expressions the
causality construct separates.


Andrei

Joe Seigh

unread,
Aug 22, 2004, 7:00:34 PM8/22/04
to

"Andrei Alexandrescu (See Website for Email)" wrote:
>

> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:412767F8...@xemaps.com...
> [snip]
> > Should we define 15+ kinds of memory barriers as the api, or just 3? If
> > the
> > latter, some programmers may not be willing to take the performance hit of
> > using overly strong memory barriers and ignore the use of a portable api.
> >
> > I'm not arguing that we should have memory barriers as an api since there
> > are problems when you go to too low a level for the api as has been shown.
> > But if you do want them, it's just not a simplistic as some would believe.
>
> I think that's a great point and question to be asked.
>
> My (not crystal-clear yet) view is to introduce causality semantics, such as
> "this thing must happen before that thing as seen by all threads" and let
> the compiler figure out what barriers exactly to insert. Key is to allow
> programmers to "express themselves". Giving a very granular set of 16
> barriers is one way; but giving only 1-2 causality constructs could have the
> compiler insert any of the 16 barriers, depending on what expressions the
> causality construct separates.

Well, I have done semantic definition in terms of observable semantics at the
program level. Basically rules that say "if you see this then you will see this"
or "if you do not see this then you will not see this" which you can use to
make straightforward inference proofs of various program properties.

The semantics were for mutual exclusion which I posted to c.p.t. here
http://groups.google.com/groups?threadm=3A111C5A.A49B55CA%40genuity.com
sort of to give you an idea of what it might look like. There are
problems with that version (besides the awkward wording) when I looked
at defining other synchronization constructs such as semaphores, etc...
I ended up breaking out the visibility definition out from the mutex
definition so the visibility rules could be applied to the other contructs
as well. They are very different from what I posted earlier.

When you make up the rules, you go for "sufficient, necessary, and useful"
as your guidelines. For example, one of the rules for an atomic<T> type
might be

If x is a variable of type atomic<T> and y is a variable of any type and
if a thread does a store into y followed by store into x, then if any
thread sees the store into x, it will see the store into y.

Sounds a little bit like memory barriers stuff, but we didn't actually mention
memory barriers and it is left up to the implementation of atomic<T> on various
platforms whether memory barriers are required and if so, which ones.

So when you define the pattern for DCL, you specifiy the flag variable as being
of atomic<T> and using the above rule and other rules you can prove all the
necessary properties for DCL. And so, if you have atomic<T> then you can
do DCL correctly and portably.

One of the things in multi-threaded programming that takes a while, maybe forever,
in getting used to, is how to specify the properties that you want to prove about
your programs. It's different than the pre-condition, post-condition properties
that most people are used to. Those don't seem to work with multi-threading.

However, you can provide well known patterns for people to use that already have all
the important properties proven. If you're good you can invent new patterns and
prove the necessary properties about them yourself. If you're really good you
can invent new synchronization contructs and rules for them that turn out to be
useful for inventing new patterns.

Getting back to memory barriers. I think they are at too low a level to be easily
defined in a portable way. I've done synchronization implementations on many different
platorms, ie. mainframes, unix, win32, etc.. and have a pretty good idea of what is
portable. Memory barriers is not one of them. I suppose you could do some generic
ones but I don't see much use or need for them except in implementing other synchronization
contructs, and I don't think they should be used at the general level. But since
memory barriers usually need to be inline for performance reasons, you may want to provide
a standard mechanism so that even though they are platform dependent, implementations need
not be compiler dependent.

Also, note that when you do definitions this way, you don't actually define what threads
are directly. Thread behavior is defined indirectly. So you don't need to define
every thing about threads up front. Just the basics like creation and deletion, and
refine their definitions with the various synchronization contructs. Plus it leaves things
open ended and extensible.

Joe Seigh

llewelly

unread,
Aug 23, 2004, 6:25:33 AM8/23/04
to
"Andrei Alexandrescu \(See Website for Email\)" <SeeWebsit...@moderncppdesign.com> writes:

> "llewelly" <llewe...@xmission.dot.com> wrote in message
> news:86brh5u...@Zorthluthik.local.bar...
>> I am a long, long way from being a threads expert, but, if I
>> understand correctly, the minimal functionality is:
>>
>> * An atomic operation, which could be any of:
>> - atomic compare and swap
>> - atomic test and set
>> - load-locked + store conditional
>>
>> * Memory barriers.
>>
>> * Create a new thread of execution.
>>
>> * Destroy an existing thread of execution.
>>
>> I don't think a threads proposal should try for more, on the grounds
>> that time is running short.
>
> You forgot changes to the observable behavior.

I forgot it because it was obvious. :-)

>
> Also, IIRC, test & set is less powerful than CAS or LL/SC.

Could be. I'm a long way from being a threads expert. It's certainly
less expressive. I'd appreciate an explanation if someone has one.

Hyman Rosen

unread,
Aug 23, 2004, 6:32:08 AM8/23/04
to
Joe Seigh wrote:
> I'm not arguing that we should have memory barriers as an api since there
> are problems when you go to too low a level for the api as has been shown.
> But if you do want them, it's just not a simplistic as some would believe.

The number of programmers who would benefit from a defined, portable, standard
way to do concurrent programming in C++ is vastly larger than the number who need
to know about the micro minutia of memory barriers, CAS, and the other low level
ways of supporting it.

Programming languages don't normally provide support for all the arcane things
any piece of hardware can do. Both the Motorola 68000 family and the intel 8086
family of processors have support for decimal arithmetic. Yet that is nowhere to
be found in C++, and isn't especially missed.

I can't repeat this often enough - C++ would do well to investigate Ada, which
hasl already been through a couple of iterations of this design process. Ada
doesn't come with memory barriers or other access to low-level system-specific
primitives, but it is still entirely adequate even for demanding real-time code.

Andrei Alexandrescu (See Website for Email)

unread,
Aug 23, 2004, 6:28:50 PM8/23/04
to
"llewelly" <llewe...@xmission.dot.com> wrote in message
news:86k6vqs...@Zorthluthik.local.bar...

> > Also, IIRC, test & set is less powerful than CAS or LL/SC.
>
> Could be. I'm a long way from being a threads expert. It's certainly
> less expressive. I'd appreciate an explanation if someone has one.

http://www.podc.org/dijkstra/2003.html

Herlihy proved in 1991 what "strength" each primitive has wrt
multithreading. That seminal work has had a huge theoretical and practical
influence on processor instruction sets definition and on how MT software is
written.

Andrei

David

unread,
Aug 27, 2004, 7:12:52 AM8/27/04
to
On Fri, 20 Aug 2004 00:50:38 UTC, Nicola....@ObjectWay.it (Nicola
Musatti) wrote:

> "David" <FlyLike...@United.Com> wrote in message news:<rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost>...
> [...]
> > While I understand your position, it would be unreasonable to add
> > a thread concept to either the C or C++ standards. The primary


> > reason for this would be that the compiler would need to know how
> > the target computer implements threads and if it did not, for it to
> > provide some underlying code to pretend that threading did exist.
> >

> > While we can compare C/C++ vs. Java as languages, Java also has
> > certain other concepts that C/C++ does not. Java has the sandbox,
> > files, threads/processes, and other concepts more associated with
> > a definition of an operating system.
>
> I agree with you. In a way Andrei is addressing the wrong body, when
> probably he should lobby Microsoft and the Open Group (and possibly
> others for less common platforms) to get together and develop a
> portable, modern C++ API to the underlying OS services.
>
> Even if the two were to develop independent, incompatible API's it
> would be an improvement over the current situation.
>
> Cheers,
> Nicola Musatti

Hello Nicola,

I've been away on business for a week and am enjoying the wonderful
and thoughtful responses to Andrei's post.

I agree with you. As stated elsewhere in this thread, "most
programmers are not sufficiently trained in the use of threads --
this leads to subtle problems the original developer will not find."
At least that is a paraphrase of the post.

Before we can argue for a far reaching concept to put into a
compliant C/C++ compiler, we must first agree on what multi-
threading concepts would be supported. Since there is not much
standardization in most shops on how to perform synchronization
on a single object, how can we expect to define such abstractions
for all potential users?

The STL and Boost are nice tools that came a bit late in the game.
It took a relatively long time for many peoples ideas on what
something like a std::string should look like. I doubt STL even
covers what most people need or works in the way that they first
expect. It takes time for a developer to understand and be proficient
with all the concepts a given toolset provides them.

Multi-threading concepts are useful tools to the skilled developer.
Since our society allows for anyone to develop nearly anything, it
might be useful to place an industrial strength warning label on
any proposed multi-threading toolset. Something like "WARNING:
USE OF THIS CONCEPT WILL LIKELY PROVE FATAL TO YOUR PRODUCT AT AN
UNFORSEEN TIME IN THE FUTURE. USE THIS POWER-TOOL WISELY TO
ACHIEVE YOUR GOALS."

I'd add a disclaimer about non-certified developers or apprenticeship
but my opinion on that topic is that most, if not all developers,
must lose a foot or two when learning to use new concepts. We all make
mistakes and tend to want to create our masterpiece on the first try
without making a few models or learning to build smaller bridges
first.

I'd also like to comment on the thought that "the compiler" is the
right place to add a multi-threading concept. Concepts must be supported
cooperatively by all the components used to develop and run the
resulting product. For instance, there were many good developers and
organizations that were working with threads on the Intel processors
for years that found their models broke when symetric multi-processor
systems were introduced and again later when the hyper-threading
processors came out. You could have a wonderful compiler and toolset
that creates a valid product, and have the next generation of hardware
or toolset upset the delicate balance that you once had and perhaps
later do not understand why your product now fails.

I'll post this later in a new thread but I'll mention it here too.
I was reading the July 2004 issue of the IEEE Computer magazine and
its focus covered many of the topics we are discussing in this thread.
It is fairly deep and is great reading. One of the concepts they
covered is the new EDGE model of processor design. They covered a
processor that is being designed, along with all the support tools
and language components, to support a massively parallel CPU. Their
first hardware will load up to 1,024 instructions as a single block
that will be run on 128 smaller processors that may, or may not, be
cooperating on the same problem. They just blew our whole concept
of multi-threading to pieces with something much larger to
comprehend. For those of you who aren't a member of the IEEE-CS
or take this publication, you may be able to find it at your local
library.

David

David

unread,
Aug 27, 2004, 7:15:01 AM8/27/04
to
Hello Paul,

I'll get to the rest of your message tomorrow. I wanted to comment
on your last point before I lost my train of thought.

On Fri, 20 Aug 2004 01:22:26 UTC, "Balog Pal" <pa...@lib.hu> wrote:

> "David" <FlyLike...@United.Com> wrote in message
> news:rOdGr40LMPU3-pn2-oQeJjQNZyuoY@localhost...
>

> > While I understand your position, it would be unreasonable to add
> > a thread concept to either the C or C++ standards. The primary
> > reason for this would be that the compiler would need to know how
> > the target computer implements threads and if it did not, for it to
> > provide some underlying code to pretend that threading did exist.
>

<snip>


>
> > C/C++ was not intended to provide the notion of even a

> > single thread -- it defines a sequence of operations.
>
> That's how generally a thread is defined IMHO. ;-) So we already have
> defined behavior of a single thread. TODO is definig interaction with
> another.
>
> Paul

That may be your impresion of a thread, but its not necessarily true.
The language you write in C/C++/Java/etc may show you a concept of a
thread for a given code fragment. However, your compiler, subsequent
tools, or even the hardware, may choose to perform your sequence of
operations in a different and hopefully equivalent manner. They may
even choose to subtitute entirely different and hopefully equivalent
operations. You are telling all your tools how to perform a task.
As the tools change, so may what they do just below the level of
our understanding. That can become very important. What we
define in the language we work in (our abstraction layer) may become
something totally different when it reaches the target machine.
We have some control over this through compiler and linker switches.
The hardware itself could perform additional transformations that
we don't plan on. We have to understand our tools, what they
are capable of doing, when they can be safely used, and when they
might fail or otherwise behave unexpectedly.

Hyman Rosen

unread,
Aug 27, 2004, 10:36:39 PM8/27/04
to
David wrote:
> Since there is not much standardization in most shops on how to
> perform synchronization on a single object, how can we expect to
> define such abstractions for all potential users?

By looking at the experience of programming languages which
have already successfully done so. Like Ada.

> their models broke

This is why we keep saying that MT must be known to the compiler.
The people whose models broke were relying on assumed rather than
guaranteed behavior.

Joe Seigh

unread,
Aug 29, 2004, 7:10:24 AM8/29/04
to

Hyman Rosen wrote:
>
> David wrote:
> > Since there is not much standardization in most shops on how to
> > perform synchronization on a single object, how can we expect to
> > define such abstractions for all potential users?
>
> By looking at the experience of programming languages which
> have already successfully done so. Like Ada.

The original post that started this was about support for things
like lock-free multi-threaded programmong. Ada has statically
preserved the state of multi-threaded programming as it was when
it was defined in the 80's. If there are frontiers in concurrent
programming to be conquered, it won't be with Ada. It almost might
not have happended in Java. Somebody incorporated a lock-free
algorithm in AWTEventMulticaster that wasn't supported by the orginal
Java memory model. Bug reports were filed. Rather than fix it,
Java support asked for a testcase that would reproduce the bug, knowing
that it would not show up in the current JVM implementations. They later
"fixed" the problem with JSR133/166 which officially allow that and
other techiques for lock-free programming. But note that these lock-free
techniques came from outside Java.

>
> > their models broke
>
> This is why we keep saying that MT must be known to the compiler.
> The people whose models broke were relying on assumed rather than
> guaranteed behavior.


The models being refered to that broke was when going from single
processor multi-threading to multi-processor multi-threading. This
lesson was learned in the late 60's by IBM in the mainframes. Of course
they didn't have things like POSIX pthreads back then. But POSIX doesn't
help for things like hyperthreading because POSIX has no concept of cache
which may work for or against you when using hyperthreading.

But you can't future proof your code. You can try. You could just run
as a single thread and that will always work. But it's not going to be
feasible for implementing a server on a 1024 processor system. Even
POSIX in it's current form won't help. POSIX has no forward progress
guarantees and that along with it having no notion of cache and the
fact that cache contention will be extremely expensive on a 1024 way,
means that scalability will become an important issue. And scalability
is one of the issues being addressed in current research in lock-free
algorithms.

I don't think the languange can solve all these problems, but it can
certainly be less of a obstacle to solving them with better support
at some level. It just won't be the all emcompassing solution that some
people have in mind.

Joe Seigh

Jim Rogers

unread,
Aug 29, 2004, 6:44:40 PM8/29/04
to
Joe Seigh <jsei...@xemaps.com> wrote in
news:41308AE0...@xemaps.com:

> The original post that started this was about support for things
> like lock-free multi-threaded programmong. Ada has statically
> preserved the state of multi-threaded programming as it was when
> it was defined in the 80's. If there are frontiers in concurrent
> programming to be conquered, it won't be with Ada.

<< SNIP >>

The Ada language reference model does not preclude the use of lock-free
mechanisms. In fact, lock-free mechanisms have been used by some
compiler vendors since the 1995 standard of Ada.

Ada protected objects are used to provide shared data buffers between
concurrent tasks. There are three kinds of operations allowed on a
protected object.

Protected procedures provide unconditional exclusive access to the
protected object. Procedures are used to alter the state of the protected
object without any associated condition or condition variable.

Protected entries provide conditional exclusive access to the protected
object. Entries are used to alter the state of a protected object when
that object is in a specified state. Entries always have an associated
queue for tasks waiting for the condition variable to become true.

Protected functions provide unconditional shared access to the protected
object. Functions provide read-only access to the protected object.

The Ada standard does not specify the mechanism to achieve exclusive
access for procedures and entries. A conforming Ada compiler can use
locks, and many have. A conforming Ada compiler can also use lock-free
mechanisms. The requirement for exclusive access is functional. It
describes the abstract condition in effect during the execution of a
procedure or entry. The requirement does not specify the mechanism or
implementation.

The requirement for exclusive access of procedures and entries can be
implemented in a manner that supports priority inversion deadlocks.
Some Ada compiler vendors have fixed that problem by the clever use of
a simple approach. While a task is executing a protected procedure or
entry its priority is temporarily promoted to the highest priority on
its processor. That allows the task to execute without interruption
and without the use of a lock. The task priority is returned to its
"normal" level upon completion of the protected procedure or entry.

Ada compiler vendors are free to implement lock-free mechanisms at
any time. They can do so without violating the standard. Why, then,
would Ada implementations be left without lock-free mechanisms that
are shown to be reliable, deterministic, and efficient?

This is an example of the point I tried to make earlier, that many
people in this discussion are thinking at the wrong level. Trying to
specify a C++ threading standard at the implementation level is both
non-portable and future limiting. Any C++ standard for threading should
be specified at the functional, abstract level. This allows each
compiler implementation to achieve efficiencies based upon the target
processing environment. It also allows compiler vendors to include new
mechanisms, such as lock-free mechanisms, without violating or
"extending" the standard.

Jim Rogers

Andrei Alexandrescu (See Website for Email)

unread,
Aug 30, 2004, 7:23:21 PM8/30/04
to
"Jim Rogers" <jimmaure...@worldnet.att.net> wrote in message
news:Xns955454524BE1D...@204.127.36.1...

> The Ada standard does not specify the mechanism to achieve exclusive
> access for procedures and entries. A conforming Ada compiler can use
> locks, and many have. A conforming Ada compiler can also use lock-free
> mechanisms. The requirement for exclusive access is functional. It
> describes the abstract condition in effect during the execution of a
> procedure or entry. The requirement does not specify the mechanism or
> implementation.

I doubt that could fly within the next 20-30 years. Lock-free programming is
"optimistic concurrency control" which means an atomic reversible operation
is tried again and again until it succeeds. Finding the smallest units of
work that take the system reversibly from one stable state to the next is
pure voodoo and I highly doubt that with today's technology, an optimizing
compiler could simply generate code for a procedure to use lock-free code.

Andrei

Joe Seigh

unread,
Aug 31, 2004, 6:05:59 AM8/31/04
to

Jim Rogers wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in
> news:41308AE0...@xemaps.com:
>
> > The original post that started this was about support for things
> > like lock-free multi-threaded programmong. Ada has statically
> > preserved the state of multi-threaded programming as it was when
> > it was defined in the 80's. If there are frontiers in concurrent
> > programming to be conquered, it won't be with Ada.
>
> << SNIP >>
>
> The Ada language reference model does not preclude the use of lock-free
> mechanisms. In fact, lock-free mechanisms have been used by some
> compiler vendors since the 1995 standard of Ada.
>

(snip)


> Ada compiler vendors are free to implement lock-free mechanisms at
> any time. They can do so without violating the standard. Why, then,
> would Ada implementations be left without lock-free mechanisms that
> are shown to be reliable, deterministic, and efficient?
>
> This is an example of the point I tried to make earlier, that many
> people in this discussion are thinking at the wrong level. Trying to
> specify a C++ threading standard at the implementation level is both
> non-portable and future limiting. Any C++ standard for threading should
> be specified at the functional, abstract level. This allows each
> compiler implementation to achieve efficiencies based upon the target
> processing environment. It also allows compiler vendors to include new
> mechanisms, such as lock-free mechanisms, without violating or
> "extending" the standard.

Lock-free semantics are different than standard lock semantics. Lock-free
implementations require specific knowledge of the data structures, something
a compiler could not have. There has to be external synchronization contructs
that allow you to do lock-free programming at the application level. Ada does
not have this.

Joe Seigh

Hans-J. Boehm

unread,
Aug 31, 2004, 11:01:08 PM8/31/04
to
llewelly <llewe...@xmission.dot.com> wrote in message news:<86brh5u...@Zorthluthik.local.bar>...

> I am a long, long way from being a threads expert, but, if I
> understand correctly, the minimal functionality is:
>
> * An atomic operation, which could be any of:
> - atomic compare and swap
> - atomic test and set
> - load-locked + store conditional
>
> * Memory barriers.
>
> * Create a new thread of execution.
>
> * Destroy an existing thread of execution.
>

In my view, the first priority is to make it possible to write
correct lock-based code. Lock-free code comes second. (It's
much harder to get right, and in my experience not always the
best performing solution. I do agree that there are important
cases in which it's the only solution with reasonable performance,
and it needs to be handled eventually.)

Currently there is no way to write guaranteed-correct lock-based
C or C++ code. The main problems, as I see them, are:

0) Since there is no discussion of threads in the standards, there
is no language-based specification that even prohibits reordering of
memory operations and locking operations. For Windows C++ users, I
know of no place this is specified at all. For the rest of this,
I'll concentrate on pthreads, which does try to specify some
things.

1) There is no specification (C, C++, or pthreads) that prohibits
extra writes to memory that may introduce races, not visible in the
source. Compilers regularly introduce such writes, and sometimes
they have to. As an example, consider:

struct { short a; char b; char c };

where the programmer intended a and c to be protected by one lock,
and b by another. If I write

z.a = 17; x.c = 'a';

the compiler may choose to read x.b, assemble the word representing
the resulting structure in a register, and write the resulting (32-bit?)
quantity back to x with a single store instruction. But that may
lose a concurrent update to x.b.

The current pthreads spec explicitly allows this, no matter what the
field types are. I believe it even allows it for adjacent (after
possible reordering) global variables. I believe this needs
to be severely constrained to have any hope of writing correct portable
multithreaded code. (In Java, a blanket prohibition of this sort
of thing is clearly called for. In C/C++, bit fields complicate matters
a bit. And there are rumored to be architectures that can't do byte
writes. I would argue that those are unsuitable for multithreaded use
with C/C++ or at least they can't pack bytes by default.)

2) Even the pthreads standard is very unclear about prohibiting
movement of memory operations around locks. Consider the
C code (this isn't really possible with the usual C++ locking
style, though it's certainly possible in the C++ language):

for (...) {
if (...) pthread_mutex_lock(...);
x = ... x ...
if (...) pthread_mutex_unlock(...);
}

(x is a global protected by the lock. It is common to acquire
locks conditionally like this to avoid the locking overhead in
the single threaded case.)

Assume the compiler determines (e.g. based on profile feedback)
that the conditionals are usually not taken, e.g. because this
application rarely creates a second thread. Ignoring the
semantics of pthread_mutex_lock() and pthread_mutex_unlock(),
it is beneficial to promote x to a register r in the loop,
which gets us

r = x;
for (...) {
if (...) { x = r; pthread_mutex_lock(...); r = x; }
r = ... r ...
if (...) { x = r; pthread_mutex_unlock(...); r = x; }
}
x = r;

The pthread standard requires that memory must be
"synchronized with" the logical program state at the
pthread_mutex_lock() and pthread_mutex_unlock() calls. By a
straightforward interpretation of that statement, I think it's
satisfied by the transformation. The problem is that I've
introduced extra reads and writes outside the lock, and thus the
resulting code is completely broken, in spite of the fact that
the implementation seems to satisfy the letter of the spec, and
is performing transformations that are reasonable without threads.
(This was extracted from a real failure case. Unfortunately,
fixing it imposes a real performance penalty, since you can't in
general tell whether a function call will have lock() or unlock()
semantics.)

Once we get past these, we can look at the issue for lock-free
code. Here I think that the Java model basically gives us the
right answers, though there are no doubt a few minor issues that
we want to reconsider since C++ is positioned differently.

Once you get to lock-free algorithms, I'd like to make a pitch for
primitives that look more like the atomic_ops package pakaged with
qprof. It is described briefly at
http://www.hpl.hp.com/research/linux/qprof/README_atomic_ops.txt
The important distinction is that it implements operations which are
pairs of (atomic operation, ordering semantics). This allows
programmers to express more precisely what is required in a particular
implementation, and allows it to be automatically and efficiently implemented
on a variety of hardware. In particular, it can take advantage of
something like the (rather elegant and efficient, in my opinion) acquire-load
and release-store instructions on Itanium.

Hans

llewelly

unread,
Sep 1, 2004, 6:08:17 AM9/1/04
to
Hans_...@hp.com (Hans-J. Boehm) writes:
[snip]

> 1) There is no specification (C, C++, or pthreads) that prohibits
> extra writes to memory that may introduce races, not visible in the
> source. Compilers regularly introduce such writes, and sometimes
> they have to. As an example, consider:
>
> struct { short a; char b; char c };
>
> where the programmer intended a and c to be protected by one lock,
> and b by another. If I write
>
> z.a = 17; x.c = 'a';
>
> the compiler may choose to read x.b, assemble the word representing
> the resulting structure in a register, and write the resulting (32-bit?)
> quantity back to x with a single store instruction. But that may
> lose a concurrent update to x.b.
>
> The current pthreads spec explicitly allows this, no matter what the
> field types are. I believe it even allows it for adjacent (after
> possible reordering) global variables. I believe this needs
> to be severely constrained to have any hope of writing correct portable
> multithreaded code. (In Java, a blanket prohibition of this sort
> of thing is clearly called for. In C/C++, bit fields complicate matters
> a bit. And there are rumored to be architectures that can't do byte
> writes. I would argue that those are unsuitable for multithreaded use
> with C/C++ or at least they can't pack bytes by default.)
[snip]

Here are some notions:

(a) Provide a qualifier which specifies that an object so
qualified is not allowed to be affected by such actions:

thread_shared: If an object is declared 'thread_shared', reads
and writes to other objects shall not result in reads or writes
to any thread_shared object. [Note: On some architectures,
padding will be required for thread_shared objects smaller than
the architecture's word size. ] A bitfield cannot be
thread_shared, unless it is a member of a compound object which
is thread_shared . It seems to me 'thread_shared' would most
naturally be a cv-qualifier, but I am far from certain that is
the right approach. The goal is to prohibit extraneous reads and
writes where the programmer specifies it be prohibited, and
no-where else.

(b) Require that objects with an alignment at least as strict as int
shall not be read or written to as a result of reads or writes to
other objects. The goal is to prohibit extraneous reads or writes
for objects large enough to allow individual writes.

Both (a) and (b) require that the standard define what 'reads' and
'writes' are, and what operations result in 'reads' and/or
'writes' to what objects.

I am suggesting these ideas primarily to find out what people with
more threads expertise than me think of them.

Alexander Terekhov

unread,
Sep 1, 2004, 4:44:23 PM9/1/04
to

"Hans-J. Boehm" wrote:

[... memory isolation ...]

You might want to take a look at this thread:

http://google.com/groups?threadm=EN7u9.8900%24Af5.345070%40newsfep2-win.server.ntli.net
(Subject: Re: Memory isolation)

Here's copy&paste.

----
> 2. "isolation" scopes [might also be nested; possibly] for defs of
> objects of static storage duration and non-static class members:
>
> isolated {
>
> static char a;
> static char b;
> static mutex m1;
>
> }
>
> isolated {
>
> static char c;
> static char d;
> static mutex m2;
>
> }

For ease of comparison I'll re-write your examples as I would
write them if the isolation rules I proposed were in place.
(Just to show that we don't *need* a new keyword.)

static struct {
char a;
char b;
mutex m1;
} e;
static struct {
char c;
char d;
mutex m2;
} f;

or

static char a;
static char b;
static mutex m1;
static char c;
static char d;
static mutex m2;

That last is "over-isolated" compared to the others,
but given the relatively small amount of static data
in most programs any extra padding is likely to be
negligible (and the mutexes will most likely already be
aligned and padded to avoid cross-thread
interference in any case).

(Corrections applied to following.)
> struct something {
>
> isolated {
>
> char a;
> char b;
> mutex m1;
>
> }
>
> isolated {
>
> char c;
> char d;
> mutex m2;
>
> }
>
> } s; // isolated by default -- see below

struct something {
struct internal {
char a;
char b;
mutex m1;
};
struct internal c;
struct internal d;
} s;
> This would allow one to clearly express isolation boundaries.

Your use of the "isolated" keyword is sufficient, but it's
not necessary.

> By default, definitions of objects of static storage duration
> shall be treated as being isolated from each other:
>
> static char a; // isolated { static char a; }
> static char b; // isolated { static char b; }

Yes, I agree, and so do the rules I posted.

> Objects of automatic storage duration need NOT be isolated
> [the isolation of the entire thread stack aside] unless an
> address/ref is taken and it can't be proven that access to
> it from some other thread is impossible.

I agree with your intent, but you don't need to say that.
Just say they "shall be isolated" and let the implementations
figure out what they actually have to do to achieve it for
each object. If that's nothing and they can easily deduce
that during compilation, they will do.

> 3. Array elements can be made isolated ONLY using class type
> with "isolated" member(s):
>
> char c_array[2]; // no isolation with respect to elems

This is the same with my model.

(Corrections applied to following.)
> struct isolated_char {
>
> isolated { char c; }
>
> } ic_array[2]; // fully isolated ic_array[0].c
> // and ic_array[1].c

struct ichar { char c; } ic_array[2];

I think the two sets of rules are the same except that
in your model sub-objects or array elements of
class-type are not guaranteed to be isolated from
their "sibling" sub-objects or elements, and in mine
they are. Both models work.

In other words your "isolated" has the same semantics
with respect to isolation as sub-object structs/classes/
unions in mine.

I don't think there is any real difference in the
isolation boundaries achievable with the two sets
of rules: they just differ in their defaults and how
you control them.

So, the default amount of isolation in your model
is slightly less than in mine, which means you are
forced to hand-tweak the isolation boundaries
slightly more often to ensure isolation safety. In
favour of your rules you will undoubtedly save a
few bytes here and there in many programs. Since
ideally we would just be standardising current
practice, it would be interesting to know what current
compilers do with respect to isolation units (if they
even consider them).

The other main difference is the addition of the
keyword. Why are new keywords a bad thing?

- Any programs that already use the identifier
"isolated" (e.g. for a variable or type) will break. If
you choose the uglier "__isolated" instead you would
avoid breaking standard-conforming programs
provided no compiler vendor had already used this
as an extension. (The language standards say that
names containing double underscores are reserved
for implementations.)

- You create a mismatch between pre- and post-
isolated-aware code. Any single use of the new keyword
means the program will not compile on a pre-isolated
compiler.

> 4. Introduce something ala offsetof-"magic" with respect to
> alignment/padding that would provide the means to write
> thread-shared *AND* thread-private allocators entirely in
> standard C/C++.

What problems are there at the moment that wouldn't
be fixed by either set of suggested isolation rules?

> 5. In the single threaded "mode", isolation scopes can simply
> be ignored.

Again, you are right, but you do not need to say so
explicitly: implementations can figure that out for
themselves, provided they can tell the difference
between a single- and a multi-threaded build.

I bet that most of them will choose to keep the sizes
of all types the same across the different build models
though. Doing otherwise is not wrong but is likely to
break code that works but assumes more than it
should about structure layouts. (Some people think it
is a good thing for compilers to go out of their way to
break non-conforming code, though. Maybe I'm too
soft ;-)

Kind regards

Garry Lancaster
Codemill Ltd
----

regards,
alexander.

David Abrahams

unread,
Sep 1, 2004, 4:50:02 PM9/1/04
to
Hans_...@hp.com (Hans-J. Boehm) writes:

> r = x;
> for (...) {
> if (...) { x = r; pthread_mutex_lock(...); r = x; }
> r = ... r ...
> if (...) { x = r; pthread_mutex_unlock(...); r = x; }
> }
> x = r;
>
> The pthread standard requires that memory must be
> "synchronized with" the logical program state at the
> pthread_mutex_lock() and pthread_mutex_unlock() calls. By a
> straightforward interpretation of that statement, I think it's
> satisfied by the transformation.

The only way I can see to support that interpretation is by
considering "memory" to include registers, but in the context of MT,
that seems like a perverse way to read "memory." It seems to me that
"memory must be synchronized" *has* to mean that the state of all
variables is visible to all threads and processors. Variables whose
values are stored in registers (and in a SMP system, processor-local
caches) must have been written to main (in an SMP system, shared)
memory.

--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Joe Seigh

unread,
Sep 1, 2004, 5:20:40 PM9/1/04
to

"Hans-J. Boehm" wrote:
>
> 1) There is no specification (C, C++, or pthreads) that prohibits
> extra writes to memory that may introduce races, not visible in the
> source. Compilers regularly introduce such writes, and sometimes
> they have to. As an example, consider:
>
> struct { short a; char b; char c };
>
> where the programmer intended a and c to be protected by one lock,
> and b by another. If I write
>
> z.a = 17; x.c = 'a';
>
> the compiler may choose to read x.b, assemble the word representing
> the resulting structure in a register, and write the resulting (32-bit?)
> quantity back to x with a single store instruction. But that may
> lose a concurrent update to x.b.
>
> The current pthreads spec explicitly allows this, no matter what the
> field types are. I believe it even allows it for adjacent (after
> possible reordering) global variables. I believe this needs
> to be severely constrained to have any hope of writing correct portable
> multithreaded code. (In Java, a blanket prohibition of this sort
> of thing is clearly called for. In C/C++, bit fields complicate matters
> a bit. And there are rumored to be architectures that can't do byte
> writes. I would argue that those are unsuitable for multithreaded use
> with C/C++ or at least they can't pack bytes by default.)

This is called word tearing in c.p.t. It's a known issue/problem in
that it is not addressed by Posix pthreads. I don't know why except
that part of the solution might require cooperation from the ANSI C
committee.

AFAIK posix pthread implementations just use normal compiler behavior
for external function calls. It's a little stricter than the unwritten
rules for Posix require but I don't think any compiler does anything
special for Posix lock and unlock functions like it might do for
something like setjmp().

What I suspect may have happened in the above example is either the
compiler had a bug (unlikely) or somebody made incorrect assumptions
about forward progress of visibility of memory access. Posix makes
no guarantees about forward progress of memory visiblity when Posix
synchronization is not used. If you make a store into a variable,
there is no guarantee that any other thread will see that store
within a fixed amount of time. Even if you get a lock. Unless of
course you have to see the store to be consistent with the synchronization
of other stores by Posix lock functions. Of course if you make
any kind of external function call or use volatile on the variables,
then the compiler will issue memory loads and stores for the variable
and you will see the updates PDQ on computers with strongly coherent
cache even without Posix synchronization.



>
> Once we get past these, we can look at the issue for lock-free
> code. Here I think that the Java model basically gives us the
> right answers, though there are no doubt a few minor issues that
> we want to reconsider since C++ is positioned differently.
>
> Once you get to lock-free algorithms, I'd like to make a pitch for
> primitives that look more like the atomic_ops package pakaged with
> qprof. It is described briefly at
> http://www.hpl.hp.com/research/linux/qprof/README_atomic_ops.txt
> The important distinction is that it implements operations which are
> pairs of (atomic operation, ordering semantics). This allows
> programmers to express more precisely what is required in a particular
> implementation, and allows it to be automatically and efficiently implemented
> on a variety of hardware. In particular, it can take advantage of
> something like the (rather elegant and efficient, in my opinion) acquire-load
> and release-store instructions on Itanium.
>

Sort of like some of the JSR-166 stuff and the apocryphal atomic<T> and atomic<*T>
stuff proposed in c.p.t. acquire/release are useful but the Itanium memory model
isn't generally applicable elsewhere. In c.p.t. we informally define them as equivalent
to the effects of lock and unlock respectively on memory visibility. It's
informal since there can be a variation on how you define them w.r.t. things such
as immutability, dependent loads, etc...

Joe Seigh

Ben Hutchings

unread,
Sep 1, 2004, 5:31:48 PM9/1/04
to
llewelly wrote:
> Hans_...@hp.com (Hans-J. Boehm) writes:
> [snip]
> > 1) There is no specification (C, C++, or pthreads) that prohibits
> > extra writes to memory that may introduce races, not visible in the
> > source. Compilers regularly introduce such writes, and sometimes
> > they have to. As an example, consider:
> >
> > struct { short a; char b; char c };
> >
> > where the programmer intended a and c to be protected by one lock,
> > and b by another. If I write
> >
> > z.a = 17; x.c = 'a';
> >
> > the compiler may choose to read x.b, assemble the word representing
> > the resulting structure in a register, and write the resulting (32-bit?)
> > quantity back to x with a single store instruction. But that may
> > lose a concurrent update to x.b.
<snip>
> Here are some notions:
>
> (a) Provide a qualifier which specifies that an object so
> qualified is not allowed to be affected by such actions:
<snip>

Like "volatile" I think this would be more trouble than it's worth.

> (b) Require that objects with an alignment at least as strict as int
> shall not be read or written to as a result of reads or writes to
> other objects. The goal is to prohibit extraneous reads or writes
> for objects large enough to allow individual writes.

<snip>

Reading bits of other objects only to discard them does not seem to
affect correctness - though it certainly can affect speed - and must
be expected since cache lines are typically 16-64 bytes long. Only
writing seems to be problematic.

I think that the requirement in (b) should apply to some type alias
like thread_atomic_t (by analogy with sig_atomic_t) and not to a
specific standard type.

Another useful guarantee would be that modification of two objects
need not be synchronised so long as their complete objects (1.8/3) are
distinct. This would only require some padding between small static
objects and any small automatic objects that may become accessible to
another thread.

> Both (a) and (b) require that the standard define what 'reads' and
> 'writes' are, and what operations result in 'reads' and/or
> 'writes' to what objects.

<snip>

I think it's fairly simple enough to define when an object is written:
when a built-in assignment operation is performed on the object or a
sub-object, and when the object is constructed or destroyed if it is
of class type. (Though the meaning of "when" here is dependent on
the definition of sequencing of operations.)

The hard part is defining when memory is or may be written.

--
Ben Hutchings
Any smoothly functioning technology is indistinguishable from a rigged demo.

Hans-J. Boehm

unread,
Sep 1, 2004, 5:44:21 PM9/1/04
to
llewelly <llewe...@xmission.dot.com> wrote in message news:<86u0uio...@Zorthluthik.local.bar>...

> Hans_...@hp.com (Hans-J. Boehm) writes:
> [snip]
> > 1) There is no specification (C, C++, or pthreads) that prohibits
> > extra writes to memory that may introduce races, not visible in the
> > source. Compilers regularly introduce such writes, and sometimes
> > they have to. As an example, consider:
> >
> > struct { short a; char b; char c };
> >
> > where the programmer intended a and c to be protected by one lock,
> > and b by another. If I write
> >
> > z.a = 17; x.c = 'a';
> >
>
> [snip]
>
> Here are some notions:
>
> (a) Provide a qualifier which specifies that an object so
> qualified is not allowed to be affected by such actions:
>
> thread_shared: If an object is declared 'thread_shared', reads
> and writes to other objects shall not result in reads or writes
> to any thread_shared object. [Note: On some architectures,
> padding will be required for thread_shared objects smaller than
> the architecture's word size. ] A bitfield cannot be
> thread_shared, unless it is a member of a compound object which
> is thread_shared . It seems to me 'thread_shared' would most
> naturally be a cv-qualifier, but I am far from certain that is
> the right approach. The goal is to prohibit extraneous reads and
> writes where the programmer specifies it be prohibited, and
> no-where else.
>
> (b) Require that objects with an alignment at least as strict as int
> shall not be read or written to as a result of reads or writes to
> other objects. The goal is to prohibit extraneous reads or writes
> for objects large enough to allow individual writes.
>
Clearly we need something along these lines. My strawman proposal
would be along the lines of simply:

"Writes to bit fields may read and rewrite adjacent bit-fields. No
other writes may occur unless they are specified in the source."

For provably single-threaded code, this would have no effect, since
the compiler can't get caught if it cheats. In practice, this may
imply a "I promise this is single-threaded" compiler flag, which could
take the place of the current "multithreaded" flag found on some
compilers.

This rule would introduce lots of spurious padding for multiprocessor
architectures not supporting byte writes. This was true for some very
early Alphas. Are there any modern machines for which it is still an
issue?

(If a uniprocessor uses a preemptive thread implementation and doesn't
support byte-width writes, you could work around the problem by
inhibiting preemption between the extra read and rewrite of the
adjacent field. This is extra work for the implementor, but probably
much less extra work than you would otherwise get from the
intermittent application failures due to client bugs.)

I think there are two kinds of problems with the thread_shared
approach:

1) It would have to apply to fields that are shared between threads,
but are protected by a lock. In practice I think it is critical that
you be able to take a single-threaded library, wrap locks around all
the calls, and continue to use it in a multithreaded context, since
you are now assured that at most one thread will be in the library at
a time. And that generally works at the moment. A "thread_shared"
attribute would force modification of the library code.

2) It would have to be applied to fields that are in fact not shared
between threads. In my example above,the same problem applies if a
and c are protected by a lock, and b is accessed by a single thread.
And in this case it's really b that you want to protect.

Your proposal (b) would be acceptable to me, if there is a strong
reason (e.g. an architecture that's likely to be of tremendous
practical importance around 3 years from now) that can handle int- but
not char-width stores. I personally think that's unlikely, since
other practically important languages (Java, and I think C#) won't be
efficiently implementable on such multiprocessors. (Embedded
multi-core processors may be an issue. But I think ARM and most other
major architectures already support byte stores, at least at the
macro-architecture level.)

I agree that my strawman proposal may seem a bit drastic, in that it
effectively imposes an architectural requirement on the hardware. But
I think we've decided that the future lies with multiprocessors and
multithreading. And we all believe that reliable code is important.
I don't think the combination is possible without some real guarantees
in this area.

Hans

Stefan Heinzmann

unread,
Sep 2, 2004, 9:26:01 PM9/2/04
to
Hans-J. Boehm wrote:

> Clearly we need something along these lines. My strawman proposal
> would be along the lines of simply:
>
> "Writes to bit fields may read and rewrite adjacent bit-fields. No
> other writes may occur unless they are specified in the source."
>
> For provably single-threaded code, this would have no effect, since
> the compiler can't get caught if it cheats. In practice, this may
> imply a "I promise this is single-threaded" compiler flag, which could
> take the place of the current "multithreaded" flag found on some
> compilers.
>
> This rule would introduce lots of spurious padding for multiprocessor
> architectures not supporting byte writes. This was true for some very
> early Alphas. Are there any modern machines for which it is still an
> issue?

Yes, I'm currently working with an embedded processor which is only able
to write 16-bit words, i.e. it writes bytes using read-modify-write cycles.

> (If a uniprocessor uses a preemptive thread implementation and doesn't
> support byte-width writes, you could work around the problem by
> inhibiting preemption between the extra read and rewrite of the
> adjacent field. This is extra work for the implementor, but probably
> much less extra work than you would otherwise get from the
> intermittent application failures due to client bugs.)

--
Cheers
Stefan

Gerhard Menzl

unread,
Sep 2, 2004, 9:50:24 PM9/2/04
to
Walter wrote:

> I'm pretty ignorant of the subtleties of multithreaded programming. What
> needs to be done to the core language to have a solid foundation upon which
> to build?

For instance, you have to find a way of defining the behaviour of

struct X
{
X ();
void g ();
};

void f ()
{
static X x;
x.g ();
}

in the presence of multiple threads. The way the initialization of
static locals is currently specified makes it impossible to prevent the
constructor of X reliably from being executed more than once.

--
Gerhard Menzl

Humans may reply by replacing the obviously faked part of my e-mail
address with "kapsch".

Hans-J. Boehm

unread,
Sep 3, 2004, 7:59:52 AM9/3/04
to
David Abrahams <da...@boost-consulting.com> wrote in message news:<uk6vec...@boost-consulting.com>...
> Hans_...@hp.com (Hans-J. Boehm) writes:
[Original program:

for (...) {
if (...) pthread_mutex_lock(...);
x = ... x ...
if (...) pthread_mutex_unlock(...);
}
]

>
> > r = x;
> > for (...) {
> > if (...) { x = r; pthread_mutex_lock(...); r = x; }
> > r = ... r ...
> > if (...) { x = r; pthread_mutex_unlock(...); r = x; }
> > }
> > x = r;
> >
> > The pthread standard requires that memory must be
> > "synchronized with" the logical program state at the
> > pthread_mutex_lock() and pthread_mutex_unlock() calls. By a
> > straightforward interpretation of that statement, I think it's
> > satisfied by the transformation.
>
> The only way I can see to support that interpretation is by
> considering "memory" to include registers, but in the context of MT,
> that seems like a perverse way to read "memory." It seems to me that
> "memory must be synchronized" *has* to mean that the state of all
> variables is visible to all threads and processors. Variables whose
> values are stored in registers (and in a SMP system, processor-local
> caches) must have been written to main (in an SMP system, shared)
> memory.

I think we're misunderstanding each other here. I assert at the
pthread_mutex_lock() and pthread_mutex_unlock() calls, the value of x
is exactly what it would have been without the transformation. The "x
= r" assignments before the calls were inserted specifically to make
sure this was the case. Thus I don't understand in what way these
functions fail to "synchronize memory with respect to other threads",
which is what the standard requires. My view of memory does not
include registers.

As Joe points out, most compilers in fact seem to just treat
pthread_mutex_lock() and friends as calls to external functions, which
they know nothing about. Thus the state in global variables
automatically has to be correct ("synchronized"?) at the point of the
call. As this example points out, that isn't good enough.

I encountered this example in a production compiler which was
subsequently "fixed". I was told that at least one research compiler
still has this problem. I expect most other production compilers were
also "fixed" when someone discovered this, or they still suffer from
the problem.

I am not convinced that any of these "fixes" are correct; they do
avoid the failures in common cases like the above one. Without a real
spec, it is not even clear what it would mean to be "correct".
Completely avoiding this problem would require (almost?) never
promoting globals to registers, and then restoring and reloading them
around calls to unknown functions, since those functions might
eventually acquire locks. And that unfortunately is probably an
optimization that buys you at least a percent or two on spec. (I'm
guessing here, but promoting globals to registers is clearly
important, and many loops contain external calls to handle error
conditions. It may even be significantly more.)

Hans

Hans-J. Boehm

unread,
Sep 3, 2004, 8:09:13 AM9/3/04
to
Alexander Terekhov <tere...@web.de> wrote in message news:<4135A3B5...@web.de>...

> "Hans-J. Boehm" wrote:
>
> [... memory isolation ...]
>
> You might want to take a look at this thread:
>
> http://google.com/groups?threadm=EN7u9.8900%24Af5.345070%40newsfep2-win.server.ntli.net
> (Subject: Re: Memory isolation)
>
> Here's copy&paste.
>
> ----
Thanks very much for the pointer. I was aware that this issue had
been considered previously in the contex of the pthreads standard, but
I wasn't aware of this discussion.

It looks like we actually have a reasonable amount of agreement on
this issue, at least to the extent that there should be some
"isolation" guarantee.

Some conclusions/observations after reading this thread:

1) It is not safe to rely on ordering of declarations here. As was
pointed out, this gives you no guarantees. Even more importantly,
objects are not necessarily layed out in memory in the order in which
they are declared. For examples, on many architectures "small"
objects are allocated in a separate region for faster access. (I
vaguely recall that one or two aggressive compilers do this for struct
fields with the "I'm compiling SPECbench" compiler option(s), though
that's a lot harder.)

2) I think our concern should be primarily with correctness and not
the false sharing issues in the cache. False sharing is clearly
important for performance, but I think it's much harder to address in
a completely portable way. There will always be some need for
performance tuning on particular platforms using non-portable
techniques.

3) I think we need to avoid any technique that requires a special
declaration to guarantee isolation between either static variables or
dynamically allocated objects, since either of those would prevent
wrapping a legacy single-threaded library inside a lock, something
which I think has to work.

4) It still seems to me that requiring isolation everywhere except
between adjacent bit fields in the same struct or class is probably
the best solution. I couldn't find any reference to any architecture
except Alpha for which this might be problematic. And byte-width
memory operations were introduced into the Alpha architecture quite a
while ago. It does cost a tiny bit of performance in a few cases
(which one could recover by explicit bit-field use). But it has the
substantial advantage that it makes esssentially all existing code
correct in this respect. And it (along with the one of the other
proposals) requires no new keywords. And I think it's always
preferable to default to safety. Allowing the programmer to shoot
him/herself in the foot may be OK; doing it automatically is not so
good.

Hans

llewelly

unread,
Sep 3, 2004, 9:32:53 PM9/3/04
to
Hans_...@hp.com (Hans-J. Boehm) writes:
[snip]
> Your proposal (b) would be acceptable to me, if there is a strong
> reason (e.g. an architecture that's likely to be of tremendous
> practical importance around 3 years from now) that can handle int- but
> not char-width stores.

I looked into DSPs some time back, and it seemed a majority of them
had this property. I don't know if threads would be appropriate
for any of them, however.

> I personally think that's unlikely, since other practically
> important languages (Java, and I think C#) won't be efficiently
> implementable on such multiprocessors.

Notably, people targetting the above mentioned DSPs do not use Java
or C#, but isntead use C, C++ or some subset of C++, in part for
reasons like that.

[snip]

> I agree that my strawman proposal may seem a bit drastic, in that it
> effectively imposes an architectural requirement on the hardware.

But it's easy to make the requirement a lot less drastic.

> But
> I think we've decided that the future lies with multiprocessors and
> multithreading. And we all believe that reliable code is important.
> I don't think the combination is possible without some real guarantees
> in this area.

Agreed.

Sergei Organov

unread,
Sep 3, 2004, 9:54:51 PM9/3/04
to
Hans_...@hp.com (Hans-J. Boehm) writes:
[...]

> for (...) {
> if (...) pthread_mutex_lock(...);
> x = ... x ...
> if (...) pthread_mutex_unlock(...);
> }

A compiler effectively replaces with:

> r = x;
> for (...) {
> if (...) { x = r; pthread_mutex_lock(...); r = x; }
> r = ... r ...
> if (...) { x = r; pthread_mutex_unlock(...); r = x; }
> }
> x = r;

[...]


> (This was extracted from a real failure case. Unfortunately,
> fixing it imposes a real performance penalty, since you can't in
> general tell whether a function call will have lock() or unlock()
> semantics.)

Well, I'm aware of the claims that one never needs 'volatile' if locking
is used correctly, but isn't the above an example where decalring 'x'
volatile would "fix" the "broken" compiler?

--
Sergei.

Joe Seigh

unread,
Sep 3, 2004, 10:02:06 PM9/3/04
to

"Hans-J. Boehm" wrote:
>
> David Abrahams <da...@boost-consulting.com> wrote in message news:<uk6vec...@boost-consulting.com>...
> > Hans_...@hp.com (Hans-J. Boehm) writes:
> > The only way I can see to support that interpretation is by
> > considering "memory" to include registers, but in the context of MT,
> > that seems like a perverse way to read "memory." It seems to me that
> > "memory must be synchronized" *has* to mean that the state of all
> > variables is visible to all threads and processors. Variables whose
> > values are stored in registers (and in a SMP system, processor-local
> > caches) must have been written to main (in an SMP system, shared)
> > memory.
>
> I think we're misunderstanding each other here. I assert at the
> pthread_mutex_lock() and pthread_mutex_unlock() calls, the value of x
> is exactly what it would have been without the transformation. The "x
> = r" assignments before the calls were inserted specifically to make
> sure this was the case. Thus I don't understand in what way these
> functions fail to "synchronize memory with respect to other threads",
> which is what the standard requires. My view of memory does not
> include registers.
>
> As Joe points out, most compilers in fact seem to just treat
> pthread_mutex_lock() and friends as calls to external functions, which
> they know nothing about. Thus the state in global variables
> automatically has to be correct ("synchronized"?) at the point of the
> call. As this example points out, that isn't good enough.

Locks are not actually required to "synchronize" memory. All that is required
is that you get a consistent view of shared memory when using locks. Locked
regions can be merged together with statements outside the locked region moved
into the locked region. So you can't make any assumptions about what may or
may not occur when a lock or unlock is performed. Lock and unlock actions aren't
directly observable.

While strongly coherent cache makes stores visible to other processors fairly
quickly, not all systems have that, ccNUMA vs. NUMA. Also the optimization
by the compiler by keeping values in registers is caching of a sort. As long
as these follow the unwritten rules of locking, they're not a bug per se. But
it does mess things up if you are assuming the stores become visible as quickly
as if they were directly stored into shared memory.

The atomic api's "fix" this by causing the compiler to drop optimization which
then makes stores visible at the speed of the shared memory. But this is
really a side effect of the implementation based on current compiler and
hardware technology. You could get the same effect by putting calls to
any external function in the loop as long as you didn't care about the
atomicity and ordering of the stores.

What are missing and are really needed, are forward progress mechanisms for
threading. I think these should be tied to signaling and things like volatile
rather than locking to allow more flexibility in optimization. The atomic api's
may or may not be one of things you want to attach forward progress to. It
depends on how you intend to use them.

Joe Seigh

Ben Hutchings

unread,
Sep 3, 2004, 10:21:02 PM9/3/04
to
Stefan Heinzmann wrote:
> Hans-J. Boehm wrote:
>
>> Clearly we need something along these lines. My strawman proposal
>> would be along the lines of simply:
>>
>> "Writes to bit fields may read and rewrite adjacent bit-fields. No
>> other writes may occur unless they are specified in the source."
>>
>> For provably single-threaded code, this would have no effect, since
>> the compiler can't get caught if it cheats. In practice, this may
>> imply a "I promise this is single-threaded" compiler flag, which could
>> take the place of the current "multithreaded" flag found on some
>> compilers.
>>
>> This rule would introduce lots of spurious padding for multiprocessor
>> architectures not supporting byte writes.

This rule would even affect POD structures in OS APIs, requiring a
change to both C and C++ ABIs on such architectures. That's just too
disruptive to be acceptable to users or implementers. There has to be
a less drastic alternative.

>> This was true for some very
>> early Alphas. Are there any modern machines for which it is still an
>> issue?
>
> Yes, I'm currently working with an embedded processor which is only able
> to write 16-bit words, i.e. it writes bytes using read-modify-write cycles.

If you mean exactly what you say, that's not a problem. The problem
is with processors which may have to read and write memory adjacent to
the object being assigned to and where the read and write are separate
memory accesses.

--
Ben Hutchings
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'

Rob Williscroft

unread,
Sep 5, 2004, 7:41:03 AM9/5/04
to
Ben Hutchings wrote in
news:slrncjhgeb.83h.b...@shadbolt.i.decadentplace.org.uk in
comp.lang.c++.moderated:

>> Yes, I'm currently working with an embedded processor which is only
>> able to write 16-bit words, i.e. it writes bytes using
>> read-modify-write cycles.
>
> If you mean exactly what you say, that's not a problem.

Did you miss the implication: "to write 16-bit words", that a
"byte" is 8 bits ?

If not:

struct X
{
char data[2];
mutex_t mutex[2];
};

The problem exists above if *logically* data[ n ] is guarded by
mutex[ n ] (0 <= n < 2).

The problem also exists if the compiler doesn't put a padding byte
between two adjacent char's { char a, b; }, and they are not
syncronized as a unit.

> ... The problem

char is 8 bits's it need's to rewrite data[1] when it updates data[0],
but the programme hasn't locked mutex[1], so mutex[1] doesn't (can't)
guard data[1] (also true for data/mutex[0]).

> is with processors which may have to read and write memory adjacent to
> the object being assigned to and where the read and write are separate
> memory accesses.
>

I'm not sure object level access is even the problem.

Imagine an MP system that only marks blocks of 128 bits (say) as
writen too in its cache's, if a compiler has CHAR_BITS < 128, then
we have a problem even if the CPU's allow byte (CHAR_BITS) level
addressing.

Rob.
--
http://www.victim-prime.dsl.pipex.com/

Joe Seigh

unread,
Sep 5, 2004, 10:05:21 PM9/5/04
to

Rob Williscroft wrote:
>
> Ben Hutchings wrote in
> news:slrncjhgeb.83h.b...@shadbolt.i.decadentplace.org.uk in
> comp.lang.c++.moderated:

> > is with processors which may have to read and write memory adjacent to
> > the object being assigned to and where the read and write are separate
> > memory accesses.
> >
>
> I'm not sure object level access is even the problem.
>
> Imagine an MP system that only marks blocks of 128 bits (say) as
> writen too in its cache's, if a compiler has CHAR_BITS < 128, then
> we have a problem even if the CPU's allow byte (CHAR_BITS) level
> addressing.

Coherent cache addresses and fixes that problem, though you may get
what is called false sharing which is a performance problem by
allocating shared objects in the same cache line and causing cache
pingponging when separate threads update what they believe are
separate objects. The problem can partly be solved by how shared
objects are allocated from memory.

Joe Seigh

Ben Hutchings

unread,
Sep 6, 2004, 7:19:16 AM9/6/04
to
Rob Williscroft wrote:
> Ben Hutchings wrote in
> news:slrncjhgeb.83h.b...@shadbolt.i.decadentplace.org.uk in
> comp.lang.c++.moderated:
>
> >> Yes, I'm currently working with an embedded processor which is only
> >> able to write 16-bit words, i.e. it writes bytes using
> >> read-modify-write cycles.
> >
> > If you mean exactly what you say, that's not a problem.
>
> Did you miss the implication: "to write 16-bit words", that a
> "byte" is 8 bits ?

I assumed that to be the case.

However, I understand "read-modify-write cycle" to mean a single
(therefore atomic) memory transaction. If the read and write are
separate operations and another memory operation could intervene then
I think the use of the word "cycle" is incorrect.

> If not:
>
> struct X
> {
> char data[2];
> mutex_t mutex[2];
> };
>
> The problem exists above if *logically* data[ n ] is guarded by
> mutex[ n ] (0 <= n < 2).
>
> The problem also exists if the compiler doesn't put a padding byte
> between two adjacent char's { char a, b; }, and they are not
> syncronized as a unit.
>
> > ... The problem
>
> char is 8 bits's it need's to rewrite data[1] when it updates data[0],
> but the programme hasn't locked mutex[1], so mutex[1] doesn't (can't)
> guard data[1] (also true for data/mutex[0]).

Given that data[0] and data[1] are updated with a read-modify-write
cycle as opposed to separate read and write operations, an update to
one cannot undo an update to another.

> > is with processors which may have to read and write memory adjacent to
> > the object being assigned to and where the read and write are separate
> > memory accesses.
> >
>
> I'm not sure object level access is even the problem.
>
> Imagine an MP system that only marks blocks of 128 bits (say) as
> writen too in its cache's, if a compiler has CHAR_BITS < 128, then
> we have a problem even if the CPU's allow byte (CHAR_BITS) level
> addressing.

I don't see the problem. When a processor needs to modify some part
of memory it requests an exclusive copy of the latest version of the
line. The request may be satisfied by another processor, a lower
level of cache, or main memory. Every other cached copy is marked
invalid. So if two processors are trying to write to memory addresses
corresponding to a single cache line, they will make very slow
progress but they will not lose updates - assuming, of course, that
the writes are atomic. There may well be some MP systems that don't
support this kind of cache coherency, but I think they would simply
be unsuitable for multithreading.

--
Ben Hutchings
Time is nature's way of making sure that everything doesn't happen at once.

Rob Williscroft

unread,
Sep 6, 2004, 4:28:33 PM9/6/04
to
Ben Hutchings wrote in
news:slrncjm6ji.6k1.b...@decadentplace.org.uk in
comp.lang.c++.moderated:

> Rob Williscroft wrote:
> > Ben Hutchings wrote in
> > news:slrncjhgeb.83h.b...@shadbolt.i.decadentplace.org.uk
> > in comp.lang.c++.moderated:
> >
> > >> Yes, I'm currently working with an embedded processor which is
> > >> only able to write 16-bit words, i.e. it writes bytes using
> > >> read-modify-write cycles.
> > >
> > > If you mean exactly what you say, that's not a problem.
> >
> > Did you miss the implication: "to write 16-bit words", that a
> > "byte" is 8 bits ?
>
> I assumed that to be the case.
>
> However, I understand "read-modify-write cycle" to mean a single
> (therefore atomic) memory transaction. If the read and write are
> separate operations and another memory operation could intervene then
> I think the use of the word "cycle" is incorrect.
>

Agreed though from a software POV "instruction" might be better.

I did however read an implication that the compiler was applying
these "cycles", so I probably did a mental substitution
s/cycles/multiple instuctions/ you didn't.

> > If not:
> >
> > struct X
> > {
> > char data[2];
> > mutex_t mutex[2];
> > };
> >
> > The problem exists above if *logically* data[ n ] is guarded by
> > mutex[ n ] (0 <= n < 2).
> >
> > The problem also exists if the compiler doesn't put a padding byte
> > between two adjacent char's { char a, b; }, and they are not
> > syncronized as a unit.
> >
> > > ... The problem
> >
> > char is 8 bits's it need's to rewrite data[1] when it updates
> > data[0], but the programme hasn't locked mutex[1], so mutex[1]
> > doesn't (can't) guard data[1] (also true for data/mutex[0]).
>
> Given that data[0] and data[1] are updated with a read-modify-write
> cycle as opposed to separate read and write operations, an update to
> one cannot undo an update to another.

That is in effect saying the CPU allows thread safe writes to bytes
(8-bit) objects, the problem is it doesn't, only to 16-bit object's.
However the compiler writer has said (correctly IMO) that a char
is 8-bits.

>
> > > is with processors which may have to read and write memory
> > > adjacent to the object being assigned to and where the read and
> > > write are separate memory accesses.
> > >
> >
> > I'm not sure object level access is even the problem.
> >
> > Imagine an MP system that only marks blocks of 128 bits (say) as
> > writen too in its cache's, if a compiler has CHAR_BITS < 128, then
> > we have a problem even if the CPU's allow byte (CHAR_BITS) level
> > addressing.
>
> I don't see the problem. When a processor needs to modify some part
> of memory it requests an exclusive copy of the latest version of the
> line.

All problem's can be solved by an extra layer of indirection.

It can also be solved by having a "line" be CHAR_BITS wide.
Or by having CHAR_BITS be 128.

> The request may be satisfied by another processor, a lower
> level of cache, or main memory. Every other cached copy is marked
> invalid. So if two processors are trying to write to memory addresses
> corresponding to a single cache line, they will make very slow
> progress but they will not lose updates - assuming, of course, that
> the writes are atomic.

A lot of assumption's for a programmer with a language Standard
that doesen't even mention threading to make.

> There may well be some MP systems that don't
> support this kind of cache coherency, but I think they would simply
> be unsuitable for multithreading.
>

Clearly not for java say, but for a future MP C++ standard, I don't see
why not, the only issue is how to tell the compiler what objects needs
to be shared:

shared int i;

struct X { char a, b; };

shared X x;

struct Y
{
int a;
shared char a; /* also implies alignment and size for Y */
};

Also have new and new [] return sutably aligned/sized memory.

Clearly executables would only work with a particular cache
line size, but that would be an implementation (and QOI) detail.

For systems that do have cache coherency then the above would
have no overhead, more QOI.

Rob.
--
http://www.victim-prime.dsl.pipex.com/

Rob Williscroft

unread,
Sep 6, 2004, 4:33:03 PM9/6/04
to
Joe Seigh wrote in news:413B01D3...@xemaps.com in
comp.lang.c++.moderated:

>
>
> Rob Williscroft wrote:
>>
>> Ben Hutchings wrote in
>> news:slrncjhgeb.83h.b...@shadbolt.i.decadentplace.org.uk
>> in comp.lang.c++.moderated:
>> > is with processors which may have to read and write memory adjacent
>> > to the object being assigned to and where the read and write are
>> > separate memory accesses.>> >
>>
>> I'm not sure object level access is even the problem.
>>
>> Imagine an MP system that only marks blocks of 128 bits (say) as
>> writen too in its cache's, if a compiler has CHAR_BITS < 128, then
>> we have a problem even if the CPU's allow byte (CHAR_BITS) level
>> addressing.
>
> Coherent cache addresses and fixes that problem, though you may get
> what is called false sharing which is a performance problem by
> allocating shared objects in the same cache line and causing cache
> pingponging when separate threads update what they believe are
> separate objects. The problem can partly be solved by how shared
> objects are allocated from memory.
>

Is that the *only* way to build a MP system ?, this strikes me
as MP hardware solving a problem created by single processor
software, or maybe compiler/languages that are really only
designed to create single processor software. IOW I can see
why its done this way now, but I don't see that it has to be
done this way.

Is there a standard for such a system ? (I suspect not since
java has incorporated its own memory model - but maybe thats
a different thing).

I'm just thinking that maybe in the future after scaling to
a million plus processors, the cache managment hardware might
be producing more heat than the CPU's themselves, and we will
be tempted to abandon that model, it worked for RISC :-).

Rob.
--
http://www.victim-prime.dsl.pipex.com/

Sean Kelly

unread,
Sep 7, 2004, 6:47:27 AM9/7/04
to
Ben Hutchings <ben-publ...@decadentplace.org.uk> wrote in message news:<slrncjm6ji.6k1.b...@decadentplace.org.uk>...

> Rob Williscroft wrote:
>
> I don't see the problem. When a processor needs to modify some part
> of memory it requests an exclusive copy of the latest version of the
> line. The request may be satisfied by another processor, a lower
> level of cache, or main memory. Every other cached copy is marked
> invalid. So if two processors are trying to write to memory addresses
> corresponding to a single cache line, they will make very slow
> progress but they will not lose updates - assuming, of course, that
> the writes are atomic. There may well be some MP systems that don't
> support this kind of cache coherency, but I think they would simply
> be unsuitable for multithreading.

Exactly. Are there any modern processors that don't support CAS?


Sean

Joe Seigh

unread,
Sep 7, 2004, 5:59:02 PM9/7/04
to

Rob Williscroft wrote:
>
> Joe Seigh wrote in news:413B01D3...@xemaps.com in
> comp.lang.c++.moderated:

> > Coherent cache addresses and fixes that problem, though you may get
> > what is called false sharing which is a performance problem by
> > allocating shared objects in the same cache line and causing cache
> > pingponging when separate threads update what they believe are
> > separate objects. The problem can partly be solved by how shared
> > objects are allocated from memory.
> >
>
> Is that the *only* way to build a MP system ?, this strikes me
> as MP hardware solving a problem created by single processor
> software, or maybe compiler/languages that are really only
> designed to create single processor software. IOW I can see
> why its done this way now, but I don't see that it has to be
> done this way.
>
> Is there a standard for such a system ? (I suspect not since
> java has incorporated its own memory model - but maybe thats
> a different thing).
>
> I'm just thinking that maybe in the future after scaling to
> a million plus processors, the cache managment hardware might
> be producing more heat than the CPU's themselves, and we will
> be tempted to abandon that model, it worked for RISC :-).
>

If and how they implement cache is a decision for the hardware
designers. Whatever they come up with is an issue only for
the implementers of the threading api, not the users of the
api. There's also the possibility of removing the function
from hardware and doing it in software ala Tanenbaum's views
on shared virtual objects (vs. shared virtual memory) (and
which are not the same things as remote objects).

But if we get up to 1000's of processors, the applications won't
be aware of it or use it until we come up with mechanisms to reduce
contention. The present synchronization api's don't really work
well up in that range of concurrency.

Joe Seigh

Ben Hutchings

unread,
Sep 7, 2004, 6:17:29 PM9/7/04
to
Rob Williscroft wrote:
> Ben Hutchings wrote in
> news:slrncjm6ji.6k1.b...@decadentplace.org.uk in
> comp.lang.c++.moderated:
>
>> Rob Williscroft wrote:
>> > Ben Hutchings wrote in
>> > news:slrncjhgeb.83h.b...@shadbolt.i.decadentplace.org.uk
>> > in comp.lang.c++.moderated:
<snip>

>> > > is with processors which may have to read and write memory
>> > > adjacent to the object being assigned to and where the read and
>> > > write are separate memory accesses.
>> > >
>> >
>> > I'm not sure object level access is even the problem.
>> >
>> > Imagine an MP system that only marks blocks of 128 bits (say) as
>> > writen too in its cache's, if a compiler has CHAR_BITS < 128, then
>> > we have a problem even if the CPU's allow byte (CHAR_BITS) level
>> > addressing.
>>
>> I don't see the problem. When a processor needs to modify some part
>> of memory it requests an exclusive copy of the latest version of the
>> line.
>
> All problem's can be solved by an extra layer of indirection.
>
> It can also be solved by having a "line" be CHAR_BITS wide.
> Or by having CHAR_BITS be 128.

Since cache line size can vary between members of the same
architecture, and is now commonly 256 or 512 bits, this really isn't
going to work very well. ;-)

>> The request may be satisfied by another processor, a lower
>> level of cache, or main memory. Every other cached copy is marked
>> invalid. So if two processors are trying to write to memory addresses
>> corresponding to a single cache line, they will make very slow
>> progress but they will not lose updates - assuming, of course, that
>> the writes are atomic.
>
> A lot of assumption's for a programmer with a language Standard
> that doesen't even mention threading to make.

I'm not saying that anyone should make assumptions like this about
today's C++ implementations - I'm stating how I understand today's
computers work and what I think can reasonably be required by a future
standard that does mention threading.

>> There may well be some MP systems that don't
>> support this kind of cache coherency, but I think they would simply
>> be unsuitable for multithreading.
>>
>
> Clearly not for java say, but for a future MP C++ standard, I don't see
> why not, the only issue is how to tell the compiler what objects needs
> to be shared:
>
> shared int i;
>
> struct X { char a, b; };
>
> shared X x;
>
> struct Y
> {
> int a;
> shared char a; /* also implies alignment and size for Y */
> };
>
> Also have new and new [] return sutably aligned/sized memory.
>
> Clearly executables would only work with a particular cache
> line size, but that would be an implementation (and QOI) detail.
>
> For systems that do have cache coherency then the above would
> have no overhead, more QOI.

The introduction of a new keyword is not "no overhead", but I accept
that this might work.

--
Ben Hutchings
Life is what happens to you while you're busy making other plans.
- John Lennon

Jim Melton

unread,
Sep 7, 2004, 6:38:28 PM9/7/04
to
"Rob Williscroft" <r...@freenet.co.uk> wrote in message
news:Xns955CA3798D9D9uk...@130.133.1.4...

> I'm just thinking that maybe in the future after scaling to
> a million plus processors, the cache managment hardware might
> be producing more heat than the CPU's themselves, and we will
> be tempted to abandon that model, it worked for RISC :-).


Straying a bit off-charter, but for massively parallel systems, cache is
localized to CPUs in clusters. That is, even if the memory model does not
expose "near" and "far" addressing, the underlying OS implements. I know
that SGI had a strategy whereby it would migrate running tasks to a single
processor cluster if there was a lot of sharing between them.

Universal memory models don't scale well.
--
<disclaimer>
Opinions posted are those of the author.
My company doesn't pay me enough to speak for them.
</disclaimer>
--
Jim Melton
Software Architect, Fusion Programs
Lockheed Martin IS&S
(303) 971-3846

Hans-J. Boehm

unread,
Sep 8, 2004, 4:22:44 AM9/8/04
to
se...@f4.ca (Sean Kelly) wrote in message news:<c628f43d.04090...@posting.google.com>...

> Ben Hutchings <ben-publ...@decadentplace.org.uk> wrote in message news:<slrncjm6ji.6k1.b...@decadentplace.org.uk>...
> > Rob Williscroft wrote:
> >
> > I don't see the problem. When a processor needs to modify some part
> > of memory it requests an exclusive copy of the latest version of the
> > line. The request may be satisfied by another processor, a lower
> > level of cache, or main memory. Every other cached copy is marked
> > invalid. So if two processors are trying to write to memory addresses
> > corresponding to a single cache line, they will make very slow
> > progress but they will not lose updates - assuming, of course, that
> > the writes are atomic. There may well be some MP systems that don't
> > support this kind of cache coherency, but I think they would simply
> > be unsuitable for multithreading.
>
> Exactly. Are there any modern processors that don't support CAS?
>
Historically, there have been many multiprocessors that were
cache-coherent,
but did not support CAS. I don't think the two are related.

PA-RISC does not have hardware CAS, though I'm not sure that a
software emulation would necessarily be slower than on many machines
that do. There are probably other architectures that don't either.
That's certainly something that needs to be supported to some extent.

I'm not sure that non-cache-coherent machines need to be supported,
except to the extent that the absence of threads needs to be
supported. I think most such machines are programmed differently,
with an explicit distinction between local and global memory. I think
they're also fairly rare. But I'm not at all an expert on these.

Hans

Sean Kelly

unread,
Sep 9, 2004, 5:59:29 AM9/9/04
to
Hans_...@hp.com (Hans-J. Boehm) wrote in message news:<1178a29f.04090...@posting.google.com>...

>
> Historically, there have been many multiprocessors that were
> cache-coherent,
> but did not support CAS. I don't think the two are related.

True enough. I should have asked more generally if there are any
multiprocessor machines don't support some method for addressing the
word/byte update issue that was being discussed. I can see how this
would be done with CAS but wasn't sure if there were affected
architectures where this may not be possible. And I suppose a more
applicable question is, as you've mentioned below, whether a typical
threading model actually applies to these machines anyway.

> I'm not sure that non-cache-coherent machines need to be supported,
> except to the extent that the absence of threads needs to be
> supported. I think most such machines are programmed differently,
> with an explicit distinction between local and global memory. I think
> they're also fairly rare. But I'm not at all an expert on these.

How are i/o library features handled now on architectures where the
model doesn't fit? There must be some compromise between the standard
and compiler implementations in some cases.


Sean

Rob Williscroft

unread,
Sep 9, 2004, 6:13:50 AM9/9/04
to
Ben Hutchings wrote in news:slrncjrlr4.6k1.ben-public-
nos...@decadentplace.org.uk in comp.lang.c++.moderated:

>> All problem's can be solved by an extra layer of indirection.
>>
>> It can also be solved by having a "line" be CHAR_BITS wide.
>> Or by having CHAR_BITS be 128.
>
> Since cache line size can vary between members of the same
> architecture, and is now commonly 256 or 512 bits, this really isn't
> going to work very well. ;-)

Hehe, still such an architecture can solve *its* problem using
cache coherency, this doesn't meal *all* architectures need to
implement such a thing, RISC worked 'cause it abandoned all
previous notion's of "The One True Way(tm)".

>
>>> The request may be satisfied by another processor, a lower
>>> level of cache, or main memory. Every other cached copy is marked
>>> invalid. So if two processors are trying to write to memory addresses
>>> corresponding to a single cache line, they will make very slow
>>> progress but they will not lose updates - assuming, of course, that
>>> the writes are atomic.
>>
>> A lot of assumption's for a programmer with a language Standard
>> that doesen't even mention threading to make.
>
> I'm not saying that anyone should make assumptions like this about
> today's C++ implementations - I'm stating how I understand today's
> computers work and what I think can reasonably be required by a future
> standard that does mention threading.

Ok, I'm saying we should revisit all of these assumption's.

Currently all/most MP hardware support some form of sophisticated memory
managment harware (cache coherency), the assumption is that this is
*nessacery* for MP multi-theading, but perhaps its only nessacery
because the hardware needs to run code writen on non-thread aware
languages/compilers (all that is currently available).

Certainly a MP/MT C++ Standard should support such hardware, but
should it rerstrict itself to it ?

>
>>> There may well be some MP systems that don't
>>> support this kind of cache coherency, but I think they would simply
>>> be unsuitable for multithreading.
>>>
>>
>> Clearly not for java say, but for a future MP C++ standard, I don't see
>> why not, the only issue is how to tell the compiler what objects needs
>> to be shared:
>>
>> shared int i;
>>
>> struct X { char a, b; };
>>
>> shared X x;
>>
>> struct Y
>> {
>> int a;
>> shared char a; /* also implies alignment and size for Y */
>> };
>>
>> Also have new and new [] return sutably aligned/sized memory.
>>
>> Clearly executables would only work with a particular cache
>> line size, but that would be an implementation (and QOI) detail.
>>
>> For systems that do have cache coherency then the above would
>> have no overhead, more QOI.
>
> The introduction of a new keyword is not "no overhead", but I accept
> that this might work.
>

How about:

namespace std
{
template < typename T >
struct shared
{
/* Implementation specifics:
- possibly just padding after "item".
- possibly compiler magic.
- possibly nothing at all :).
*/

T item;
}
}

Oh dear, did I just add "an extra layer of indirection" ;-)

Rob.
--
http://www.victim-prime.dsl.pipex.com/

Hans-J. Boehm

unread,
Sep 9, 2004, 9:19:39 PM9/9/04
to
Sergei Organov <o...@javad.ru> wrote in message news:<ch9qtv$ogk$1...@n6.co.ru>...

> Well, I'm aware of the claims that one never needs 'volatile' if locking
> is used correctly, but isn't the above an example where decalring 'x'
> volatile would "fix" the "broken" compiler?

That's probably true in practice. But I don't think we want to
encourage that style. For one thing, having to declare variables
volatile even if they're protected by a lock would again make it
impossible to "wrap" existing single-threaded libraries with a lock.

Hans

ka...@gabi-soft.fr

unread,
Sep 10, 2004, 10:23:41 AM9/10/04
to
Hans_...@hp.com (Hans-J. Boehm) wrote in message
news:<1178a29f.04090...@posting.google.com>...

> For one thing, having to declare variables


> volatile even if they're protected by a lock would again make it
> impossible to "wrap" existing single-threaded libraries with a lock.

But my experience is that this doesn't work anyway. Large, existing
single-threaded libraries use functions like localtime or readdir. At
least under Solaris, such functions cannot be used if another thread is
using the re-entrant versions localtime_r or readdir_r, even if the code
that uses them is protected by a lock.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Andrei Alexandrescu (See Website for Email)

unread,
Sep 11, 2004, 1:01:25 AM9/11/04
to
<ka...@gabi-soft.fr> wrote in message
news:d6652001.0409...@posting.google.com...

> Hans_...@hp.com (Hans-J. Boehm) wrote in message
> news:<1178a29f.04090...@posting.google.com>...
>
>> For one thing, having to declare variables
>> volatile even if they're protected by a lock would again make it
>> impossible to "wrap" existing single-threaded libraries with a lock.
>
> But my experience is that this doesn't work anyway. Large, existing
> single-threaded libraries use functions like localtime or readdir. At
> least under Solaris, such functions cannot be used if another thread is
> using the re-entrant versions localtime_r or readdir_r, even if the code
> that uses them is protected by a lock.

Yah, but the STL does work by wholesale lock-wrapping.

I agree with Hans - another thing is that volatile disables many
optimizations that would be legit inside a locked region.


Andrei

ka...@gabi-soft.fr

unread,
Sep 13, 2004, 4:15:30 PM9/13/04
to
"Andrei Alexandrescu \(See Website for Email\)"
<SeeWebsit...@moderncppdesign.com> wrote in message
news:<2qeakdF...@uni-berlin.de>...

> <ka...@gabi-soft.fr> wrote in message
> news:d6652001.0409...@posting.google.com...
> > Hans_...@hp.com (Hans-J. Boehm) wrote in message
> > news:<1178a29f.04090...@posting.google.com>...

> >> For one thing, having to declare variables
> >> volatile even if they're protected by a lock would again make it
> >> impossible to "wrap" existing single-threaded libraries with a
> >> lock.

> > But my experience is that this doesn't work anyway. Large, existing
> > single-threaded libraries use functions like localtime or readdir.
> > At least under Solaris, such functions cannot be used if another
> > thread is using the re-entrant versions localtime_r or readdir_r,
> > even if the code that uses them is protected by a lock.

> Yah, but the STL does work by wholesale lock-wrapping.

??? The STL, by its very nature, is lock-independant. Or should be --
I guess allocators could be a problem.

I'm not too sure what point you're trying to make. You certainly aren't
suggesting that we should systematically lock all of our accesses to the
STLL, but I don't see what you are suggesting.

> I agree with Hans - another thing is that volatile disables many
> optimizations that would be legit inside a locked region.

I agree with him in regards to volatile. That wasn't my point. I
disagree that just wholesale locking around a major single-threaded
library is going to automatically work. It may work for some libraries,
but unless you know the details of the library, you never know.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Nicola Musatti

unread,
Sep 13, 2004, 4:36:30 PM9/13/04
to
> Hans_...@hp.com (Hans-J. Boehm) wrote in message
> news:<1178a29f.04090...@posting.google.com>...
>
> > For one thing, having to declare variables
> > volatile even if they're protected by a lock would again make it
> > impossible to "wrap" existing single-threaded libraries with a lock.
>
> But my experience is that this doesn't work anyway. Large, existing
> single-threaded libraries use functions like localtime or readdir. At
> least under Solaris, such functions cannot be used if another thread is
> using the re-entrant versions localtime_r or readdir_r, even if the code
> that uses them is protected by a lock.

Personally I understood Hans Boehm's hypothetical library lock to
reside within the library, not in user code.

Cheers,
Nicola Musatti

Hans-J. Boehm

unread,
Sep 13, 2004, 4:49:08 PM9/13/04
to
Joe Seigh <jsei...@xemaps.com> wrote in message news:<41387547...@xemaps.com>...

> Locks are not actually required to "synchronize" memory. All that is required
> is that you get a consistent view of shared memory when using locks. Locked
> regions can be merged together with statements outside the locked region moved
> into the locked region. So you can't make any assumptions about what may or
> may not occur when a lock or unlock is performed. Lock and unlock actions aren't
> directly observable.
>

I think we're still going around in circles. The C and C++ standards
say nothing about locks. I haven't found any statements about memory
visibility in the win32 specs. (There are some statements about it in
the C#/CLI specs, but we're talking about C++.) The Single Unix Spec
V3 does state:

"Applications shall ensure that access to any memory location by more
than one thread of control (threads or processes) is restricted such
that no thread of control can read or modify a memory location while
another thread of control may be modifying it. Such access is
restricted using functions that synchronize thread execution and also
synchronize memory with respect to other threads. The following
functions synchronize memory with respect to other threads:

...
pthread_mutex_lock()
...
pthread_mutex_unlock()
... "

I believe this came directly from earlier versions of the Pthreads
spec.

We agree that you can perform optimizations that invisibly fail to
"synchronize memory". (This becomes an issue if you relax the
restriction at the beginning of the above paragraph, e.g. to support
lock-free algorithms, but that's not the topic here.) But the problem
I've been trying to point out is that if the compiler literally
follows the letter of the above spec, it may still generate code that
visibly diverges from everyone's intuition about how locks should
work. And I claim that's an obstacle to reliable multithreaded code.

Hans

Joe Seigh

unread,
Sep 14, 2004, 8:13:27 AM9/14/04
to

"Hans-J. Boehm" wrote:

> I think we're still going around in circles. The C and C++ standards
> say nothing about locks. I haven't found any statements about memory
> visibility in the win32 specs. (There are some statements about it in
> the C#/CLI specs, but we're talking about C++.) The Single Unix Spec
> V3 does state:
>
> "Applications shall ensure that access to any memory location by more
> than one thread of control (threads or processes) is restricted such
> that no thread of control can read or modify a memory location while
> another thread of control may be modifying it. Such access is
> restricted using functions that synchronize thread execution and also
> synchronize memory with respect to other threads. The following
> functions synchronize memory with respect to other threads:
>
> ...
> pthread_mutex_lock()
> ...
> pthread_mutex_unlock()
> ... "
>
> I believe this came directly from earlier versions of the Pthreads
> spec.


This means there is no formal definition of Posix semantics and thus there is no way
to actually prove or know you have a correct implementation.

>
> We agree that you can perform optimizations that invisibly fail to
> "synchronize memory". (This becomes an issue if you relax the
> restriction at the beginning of the above paragraph, e.g. to support
> lock-free algorithms, but that's not the topic here.) But the problem
> I've been trying to point out is that if the compiler literally
> follows the letter of the above spec, it may still generate code that
> visibly diverges from everyone's intuition about how locks should
> work. And I claim that's an obstacle to reliable multithreaded code.
>

Well, don't think in terms of hardware memory. Think in terms of what is observable
to the program. Programs don't see memory, they see program variables. So a
synchronization requires whatever it takes to make the value of a variable seen
by one thread be seen by another thread. If it's shared memory, then memory
stores and memory barriers have to be used somewhere. If it's distributed
shared memory then other quite different mechanisms need to be used.

So the first thing is to come up with a formal definition of "synchronization".
The problem is that there is none. In the Rationale: Basic Definitions under
General Concepts in the Single Unix Specification.

Formal definitions of the memory model were rejected as unreadable by the vast
majority of programmers. In addition, most of the formal work in the literature has
concentrated on the memory as provided by the hardware as opposed to the application
programmer through the compiler and runtime system. It was believed that a simple
statement intuitive to most programmers would be most effective.
IEEE Std 1003.1-2001 defines functions that can be used to synchronize access to
memory, but it leaves open exactly how one relates those functions to the semantics of
each function as specified elsewhere in IEEE Std 1003.1-2001. IEEE Std 1003.1-2001
also does not make a formal specification of the partial ordering in time that the
functions can impose, as that is implied in the description of the semantics of each
function. It simply states that the programmer has to ensure that modifications do not
occur "simultaneously" with other access to a memory location.

Just because no one would understand it isn't an excuse for not doing a formal
specification. Java did it. But anyway we're stuck without one and there won't
be any help from Posix.

I could do a formal specification but I'm not really in a position of any kind of
authority so there's no point in my doing it, even as an academic exercise.

So, assuming you get a specification somewhere, the implementation becomes hardware
and compiler dependent. If you don't want the implementations to be compiler dependent
(which is what I think we're discussing here) then you can define standard language
mechanisms to implement thread synchronization with. But you're not defining threads,
just these mechanisms. In the same way as memory barriers do not define threads.
In fact, you can be off the hook for defining threads or having a definition at all.
Just define the mechanisms and say that you think they'd be useful for implementing
threads. Even though Posix can't tell you what threads are, they can give you
an opinion on the mechanisms.

Joe Seigh

Andrei Alexandrescu (See Website for Email)

unread,
Sep 14, 2004, 8:21:42 AM9/14/04
to
<ka...@gabi-soft.fr> wrote in message
news:d6652001.04091...@posting.google.com...

> "Andrei Alexandrescu \(See Website for Email\)"
> <SeeWebsit...@moderncppdesign.com> wrote in message
>> Yah, but the STL does work by wholesale lock-wrapping.
>
> ??? The STL, by its very nature, is lock-independant. Or should be --
> I guess allocators could be a problem.
>
> I'm not too sure what point you're trying to make. You certainly aren't
> suggesting that we should systematically lock all of our accesses to the
> STLL, but I don't see what you are suggesting.

I was just saying that the STL containers work if you use locking outside of
them. This is because (in all implementations I know of) they don't use
stealth shared state. The notable offender is std::string on implementations
that use refcounting, but then string is not part of the STL :o).

>> I agree with Hans - another thing is that volatile disables many
>> optimizations that would be legit inside a locked region.
>
> I agree with him in regards to volatile. That wasn't my point. I
> disagree that just wholesale locking around a major single-threaded
> library is going to automatically work. It may work for some libraries,
> but unless you know the details of the library, you never know.

Violent agreement. Now I'll let go of your shirt, and please let go of mine
:oD.


Andrei

Balog Pal

unread,
Sep 14, 2004, 8:58:59 PM9/14/04
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:41463A59...@xemaps.com...

> Formal definitions of the memory model were rejected as unreadable by
the vast
> majority of programmers.

DOH

> I could do a formal specification but I'm not really in a position of any
kind of
> authority so there's no point in my doing it, even as an academic
exercise.

Of course there is. C++ needs that formal spec. The to-be-assembled
proposal on thread support must have it. (Hopefully the first move happens
at the recent meeting.) For a while it may be just a TODO spot, but the
sooner it is ready the better.

If you can do it please do, it will not be a wasted work.

Paul

Ben Hutchings

unread,
Sep 15, 2004, 5:57:02 AM9/15/04
to
Balog Pal wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:41463A59...@xemaps.com...
>
>> Formal definitions of the memory model were rejected as
>> unreadable by the vast majority of programmers.
>
> DOH
>
>> I could do a formal specification but I'm not really in a position
>> of any kind of authority so there's no point in my doing it, even
>> as an academic exercise.
>
> Of course there is. C++ needs that formal spec. The to-be-assembled
> proposal on thread support must have it. (Hopefully the first move happens
> at the recent meeting.)

The first paper, entitled "Memory model for multithreaded C++", has
been submitted as N1680.

> For a while it may be just a TODO spot, but the sooner it is ready
> the better.

This first paper simply explains why the memory model needs to be
redefined for multithreaded environments and what sort of special
operations may be needed.

> If you can do it please do, it will not be a wasted work.

It might be if it's duplicated work. That's not meant to discourage
anyone from working on the memory model; just to point out that people
with experience are already doing so (Doug Lea and Bill Bugh were co-
authors on both JSR 133 and this paper) and it might be a good idea to
work with them than separately.

--
Ben Hutchings
If at first you don't succeed, you're doing about average.

Balog Pal

unread,
Sep 15, 2004, 2:33:07 PM9/15/04
to
"Ben Hutchings" <ben-publ...@decadentplace.org.uk> wrote in message
news:slrnckf9h5.pk.b...@decadentplace.org.uk...

> The first paper, entitled "Memory model for multithreaded C++", has
> been submitted as N1680.

That's cool. How can one access those papers? I started at
http://www.open-std.org/jtc1/sc22/wg21/ and tried the recent papers at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/#mailing2004-07

The last one there is N1667. Some other links require password.

> > If you can do it please do, it will not be a wasted work.
>
> It might be if it's duplicated work. That's not meant to discourage
> anyone from working on the memory model; just to point out that people
> with experience are already doing so (Doug Lea and Bill Bugh were co-
> authors on both JSR 133 and this paper) and it might be a good idea to
> work with them than separately.

Cool. How can one do that?

Paul

Joe Seigh

unread,
Sep 15, 2004, 3:28:09 PM9/15/04
to

Balog Pal wrote:
>
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:41463A59...@xemaps.com...
>
> > Formal definitions of the memory model were rejected as unreadable by
> the vast
> > majority of programmers.
>
> DOH
>
> > I could do a formal specification but I'm not really in a position of any
> kind of
> > authority so there's no point in my doing it, even as an academic
> exercise.
>
> Of course there is. C++ needs that formal spec. The to-be-assembled
> proposal on thread support must have it. (Hopefully the first move happens
> at the recent meeting.) For a while it may be just a TODO spot, but the
> sooner it is ready the better.
>
> If you can do it please do, it will not be a wasted work.
>

Well, the first attempt I worked on, while it was maybe only
a few weeks work took about 2 years due to trying to come up
with the proper concepts. A second attempt wouldn't take that
long but it's still a lot of work. It's in sort of Guttag's
algebraic specification form which while compact is work
intensive to create.

I don't think you actually need to to define threading itself to
have it supported by C++ though you probably have to see the definitions
for threading to realize that.

The actual support you'd need probably would look something like the following.
Among other things, you'd have attributes that would make whatever they
were applied to look something like a sequence point. So you could have
a _release attribute that would force the compiler to issue stores for
anything in registers and a _acquire attribute that would force the registers
to be (re)loaded by compiler after that point. Default attributes for
external function calls would be both _release and _acquire, which is the defacto
compiler optimization now.

So a lock function would be declared with _acquire attribute and an unlock
function would be decleared with the _release attribute. So far, so good.
That's what you'd expect.

Now comes the interesting part. You have this statement in the Single Unix Specification
(right before the part I quoted previously).

Conforming applications may only use the functions listed to synchronize threads of
control with respect to memory access. There are many other candidates for functions
that might also be used. Examples are: signal sending and reception, or pipe writing
and reading. In general, any function that allows one thread of control to wait for an
action caused by another thread of control is a candidate. IEEE Std 1003.1-2001 does
not require these additional functions to synchronize memory access since this would
imply the following:

All these functions would have to be recognized by advanced compilation systems
so that memory operations and calls to these functions are not reordered by
optimization.

All these functions would potentially have to have memory synchronization
instructions added, depending on the particular machine.

The additional functions complicate the model of how memory is synchronized and
make automatic data race detection techniques impractical.

Now suppose you want to write one of the functions say a queue push operation. You need
the stores into the object to complete before you push onto the queue, otherwise a
dequeueing thread could see an incompletely initialized object. Well, you'd declare
the push function with the _release attribute. The default, both _release and _acquire,
would work also, which is why things work today it you're wondering.

The interesting thing is that when you inline the function, the attributes can be
elided, at least in this case. That's because the synchronization provided internally by
the push function would provide the necessary attributes. It would likely contain
a lock _acquire, unlock _release pair which would be sufficient. This bears further
investigation to see if it generalizes.

Other thread support would be things discussed previously such as atomicity of
individual stores and loads, word tearing and data alignment, and how you'd
get the compiler to emit the proper memory barriers and interlocked instructions
given they're platform dependent and given the variation in inline assembler syntax. I'd
rather not have this hard coded into the compiler by that language standard given
that most of the lock-free synchronization mechanisms aren't well known or defined
at this time and you want C/C++ to be able to deal with that. I'd rather have it
decoupled from the language spec and done at the library level. That way you
can extend threading functionality without having to go through the spec and a long
compiler development and release cycle.


Joe Seigh

Ben Hutchings

unread,
Sep 15, 2004, 7:42:25 PM9/15/04
to
I wrote:
> (Doug Lea and Bill Bugh were co-authors on both JSR 133 and this paper)

I meant Bill _Pugh_, of course.

Joe Seigh

unread,
Sep 16, 2004, 7:23:14 AM9/16/04
to

Ben Hutchings wrote:
>
> I wrote:
> > (Doug Lea and Bill Bugh were co-authors on both JSR 133 and this paper)
>
> I meant Bill _Pugh_, of course.
>

The JSR-133 stuff is here http://www.cs.umd.edu/~pugh/java/memoryModel/

Joe Seigh

David Eng

unread,
Sep 16, 2004, 10:36:09 PM9/16/04
to
"Andrei Alexandrescu \(See Website for Email\)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<2ofbi8F...@uni-berlin.de>...
> Years ago, Java decided to add support for multithreading to the core
> language. Many MT experts said that the support was not very good. Standard
> C++, as we all know, chose (and as far as I understand, is undecided as of
> the next standardization iteration) to stay away from MT altogether.
>
> Since then, a predictable sequence of events has happened.
>
> The Java community got better at what they were doing, and the C++ community
> continued to segregate deeper and deeper among standard-conforming
> programmers who don't use MT, multi-standard programmers who use, say,
> pthreads and its C binding (and tread carefully as far as mixing C++ with
> pthreads go), and hard-core MT programmers who use nonportable documentation
> provided by their own platform and compiler.
>
> Once it became evident that threads must be part of the language, the
> pursuit of better language-level abstractions for multithreading naturally
> continued within the Java research and industrial communities. Also,
> hard-core C and C++ MT programmers became a more and more isolated clique
> (occasionally perceived at snooty by the standard-compliant programmers who
> don't understand the complexities that are at stake).
>
> As a result, much of the research on the recent lock-free data structures
> (which surpass the now-familiar lock-based structures in performance) has
> been done in Java or uses C/C++ only as "pseudocode", the real C/C++ code
> being not portable.
>
> Java even has lock-free structures in their current distribution
> (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html).
> If this trend will continue, soon Java will be the language of choice for
> portable multithreaded programming, and standard C++ will be out in the cold
> clenching its teeth.
>
> Does the C++ standardization committee plan to address threads at all, or
> continue to ignore them?

Hi Andrei, I am so excited in reading this post and article N1680.
You did a big favor for C++ programmers. I am long looking for a C++
multithread model. I cannot say how important it will be for C++.
But I just don't have the ability to do it. Now, with the people like
Hans Boehm and you, I can safely say we will have a nice C++
multithread model. In this era of parallel and distributed computing,
multithread is not an auxiliary choice, rather, it is a must feather
in a modern programming language. I want to say thank you again.

The only thing I want to add to your article N1680 is that thread
library from Rogue Wave is also a good one (more completed and
comprehensive among all C++ thread libraries, in my opinion). You can
check out this library along with other ones for any proposed standard
library.

It is loading more messages.
0 new messages