Multi-Processor Concurrency Problem

Peter Olcott

unread,

Jan 8, 2002, 4:52:47 PM1/8/02

to

inline const char *FastString::c_str() const
{
String[NextByte] = 0;
return (String);
}

The String is an array of char. This is written in C++.
http://home.att.net/~olcott/FastString.cpp
This is where the rest of the code can be found.

I want to know if there can be any possible problem
with the above function in either a multi-threaded
or multi-processor environment, such that multiple
concurrent access to the above function, pertaining
to the same data could cause errors in the absence
of any locking mechanism.

I am guessing that this is a computer hardware problem,
and the problem is whether or not two completely
simultaneous attempts to update this exact same address
with zero could collide with each other such that a
value other than zero could be written.

Daniel Hams

unread,

Jan 8, 2002, 5:54:41 PM1/8/02

to

Peter Olcott wrote:

> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }
>
>

I'm a newbie at this, but my guess is that because you are using a
locally declared class instance variable (NextByte),
thats a problem with multiple threads making calls into a single
instance of the class.

Lets say I hit another method in FastString that updates that last
character location with a new value, and then updates the
value of NextByte. Inbetween these operations, a thread has called c_str
causing that new char to be overwritten with your null
terminator. The new character has been lost.

I don't think thats your desired behavoir?

Could you not initialise your entire String char array to the NULL
character to start with?

D

Mark Johnson

unread,

Jan 8, 2002, 6:13:14 PM1/8/02

to

Peter Olcott wrote:
>
> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }
>
> The String is an array of char. This is written in C++.
> http://home.att.net/~olcott/FastString.cpp
> This is where the rest of the code can be found.
>
> I want to know if there can be any possible problem
> with the above function in either a multi-threaded
> or multi-processor environment, such that multiple
> concurrent access to the above function, pertaining
> to the same data could cause errors in the absence
> of any locking mechanism.
>

Let's look at it assuming you can be preempted at any point in the
process to enumerate some possible problems...
- the value of String (a *char) is not valid or changes during the
execution of the first statement
- the value of String (a *char) is made invalid or changes between the
first and second statements
- the value of NextByte (an int) is not valid during execution of the
first statement
- the value at String[NextByte] is set to a non zero value between the
first and second statements
- the characters that String points to can be destroyed during (or
after) execution of this function
These could all happen with a single CPU with multiple threads of
execution. Your class has plenty of functions that can make these
happen. If you can guarantee that none of these can occur with some
other mechanism, you are OK. If not, you are broke.

If you have multiple CPU's, the problem gets messy if memory visibility
is not taken into account. For example, CPU 0 can set NextByte, while
CPU 1 merrily uses a cached value of NextByte in the assignment
statement. To fix this, you need a mutex or other mechanism for making
the memory consistent.

> I am guessing that this is a computer hardware problem,
> and the problem is whether or not two completely
> simultaneous attempts to update this exact same address
> with zero could collide with each other such that a
> value other than zero could be written.

I don't see how hardware helps or hurts you that much. As I said in the
first statement - it can happen with multiple threads in a single CPU
with a strong memory model if the rest of the FastString object does not
use care in each and every action. Your statement that multiple
assignments of zero is correct, however that does not mean that another
function in FastString won't change the exact same address to a non zero
value while this function is running at the same time.

Peter Olcott

unread,

Jan 8, 2002, 6:23:34 PM1/8/02

to

"Daniel Hams" <dann...@huntthepickle.org> wrote in message
news:3C3B78B1...@huntthepickle.org...

I am not talking about that problem. I am restricting
consideration ONLY to this particular c_str()
member function. Can concurrent calls to this
member function interfere with each other?

Peter Olcott

unread,

Jan 8, 2002, 6:29:45 PM1/8/02

to

"Mark Johnson" <Mark_H_...@Raytheon.com> wrote in message news:3C3B7D0A...@Raytheon.com...

> Peter Olcott wrote:
> >
> > inline const char *FastString::c_str() const
> > {
> > String[NextByte] = 0;
> > return (String);
> > }

> If you have multiple CPU's, the problem gets messy if memory visibility

> is not taken into account. For example, CPU 0 can set NextByte, while

No, that is defined as impossible. The analysis is restricted
to this one function in isolation. This one function does not
change the value of NextByte.

> > I am guessing that this is a computer hardware problem,
> > and the problem is whether or not two completely
> > simultaneous attempts to update this exact same address
> > with zero could collide with each other such that a
> > value other than zero could be written.
>
> I don't see how hardware helps or hurts you that much. As I said in the
> first statement - it can happen with multiple threads in a single CPU
> with a strong memory model if the rest of the FastString object does not
> use care in each and every action. Your statement that multiple
> assignments of zero is correct, however that does not mean that another
> function in FastString won't change the exact same address to a non zero
> value while this function is running at the same time.

Yes it does, because we are restricting the analysis to this
one function in isolation, and this one function can ONLY
change this address to zero. Basically I am telling all my
users that this is a ReadOnly function. With all the functions
capable a writing, it is their responsibility to handle the
multiple concurrent access.

Chris Smith

unread,

Jan 8, 2002, 7:56:27 PM1/8/02

to

Okay, clearer problem this time, but this is essentially exactly what it
came down to last round. Here's your answer:

No, POSIX does not guarantee that this function can be executed
concurrently in two different threads on the same data. Memory is being
written in two different threads without synchronization, and the results
are not defined.

No, C++ by itself doesn't guarantee that *anything* works in any kind of
multithreading environment. The C++ standard says nothing about
concurrency or consistency guarantees between threads. That's why you
need to be able to refer to another standard, such as POSIX, to determine
if concurrent access is safe.

I find it unlikely that this code will cause problems on any current
hardware architecture that I'm aware of. However, since you haven't
mentioned any hardware architectures, and I'm not familiar with all
existing (and definitely not all future) hardware architectures, I
wouldn't put any money on this being portably safe.

The *only* reasonable way that you're going to be able to provide the
guarantee that you want is to list supported platforms for your code, and
make and verify it individually for each supported platform. POSIX
allows you to treat all POSIX platforms as one by only relying on
guarantees made by the POSIX standard (in which case you'd need to revise
your code). If you need to support non-POSIX environments like Win32
threads, you'll need to separately verify those platforms.

Chris Smith

Daniel Hams

unread,

Jan 8, 2002, 8:04:29 PM1/8/02

to

> I am not talking about that problem. I am restricting
> consideration ONLY to this particular c_str()
> member function. Can concurrent calls to this
> member function interfere with each other?

Feasibly, yes. As far as I know, its undefined behavoir, and as such means that concurrent threads don't _have_ to play nice.

Maybe ask the GCC guys what happens in this kind of scenario.

It will probably work most of the time, but you'll have to do some
extensive testing to find out .-)

Dan

Peter Olcott

unread,

Jan 8, 2002, 8:30:30 PM1/8/02

to

"Daniel Hams" <dann...@huntthepickle.org> wrote in message

news:3C3B971D...@huntthepickle.org...

Testing is out of the question because I must
know that it will work on machines not yet
invented. I think that this can be known,
because I think that designing the architecture
any other way might not make any sense.

Peter Olcott

unread,

Jan 8, 2002, 8:35:59 PM1/8/02

to

> I find it unlikely that this code will cause problems on any current
> hardware architecture that I'm aware of. However, since you haven't
> mentioned any hardware architectures, and I'm not familiar with all
> existing (and definitely not all future) hardware architectures, I
> wouldn't put any money on this being portably safe.
>
> The *only* reasonable way that you're going to be able to provide the
> guarantee that you want is to list supported platforms for your code, and
> make and verify it individually for each supported platform. POSIX
> allows you to treat all POSIX platforms as one by only relying on
> guarantees made by the POSIX standard (in which case you'd need to revise
> your code). If you need to support non-POSIX environments like Win32
> threads, you'll need to separately verify those platforms.

I can't do this. It must run on machines not even invented yet,
and it must run in every environment, including those not
yet written. I am not getting any replies from the guys that
know this bare metal stuff, and think that's what I need.

Apart from any operating system, is there every a case
where writing a ASCII zero byte from two different processes
to the exact same address would result in anything other than
ASCII zero, being written? This is a computer circuit
question.

Peter Olcott

unread,

Jan 8, 2002, 8:49:26 PM1/8/02

to

inline const char *FastString::c_str() const
{
String[NextByte] = 0;
return (String);
}

The String is an array of char. This is written in C++.

http://home.att.net/~olcott/FastString.cpp
This is where the rest of the code can be found.

I want to know if there can be any possible problem

with the above function taken in isolation from
every other possible function, in either a multi-threaded

or multi-processor environment, such that multiple
concurrent access to the above function, pertaining
to the same data could cause errors in the absence
of any locking mechanism.

I am guessing that this is a computer hardware problem,

and the problem is whether or not two completely
simultaneous attempts to update this exact same address
with zero could collide with each other such that a
value other than zero could be written.

Remember this is two processes writing the same value
(ASCII Zero) to the exact same address, there are
no other possibilities. I only want to consider the
concurrent use of the above function in complete
isolation from any and all other functions.

del cecchi

unread,

Jan 8, 2002, 10:14:15 PM1/8/02

to

"Peter Olcott" <olc...@worldnet.att.net> wrote in message
news:GkN_7.240411$WW.12...@bgtnsc05-news.ops.worldnet.att.net...

I don't read code, sorry. But any properly functioning memory
controller serializes the memory writes, or in the case of a cache,
maybe multi-ports it. In any case in some sense it is impossible for
the writes to actually be simultaneous. Of course one could have poorly
designed hardware with bus arbitration problems or some such.
Metastability can make things act funny as can timing violations.

Summary: Hardware is broken or the software isn't doing what you think
it is.

del cecchi

cjt

unread,

Jan 8, 2002, 10:24:29 PM1/8/02

to

Peter Olcott wrote:
>
> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }
>
> The String is an array of char. This is written in C++.
> http://home.att.net/~olcott/FastString.cpp
> This is where the rest of the code can be found.
>

I can certainly imagine a machine whose instructions only work on units
bigger than a byte getting in trouble, since a read is then involved (to
get the rest of the bytes in a word, e.g., so they can be held invariant).

Call two instances with values of NextByte that point to adjacent bytes in
the same word and see what happens.

Peter Olcott

unread,

Jan 8, 2002, 10:43:29 PM1/8/02

to

> I don't read code, sorry. But any properly functioning memory
> controller serializes the memory writes, or in the case of a cache,

Even in the case of two or more physical processors sharing
the same memory space?

Peter Olcott

unread,

Jan 8, 2002, 10:46:23 PM1/8/02

to

"cjt" <chel...@prodigy.net> wrote in message news:3C3BB79F...@prodigy.net...

> Peter Olcott wrote:
> >
> > inline const char *FastString::c_str() const
> > {
> > String[NextByte] = 0;
> > return (String);
> > }
> >
> > The String is an array of char. This is written in C++.
> > http://home.att.net/~olcott/FastString.cpp
> > This is where the rest of the code can be found.
> >
> I can certainly imagine a machine whose instructions only work on units
> bigger than a byte getting in trouble, since a read is then involved (to
> get the rest of the bytes in a word, e.g., so they can be held invariant).
>
> Call two instances with values of NextByte that point to adjacent bytes in
> the same word and see what happens.

This will not ever happen. I can be guaranteed that this
will not ever occur. The only thing that I am looking at
is two or more physical processors writing the same
value to the same exact byte address, at anytime at
all which can include exactly the same instant.

David Schwartz

unread,

Jan 8, 2002, 10:51:19 PM1/8/02

to

Peter Olcott wrote:

> Testing is out of the question because I must
> know that it will work on machines not yet
> invented. I think that this can be known,
> because I think that designing the architecture
> any other way might not make any sense.

You are doing nothing more than assuming.

DS

David Schwartz

unread,

Jan 8, 2002, 10:52:17 PM1/8/02

to

Peter Olcott wrote:

> Apart from any operating system, is there every a case
> where writing a ASCII zero byte from two different processes
> to the exact same address would result in anything other than
> ASCII zero, being written? This is a computer circuit
> question.

In principle, a machine could write a 'zero' to an address by fetching
its current value and then atomically subtracting that value. That
wouldn't violate any standard that's applicable in this case.

DS

Peter Olcott

unread,

Jan 8, 2002, 11:05:45 PM1/8/02

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:3C3BBE37...@webmaster.com...

Ah, but, you are merely assuming that I am
doing nothing more than assuming. Actually
a linked up several computer hardware only
newsgroups, and am discussing it with them.

Peter Olcott

unread,

Jan 8, 2002, 11:07:23 PM1/8/02

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:3C3BBE71...@webmaster.com...

Except that in practice is would screw up concurrency,
thus would be avoided as a bad circuit design. Not
even counting the fact that this requires twice as much
work.

Alex Colvin

unread,

Jan 8, 2002, 11:24:39 PM1/8/02

to

> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }

>The String is an array of char. This is written in C++.
>http://home.att.net/~olcott/FastString.cpp
>This is where the rest of the code can be found.

>I want to know if there can be any possible problem
>with the above function taken in isolation from
>every other possible function, in either a multi-threaded
>or multi-processor environment, such that multiple
>concurrent access to the above function, pertaining
>to the same data could cause errors in the absence
>of any locking mechanism.

Well, without looking at the rest of your code (I'm lazy too) I'll note
that you have a const method modifying String. If that's a member, you're
asking for trouble. Unless you've also declared it "mutable" or some
such...

I assume that String is just a character pointer or array.
It had better not be a STL-string with an implicit conversion
taking place in the return statement (and what's with the parentheses
around the return value? This isn't PL/I).

With the STL strings, the storage returned by c_str() isn't guaranteed to
stick around when you do things to the string that owns it.

Take a good look at what your code is actually doing - maybe at the
assembly-language. The old Cfront made this easier. Be especially wary of
things like reference count reference counts that aren't locked.

>I am guessing that this is a computer hardware problem,

Highly unlikely, unless you're running your own homebrew multiprocessor.

Although sometimes it's not a hardware problem only because the hardware
is defined to work that way.

My advice is to avoid sharing memory, especially anything involving
pointers, among threads. In fact, my advice is to avoid threads.
Use fork() instead.

--
mac the naïf

CBFalconer

unread,

Jan 8, 2002, 11:57:46 PM1/8/02

to

You are being ridiculous. Moving here from c.l.c or
comp.programming does not change the reality.

Take two identical balls. Label them 'zero'. Hold one in each
hand. Try to stuff them both in the same corner of the same
drawer at the same time. Computers are no more complicated than
this. There is no magic.

Memories are NOT written to simultaneously. Operations are
queued. If you have two processes running on two separate CPUs
both trying to write to the same physical address, some bus
mastery protocol will order the requests. Similarly, there can be
no simultaneous reads. If the processes are running on the same
CPU they can't even try to write simultaneously.

However systems generally have caches to postpone those
reads/writes until it is convenient and efficacious to perform
them. These caches can get out of sync with the actual memory,
and create problems. Architects have gone to great lengths to
avoid these invalid cache hits.

Protocols such as threading, critical areas, semaphores, monitors,
queues etc. have been designed and built to make these results
unambiguous. Use them.

Maybe you should describe what you are actually trying to do.
This horse is dead and maggot infested.

--
Chuck F (cbfal...@yahoo.com) (cbfal...@XXXXworldnet.att.net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)

Peter Olcott

unread,

Jan 9, 2002, 12:05:58 AM1/9/02

to

"Alex Colvin" <al...@world.std.com> wrote in message news:GpnLL...@world.std.com...

> > inline const char *FastString::c_str() const
> > {
> > String[NextByte] = 0;
> > return (String);
> > }
>
> >The String is an array of char. This is written in C++.
> >http://home.att.net/~olcott/FastString.cpp
> >This is where the rest of the code can be found.
>
> >I want to know if there can be any possible problem
> >with the above function taken in isolation from
> >every other possible function, in either a multi-threaded
> >or multi-processor environment, such that multiple
> >concurrent access to the above function, pertaining
> >to the same data could cause errors in the absence
> >of any locking mechanism.
>
> Well, without looking at the rest of your code (I'm lazy too) I'll note
> that you have a const method modifying String. If that's a member, you're
> asking for trouble. Unless you've also declared it "mutable" or some
> such...
>
> I assume that String is just a character pointer or array.

I already said that its an array of char, that was my first sentence.

> With the STL strings, the storage returned by c_str() isn't guaranteed to
> stick around when you do things to the string that owns it.

I want to know more about this. Can you cut and paste
me some quotes, or give me a link?

> Take a good look at what your code is actually doing - maybe at the
> assembly-language. The old Cfront made this easier. Be especially wary of
> things like reference count reference counts that aren't locked.
>
> >I am guessing that this is a computer hardware problem,
>
> Highly unlikely, unless you're running your own homebrew multiprocessor.

My goal is to provide a 100% platform independent C++ source
code only version of at least std::string. I want to provide at least
the basic exception guarantee, and the thread safety guarantee
that SGI STL provides. This thread safety guarantee basically
say we are not going to do anything to help you make this
code work in a threaded environment, all that we can guarantee
is that we won't do anything to screw the threads up, You must
provide all the concurrency control mechanisms yourself.

The reason that this is a hardware problem, and the reason
that this post was made to hardware only groups was because
the ONLY remaining issue is the concurrency of the single
function listed above.

Is there any hardware platform where writing the same zero
value to the same byte address could ever result in something
besides zero being written in a multiple processor concurrency
environment?

(Obviously the follow-up to a YES question would be
which hardware platform, and exactly what happens?
the follow up to a NO answer, would be How do you
know this for sure?)

This above question (and follow ups) forms the total and
complete full scope of every single detail of my complete
investigation.

Peter Olcott

unread,

Jan 9, 2002, 12:22:50 AM1/9/02

to

> Take two identical balls. Label them 'zero'. Hold one in each
> hand. Try to stuff them both in the same corner of the same
> drawer at the same time. Computers are no more complicated than
> this. There is no magic.
>
> Memories are NOT written to simultaneously. Operations are
> queued. If you have two processes running on two separate CPUs
> both trying to write to the same physical address, some bus
> mastery protocol will order the requests. Similarly, there can be
> no simultaneous reads. If the processes are running on the same
> CPU they can't even try to write simultaneously.
>
> However systems generally have caches to postpone those
> reads/writes until it is convenient and efficacious to perform
> them. These caches can get out of sync with the actual memory,
> and create problems. Architects have gone to great lengths to
> avoid these invalid cache hits.
>
> Protocols such as threading, critical areas, semaphores, monitors,
> queues etc. have been designed and built to make these results
> unambiguous. Use them.
>
> Maybe you should describe what you are actually trying to do.
> This horse is dead and maggot infested.

I sure wouldn't say that. This is the very first definitive answer
that I have received after about one hundred or more answers.
Every other answer was about some operating system protocol,
or other rule-of-thumb, or conventional standard.

I don't think that there could be an invalid cache hit because
the processor would have to know that if it has a write that
proceeds a read in its queue, and these both refer to the same
memory location, that it had better do the write first.

I am trying to write an implementation of std::string that
is as fast as it possibly can be. It must meet SGI STL's
thread safety guarantee. It must meet at least the basic
guarantee of exception safety, and it must be utterly
platform independent, and 100% portable in source
code only form. I don't want to get into any issues about
whether this combination of criteria is right, I have already
hashed that through to oblivion.

So let me verify the most crucial point that you, and no one
else has made. All memory reads, and all memory writes are
always synchronized on every machine all the time?

In other words is not physically possible for any
machine to physically write to the same address at
the same time, and any attempt under any conditions
will always, each and every time be synchronized?

Mike Mowbray

unread,

Jan 9, 2002, 12:54:31 AM1/9/02

to

Peter Olcott wrote:

> > I am trying to write an implementation of std::string
> > that is as fast as it possibly can be. It must meet SGI
> > STL's thread safety guarantee.

I have to ask... why are you inserting the null byte in c_str()?
Why not just keep the char array null-terminated whenever some
other non-const mbr fn is called which changes the string's
contents?

I doubt the extra overhead of writing some extra null bytes
occasionally will ever be noticed.

- MikeM.

Peter Olcott

unread,

Jan 9, 2002, 1:30:23 AM1/9/02

to

"Mike Mowbray" <mi...@ot.com.au> wrote in message news:3C3BDB17...@ot.com.au...

When I say fastest possible I mean that in the very most literal
terms. Something on the order of many of the brightest minds
getting together and spending years trying to derive even the
slightest trace of improvement. This is actually quite easily
feasible. Linux is at least in this ballpark, and has a much more
open ended specification, thus much more difficult to perfect.

Kaz Kylheku

unread,

Jan 9, 2002, 1:43:00 AM1/9/02

to

In article <%lP_7.40636$fe1.7...@bgtnsc06-news.ops.worldnet.att.net>,

Why would it screw up concurrency? Under concurrency, the software
could issue a special instruction to bring about an atomic read-modify-write
cycle on the bus, so the subtraction would happen indivisibly.

Do you really think that locations don't have to be read before being
written, on any system? What if reads and writes have to be done in 64
bit words, but the program wants to store, say, a 32 bit word? You have
to get an up-to-date copy of the 64 bit word, mask in the 32 bit word,
and then write back a whole 64 bit word. Whether this is done by
compiler-generated code, or by the hardware, is irrelevant. In principle,
it could be done either way.

If you don't do this atomically, what if some other thread or
processor simultaneously writes to the other 32 bit half of that 64 bit
word? Someone has to have the last word, no pun intended, and the two
don't know about each other's halves.

Now this read-modify-write of the 64 bit word could be done atomically,
but *that* would reduce concurrency by needlessly locking up parts of
the hardware. When you don't need such updates to be done atomically,
because you know you don't have to worry about concurrency, you won't
issue the special instruction to do it atomically.

Now in standard C or C++ there is no way to specify whether or not to
use that atomic update.

Peter Olcott

unread,

Jan 9, 2002, 2:19:11 AM1/9/02

to

> >Except that in practice is would screw up concurrency,
> >thus would be avoided as a bad circuit design. Not
> >even counting the fact that this requires twice as much
> >work.
>
> Why would it screw up concurrency? Under concurrency, the software
> could issue a special instruction to bring about an atomic read-modify-write
> cycle on the bus, so the subtraction would happen indivisibly.
>

You did not specify this, and I did not assume it.
Its still likely slower than necessary. read-modify-write
would either be slower or it would require a hardware
circuit more complicated than necessary, for this
particular operation. A comparable operation could
be very useful for the more complex case of multiple
concurrent access, where DIFFERING values are
being written to DIFFERING addresses.

> Do you really think that locations don't have to be read before being
> written, on any system? What if reads and writes have to be done in 64
> bit words, but the program wants to store, say, a 32 bit word? You have
> to get an up-to-date copy of the 64 bit word, mask in the 32 bit word,
> and then write back a whole 64 bit word. Whether this is done by
> compiler-generated code, or by the hardware, is irrelevant. In principle,
> it could be done either way.

I would think that this would be the exception rather than the rule.
I would think that in most cases a machine word could be read
and written independently. I would even guess the same for bytes.

> If you don't do this atomically, what if some other thread or
> processor simultaneously writes to the other 32 bit half of that 64 bit
> word? Someone has to have the last word, no pun intended, and the two
> don't know about each other's halves.

Sure this could be required. Making one part of the design complex
to make another part simple.

> Now this read-modify-write of the 64 bit word could be done atomically,
> but *that* would reduce concurrency by needlessly locking up parts of
> the hardware. When you don't need such updates to be done atomically,
> because you know you don't have to worry about concurrency, you won't
> issue the special instruction to do it atomically.
>
> Now in standard C or C++ there is no way to specify whether or not to
> use that atomic update.

I have already hashed this into non-existence in other groups.
Ultimately C++ can't do any more than the hardware can
do, thus the hardware forms the bottom line. Two people
have told me that every single write to memory is serialized
to occur synchronously. It is literally physically impossible
to write to the same location at the same time, even with
hundreds of physical processors sharing the same memory
space. If this is the case, then my problem is fully solved.

My little function taken in isolation can be considered
a read only function as far as concurrency is concerned,
even though in actual fact it does write an ASCII zero.

Terje Mathisen

unread,

Jan 9, 2002, 3:17:32 AM1/9/02

to

Peter Olcott wrote:
> Is there any hardware platform where writing the same zero
> value to the same byte address could ever result in something
> besides zero being written in a multiple processor concurrency
> environment?

No.

Not on a working system.

On the original Alpha with 32-bit load/store operations the second of
those writes might take a while, _if_ it was compiled to guard against
simultaneous updates:

In that case the LL/SC sequence would notice that the surrounding word
(really only the target byte!) had been modified, and it would therefore
have to retry the operation.

OTOH, since there's not even a 'volatile' in your source code, it would
be compiled to use a normal load/modify/store sequence, which would have
been OK.

Terje

PS. I just thought of a possible exception: If the target memory is
really some kind of memory-mapped device, then all bets are off. I.e. on
the 1984-vintage IBM AT with EGA card, load/store operations to the
graphics card could do more or less anything you wanted, because the
actual operation was specified using external ports, and the load/store
ops only indicated the address range to work on.

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Arnold Hendriks

unread,

Jan 9, 2002, 4:11:23 AM1/9/02

to

Peter Olcott <olc...@worldnet.att.net> wrote:
> Every other answer was about some operating system protocol,
> or other rule-of-thumb, or conventional standard.

Because for a lot of people, standards matter. Who knows what may change
in the future that destroys all the assumptions you've made? Either base
your choices on a standard, and you can guarantee portability to all
sytems that implement this standard, or forego your goal of complete
portability.

That's how standards work. That's how portability works. Assuming hardware
works in a certain way, and will always work that way in the future, will
come back to haunt you in the future. If you don't wan that, stick to
standards. That's why they're there.

> I am trying to write an implementation of std::string that

Seeing the state of your code, you have a lot of other issues to resolve
anyway before you should be bothering about speeding up the class in
a high-concurrency environment. Why not tackle one problem at a time?
Perhaps the speed problem you're worrying about has already vanished by
the time you properly implemented all other semantics of std::string.
(Premature optimization...)

--
Arnold Hendriks <a.hen...@b-lex.com>
B-Lex Information Technologies, http://www.b-lex.com/

Rupert Pigott

unread,

Jan 9, 2002, 7:23:51 AM1/9/02

to

Peter Olcott <olc...@worldnet.att.net> wrote in message

news:WcQ_7.40761$fe1.7...@bgtnsc06-news.ops.worldnet.att.net...

>
> "Alex Colvin" <al...@world.std.com> wrote in message
news:GpnLL...@world.std.com...

> > With the STL strings, the storage returned by c_str() isn't guaranteed
to
> > stick around when you do things to the string that owns it.
>
> I want to know more about this. Can you cut and paste
> me some quotes, or give me a link?

RTFM. Go to SGI's website and pull out their documentation on
their STL implementation. It's fairly precise about what is
guaranteed and what isn't. :)

[SNIP]

> My goal is to provide a 100% platform independent C++ source
> code only version of at least std::string. I want to provide at least
> the basic exception guarantee, and the thread safety guarantee
> that SGI STL provides. This thread safety guarantee basically
> say we are not going to do anything to help you make this
> code work in a threaded environment, all that we can guarantee
> is that we won't do anything to screw the threads up, You must
> provide all the concurrency control mechanisms yourself.

<APOLOGIST>
This is probably because there are lots of concurrency
control mechanisms, none of which is available everywhere.
Some are more common than others, eg : POSIX.
</APOLOGIST>

As a general point, I would seriously consider using concurrency
controls on that piece of code anyways. AFAIK two processes
running on two different processors banging the same memory
location at the same time is *NOT* something that C gives any
strong guarantees on.

I don't have a copy of the ANSI C++ spec to hand at the
moment, and I suspect to answer this to my satisfaction it
would need a pretty thorough perusal. However my cautious
instinct tells me USE CONCURRENCY CONTROLS for the
following reasons :
1) You are probably operating in the realms of "undefined
behaviour" of the C++ spec.
2) Your code is meant to be ultra-portable and it's a
library... So if destructive undefined behaviour does occur
in your STL(?) library it will be particularly hard to
to track it down (especially as an inline).
3) If you are operating in the realms of "undefined
behaviour", your code is incorrect by definition. Fix it.

I assume you're trying to play fast & loose on the
thread-safety thing to save some processor cycle. I'd
recommend that you profile some code which makes heavy
use of your STL library and see what kind of impact the
explicitly thread-safe version makes. ie: Is it really
worth writing broken code to save the few cycles ?

Cheers,
Rupert

Peter Olcott

unread,

Jan 9, 2002, 9:01:55 AM1/9/02

to

> > Is there any hardware platform where writing the same zero
> > value to the same byte address could ever result in something
> > besides zero being written in a multiple processor concurrency
> > environment?
>
> No.
>
> Not on a working system.
>
> On the original Alpha with 32-bit load/store operations the second of
> those writes might take a while, _if_ it was compiled to guard against
> simultaneous updates:

I want to know that it is possible or impossible. When I use these
words, I mean them with complete literal precision. I am thus assuming
the case where the programmer does everything possible from a
programmer's point of view to force this condition to occur.
The only things that are disallowed are physical alterations to the
hardware.

> In that case the LL/SC sequence would notice that the surrounding word
> (really only the target byte!) had been modified, and it would therefore
> have to retry the operation.
>
> OTOH, since there's not even a 'volatile' in your source code, it would
> be compiled to use a normal load/modify/store sequence, which would have
> been OK.
>
> Terje
>
> PS. I just thought of a possible exception: If the target memory is
> really some kind of memory-mapped device, then all bets are off. I.e. on
> the 1984-vintage IBM AT with EGA card, load/store operations to the
> graphics card could do more or less anything you wanted, because the
> actual operation was specified using external ports, and the load/store
> ops only indicated the address range to work on.

Yet would simultaneous attempts to write a zero to the same memory
location ever possibly result in something other than zero being written?

Peter Olcott

unread,

Jan 9, 2002, 9:13:38 AM1/9/02

to

"Rupert Pigott" <Dark...@btinternet.com> wrote in message
news:a1hcon$bds$1...@paris.btinternet.com...

> Peter Olcott <olc...@worldnet.att.net> wrote in message
> news:WcQ_7.40761$fe1.7...@bgtnsc06-news.ops.worldnet.att.net...
> >
> > "Alex Colvin" <al...@world.std.com> wrote in message
> news:GpnLL...@world.std.com...
> > > With the STL strings, the storage returned by c_str() isn't guaranteed
> to
> > > stick around when you do things to the string that owns it.
> >
> > I want to know more about this. Can you cut and paste
> > me some quotes, or give me a link?
>
> RTFM. Go to SGI's website and pull out their documentation on
> their STL implementation. It's fairly precise about what is
> guaranteed and what isn't. :)

I already did this, and this single precise point is the one thing
that is still unspecified.

> [SNIP]
> > My goal is to provide a 100% platform independent C++ source
> > code only version of at least std::string. I want to provide at least
> > the basic exception guarantee, and the thread safety guarantee
> > that SGI STL provides. This thread safety guarantee basically
> > say we are not going to do anything to help you make this
> > code work in a threaded environment, all that we can guarantee
> > is that we won't do anything to screw the threads up, You must
> > provide all the concurrency control mechanisms yourself.
>
> <APOLOGIST>
> This is probably because there are lots of concurrency
> control mechanisms, none of which is available everywhere.
> Some are more common than others, eg : POSIX.
> </APOLOGIST>

This, of course is perfectly reasonable when this code is to
be distributed in 100% source code only format. In fact in
this case its impossible to avoid.

> As a general point, I would seriously consider using concurrency
> controls on that piece of code anyways. AFAIK two processes
> running on two different processors banging the same memory
> location at the same time is *NOT* something that C gives any
> strong guarantees on.

The reason that I posted to the hardware groups is I want
to know what kind of guarantees that the bare metal gives
on this because this bare metal is the ultimate arbiter.

> I don't have a copy of the ANSI C++ spec to hand at the
> moment, and I suspect to answer this to my satisfaction it
> would need a pretty thorough perusal. However my cautious
> instinct tells me USE CONCURRENCY CONTROLS for the
> following reasons :

This is a physical impossibility thus completely out of the
question. The reason that I don't say why I need the
question answered is it gets into an endless debate of
my design criteria. These design criteria are 100% utterly
immutable.

> 1) You are probably operating in the realms of "undefined
> behaviour" of the C++ spec.
> 2) Your code is meant to be ultra-portable and it's a
> library... So if destructive undefined behaviour does occur
> in your STL(?) library it will be particularly hard to
> to track it down (especially as an inline).
> 3) If you are operating in the realms of "undefined
> behaviour", your code is incorrect by definition. Fix it.

Although it may be undefined behavior according to
any language standard, apparently according to two
hardware experts it is completely defined at the hardware
level. Every single read or write to memory is always
completely synchronized to occur sequentially, it is
physically impossible to occur otherwise. Given this
single premise, I can know for sure that there can not
possibly be any problem with my c_str() function
on any possible machine architecture.

> I assume you're trying to play fast & loose on the
> thread-safety thing to save some processor cycle. I'd
> recommend that you profile some code which makes heavy
> use of your STL library and see what kind of impact the
> explicitly thread-safe version makes. ie: Is it really
> worth writing broken code to save the few cycles ?

It is not broken code according to the hardware people.
If a sequence can not ever possibly occur, then its not
wise to protect against this impossible sequence.

>
> Cheers,
> Rupert
>
>

Nick Maclaren

unread,

Jan 9, 2002, 9:14:18 AM1/9/02

to

In article <n3Y_7.241382$WW.13...@bgtnsc05-news.ops.worldnet.att.net>,

"Peter Olcott" <olc...@worldnet.att.net> writes:
|> > > Is there any hardware platform where writing the same zero
|> > > value to the same byte address could ever result in something
|> > > besides zero being written in a multiple processor concurrency
|> > > environment?
|> >
|> > No.
|> >
|> > Not on a working system.
|> >
|> > On the original Alpha with 32-bit load/store operations the second of
|> > those writes might take a while, _if_ it was compiled to guard against
|> > simultaneous updates:
|>
|> I want to know that it is possible or impossible. When I use these
|> words, I mean them with complete literal precision. I am thus assuming
|> the case where the programmer does everything possible from a
|> programmer's point of view to force this condition to occur.
|> The only things that are disallowed are physical alterations to the
|> hardware.

The answer is "perhaps". There have been a fair number of systems
where that could result in one or more of the following:

One or both CPUs hanging
One or both CPUs interrupting (e.g. SIGILL)
Both writes being cancelled

Theoretically, I can imagine something else being written, but
I have never heard of it except as a result of one of the above
events.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

Del Cecchi

unread,

Jan 9, 2002, 9:26:32 AM1/9/02

to

In article <WcQ_7.40761$fe1.7...@bgtnsc06-news.ops.worldnet.att.net>,

You are pretty pushy for a stranger here. Why are you so sure that your code is
correct?

|>
|> > Take a good look at what your code is actually doing - maybe at the
|> > assembly-language. The old Cfront made this easier. Be especially wary of
|> > things like reference count reference counts that aren't locked.
|> >
|> > >I am guessing that this is a computer hardware problem,
|> >
|> > Highly unlikely, unless you're running your own homebrew multiprocessor.
|>
|> My goal is to provide a 100% platform independent C++ source
|> code only version of at least std::string. I want to provide at least
|> the basic exception guarantee, and the thread safety guarantee
|> that SGI STL provides. This thread safety guarantee basically
|> say we are not going to do anything to help you make this
|> code work in a threaded environment, all that we can guarantee
|> is that we won't do anything to screw the threads up, You must
|> provide all the concurrency control mechanisms yourself.
|>
|> The reason that this is a hardware problem, and the reason
|> that this post was made to hardware only groups was because
|> the ONLY remaining issue is the concurrency of the single
|> function listed above.

If you have an example of hardware where this actually does happen,
then trot it out. Get the logic analyzer going. My
money is on a software problem.

Obviously no one can know this for sure for every single hardware platform ever
made. But let us break the problem down.

Uniprocessors: Impossible to actually do the two writes at the same actual time.

SMP: The writes to memory get serialized by the memory controller or bus
arbitration (if there is a bus involved, and there is if you go deep enough)

CCNUMA see above SMP

DSM, software coherence. You could get one or the other, or even have different
processors believing that a location contains different data. This is still a
software problem.

The only way one could get data that is different from what either process is
writing is if the hardware is broken. I will type this slowly so you can
understand. B R O K E N.

It may be broken in design, although I know of no hardware that is. A
mis-designed synchronizer in a piece of logic with asynchronous inputs could
possibly cause this, but it would cause all sorts of other problems also and
would be a relatively infrequent occurance.

This is production level hardware you are talking about, right?

Nah, not a hardware problem. First rule of debugging: It is the software.

:-)

|> >
|> > My advice is to avoid sharing memory, especially anything involving
|> > pointers, among threads. In fact, my advice is to avoid threads.
|> > Use fork() instead.
|> >
|> >
|> > --
|> > mac the naïf
|>
|>

--

Del Cecchi
cec...@us.ibm.com
vlsi circuit technology
rochester, MN

Del Cecchi

unread,

Jan 9, 2002, 9:27:38 AM1/9/02

to

In article <B%O_7.40589$fe1.7...@bgtnsc06-news.ops.worldnet.att.net>,

"Peter Olcott" <olc...@worldnet.att.net> writes:
|> > I don't read code, sorry. But any properly functioning memory
|> > controller serializes the memory writes, or in the case of a cache,
|>
|> Even in the case of two or more physical processors sharing
|> the same memory space?

Absolutely Positively. At least every one I have ever looked at.

|>
|> > maybe multi-ports it. In any case in some sense it is impossible for
|> > the writes to actually be simultaneous. Of course one could have poorly
|> > designed hardware with bus arbitration problems or some such.
|> > Metastability can make things act funny as can timing violations.
|> >
|> > Summary: Hardware is broken or the software isn't doing what you think
|> > it is.
|> >
|> > del cecchi
|> >
|> >
|>
|>

--

Del Cecchi
cecchi@rchland

Mark Johnson

unread,

Jan 9, 2002, 9:36:16 AM1/9/02

to

Peter Olcott wrote:
>
> [snip] Basically I am telling all my
> users that this is a ReadOnly function. With all the functions
> capable a writing, it is their responsibility to handle the
> multiple concurrent access.

But its not a "read only" function. It changes the value of String when
executed. It also requires a host of preconditions to be true (String is
a valid pointer, NextByte is a "reasonable value", ...). All those other
functions in your class can affect the way this function works in the
ways I described. Mutexes or not, any use of those other functions while
this function is active can invalidate those preconditions and cause
this one to fail.

--Mark

Peter Olcott

unread,

Jan 9, 2002, 9:41:55 AM1/9/02

to

> > [snip] Basically I am telling all my
> > users that this is a ReadOnly function. With all the functions
> > capable a writing, it is their responsibility to handle the
> > multiple concurrent access.
>
> But its not a "read only" function. It changes the value of String when

Yet apparently according to computer hardware experts this
can not ever make any possible difference. It can't make a
difference because every single access to memory is always
synchronized to occur sequentially. This is a function of the
hardware, and can not ever be superceded by any software.

> executed. It also requires a host of preconditions to be true (String is
> a valid pointer, NextByte is a "reasonable value", ...). All those other
> functions in your class can affect the way this function works in the
> ways I described. Mutexes or not, any use of those other functions while
> this function is active can invalidate those preconditions and cause
> this one to fail.

All of those other cases can be assumed away.

> --Mark

Peter Olcott

unread,

Jan 9, 2002, 9:51:30 AM1/9/02

to

"Arnold Hendriks" <a.hen...@b-lex.com> wrote in message news:a1h1fr$pug$1...@news.btcnet.nl...

> Peter Olcott <olc...@worldnet.att.net> wrote:
> > Every other answer was about some operating system protocol,
> > or other rule-of-thumb, or conventional standard.
> Because for a lot of people, standards matter. Who knows what may change
> in the future that destroys all the assumptions you've made? Either base
> your choices on a standard, and you can guarantee portability to all
> sytems that implement this standard, or forego your goal of complete
> portability.
>
> That's how standards work. That's how portability works. Assuming hardware
> works in a certain way, and will always work that way in the future, will
> come back to haunt you in the future. If you don't wan that, stick to
> standards. That's why they're there.

According to the hardware experts, the kind of design that would
give me a problem would be considered bad design, and never
be built.

> > I am trying to write an implementation of std::string that
> Seeing the state of your code, you have a lot of other issues to resolve
> anyway before you should be bothering about speeding up the class in
> a high-concurrency environment. Why not tackle one problem at a time?
> Perhaps the speed problem you're worrying about has already vanished by
> the time you properly implemented all other semantics of std::string.
> (Premature optimization...)

If you see any problems with my current code, please feel free
to make comments. This code has already gone through
several levels of peer review, and many changes have already
been made. More than this, it has beaten the next fastest
implementation of std::string STLport on the data movement
benchmarks by an average of double. It has beaten everyone
else by a much larger factor.

I think that in some relatively rare cases this whole idea of
premature optimization is fundamentally incorrect. The best
possible optimization can only occur if it is forms the design
criteria at every stage of the design, and not merely tacked
on as an after thought, at the end. Although it may take three
times as long, it results in code of at least ten-fold better
performance. For most things this makes no difference.
It does not matter if an application program completes your
calculations in 1/10,000 of a second or ten time slower at
1/1,000 of a second. For some things it does matter very much.

Rupert Pigott

unread,

Jan 9, 2002, 10:01:57 AM1/9/02

to

Peter Olcott <olc...@worldnet.att.net> wrote in message

news:meY_7.241405$WW.13...@bgtnsc05-news.ops.worldnet.att.net...
[SNIP]

> > RTFM. Go to SGI's website and pull out their documentation on
> > their STL implementation. It's fairly precise about what is
> > guaranteed and what isn't. :)
>
> I already did this, and this single precise point is the one thing
> that is still unspecified.

Bummer... Me being Mr Boring, if something isn't specified as
thread-safe I would assume that is is not to be on the safe
side ? :)

[SNIP]

> > As a general point, I would seriously consider using concurrency
> > controls on that piece of code anyways. AFAIK two processes
> > running on two different processors banging the same memory
> > location at the same time is *NOT* something that C gives any
> > strong guarantees on.
>
> The reason that I posted to the hardware groups is I want
> to know what kind of guarantees that the bare metal gives
> on this because this bare metal is the ultimate arbiter.

My advice : Put big warnings around any code which makes
assumptions about behaviour which is "undefined" by the
language spec. That way people have a chance to sort problems
quickly should your assumptions come unstuck one day.

> > I don't have a copy of the ANSI C++ spec to hand at the
> > moment, and I suspect to answer this to my satisfaction it
> > would need a pretty thorough perusal. However my cautious
> > instinct tells me USE CONCURRENCY CONTROLS for the
> > following reasons :
>
> This is a physical impossibility thus completely out of the
> question. The reason that I don't say why I need the
> question answered is it gets into an endless debate of
> my design criteria. These design criteria are 100% utterly
> immutable.

So you're having to produce a portable thread-safe library
with no concurrency control facilities available to you.

Nice requirements, can't say I envy you. While I admire your
resolution, it's not a crime to re-evaluate your design
criteria. You may also find that people are able to spot
opportunities and mistakes in your design criteria which
you have missed.

[SNIP]

> Although it may be undefined behavior according to
> any language standard, apparently according to two
> hardware experts it is completely defined at the hardware

Yes, but you miss the point. If you write code whose
behaviour is classed as "undefined" by the language standard
and therefore it's behaviour is not guaranteed, regardless of
the hardware experts, hardware or compilers. I'm being an
arse about this because I've spent a significant amount of
time untangling problems caused by such code.

> level. Every single read or write to memory is always
> completely synchronized to occur sequentially, it is
> physically impossible to occur otherwise. Given this

Um, maybe on a x86 with gcc -O0. I wouldn't like to stick
my neck out for the other bazillion hardware/compiler/option
combos out there.

> single premise, I can know for sure that there can not
> possibly be any problem with my c_str() function
> on any possible machine architecture.

"on any possible machine architecture". Have you got a
proof for that ? ;)

> > I assume you're trying to play fast & loose on the
> > thread-safety thing to save some processor cycle. I'd
> > recommend that you profile some code which makes heavy
> > use of your STL library and see what kind of impact the
> > explicitly thread-safe version makes. ie: Is it really
> > worth writing broken code to save the few cycles ?
>
> It is not broken code according to the hardware people.
> If a sequence can not ever possibly occur, then its not
> wise to protect against this impossible sequence.

There is a huge difference between these two things :
A) Two hardware bods (for whom I have a great deal of respect),
saying that in your particular case, to the best of their
knowledge there shouldn't be a problem.

B) "a sequence can not ever possibly occur"

Looking at your code, Alex Colvin is right... The
String[NextByte]=0 line is broken. You can't modify a
non-mutable datamember in a const method.

I'm having trouble understanding how you can guarantee
that c_str() is going to work while
operator+=( const FastString& source ) method is running.
It looks to me as if you could NULL terminate your string
in the middle of it being copied (bleah), or even worse
you could write to a freed chunk of memory should it
reallocate (bleah bleah).

Maybe I've missed some ultra1337 trick or there are
going to be some external unwritten(bad) usage restrictions
you'll impose upon it.

My fortune cookie says that you are going to live in
Interesting Times. ;)

Cheers,
Rupert

P.S. : Good to see you using static_cast.

Rob Young

unread,

Jan 9, 2002, 10:02:26 AM1/9/02

to

In article <a1hjuo$t7e$1...@news.rchland.ibm.com>, cec...@signa.rchland.ibm.com (Del Cecchi) writes:

> |>
> |> > Although sometimes it's not a hardware problem only because the hardware
> |> > is defined to work that way.
>
> Nah, not a hardware problem. First rule of debugging: It is the software.
>
> :-)
>

And the corollary:

"It isn't a network problem, it must be a system problem."

So you must prove it is a network problem... hence networking
has a good thing going... "free" troubleshooting. Been there
, done that time and time again.

Rob

CBFalconer

unread,

Jan 9, 2002, 10:03:03 AM1/9/02

to

Peter Olcott wrote:
>
... snip ...

>
> So let me verify the most crucial point that you, and no one
> else has made. All memory reads, and all memory writes are
> always synchronized on every machine all the time?
>
> In other words is not physically possible for any
> machine to physically write to the same address at
> the same time, and any attempt under any conditions
> will always, each and every time be synchronized?

In any machine I ever heard or conceived of. However, I am *not*
omniscient, as has been shown several times before.

It would be possible to design a system where this doesn't hold,
however it would be useless.

Nick Maclaren

unread,

Jan 9, 2002, 10:34:59 AM1/9/02

to

In article <a1hjuo$t7e$1...@news.rchland.ibm.com>,
cec...@signa.rchland.ibm.com (Del Cecchi) writes:
|> |>

|> |> Is there any hardware platform where writing the same zero
|> |> value to the same byte address could ever result in something
|> |> besides zero being written in a multiple processor concurrency
|> |> environment?
|>

|> Obviously no one can know this for sure for every single hardware platform ever
|> made. But let us break the problem down.
|>

|> The only way one could get data that is different from what either process is
|> writing is if the hardware is broken. I will type this slowly so you can
|> understand. B R O K E N.
|>
|> It may be broken in design, although I know of no hardware that is. A
|> mis-designed synchronizer in a piece of logic with asynchronous inputs could
|> possibly cause this, but it would cause all sorts of other problems also and
|> would be a relatively infrequent occurance.
|>
|> This is production level hardware you are talking about, right?

The systems that I heard of having problems with all this were
NOT what I would call production level hardware. Back around 1980,
there were a fair number of experimental systems which attached
multiple CPUs to some kind of a bus which then was attached to the
memory. I am pretty sure that some of them could fail in these
circumstances, but cannot remember which.

As far as I can recall, the symptoms were never that the wrong
value got written, but more that the system would hang, interrupt
or simply lose BOTH writes. The problems about dealing with such
issues when building a (non-coherent) SMP system out of cheap
chips from several vendors, and a large amount of string, sealing
wax and chewing gum were reasons that such systems went out of
favour, even as experimental designs.

When low-end SMP came back again a decade later, it was all done
a lot more professionally.

The reason that the language standards forbid this is that there
ARE systems where storing to different, larger objects will end
up with a junk object that is made of up some bytes from either
object. And, quite rightly, the language standards simply say
that such things are undefined behaviour, because they don't want
to waste time defining the effect of a half-broken program.

Dave Hansen

unread,

Jan 9, 2002, 10:48:00 AM1/9/02

to

On 9 Jan 2002 14:26:32 GMT, cec...@signa.rchland.ibm.com (Del Cecchi)
wrote:

[...]

>Obviously no one can know this for sure for every single hardware platform ever
>made. But let us break the problem down.
>
>Uniprocessors: Impossible to actually do the two writes at the same actual time.
>
>SMP: The writes to memory get serialized by the memory controller or bus
>arbitration (if there is a bus involved, and there is if you go deep enough)
>
>CCNUMA see above SMP
>
>DSM, software coherence. You could get one or the other, or even have different
>processors believing that a location contains different data. This is still a
>software problem.
>
>The only way one could get data that is different from what either process is
>writing is if the hardware is broken. I will type this slowly so you can
>understand. B R O K E N.

I worked on a system about 8 years ago with an IDT (I think) dual-port
RAM chip between an 80186 and an 8051. The spec on the RAM said you
absitively posolutely should *not* access the same location from both
sides simultaneously. Even both sides reading could cause not only a
bad read value, but change the value in the location as well.

Fortunately, the chip also had what they called "semaphores" that
could be used at the software level to provide mutual exclusion. Each
side would "request" a semaphore by writing a 0, spin lock until the
value of the semaphore reads as zero, then "release" the semaphore by
writing a 1. IIRC, the chip had 4K bytes and 8 semaphores.

Of course, I don't really think the OP would really want to worry
about this particular target, especially with a C++ string library...

Regards,

-=Dave
--
Change is inevitable, progress is not.

Maxim S. Shatskih

unread,

Jan 9, 2002, 8:45:57 AM1/9/02

to

> that SGI STL provides. This thread safety guarantee basically
> say we are not going to do anything to help you make this
> code work in a threaded environment

NT (which will be the most popular OS in next years) and advanced UNIXen like Solaris (and yes, even Linux has __clone(), has
pthreads mapped to __clone() and is a most popular UNIX for now) all have threads.
Limiting the users to single-threaded version is really amazing under this circumstances.
MS suggested to stop writing _even the UI components_ (OCXes) as single-threaded since long ago, around 96 or 97.

I would suggest to have the MT-safe code with locking implemented by macros expanded in platform-specific way - to no-op on
single-threaded platforms (ancient-style UNIXen). The classic mutex is all you need - even a non-recursive one is OK.

It will not cause any significant development cost increase, if you have any good Win32 programmers in your stuff. :-)

Max

Peter Olcott

unread,

Jan 9, 2002, 11:03:33 AM1/9/02

to

> > So let me verify the most crucial point that you, and no one
> > else has made. All memory reads, and all memory writes are
> > always synchronized on every machine all the time?
> >
> > In other words is not physically possible for any
> > machine to physically write to the same address at
> > the same time, and any attempt under any conditions
> > will always, each and every time be synchronized?
>
> In any machine I ever heard or conceived of. However, I am *not*
> omniscient, as has been shown several times before.
>
> It would be possible to design a system where this doesn't hold,
> however it would be useless.
>
> --
> Chuck F

That seems to be a pretty definitive statement. Another
tangential issue has arisen. Is there any machine architecture
where this would necessarily be much slower in a multi-processor
or threaded environment? In other words the hardware would
invoke some special operation that slows everything down?

Peter Olcott

unread,

Jan 9, 2002, 11:04:33 AM1/9/02

to

> |> > I don't read code, sorry. But any properly functioning memory
> |> > controller serializes the memory writes, or in the case of a cache,
> |>
> |> Even in the case of two or more physical processors sharing
> |> the same memory space?
>
> Absolutely Positively. At least every one I have ever looked at.

Thanks

Peter Olcott

unread,

Jan 9, 2002, 11:08:13 AM1/9/02

to

> |> I want to know that it is possible or impossible. When I use these
> |> words, I mean them with complete literal precision. I am thus assuming
> |> the case where the programmer does everything possible from a
> |> programmer's point of view to force this condition to occur.
> |> The only things that are disallowed are physical alterations to the
> |> hardware.
>
> The answer is "perhaps". There have been a fair number of systems
> where that could result in one or more of the following:
>
> One or both CPUs hanging
> One or both CPUs interrupting (e.g. SIGILL)
> Both writes being cancelled
>
> Theoretically, I can imagine something else being written, but
> I have never heard of it except as a result of one of the above
> events.
>
>
> Regards,
> Nick Maclaren,

The answer perhaps seems to signify that you are uncertain
of the degree of your knowledge of this. I have received a
number of not possible, answers. Since you seem uncertain
of your own answer, and I have no other basis than what
you have said, I must for the time being de-emphasize the
weight of this answer. I will count it as a tentative no vote.

Peter Boyle

unread,

Jan 9, 2002, 11:15:13 AM1/9/02

to

> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }
>

> I want to know if there can be any possible problem
> with the above function taken in isolation from
> every other possible function, in either a multi-threaded
> or multi-processor environment, such that multiple
> concurrent access to the above function, pertaining
> to the same data could cause errors in the absence
> of any locking mechanism.

What is, where is, Nextbyte, and how does it get assigned.
Is it thread local or global, does it have the same value
on the two threads.
Ditto string.
As others have observed if Nextbyte is not identical between
the two threads, on certain machines read-modify-write
issues apply on non-byte addressable (but still cache coherent)machines.

In lockless threaded code software bugs are number one
suspect by a mile, and you will always have to pretty rigorously
prove it isn't sfw. State of string before, state of
string after, value of Nextbyte in each thread,
(perhaps even arrange for a variety of time orderings
- hack the assembler output to insert scheduler calls
between the r-m-w sequence etc... )

> I am guessing that this is a computer hardware problem,

> and the problem is whether or not two completely
> simultaneous attempts to update this exact same address
> with zero could collide with each other such that a
> value other than zero could be written.

System busses are not ethernet - generally contention is the arbiter's
job!

> Remember this is two processes writing the same value
> (ASCII Zero) to the exact same address, there are

So NextByte is identical?

> no other possibilities. I only want to consider the
> concurrent use of the above function in complete
> isolation from any and all other functions.

Oh, to have that luxury!

Peter

Peter Olcott

unread,

Jan 9, 2002, 11:26:40 AM1/9/02

to

> Bummer... Me being Mr Boring, if something isn't specified as
> thread-safe I would assume that is is not to be on the safe
> side ? :)

No most often these little subtle details merely escape the
person whom wrote the spaces far less than omniscient
perspective.

> > The reason that I posted to the hardware groups is I want
> > to know what kind of guarantees that the bare metal gives
> > on this because this bare metal is the ultimate arbiter.
>
> My advice : Put big warnings around any code which makes
> assumptions about behaviour which is "undefined" by the
> language spec. That way people have a chance to sort problems
> quickly should your assumptions come unstuck one day.

First I must ascertain the degree of certainty that it will
ever be a problem. So far at least two computer hardware
experts have said that if this is ever a problem, it would have
to be on a machine with such faulty design that the
machine itself would not be capable of doing anything
useful any way. IN other words they said it would never
happen.

> > This is a physical impossibility thus completely out of the
> > question. The reason that I don't say why I need the
> > question answered is it gets into an endless debate of
> > my design criteria. These design criteria are 100% utterly
> > immutable.
>
> So you're having to produce a portable thread-safe library
> with no concurrency control facilities available to you.
>
> Nice requirements, can't say I envy you. While I admire your
> resolution, it's not a crime to re-evaluate your design
> criteria. You may also find that people are able to spot
> opportunities and mistakes in your design criteria which
> you have missed.

Its already been hashed out to oblivion.

> Yes, but you miss the point. If you write code whose
> behaviour is classed as "undefined" by the language standard
> and therefore it's behaviour is not guaranteed, regardless of
> the hardware experts, hardware or compilers. I'm being an
> arse about this because I've spent a significant amount of
> time untangling problems caused by such code.

If it can not ever occur in any hardware. If it is completely
impossible, then the language spec is merely missing a
detail, thus the language spec errs in not having this detail.

> > level. Every single read or write to memory is always
> > completely synchronized to occur sequentially, it is
> > physically impossible to occur otherwise. Given this
>
> Um, maybe on a x86 with gcc -O0. I wouldn't like to stick
> my neck out for the other bazillion hardware/compiler/option
> combos out there.

Hardware designers already have stuck their neck out on
this issue, saying that any architecture where this would
be a problem would be a completely useless architecture
that would never be built.

> > single premise, I can know for sure that there can not
> > possibly be any problem with my c_str() function
> > on any possible machine architecture.
>
> "on any possible machine architecture". Have you got a
> proof for that ? ;)

One can not possibly have any proof of future events.
I have substantial evidence in terms of expert opinion.

> > It is not broken code according to the hardware people.
> > If a sequence can not ever possibly occur, then its not
> > wise to protect against this impossible sequence.
>
> There is a huge difference between these two things :
> A) Two hardware bods (for whom I have a great deal of respect),
> saying that in your particular case, to the best of their
> knowledge there shouldn't be a problem.
>
> B) "a sequence can not ever possibly occur"

It would result in a completely useless architecture if it did
occur.

> Looking at your code, Alex Colvin is right... The
> String[NextByte]=0 line is broken. You can't modify a
> non-mutable datamember in a const method.

Not according to experts on the C++ standards committee.
The distinction is logical versus physical constness, or
something like that. I have not yet exactly memorized
all the terminology.

> I'm having trouble understanding how you can guarantee
> that c_str() is going to work while
> operator+=( const FastString& source ) method is running.

Very simple c_str() is guaranteed not to be running while
operator+= is running on this same data.

> It looks to me as if you could NULL terminate your string
> in the middle of it being copied (bleah), or even worse

Can't happen, guaranteed that no other functions except
possibly another instance of c_str() itself will be running
against this same data.

> you could write to a freed chunk of memory should it
> reallocate (bleah bleah).
>
> Maybe I've missed some ultra1337 trick or there are
> going to be some external unwritten(bad) usage restrictions
> you'll impose upon it.

There is external unwritten synchronization code the will
be the user's responsibility to write. I am shooting for
the same thread-safety guarantee that SGI STL provides,
thus the ONLY remaining issue of the thread safety of
my FastString, is multiple concurrent instances of c_str().
ALL the other issues have been resolved by explicitly
making them the responsibility of the user.

Peter Olcott

unread,

Jan 9, 2002, 11:28:07 AM1/9/02

to

> NT (which will be the most popular OS in next years) and advanced UNIXen like Solaris (and yes,
even Linux has __clone(), has
> pthreads mapped to __clone() and is a most popular UNIX for now) all have threads.
> Limiting the users to single-threaded version is really amazing under this circumstances.
> MS suggested to stop writing _even the UI components_ (OCXes) as single-threaded since long ago,
around 96 or 97.
>
> I would suggest to have the MT-safe code with locking implemented by macros expanded in
platform-specific way - to no-op on
> single-threaded platforms (ancient-style UNIXen). The classic mutex is all you need - even a
non-recursive one is OK.
>
> It will not cause any significant development cost increase, if you have any good Win32
programmers in your stuff. :-)

Great idea for a future enhancement, yet totally out of the scope
of the current project.

Peter Olcott

unread,

Jan 9, 2002, 11:40:57 AM1/9/02

to

> |> > With the STL strings, the storage returned by c_str() isn't guaranteed to
> |> > stick around when you do things to the string that owns it.
> |>
> |> I want to know more about this. Can you cut and paste
> |> me some quotes, or give me a link?
>
> You are pretty pushy for a stranger here. Why are you so sure that your code is
> correct?

My request is a totally different issue, not, pertaining
to the correctness of my code, but, pertaining to the
correctness of the std::string that I am implementing.
I have a completely separate issue than the one that I
brought up here, and if std::string errs, then this helps
to resolve another issue. There is no sense in exactly
replicating erroneous behavior.

>
> Obviously no one can know this for sure for every single hardware platform ever
> made. But let us break the problem down.
>
> Uniprocessors: Impossible to actually do the two writes at the same actual time.
>
> SMP: The writes to memory get serialized by the memory controller or bus
> arbitration (if there is a bus involved, and there is if you go deep enough)
>
> CCNUMA see above SMP
>
> DSM, software coherence. You could get one or the other, or even have different
> processors believing that a location contains different data. This is still a
> software problem.
>
> The only way one could get data that is different from what either process is
> writing is if the hardware is broken. I will type this slowly so you can
> understand. B R O K E N.
>
> It may be broken in design, although I know of no hardware that is. A
> mis-designed synchronizer in a piece of logic with asynchronous inputs could
> possibly cause this, but it would cause all sorts of other problems also and
> would be a relatively infrequent occurance.
>
> This is production level hardware you are talking about, right?
>
> |>
> |> This above question (and follow ups) forms the total and
> |> complete full scope of every single detail of my complete
> |> investigation.
> |>
> |> > Although sometimes it's not a hardware problem only because the hardware
> |> > is defined to work that way.
>
> Nah, not a hardware problem. First rule of debugging: It is the software.

Yet you not only just proved that it was a hardware problem,
yet provided the answer that I was looking for. The problem
is that we used the word "problem" in two different contexts.

When I used the word problem, I meant like in algebra,
the answer comes from the hardware, not from the software.
When you used the word "problem" you meant something is
wrong, there is an error somewhere. If you substitute my
usage of the word "problem" with the word "answer" then
my point becomes clear.

Since I must ONLY concern myself with the simultaneous
concurrent invokations of c_str() itself taken in complete
isolation from everything else, then the answer that no
invocation of c_str() can effect any other invocation of
c_str() does come from the way that hardware is designed.

c_str() will not be able to interfere with itself such that
the ASCII zero is written as any other value other than
ASCII zero.

Peter Olcott

unread,

Jan 9, 2002, 11:42:25 AM1/9/02

to

"Rob Young" <you...@encompasserve.org> wrote in message
news:jSKPGF...@eisner.encompasserve.org...

Problem as in algebra, this question needs and answer,
not problem in there is something wrong, there is an
error somewhere.

Peter Olcott

unread,

Jan 9, 2002, 11:45:08 AM1/9/02

to

> I worked on a system about 8 years ago with an IDT (I think) dual-port
> RAM chip between an 80186 and an 8051. The spec on the RAM said you
> absitively posolutely should *not* access the same location from both
> sides simultaneously. Even both sides reading could cause not only a
> bad read value, but change the value in the location as well.
>
> Fortunately, the chip also had what they called "semaphores" that
> could be used at the software level to provide mutual exclusion. Each
> side would "request" a semaphore by writing a 0, spin lock until the
> value of the semaphore reads as zero, then "release" the semaphore by
> writing a 1. IIRC, the chip had 4K bytes and 8 semaphores.
>
> Of course, I don't really think the OP would really want to worry
> about this particular target, especially with a C++ string library...
>
> Regards,
>
> -=Dave
> --
> Change is inevitable, progress is not.

So it is possible, yet, not likely.

Maxim S. Shatskih

unread,

Jan 9, 2002, 11:55:34 AM1/9/02

to

> Since I must ONLY concern myself with the simultaneous
> concurrent invokations of c_str() itself taken in complete

Also - maybe the compiler is buggy?
Wrong optimization in the outer context (the function is inline, so, this is possible).
Try to declare without "inline" and see what will occur.
Then try on other machine of the similar type.

Max

Rupert Pigott

unread,

Jan 9, 2002, 11:59:45 AM1/9/02

to

Peter Olcott <olc...@worldnet.att.net> wrote in message

news:4b__7.241595$WW.13...@bgtnsc05-news.ops.worldnet.att.net...
[SNIP]

> > Looking at your code, Alex Colvin is right... The
> > String[NextByte]=0 line is broken. You can't modify a
> > non-mutable datamember in a const method.
>
> Not according to experts on the C++ standards committee.
> The distinction is logical versus physical constness, or
> something like that. I have not yet exactly memorized
> all the terminology.

I'm gonna have to re-read all that stuff. Obviously I've
replaced all the useful bits with chewed up wood shavings. ;)

> > I'm having trouble understanding how you can guarantee
> > that c_str() is going to work while
> > operator+=( const FastString& source ) method is running.

> There is external unwritten synchronization code the will

> be the user's responsibility to write. I am shooting for
> the same thread-safety guarantee that SGI STL provides,
> thus the ONLY remaining issue of the thread safety of
> my FastString, is multiple concurrent instances of c_str().
> ALL the other issues have been resolved by explicitly
> making them the responsibility of the user.

Righty Ho.

So... In essence what you're doing is producing an analog of
an STL class, with the sole difference that it guarantees
thread safety for one method in it... Providing that you
adhere to some external usage rules and you're comfortable
with a bunch of assumptions about hardware and the way
compilers work.

Well, at least you're putting some leg work into researching
it, even if it does make my toes curl with horror.

Cheers,
Rupert

Maxim S. Shatskih

unread,

Jan 9, 2002, 12:20:53 PM1/9/02

to

> > Looking at your code, Alex Colvin is right... The
> > String[NextByte]=0 line is broken. You can't modify a
> > non-mutable datamember in a const method.
>
> Not according to experts on the C++ standards committee.
> The distinction is logical versus physical constness, or
> something like that. I have not yet exactly memorized
> all the terminology.

From Stroustrup/Ellis:
- "const" does not mean "place to read only memory", nor it means "compile time constant".
- nevertheless, a "const" member function has "this" as "const C*", so, modifying anything growing from "this" is disallowed in it.

Max

Stephen Fuld

unread,

Jan 9, 2002, 12:30:01 PM1/9/02

to

"Peter Olcott" <olc...@worldnet.att.net> wrote in message

news:meY_7.241405$WW.13...@bgtnsc05-news.ops.worldnet.att.net...
>

snip

>
> Although it may be undefined behavior according to
> any language standard, apparently according to two
> hardware experts it is completely defined at the hardware
> level. Every single read or write to memory is always
> completely synchronized to occur sequentially, it is
> physically impossible to occur otherwise. Given this
> single premise, I can know for sure that there can not
> possibly be any problem with my c_str() function
> on any possible machine architecture.

You might have missed the impact of Terje's comments earlier. While this is
true if you can guarantee that what you are writing is truely hardware
memory, in an arbitrary hardware design you cannot guarantee that. It would
be entirely possible (quite easy even) to have a memory mapped address be
some kind of hardware device that did almost anything. For example, write
one bits to all of physical memory (thus overwriting your program), sending
an e-mail, powering off the whole system, etc. Now I am not commenting on
the utility of any of these, or their commonness, but you did said "any
possible machine architecture".

--
- Stephen Fuld
e-mail address disguised to prevent spam

CBFalconer

unread,

Jan 9, 2002, 1:47:23 PM1/9/02

to

Peter Olcott wrote:
>
> "Rupert Pigott" <Dark...@btinternet.com> wrote in message
> > >
... snip ...

>
> Although it may be undefined behavior according to
> any language standard, apparently according to two
> hardware experts it is completely defined at the hardware
> level. Every single read or write to memory is always
> completely synchronized to occur sequentially, it is
> physically impossible to occur otherwise. Given this
> single premise, I can know for sure that there can not
> possibly be any problem with my c_str() function
> on any possible machine architecture.
>

> > I assume you're trying to play fast & loose on the
> > thread-safety thing to save some processor cycle. I'd
> > recommend that you profile some code which makes heavy
> > use of your STL library and see what kind of impact the
> > explicitly thread-safe version makes. ie: Is it really
> > worth writing broken code to save the few cycles ?
>

> It is not broken code according to the hardware people.
> If a sequence can not ever possibly occur, then its not
> wise to protect against this impossible sequence.

If you think about a physical write mechanism, you will see that
any attempt to write from more than one source requires a physical
connection at some point. These would conflict at least some of
the time, and eventually destroy the actual driving gates.
Meanwhile the value written would be indeterminate. Nobody in
their right mind would ever design such a thing, and if they did
it wouldn't live long.

This is *NOT* the same thing as a wired or.

At any rate, I still would not trust concurrent use of software
that did not use suitable concurrency protocols. Not too many
years ago many C libraries were broken because of (relatively)
global variables. There are still implied globals, such as data
FILE*s point to.

If you HAVE to write a zero, that implies somewhere, sometime, in
some continuum, something wrote a non-zero. If that process is
concurrent, there are NO, none, zip, blech, guarantees about its
timing, including 'before' and 'after'. That is why we have
synchronization primitives, such as 'test and set'. Providing
those, on some hardware, is non-trivial. With core memory 'test
and reset' was easy - just reading the value reset it, so the
problem was inverted to provide a 'read modify write' cycle. I
don't see much core memory around today.

Come to think of it, I often still don't trust most software :-)

Mark Johnson

unread,

Jan 9, 2002, 1:46:32 PM1/9/02

to

Peter Olcott wrote:
> [snip]

> > But its not a "read only" function. It changes the value of String when
>
> Yet apparently according to computer hardware experts this
> can not ever make any possible difference. It can't make a
> difference because every single access to memory is always
> synchronized to occur sequentially. This is a function of the
> hardware, and can not ever be superceded by any software.

Let me give you a real world example from the system we are currently
implementing. We have a high speed interface card and driver that maps
remote memory into the address space of the local PC. One of the
operations implemented by that card when reading a location is a "load,
increment, store, return old value". With this and a store zero
operation, you can implement a mutex. For operations on a remote PC's
memory, this works well.

However, you can also map the local PC's memory through this card as
well. If you use the card to perform both operations (incrementing read,
store zero), everything works OK. If you use the card to do the
incrementing read and the PC to do the store zero, you can end up with
deadlock. The reason is the PC's store zero occurs between the card's
read and write cycles. So, yes - every access is sequential, but without
serialization in the card, it caused problems. When we did this
[incorrectly] with our application a deadlock would occur within
minutes.

Based on that kind of real world experience, I emphasize the need for
guarantees that the necessary preconditions are in place prior to
running a function - even a simple one like your c_str function. I
consider safety and correctness far more important than any efficiency
argument in the systems I deliver. The customer cares first that it
works in ALL cases, so I do to.
--Mark

Anne & Lynn Wheeler

unread,

Jan 9, 2002, 2:32:17 PM1/9/02

to

"Peter Olcott" <olc...@worldnet.att.net> writes:

> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }

(w/o looking at the rest of the code) depending on the compiler,
machine, etc .... the compiler could be generating code that loads
NextByte in a register and keeps it there for extended period of time.
If NextByte is being changed (increment, decrement, etc), it is
possible that multiple processors wouldn't necessarily see changes
that each other are doing and NextByte gets "out-of-sync" ... so the
different processors are now working with different NextByte values.

Then on lots of machines, one byte storage alterations require
fetching a word (or more), updating the specific byte and then storing
back the whole unit of storage.

If processor-1 was working with NextByte=71 and the processor-2 with
NextByte=70;

* proc1 fetches the word for byte 71,
* proc2 fetches the same word for byte 70,
* proc1 updates the byte and stores the word back,
* proc2 update the byte and stores the word back ...

wiping out the change done by proc1.

aka the problem can be a combination of

1) compiler not generating code to maintain NextByte syncronized in
multiple processors (i.e. keeping values in local registers & no
serialization of value cross processors) and

2) the way processors may do storage alterations on smaller than
minimum storage fetch/store unit.

--
Anne & Lynn Wheeler | ly...@garlic.com - http://www.garlic.com/~lynn/

Alexander Terekhov

unread,

Jan 9, 2002, 2:34:59 PM1/9/02

to

Dave Hansen wrote:
[...]

> Of course, I don't really think the OP would really want to worry
> about this particular target, especially with a C++ string library...

OP should really worry about undefined/illegal behavior
and consequences of breaking the memory synchronization/
visibility rules (POSIX/4.10), and should rather carefully
consider the following recommendation from a well known
chip manufacturer:

"IA-32 Intel ® Architecture
Software Developer's
Manual
.
.
.
Despite the fact that Pentium 4, Intel Xeon, and P6 family
processors support processor ordering, Intel does not guarantee
that future processors will support this model. To make software
portable to future processors, it is recommended that operating
systems provide critical region and resource control constructs
and API's (application program interfaces) based on I/O, locking,
and/or serializing instructions be used to synchronize access to
shared areas of memory in multiple-processor systems. "

regards,
alexander.

Steve Watt

unread,

Jan 9, 2002, 2:32:18 PM1/9/02

to

In article <PSJ_7.39991$fe1.7...@bgtnsc06-news.ops.worldnet.att.net>,
Peter Olcott <olc...@worldnet.att.net> wrote:
[ the same question he asked in comp.programming.threads earlier
about lazy null termination ]

It has been repeatedly pointed out to you that it violates the POSIX
memory visibility standard to do what you are doing. Furthermore,
you are now demanding that we consider only this single function
in isolation, and not view it as part of a system.

If you want to be maximally portable, which you claim, you may only
have a single thread write to a memory location without some kind
of synchronization. If nobody else has been able to posit a hardware
environment where it would fail, I will: Some RAM controllers can
return bogus data if a read and a write at the same address occur
simultaneously. Or a memory cell where writing a value requires
"erasing" to all ones and then the value you wish (think flash).

On any modern architecture, writing one extra byte (the null) to
the cache while you own the lock will have such minimal impact that
the affect will be swamped by branch mispredictions and cache effects
from not having your data and code perfectly laid out relative to
the cache lines.

[ flame follows ]
As has been pointed out before, you are attempting a silly
micro-optimization that will a: have no or very minimal visible
effect on performance, and b: violates the POSIX memory visibility
rules. Stamping your feet and saying "I want it this way because
it'll be faster" doesn't change those *facts*.

And now, I will do with this thread what I should've the first time.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Internet: steve @ Watt.COM Whois: SW32
Free time? There's no such thing. It just comes in varying prices...

Del Cecchi

unread,

Jan 9, 2002, 3:39:18 PM1/9/02

to

In article <3C3C9B63...@web.de>,

|> "IA-32 Intel Ž Architecture

Please note that the behaviour discussed was not that one or the other write
happened, but that some other data entirely that matched neither write happened.

This has nothing to do with processor ordering vrs weak ordering vrs strong
ordering. In a properly designed processor, one or the other write should occur
last.

Of course the memory system operates on physical or real addresses, and many
modern processes throw address spaces and pages around like they were popcorn.
--

Del Cecchi
cecchi@rchland

Alexander Terekhov

unread,

Jan 9, 2002, 4:34:19 PM1/9/02

to

Del Cecchi wrote:
[...]

> Please note that the behaviour discussed was not that one or the other write
> happened, but that some other data entirely that matched neither write happened.
>
> This has nothing to do with processor ordering vrs weak ordering vrs strong
> ordering.

Peter's original post to this group was rather confusing.
His real issue is the design of thread-safe library, and
later he even mentioned it here:

http://groups.google.com/groups?as_umsgid=WcQ_7.40761%24fe1.764498%40bgtnsc06-news.ops.worldnet.att.net&rnum=7

"This thread safety guarantee basically
say we are not going to do anything to help you make this

code work in a threaded environment, all that we can guarantee
is that we won't do anything to screw the threads up, You must
provide all the concurrency control mechanisms yourself."

Now, with his broken "const" c_str method which writes
to some potentially shared object(s) (that might be
protected by some read-write lock, for example), he
DOES "screw the threads up":

http://groups.google.com/groups?as_umsgid=3C3AAA5C.B3841089%40web.de
http://groups.google.com/groups?as_umsgid=3C3C6036.2C7CE683%40web.de
http://groups.google.com/groups?as_umsgid=3C2CE9F9.838A79B9%40web.de

So, he should just get rid of const in his function
declaration which would tell everybody and his dog to
use MUTEX form of synchronization (when invoking any
of his non-const methods on shared "FastString" objects,
if any), or even better, not bother at all with that lazy
'\0' termination and provide a real const c_str method.

regards,
alexander.

David Schwartz

unread,

Jan 9, 2002, 5:08:29 PM1/9/02

to

Peter Olcott wrote:

> Ah, but, you are merely assuming that I am
> doing nothing more than assuming. Actually
> a linked up several computer hardware only
> newsgroups, and am discussing it with them.

That makes no sense, since your question is not about any particular
type of hardware.

DS

David Schwartz

unread,

Jan 9, 2002, 5:09:48 PM1/9/02

to

Peter Olcott wrote:

> > In principle, a machine could write a 'zero' to an address by fetching
> > its current value and then atomically subtracting that value. That
> > wouldn't violate any standard that's applicable in this case.

> Except that in practice is would screw up concurrency,

No, it would screw up your assumptions about concurrency.

> thus would be avoided as a bad circuit design.

Today's bad circuit design is tomorrow's optimization.

> Not
> even counting the fact that this requires twice as much
> work.

Two little steps may be more efficient than one big one. When dealing
with unknown platforms, one can either make assumptions or follow
standards. You are choosing the former.

DS

Rupert Pigott

unread,

Jan 9, 2002, 5:13:07 PM1/9/02

to

Alexander Terekhov <tere...@web.de> wrote in message
news:3C3CB75B...@web.de...
[SNIP]

> So, he should just get rid of const in his function
> declaration which would tell everybody and his dog to
> use MUTEX form of synchronization (when invoking any
> of his non-const methods on shared "FastString" objects,
> if any), or even better, not bother at all with that lazy
> '\0' termination and provide a real const c_str method.

He's assured us that it won't be a problem because all
the applications which use it will conform a particular
model of usage.

However, I think a smarter way for him to do his lazy
termination and thereby avoid all this BS would be to
put the termination in when assignment or modification
of the string occurs. That way it's done ONCE only,
rather than every time his *CONST* method is called.
That also has the side benefit of voiding his whole
read/write dilemma.

However it doesn't void his assertion that c_str's
execution will never overlap concurrent execution of
a mutator. For that he needs to use concurrency
controls at some level.

That comes down to a choice of putting the controls
in the application and trusting everyone to play
ball or putting them in the library and threreby
ensuring it's actually threadsafe.

Cheers,
Rupert

Chris Smith

unread,

Jan 9, 2002, 5:41:39 PM1/9/02

to

Peter Olcott wrote ...
> It must run on machines not even invented yet,

> I am not getting any replies from the guys that
> know this bare metal stuff, and think that's what I need.

These are two conflicting statements. Pick which you want. If you need
the code to run on all possible future machines, you *have* to adhere to
a published standard, *not* rely on hardware knowledge. Without
requiring platforms to adhere to any kind of published standard, you can
*never* guarantee that the implementation will work on any non-existing
hardware or software environment.

As a simple counter-example, I could build a machine whose sole purpose
is to recognize your code and intentionally screw things up royally. You
are ignoring the simple fact that it's impossible to attain the goal
you've set out for yourself. It's possible to write thread-safe code
under POSIX, but it's not possible to write thread-safe code under
"everything". The best you can do is write code that has a high
probability of running, and you've already done some of that.

Here's what you should do:

1. Re-read David's (Schwartz) response where he proves that a hardware
architecture could be designed that causes your simultaneous unlocked
writes of zero to the same memory address to fail (and don't reply that
the architecture looks bad... the point is that it's valid and breaks
your code; you don't know enough about computer engineering in the year
2106 to know if it's bad design or not).

2. Re-read my response where I list in very specific terms the *only* way
you're going to be able to guarantee the kind of thread-safety that you
want.

3. Stop asking for a guarantee that your code will run on all possible
platforms without adhering to any published standard. It's a dumb
question, and has no answer.

Chris Smith

Christian Bau

unread,

Jan 9, 2002, 6:32:25 PM1/9/02

to

Peter Olcott wrote:
>
> > > Is there any hardware platform where writing the same zero
> > > value to the same byte address could ever result in something
> > > besides zero being written in a multiple processor concurrency
> > > environment?
> >
> > No.
> >
> > Not on a working system.
> >
> > On the original Alpha with 32-bit load/store operations the second of
> > those writes might take a while, _if_ it was compiled to guard against
> > simultaneous updates:

>
> I want to know that it is possible or impossible. When I use these
> words, I mean them with complete literal precision. I am thus assuming
> the case where the programmer does everything possible from a
> programmer's point of view to force this condition to occur.
> The only things that are disallowed are physical alterations to the
> hardware.

You need to understand that there are two different levels: The level of
the programming language (C in this case), and the level of the actual
hardware.

On the hardware level, if an instruction is executed that stores a zero
then a zero will be stored. On a multiprocessor system, obviously some
other processor might execute an instruction that stores a one to the
same location. One instruction will execute before the other, and you
will end up with a zero or a one stored, but with nothing else. What is
not necessarily guaranteed is the order of different actions. If
processor A stores 1 into location X and 2 into location Y, and
processor B stores 4 into location Y and then 3 into location X, you may
end up with 1 in location X and 4 in location Y which could come
unexpected.

But there is a different level: You don't know what code the C compiler
will execute. For example, you might have a processor where the smallest
addressable memory unit is larger than a byte. If you try to store a
zero byte, then the compiler might produce code that reads for example
eight bytes into a register, modifies one of the eight bytes, and writes
all the eight bytes back. If one processor tries to set byte X to zero
using this method, and another processor tries to set byte X+1 to zero,
then it is _not_ guaranteed that both bytes will be zero. And you don't
need multiple processors for this at all, the same thing can happen with
a single processor and multiple threads running. And you don't need
fancy hardware for this to happen; if you don't tell the compiler that
some memory location is volatile then it will assume that no other
threads refer to that location, and it can do other things than you
expect.

David Schwartz

unread,

Jan 9, 2002, 6:36:50 PM1/9/02

to

Peter Olcott wrote:

> I have already hashed this into non-existence in other groups.
> Ultimately C++ can't do any more than the hardware can
> do, thus the hardware forms the bottom line.

You seem determined to be deliberately wrong. C++ has nothing
whatsoever to do with hardware so that a properly written C++ program
can run correctly on any hardware. Modern hardware does any number of
things that would have been unthinkable years ago, and it's only
standards compliance that will protect you from future and unknown
hardware.

> Two people
> have told me that every single write to memory is serialized
> to occur synchronously. It is literally physically impossible
> to write to the same location at the same time, even with
> hundreds of physical processors sharing the same memory
> space. If this is the case, then my problem is fully solved.

How does this solve or affect your problem? Again, consider a RISC
platform that features a load and a subtract but no write. One zeroes an
address by loading it and then subtracting that value. Ownership of the
memory address could ping-pong between two processors' local caches
inbetween the reads and subtracts.

Does any hardware that exists right now do this? Not that I know of.
Could this turn out to be the most efficient way to do things in 2017? I
don't know any better than you do.

> My little function taken in isolation can be considered
> a read only function as far as concurrency is concerned,
> even though in actual fact it does write an ASCII zero.

According to what standard? C++? POSIX? WIN32?

DS

Peter Olcott

unread,

Jan 9, 2002, 8:12:40 PM1/9/02

to

> > There is external unwritten synchronization code the will
> > be the user's responsibility to write. I am shooting for
> > the same thread-safety guarantee that SGI STL provides,
> > thus the ONLY remaining issue of the thread safety of
> > my FastString, is multiple concurrent instances of c_str().
> > ALL the other issues have been resolved by explicitly
> > making them the responsibility of the user.
>
> Righty Ho.
>
> So... In essence what you're doing is producing an analog of
> an STL class, with the sole difference that it guarantees
> thread safety for one method in it... Providing that you

Nope, its not that. I make all the write functions the users
responsibility, and tell them that the read functions are
safe ONLY if you are not writing to the same data. This
is the same thread safety guarantee that SGI STL provides.
They can't provide a stronger one than this AND stay portable.
All I gotta know is if c_str() can interfere with itself,
because I'm telling my users that its a ReadOnly function.

Peter Olcott

unread,

Jan 9, 2002, 8:18:05 PM1/9/02

to

"Maxim S. Shatskih" <ma...@storagecraft.com> wrote in message news:a1htvp$1h89$1...@gavrilo.mtu.ru...

Stroustrup page 231.
Occasionally, a member function is logically const, but it still
needs to change the value of a member. To a user, the function
appears not to change the state of its object. However, some
detail that the user cannot directly observe it updated. This is
often called logical constness.

Peter Olcott

unread,

Jan 9, 2002, 8:21:02 PM1/9/02

to

> You might have missed the impact of Terje's comments earlier. While this is
> true if you can guarantee that what you are writing is truely hardware
> memory, in an arbitrary hardware design you cannot guarantee that. It would

I think that I can guarantee this. With each and every hardware
platform, it must always make writes come before reads to
the exact same location in memory, iff this is the way that the
programmer specified them.

Peter Olcott

unread,

Jan 9, 2002, 8:26:34 PM1/9/02

to

> If you think about a physical write mechanism, you will see that
> any attempt to write from more than one source requires a physical
> connection at some point. These would conflict at least some of
> the time, and eventually destroy the actual driving gates.
> Meanwhile the value written would be indeterminate. Nobody in
> their right mind would ever design such a thing, and if they did
> it wouldn't live long.

They tell me that all memory access is synchronized to
be sequential, all the time, by the hardware.

> At any rate, I still would not trust concurrent use of software
> that did not use suitable concurrency protocols. Not too many

I can't possibly do this in 100% portable, completely platform
independent C++ source code only distribution. I must shoot
for the best within this binding constraint.

> If you HAVE to write a zero, that implies somewhere, sometime, in
> some continuum, something wrote a non-zero. If that process is

Any writing beside this single c_str() instance of writing
a zero to a fixed location in memory is explicitly the library
user's responsibility. I can't help them in standard C++.

Peter Olcott

unread,

Jan 9, 2002, 8:28:37 PM1/9/02

to

I must stick with 100% portable platform independent
C++ source code distribution.

> OP should really worry about undefined/illegal behavior
> and consequences of breaking the memory synchronization/
> visibility rules (POSIX/4.10), and should rather carefully
> consider the following recommendation from a well known
> chip manufacturer:
>

> "IA-32 Intel Ž Architecture

Peter Olcott

unread,

Jan 9, 2002, 8:30:29 PM1/9/02

to

> Please note that the behaviour discussed was not that one or the other write
> happened, but that some other data entirely that matched neither write happened.
>
> This has nothing to do with processor ordering vrs weak ordering vrs strong
> ordering. In a properly designed processor, one or the other write should occur
> last.
>
> Of course the memory system operates on physical or real addresses, and many
> modern processes throw address spaces and pages around like they were popcorn.
> --
>
> Del Cecchi
> cecchi@rchland

Yet no processor will ever get its reads and writes to the
same physical address out of order, and still be correct.

Peter Olcott

unread,

Jan 9, 2002, 8:34:53 PM1/9/02

to

> Now, with his broken "const" c_str method which writes
> to some potentially shared object(s) (that might be
> protected by some read-write lock, for example), he
> DOES "screw the threads up":

According to Stroustrup last paragraph on page 231,
my function is not a broken const function, it is merely
logically const, instead of physically const.

Peter Olcott

unread,

Jan 9, 2002, 8:38:01 PM1/9/02

to

> However, I think a smarter way for him to do his lazy
> termination and thereby avoid all this BS would be to
> put the termination in when assignment or modification
> of the string occurs. That way it's done ONCE only,
> rather than every time his *CONST* method is called.
> That also has the side benefit of voiding his whole
> read/write dilemma.

and slowing all the other code down 10-20%.

> However it doesn't void his assertion that c_str's
> execution will never overlap concurrent execution of
> a mutator. For that he needs to use concurrency
> controls at some level.
>
> That comes down to a choice of putting the controls
> in the application and trusting everyone to play
> ball or putting them in the library and threreby
> ensuring it's actually threadsafe.

Can't possibly do this in a 100% portable, completely
platform independent standard C++ only, way.

> Cheers,
> Rupert
>
>

Peter Olcott

unread,

Jan 9, 2002, 8:42:29 PM1/9/02

to

"Maxim S. Shatskih" <ma...@storagecraft.com> wrote in message news:a1hsga$1fr5$1...@gavrilo.mtu.ru...

This code has been run on many platforms. across many
different operating systems and compilers. It has proven
to be on the average twice as fast as the next fastest one
on data movement. STLport. It has proven to be much
faster than this on the other platforms. In one case beating
MSVC++ 6.0 by 178-fold.

If you are referring to the c_str() const compiling,
(even though I add a zero bye at the end)
this is normal and correct, according to members
of the C++ standards committee.

Peter Olcott

unread,

Jan 9, 2002, 8:47:09 PM1/9/02

to

"Peter Boyle" <pbo...@holyrood.ed.ac.uk> wrote in message
news:Pine.SOL.4.33.02010...@holyrood.ed.ac.uk...

>
>
> > inline const char *FastString::c_str() const
> > {
> > String[NextByte] = 0;
> > return (String);
> > }
> >

> > I want to know if there can be any possible problem
> > with the above function taken in isolation from
> > every other possible function, in either a multi-threaded
> > or multi-processor environment, such that multiple
> > concurrent access to the above function, pertaining
> > to the same data could cause errors in the absence
> > of any locking mechanism.
>
> What is, where is, Nextbyte, and how does it get assigned.
> Is it thread local or global, does it have the same value
> on the two threads.

||||||with the above function taken in isolation from
||||||every other possible function,
All these other functions are the users responsibility for
concurrency.

> Ditto string.

char *

> As others have observed if Nextbyte is not identical between
> the two threads, on certain machines read-modify-write
> issues apply on non-byte addressable (but still cache coherent)machines.

All this stuff is moot, that's what I mean by taking this
function in ISOLATION.

> > Remember this is two processes writing the same value
> > (ASCII Zero) to the exact same address, there are
>
> So NextByte is identical?
>
> > no other possibilities. I only want to consider the
> > concurrent use of the above function in complete
> > isolation from any and all other functions.
>
> Oh, to have that luxury!

I can't do any better than this in standard C++...

>
> Peter
>
>
>
>

Peter Olcott

unread,

Jan 9, 2002, 8:50:15 PM1/9/02

to

"Anne & Lynn Wheeler" <ly...@garlic.com> wrote in message news:uwuyr7...@earthlink.net...

> "Peter Olcott" <olc...@worldnet.att.net> writes:
>
> > inline const char *FastString::c_str() const
> > {
> > String[NextByte] = 0;
> > return (String);
> > }
>
> (w/o looking at the rest of the code) depending on the compiler,
> machine, etc .... the compiler could be generating code that loads
> NextByte in a register and keeps it there for extended period of time.
> If NextByte is being changed (increment, decrement, etc), it is
> possible that multiple processors wouldn't necessarily see changes
> that each other are doing and NextByte gets "out-of-sync" ... so the
> different processors are now working with different NextByte values.
>
> Then on lots of machines, one byte storage alterations require
> fetching a word (or more), updating the specific byte and then storing
> back the whole unit of storage.
>
> If processor-1 was working with NextByte=71 and the processor-2 with
> NextByte=70;
>
> * proc1 fetches the word for byte 71,
> * proc2 fetches the same word for byte 70,
> * proc1 updates the byte and stores the word back,
> * proc2 update the byte and stores the word back ...

All of these problems are the users responsibility.
I am providing this library in standard C++ source.
Its gotta be 100% portable across all platforms.

cjt

unread,

Jan 9, 2002, 9:49:59 PM1/9/02

to

Christian Bau wrote:
>
<snip>

>
> But there is a different level: You don't know what code the C compiler
> will execute. For example, you might have a processor where the smallest
> addressable memory unit is larger than a byte. If you try to store a
> zero byte, then the compiler might produce code that reads for example
> eight bytes into a register, modifies one of the eight bytes, and writes
> all the eight bytes back. If one processor tries to set byte X to zero
> using this method, and another processor tries to set byte X+1 to zero,
> then it is _not_ guaranteed that both bytes will be zero. And you don't
> need multiple processors for this at all, the same thing can happen with
> a single processor and multiple threads running. And you don't need
> fancy hardware for this to happen; if you don't tell the compiler that
> some memory location is volatile then it will assume that no other
> threads refer to that location, and it can do other things than you
> expect.

I tried to make that point early in this discussion, and he dismissed it out
of hand. He seems to have some sort of agenda I don't understand.

CBFalconer

unread,

Jan 9, 2002, 10:39:12 PM1/9/02

to

Peter Olcott wrote:
>
> > > So let me verify the most crucial point that you, and no one
> > > else has made. All memory reads, and all memory writes are
> > > always synchronized on every machine all the time?
> > >
> > > In other words is not physically possible for any
> > > machine to physically write to the same address at
> > > the same time, and any attempt under any conditions
> > > will always, each and every time be synchronized?
> >
> > In any machine I ever heard or conceived of. However, I am *not*
> > omniscient, as has been shown several times before.
> >
> > It would be possible to design a system where this doesn't hold,
> > however it would be useless.
>
> That seems to be a pretty definitive statement. Another
> tangential issue has arisen. Is there any machine architecture
> where this would necessarily be much slower in a multi-processor
> or threaded environment? In other words the hardware would
> invoke some special operation that slows everything down?

Seems quite likely. CPU1 is using the memory, CPU2 asserts a bus
request, uses it for some time, then releases. Now CPU1 can
finish its operation, unless CPU3 has higher priority and wants
the bus. After that CPU2 wants it again, etc. There are *NO*
timing guarantees (didn't I say that before?).

Standard sort of concurrency problem. Look for 'fairness'. Also
conspiracy can cause infinite lockout. As I said before, ben-Ari
is readable, and very instructive.

Hank Oredson

unread,

Jan 9, 2002, 11:42:11 PM1/9/02

to

"Peter Olcott" <olc...@worldnet.att.net> wrote in message
news:os__7.241620$WW.13...@bgtnsc05-news.ops.worldnet.att.net...
> > I worked on a system about 8 years ago with an IDT (I think) dual-port
> > RAM chip between an 80186 and an 8051. The spec on the RAM said you
> > absitively posolutely should *not* access the same location from both
> > sides simultaneously. Even both sides reading could cause not only a
> > bad read value, but change the value in the location as well.
> >
> > Fortunately, the chip also had what they called "semaphores" that
> > could be used at the software level to provide mutual exclusion. Each
> > side would "request" a semaphore by writing a 0, spin lock until the
> > value of the semaphore reads as zero, then "release" the semaphore by
> > writing a 1. IIRC, the chip had 4K bytes and 8 semaphores.

> >
> > Of course, I don't really think the OP would really want to worry
> > about this particular target, especially with a C++ string library...
> >

> > Regards,
> >
> > -=Dave
> > --
> > Change is inevitable, progress is not.
>
> So it is possible, yet, not likely.

And thus you must code your library function differently.

--

... Hank

http://horedson.home.att.net

Hank Oredson

unread,

Jan 9, 2002, 11:49:25 PM1/9/02

to

"Peter Olcott" <olc...@worldnet.att.net> wrote in message

news:rr6%7.242370$WW.13...@bgtnsc05-news.ops.worldnet.att.net...

If the user (me for example) had to worry about the stuff going
on below the library functions (hardware), I would (and do)
provide my own library functions that work correctly.

C++ string handling is one of those cases ... particularly when
writing things like interrupt service routines. But your code would
never be called from an ISR, would it? You would disallow that?

CBFalconer

unread,

Jan 10, 2002, 3:27:03 AM1/10/02

to

Peter Olcott wrote:
>
> Yet no processor will ever get its reads and writes to the
> same physical address out of order, and still be correct.

Oh yes it can. Imagine a fragment like this:

while (barf[i] == 0) {
/* lots of code, never writing to barf[i] for the original i
*/
barf[i] = 0;
/* more code, no restrictions */
}

a smart compiler/optimizer is entitled to discard the assignment
entirely. If the second mass of code never affects i or barf[i]
then it can also optimize to:

if (0 == barf[i]) {
label: /* first code block */
/* second code block */
goto label;
}

saving the assignment code and the repeated execution of the test.

However, we do have things like volatile, mutex, semaphores,
wait/signal etc. to handle this sort of problem in a fairly
portable way.

CBFalconer

unread,

Jan 10, 2002, 3:27:06 AM1/10/02

to

So the user writes:

i := 0;
lockarea;
WHILE a.getchar(i) <> 0 DO BEGIN
unlockarea;
i := i + 1;
lockarea;
END;
(* whee, I have the string length in i *)
(* woops, forgot to unlockarea on exit - HUNG *)

having done 2*i system calls, each of which may well have blocked
his process. I don't think this is going to be popular.

Maxim S. Shatskih

unread,

Jan 10, 2002, 4:09:27 AM1/10/02

to

> Stroustrup page 231.
> Occasionally, a member function is logically const, but it still
> needs to change the value of a member. To a user, the function
> appears not to change the state of its object. However, some
> detail that the user cannot directly observe it updated. This is
> often called logical constness.

Is "mutable" keyword necessary for this?

Max

Mark Horsburgh

unread,

Jan 10, 2002, 4:08:20 AM1/10/02

to

On Thu, 10 Jan 2002 01:12:40 GMT, Peter Olcott <olc...@worldnet.att.net> wrote:
>> > There is external unwritten synchronization code the will
>> > be the user's responsibility to write. I am shooting for
>> > the same thread-safety guarantee that SGI STL provides,
>> > thus the ONLY remaining issue of the thread safety of
>> > my FastString, is multiple concurrent instances of c_str().
>> > ALL the other issues have been resolved by explicitly
>> > making them the responsibility of the user.
>>
>> Righty Ho.
>>
>> So... In essence what you're doing is producing an analog of
>> an STL class, with the sole difference that it guarantees
>> thread safety for one method in it... Providing that you
>
> Nope, its not that. I make all the write functions the users
> responsibility, and tell them that the read functions are
> safe ONLY if you are not writing to the same data. This
> is the same thread safety guarantee that SGI STL provides.
> They can't provide a stronger one than this AND stay portable.
> All I gotta know is if c_str() can interfere with itself,
> because I'm telling my users that its a ReadOnly function.

And therein lies your real problem. You're telling users that a
function is ReadOnly when it's not. Don't do that.

Mark

Alexander Terekhov

unread,

Jan 10, 2002, 4:58:10 AM1/10/02

to

Peter Olcott wrote:
>
> > Now, with his broken "const" c_str method which writes
> > to some potentially shared object(s) (that might be
> > protected by some read-write lock, for example), he
> > DOES "screw the threads up":
>
> According to Stroustrup last paragraph on page 231,
> my function is not a broken const function, it is merely
> logically const, instead of physically const.

> > Now, with his broken "const" c_str method which writes
> > to some potentially shared object(s) (that might be
> > protected by some read-write lock, for example), he
> > DOES "screw the threads up":
>
> According to Stroustrup last paragraph on page 231,
> my function is not a broken const function, it is merely
> logically const, instead of physically const.

[[ flames on ]]

You know, you are being ridiculous, really.

- stop polluting the usenet groups which has nothing
to do with your problems/issues; go back to c.p.t;

[[ flames off ]]

- Stroustrup does NOT discuss threaded C++;
he discusses SINGLE-THREADED C++ programming language.

- without declaring your member as mutable ("mutable union
{char *String;...}") your code is broken/undefined anyway,
threads aside;

- the lazy stuff he is talking about:

if ( this->cache_valid == false ) {

// ...
this->cache_valid = true;

}

return this->cache;

does NOT work in threaded (MP) environment without:

a) protecting "everything" with mutex (or some rwlock which would
support promoting access to WRITE operation inside "if" block,
but since it could always end up being not atomic, it might
break your object caller's assumptions if he or she owns the
lock and uses it to protect some other objects)

-or-

b) using an extra *thread-specific* this->per_thread_cache_valid
flag which would only work "once" per *this object lifetime:

http://groups.google.com/groups?as_umsgid=3B53E398.8A8C5979%40web.de

-or-

c) using some special (that is, NOT ANSI C/C++) "volatile"
semantics for this->cache_valid flag (again, would only
work "once" per *this object lifetime):

http://jcp.org/jsr/detail/133.jsp
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

regards,
alexander.

Alexander Terekhov

unread,

Jan 10, 2002, 5:23:38 AM1/10/02

to

http://rsim.cs.uiuc.edu/~sadve/Publications/models_tutorial.ps

regards,
alexander.

David Bradley

unread,

Jan 10, 2002, 7:27:45 AM1/10/02

to

Peter Olcott wrote:

> My little function taken in isolation can be considered
> a read only function as far as concurrency is concerned,
> even though in actual fact it does write an ASCII zero.

Do a search on "word tearing" for examples of problems with this assumption.

lokie.spods

unread,

Jan 10, 2002, 7:47:59 AM1/10/02

to

"Peter Olcott" <olc...@worldnet.att.net> wrote in message

news:38N_7.240393$WW.12...@bgtnsc05-news.ops.worldnet.att.net...
> > I find it unlikely that this code will cause problems on any current
> > hardware architecture that I'm aware of. However, since you haven't
> > mentioned any hardware architectures, and I'm not familiar with all
> > existing (and definitely not all future) hardware architectures, I
> > wouldn't put any money on this being portably safe.
> >
> > The *only* reasonable way that you're going to be able to provide the
> > guarantee that you want is to list supported platforms for your code,
and
> > make and verify it individually for each supported platform. POSIX
> > allows you to treat all POSIX platforms as one by only relying on
> > guarantees made by the POSIX standard (in which case you'd need to
revise
> > your code). If you need to support non-POSIX environments like Win32
> > threads, you'll need to separately verify those platforms.
>
> I can't do this. It must run on machines not even invented yet,
> and it must run in every environment, including those not
> yet written. I am not getting any replies from the guys that

> know this bare metal stuff, and think that's what I need.
>

> Apart from any operating system, is there every a case
> where writing a ASCII zero byte from two different processes
> to the exact same address would result in anything other than
> ASCII zero, being written? This is a computer circuit
> question.
>
Sorry for taking so long to reply, but you've no idea how difficult it is to
acquire sufficent entrails to divine the future upto the point the sun goes
nova and no one will care much for a string class.

Your problem can be broken down thus:
You wish to make guarantees to your users past, present and future regarding
the safety, portability and efficency of your code. You also want to do this
by disregarding the guarantees provided to you by the main implementation
standards in favour of assumptions about how hardware will continue to be
designed (to say nothing of OS'es).

A true story I often ram down the throats of my trainee's regarding
assumptions, involves a little company called Canon. Many moons ago so the
story is told a young technician accidently touched the nib of a fountain
pen with a soldering iron, a prank thats been repeated many times since.
Instead of making an assumption that everybody knew ink would squirt out, or
it was irrelevant, the technician wrote his findings up and passed them on.
The result a while later becoming the BubbleJet printer and the rest as they
say is history.

There are other "classic mistakes" made in design and R+D which have become
viable mainstream products or must have features. You may for example want
to ask IBM about its parallel port cockup, thank god no one spotted that one
otherwise the Bi-Directional printer port would still be a pipe dream.

Personally I believe it'd be a very brave (wo)man who'd dare make a
prediction about what will change in the IT industry over the next 10 years
let alone make guarantees for a longer period than that. Take the advice of
your peers within this group and comply with todays standards, hell whats it
going to cost to update your code to tommorows standards when they appear?

Anthony McDonald
--
Spammer's please note, all spam sent to this address immediately raises a
complaint with your network administrator.

Joe Seigh

unread,

Jan 10, 2002, 10:44:48 AM1/10/02

to

Peter Olcott wrote:
>
> inline const char *FastString::c_str() const
> {
> String[NextByte] = 0;
> return (String);
> }
>

> The String is an array of char. This is written in C++.
> http://home.att.net/~olcott/FastString.cpp
> This is where the rest of the code can be found.

>
> I want to know if there can be any possible problem

> with the above function in either a multi-threaded

> or multi-processor environment, such that multiple
> concurrent access to the above function, pertaining
> to the same data could cause errors in the absence
> of any locking mechanism.
>

> I am guessing that this is a computer hardware problem,
> and the problem is whether or not two completely
> simultaneous attempts to update this exact same address
> with zero could collide with each other such that a
> value other than zero could be written.

To rephrase this, you are basically asking if there are
any assumptions you can make about storage models in
a multi-threading environment. The answer is no.
That's because hardware engineers don't inhabit the
same universe that software engineers do. Basically,
they analyze a lot of code (mostly badly written code)
to figure out how to make hardware that runs it faster.
As as consequence, the clever code that you wrote is
going to break or run slower because the assumptions
you make will no longer hold at some point.

The only safe thing to do is pick some standard, like
say POSIX threads, which will guarantee some sort of storage
model (if you follow their rules). Because if a
hardware engineer creates an architecture that breaks
POSIX threads he will have to contend with an irate
POSIX committee and even more irate POSIX thread
implementers, which is more of an incentive than just
a single irate user could be. Mobs can be a good thing.

As an example of some of the strange ideas that hardware
engineers can get into their heads - coherent cache (not
cache but coherent cache). Coherent cache addresses several
issues. One of which is lost stores. Another of which is
staleness of data. IMO, the latter is unnecessary. The only
effect of coherent cache is that it makes muti-threaded programs with
race condition bugs less likely to break and as a consequence
harder to debug. I maintain that you will not find any correctly
written POSIX thread program that suffered any consequences
of a caching scheme that allowed stale data. But I've argued
with enough HW types who swear the universe will end if you
allowed stale data in cache though they could not explcitly state
why exactly.

Joe Seigh

hack

unread,

Jan 10, 2002, 11:18:31 AM1/10/02

to

In article <3C3D0113...@prodigy.net>, cjt <chel...@prodigy.net> wrote:
>Christian Bau wrote:
>...

>I tried to make that point early in this discussion, and he dismissed it out
>of hand. He seems to have some sort of agenda I don't understand.

The only way I can make sense is that the OP has a system in which he has
taken care of all multiple-thread issues except one: a particular function
may have two instances in "simultaneous" execution, but *only* in the case
where they deal with the exact same string, so that the relevant byte address
is identical. Since nothing else can logically execute at that time, the
size of the memory access granule is indeed irrelevant.

Whether you believe that he manages to enforce the constraint described above
is a different story.

Michel.

Kaz Kylheku

unread,

Jan 10, 2002, 1:53:22 PM1/10/02

to

In article <3C3CD412...@webmaster.com>, David Schwartz wrote:
>Peter Olcott wrote:
>
>> I have already hashed this into non-existence in other groups.
>> Ultimately C++ can't do any more than the hardware can
>> do, thus the hardware forms the bottom line.
>
> You seem determined to be deliberately wrong. C++ has nothing

No, he is determined to convince everyone, no matter who, that he
is right. See some recent threads in comp.lang.c++ regarding a string
class.

Olcott is a troll. Which is why I have largely stayed away from this
thread, and the one in comp.lang.c++.

Stephen Fuld

unread,

Jan 10, 2002, 3:16:33 PM1/10/02

to

"Peter Olcott" <olc...@worldnet.att.net> wrote in message

news:206%7.242293$WW.13...@bgtnsc05-news.ops.worldnet.att.net...

> > You might have missed the impact of Terje's comments earlier. While
this is
> > true if you can guarantee that what you are writing is truely hardware
> > memory, in an arbitrary hardware design you cannot guarantee that. It
would
>
> I think that I can guarantee this. With each and every hardware
> platform, it must always make writes come before reads to
> the exact same location in memory, iff this is the way that the
> programmer specified them.

I think you are still missing the point. I could have a hardware design
where a particular "memory" location isn't really (in the hardware) a memory
at all, but a register on some peripheral device. It can be that reading or
writing to that location can trigger that external device to do whatever the
hardware designer wants it to do. This has absolutely nothing to do with
real physical memory, but with a commonly used mechanism, called memory
mapped I/O, to access external devices. If the address to which you are
writing is not physical memory, but some memory mapped device, you have
absolutely no guarantees whatever from the hardware of what is going to
happen. What will happen is whatever the hardware designer makes happen,
and this could be almost anything, most of which are not what you would
want.

Alex Colvin

unread,

Jan 10, 2002, 4:27:34 PM1/10/02

to

> > inline const char *FastString::c_str() const
> > {
> > String[NextByte] = 0;
> > return (String);
> > }

> >The String is an array of char. This is written in C++.
> >http://home.att.net/~olcott/FastString.cpp
> >This is where the rest of the code can be found.

> >I want to know if there can be any possible problem

> >with the above function taken in isolation from

Sorry, I assumed you had found a problem and were attempting to
debug it, rather than to design something bug-free.

There is a theoretical model of parallel shared memory that requires
exclusive Read/Write (EREW). As far as I know,
you'll never run into such a critter.
Even if memory is replicated, consistent interleaved updates should
be fine.

However, the "const" issue is one of those dark corners of the
standard that might allow the compiler to screw things up for you.
Though I doubt any real compiler could get away with that.

--
mac the naïf

CBFalconer

unread,

Jan 10, 2002, 7:06:38 PM1/10/02

to

Scenario - a multiprocessor environment.

Time slice 1 - CPU1 runs P1, loads expression into temp.
Time slice 2 - CPU2 runs P1, alters expression in temp.
Time slice 3 - CPU1 runs P1 again, where does it get temp?

if the cache belongs to the memory system, no problem. If the
cache belongs to the CPU, all hell breaks loose unless it is
marked invalid and dirty values written out on every dispatcher
call. That tends to increase the average access time.

Bill Todd

unread,

Jan 11, 2002, 2:35:21 AM1/11/02

to

"Emil Naepflein" <Emil.Na...@philosys.de> wrote in message
news:oo9q3uol3vlihuhkf...@4ax.com...
> On Thu, 10 Jan 2002 01:26:34 GMT, "Peter Olcott"
> <olc...@worldnet.att.net> wrote:
>
> > Any writing beside this single c_str() instance of writing
> > a zero to a fixed location in memory is explicitly the library
> > user's responsibility. I can't help them in standard C++.
>
> So the library user's have to use concurrency control anyway. The only
> thing you have to guarantee is the thread-safeness of the library. So
> effectively the discussion about concurrent use of this function is
> mood, because of the concurrency control the user has to implement, it
> will never happen that two threads/processors write this memory location
> concurrently. ;-)

Well, since he's told the user the function is const, the user might be
forgiven for simply ensuring that it did not execute concurrently with any
non-const functions (but could execute concurrently with itself)...

- bill

Joe Seigh

unread,

Jan 11, 2002, 6:16:26 AM1/11/02

to

CBFalconer wrote:
>
> Joe Seigh wrote:
...

> > As an example of some of the strange ideas that hardware
> > engineers can get into their heads - coherent cache (not
> > cache but coherent cache). Coherent cache addresses several
> > issues. One of which is lost stores. Another of which is
> > staleness of data. IMO, the latter is unnecessary. The only
> > effect of coherent cache is that it makes muti-threaded programs with
> > race condition bugs less likely to break and as a consequence
> > harder to debug. I maintain that you will not find any correctly
> > written POSIX thread program that suffered any consequences
> > of a caching scheme that allowed stale data. But I've argued
> > with enough HW types who swear the universe will end if you
> > allowed stale data in cache though they could not explcitly state
> > why exactly.
>
> Scenario - a multiprocessor environment.
>
> Time slice 1 - CPU1 runs P1, loads expression into temp.
> Time slice 2 - CPU2 runs P1, alters expression in temp.
> Time slice 3 - CPU1 runs P1 again, where does it get temp?
>
> if the cache belongs to the memory system, no problem. If the
> cache belongs to the CPU, all hell breaks loose unless it is
> marked invalid and dirty values written out on every dispatcher
> call. That tends to increase the average access time.
>

I'm not clear on your scenario. I take it to mean

1) cpu 1 fetches value from temp (from cache)
2) cpu 2 stores new value into temp (in cache)
3) cpu 1 fetches value from temp (from cache)

and the question is during (3) is the value fetched from
temp allowed to be stale, i.e. cpu 1's cache entry was not refreshed
between (2) and (3)?

I would say it all depends. It would depend whether cpu 2 and cpu 1 performed
lock actions during that interval or not. If not then I'll argue that
a stale value is ok.

Now obviously I've left a lot out. Cache is now no longer transparent and I
have (deliberately) not specified any mechanisms to deal with that issue, suffice to say
that the issues would not have to be dealt with in general programming but
specifically by the routines that implement locking and such. So in that
sense it would still be transparent at the general programming level.

Joe Seigh