What are these kind of objects good for?
Why this (as far as I can see) "forced" parallelism between "volatile" and
"const"?
Tito
volatile means that the object's value can be changed by means outside
of the system (e.g. a variable representating the current environment
temperature). volatile tells the compiler to avoid specific variable
optimisations so as to ensure valid behaviour.
--
Ioannis
* Ioannis Vranos
* Programming pages: http://www.noicys.freeurl.com
* Alternative URL: http://run.to/noicys
Yes, what you say is true when talking about objects of primitive types
("ordinary" variables).
When I talk about "objects" I'm referring to instances of user defined types
(structs or classes).
Tito
It's the same i guess. The compiler avoids optimisations regarding
their state.
Then making an object volatile is the same as if all the attributes of his
class were volatile?
But one question remains: why it is only permitted to call volatile function
members on volatile objects?
Tito
The aim of C/C++ volatile keyword is simply to confuse people...
resulting in a worldwide annual loss of productivity measured in
quite a few $10^6 ["probably"]. And it [i.e. loss] is growing
constantly.
regards,
alexander.
--
http://groups.google.com/groups?selm=3DBFF494.5FB11101%40web.de
Well as it was said, volatile means that an object can be changed by
outside means. So if you have
volatile int x=0;
cout<<x<<endl;
x can be something else than 0 (like 3).
A volatile object means that its state can be changed by outside
means. I guess a volatile member function prompts the compiler to not
apply specific optimisations on the code of that function So we
declare the member function volatile to help the compiler, probably it
was considered difficult for the average compiler to protect volatile
class members across all member functions, so a little help was
required.
Because the code you generate to deal with volatile data often has to
be different from the code you generate to deal with non-volatile
data. Consider something like this:
class whatever {
int a;
int b;
int c;
public:
void add() { a += c; b += c; a += c;}
void add() volatile { a += c; b+= c; a += c; }
};
Now, even though the code _I've_ written in the two member functions
looks identical, what the compiler generates for them may be entirely
different. In particular, in the non-volatile member function, the
compiler will normally generate code that only loads c into a
register once, and then adds the contents of that register to both a
and c. In the volatile version, it can't do that: it HAS to re-load
c from memory each time it's used. Likewise, in the non-volatile
version, it can see that b+=c has no effect on the two a+=c
statements. As such, in that version it will typically produce code
that's essentially equivalent to a+=c<<1; Again, in the volatile
version, it can't do that: the order and number of writes to volatile
objects may be meaningful in and of itself, so it can't combine the
code for different statements like this.
--
Later,
Jerry.
The universe is a figment of its own imagination.
Tha purpose of volatile qualification is to provide a method
for the programmer to instruct the compiler that the access
pattern for a specific object needs to be exactly as the
programmer has written it (i.e. according to the C abstract
machine), thus the compiler must not change that pattern in
the process of optimizing the generated machine code.
> The aim of C/C++ volatile keyword is simply to confuse people...
Maybe that's *your* aim.
I have to agree that the effect has been something along those lines,
but the aim has been rather clear: to provide a mechanism by which the
programmer can relieve certain burdens on the implementation and
replace them with other burdens:
relieved:
* the burden that the object needs to remember the value
last stored to it by the program
added:
* the burden that all computations involving the value of a
volatile object must use a very fresh copy.
* the burden that all modification of a volatile object must
be committed by the next sequence point.
In other words, objects of volatile-qualified types are viewed as
I/O registers.
Okay, so that's the aim. The problem with C's notion of volatility is
that its granualarity is not sufficiently fine, i.e., 1the notion that
one size fits all:
1. Input registers are not output registers and vice versa.
2. Neither are objects whose values are written by signal handlers.
3. Neither are automatic objects whose updates should be remembered
that are local to functions that contain a call to setjmp.
Why do any of these three kinds of objects need to have their values
stored to memory by the first sequence point following a modification
of that object?
Tom Payne
No. I've explained this before, but you persist in giving
misinformation and drawing incorrect inferences from it.
When I use the term "viewed as", I'm making an analogy. This
particular analogy is supported by C89, 6.5.3, footnote 67:
A volatile declaration may be used to describe an object
corresponding to a memory-mapped input/output port ...
The point of the analogy is that a conforming implementation must
treat objects of volatile-qualified types as though they are:
1) subject to spontaneous changes in value, just like input
registers
2) subject to observation, just like output registers, i.e.
their values at sequence points are part of the program's
behavior.
Point #1 above implies that, for an object of volatile-qualified type,
an implementation should not uses possibly stale values that have been
cached in registers. Point #2 implies that, for an object of
volatile-qualified type, an implementation should promptly commit any
newly assigned value to the object's location (rather than simply
caching it in memory). I don't see either of those points are being
incompatible with your comment that:
Tha purpose of volatile qualification is to provide a method
for the programmer to instruct the compiler that the access
pattern for a specific object needs to be exactly as the
programmer has written it (i.e. according to the C abstract
machine), thus the compiler must not change that pattern in
the process of optimizing the generated machine code.
Have I missed something here?
Tom Payne
Then probably you should have used "like". Also making an analogy of a
specific technical aspect with other irrelevant technical aspects,
only confusion can cause.
If your point is that I/O ports are irrelevant to volatility, I
couldn't agree less. IMHO, memory-mapped I/O ports are the
fundamental paradigm for volatile objects. In fact, if a program
attempts to access a memory-mapped I/O port via an lvalue whose type
is not volatile-qualified, the resulting behavior is undefined.
Tom Payne
Yeah, ``how nice.'' <http://tinyurl.com/2r53> ("Forget C/C++
volatiles, Momchil. ...")
> > The aim of C/C++ volatile keyword is simply to confuse people...
>
> Maybe that's *your* aim.
Nope. My aim is simply to get C/C++ volatile and *jmp deprecated,
replace async. signals with threads... uhhm, and merge C and C++
and POSIX.1... making exceptions work in "C" language core and
with "C" bindings of POSIX.1++ (in addition to "C++" bindings).
That's it. ;-)
regards,
alexander.
No; they're the most obvious *application* for this
facility. There are other, quite different, reasons
for using volatile qualification as have been
discussed here and in other newsgroups (such as
comp.os.plan9). If you want a *paradigm*, try the
original Ritchie PDP-11 C compiler which performed
minimal optimizations and thus could be treated
almost as a high-level assembler. (However, there
were still other access issues in certain
circumstances, but "volatile" wouldn't address
those other than through the requirement that the
implementation document access semantics.)
which contains several erroneous claims.
Ah. Okay.
http://groups.google.com/groups?selm=3DD47620.6040909%40null.net
(comp.os.plan9; Re: [9fans] how to avoid a memset() optimization)
<quote>
> The intent of volatile was to capture appropriate
> behavior of memory-mapped registers and similar
> things (like a clock in user space updated by the
> OS.) So, things like
> *p = 0;
> *p = 0;
> should generate two stores if p is volatile *int.
Yes, the C standard requires that, with the correction
that it is the int that must be volatile-qualified,
not the pointer. I.e., volatile int* if we're using C
abstract types. It is still up to the implementation
to determine whether the store involves a read also
and how wide the access is (e.g., if int is 32 bits on
a 64-bit word bus, the store would necessitate fetch
of 64 bits, modification of 32 of them, and write-back
of 64 bits). There doesn't seem to be any point in
trying to let the programmer specify such details,
since they're normally built into the hardware. But
volatile as it is specified at least lets the programmer
control the *compiler* (code generator), which is
partial control and quite often good enough.
</quote>
regards,
alexander.
``such as...'' ? ;-)
regards,
alexander.
I'm meaning paradigm in the sense of an example that serves as a
pattern or model, i.e., one that captures all of the essential aspects
of the concept.
+ There are other, quite different, reasons
+ for using volatile qualification as have been
+ discussed here and in other newsgroups (such as
+ comp.os.plan9).
Fine. Then please furnish just one specific example of an object that
needs to have volatile-qualified type for a reason that is
fundamentally different from the reasons that memory-mapped I/O ports
must have volatile-qualified types.
Tom Payne
Like what? Please be specific!
Tom Payne
I believe that the C standard tried to make that point in general via
the sentence:
What constitutes an access to an object that has volatile-qualified
type is implementation defined.
[C89, 6.5.3]
The term "constitutes" is unfortunately ambiguous here, e.g.,
This letter does not constitute an offer of employment.
versus
A father, a mother and their children constitute a family.
I have to go along with Doug's claim that the standard doesn't make
sense if we take the fisrt interpretaton for constitute. Rather, the
only way for the stanard to make sense is to interpret it as saying
that the implementation is free to add constituents of the sort you
mentioned above to reads and writes of volatile objects.
Tom Payne
I just did.
Two examples were auto variables whose values you
want to rely on after longjmp()ing back, and a buffer
that you really want cleared with no other use made
of the zero fill. The former is necessary for
reasons having to do with allowable code optimization
while the latter is necessary because some external
agent beyond the scope of the C standard will be
examining the contents (sometimes volatile
qualification can help with run-time debugging for a
similar reason). Memory-mapped device registers used
for input have values that are potentially changed
from their last-stored (by the program) value, due to
action of external agents. These are three different
situations, although of course they are connected due
to being cases where volatile qualification is useful.
No one of them captures all the relevant aspects.
In the case of the zero-filled object, the zero-filling must not to be
optimized away because the object might be externally observable,
i.e., it might behave like an output register.
In the case of the automatics that are local to a function that
invokes setjmp, there are potentially two continuations from the call
to setjmp:
- the first one begins with the normal return from setjmp()
- the second begins when setjmp returns as a result of a longjmp()
That second continuation must read values written to local automatics
by the first continuation. In demanding that such local automatics
have volatile-qualified type, the standard is requiring the first
continuation to treat such objects as output ports and the second
continuation to treat them as input ports, which sufficies to
achieve reliable post-longjmp values.
However, it's overkill to require that such variables be written out
at each sequence point -- for purposes under discussion, they need
only be saved at function calls. What the Standard requires of the
handling of objects of volatile-qualified type is determined by what's
needed in handling I/O registers. These automatics don't exacly fit
that paradigm. And, in requiring that they have volatile-qualified
types, the standard has imposed an unfortunate burden of inefficiency
on them.
Tom Payne
That's rather far-fetched.
> However, it's overkill to require that such variables be written out
> at each sequence point --
Volatile qualification is a simple mechanism with multiple
uses. It isn't specifically tailored for just one of them.
In news://twIA9.23319$nB.2761@sccrnsc03 an OP questioned why the
following snippet resulted in 100098 rather than 100099.
float fTemp = 1000.99f; fTemp *= 100; unsigned un = fTemp;
The problem was due to different rounding behaviour of an 80 bit
floating point register and a 32 bit float variable. Making fTemp
volatile forces the write to un to be from the variable rather than the
register. Unfortunately, the implementation does not obey the
instruction. (There are other implementation-dependent ways to force the
desired behaviour.)
I care less for efficiency than that an implementation does as told.
--
Walter Briscoe
I count three that are mentioned in the stanard:
1) I/O registers and things that behave like them.
2) Local automatics of functions that invoke setjmp.
3) Static objects that are written by signal handlers.
+ It isn't specifically tailored for just one of them.
The requirements that the stanard places on volatiles are (almost)
correct for case #1. As I just pointed out they are overkill for case
#2. They are insufficient for case #3 and have to be supplemented
with requirements of atomicity.
Tom Payne
What about objects shared with other processes?
-Mike
It is ``brain-dead!'', so to speak. <http://tinyurl.com/2s3e>,
<http://tinyurl.com/2s38>,
<http://tinyurl.com/2s2q>.
> and a buffer that you really want cleared with no other
> use made of the zero fill.
``Undefined'' AND ``brain-dead!''. <http://tinyurl.com/2s34>,
<http://tinyurl.com/2s2s>,
<http://tinyurl.com/2s2u>
regards,
alexander.
Hi Mike, contributing to the "quite a few $10^6" {growing} amount or what?
regards,
alexander.
I have no idea what you're asking me. English please.
-Mike
A) http://groups.google.com/groups?selm=3DD4E119.CB51D2D6%40web.de
(Subject: Re: Volatile declared objects)
B) http://groups.google.com/groups?selm=3D47F3CA.E5B4239%40web.de
(Subject: Re: "memory location")
"....
And, BTW, I'd like to urge folks at comp.std.c to consider *FIXING*
the C99 rationale as well..."
C) http://groups.google.com/groups?selm=3DC9930A.10CB5BE%40web.de
(Subject: Re: When to use the 'volatile' keyword ?)
regards,
alexander.
The requirements that the Standard places on the handling of objects
of volatile qualified types were designed for things that act like I/O
registers. The authors of the Standard, however, found that by
imposing those same requirements they could solve some shared-access
problems involving setjmp and longjmp and other shared-access problems
involving signal handlers. The fit wasn't so all that good, but it
worked, at the expense of significant overhead. That success led a
lot of people to believe that shared-access issues in multithreading
can be solved by the silver bullet of requiring volatile-qualified
types. Dave Butenhof has worked very hard to point out that
volatile-qualified types are neither necessary nor sufficient to
guarantee coherence among thread-shared objects. Most of Dave's
postings on the matter have appeared in comp.programming.threads, but
some of them have also appeared in this group.
Tom Payne
Just because somebody makes a bogus argument doesn't
mean there is actually a problem with the specification.
So you're quoting yourself. I've already read those threads.
You've just effectively told me "Alexander is correct because
Alexander says so." Huh?
-Mike
Read again those threads.
regards,
alexander.
I don't see ANY arguments in the text you've quoted here.
> doesn't mean there is actually a problem with the specification.
Open your eyes [phrases in "emotionally loaded language" aside].
Really.
regards,
alexander.
[ ... ]
> So you're quoting yourself. I've already read those threads.
> You've just effectively told me "Alexander is correct because
> Alexander says so." Huh?
Eventually you'll learn that even though Alexander is a fairly
intelligent person, it's best to just plonk him and be done with it.
--
Later,
Jerry.
The universe is a figment of its own imagination.
http://groups.google.com/groups?selm=3CA25853.2B30E9BC%40web.de
(Subject: Re: Q: use CreateThread() and TerminateThread() and message procedure)
regards,
alexander.
Done. (Result of another thread)
-Mike
Since neither C nor C++ supports other processes or shared objects,
what about them?
--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
Yeah. What a terrible loss. I'll never be bothered by a reply of yours
again. Excellent.
> (Result of another thread)
It's now here as well.
< Forward Inline >
-------- Original Message --------
Message-ID: <3DD88CD7...@web.de>
Newsgroups: comp.lang.c++
Subject: Re: Temporaries and const-ness
Mike Wahler wrote:
[...]
> <shrug>
> *PLONK*
Ouch.
regards,
alexander. < almost dead but smiling nevertheless >
Sure they do. Just not portably, and not always
in a way that one would desire.
> > Since neither C nor C++ supports other processes
> > or shared objects, what about them?
>
> Sure they do. Just not portably,
Yep, PORTABLE [POSIX(R) "THR" option <http://tinyurl.com/2sbj>]
threads aside, of course.
> and not always in a way that one would desire.
Yup, < Forward Inline >
-------- Original Message --------
Message-ID: <3D0DCB20...@web.de>
Date: Mon, 17 Jun 2002 13:42:24 +0200
Newsgroups: comp.lang.c++
Subject: Re: Multithreading, synchronization and variables
josh wrote:
>
> On Sat, 15 Jun 2002 23:05:50 +0200, Alexander Terekhov
> <tere...@web.de> wrote:
> > josh wrote:
> > > On Sat, 15 Jun 2002 19:20:28 +0200, Bernd Fuhrmann
> > > <Silver...@gmx.de> wrote:
> > > > Suppose I've got a variable and two concurrent threads
> > > > that use (i.e. change) it. Would it be neccessary to make
> > > > that variable volatile
> >
> > > No. Volatile's got nothing to do with threads.
> >
> > Standard C/C++ POSIX Threads. But there are C/C+
> > implementations which provide non standard volatile semantics
> > that ARE relevant w.r.t. threading -- e.g. to ensure the BYTE
> > memory granularity to fight 'word-tearing' race condition,
> > etc. Also, in Java, REVISED volatiles are quite relevant
> > w.r.t. threading.
> What's "w.r.t"?
'with respect to'
> On topic: this is interesting, any more detail
> on that? Hopefully in just a few plain words <g>.
Nah, in just one single web link:
http://groups.google.com/groups?selm=c29b5e33.0205241457.24f12178%40posting.google.com
(Subject: Re: Parallel Programming in C++; see "B) Your 'volatile'-bullshit")
As for granularity, try this:
http://www.tru64unix.compaq.com/docs/base_doc/DOCUMENTATION/V51_HTML/ARH9RBTE/DOCU0007.HTM#gran_sec
("3.7 Granularity Considerations....")
regards,
alexander.
-------- Original Message --------
Message-ID: <3D0DCB4E...@web.de>
Date: Mon, 17 Jun 2002 13:43:10 +0200
Newsgroups: comp.lang.c++
Subject: Re: Multithreading, synchronization and variables
Jack Klein wrote:
[...]
> But POSIX threads, and anything at all concerning threading, is
> off-topic here.
Nothing is off-topic here (this group has no CHARTER to begin with).
> The meaning of the volatile keyword, as defined by
> both ISO C and ISO C++, has nothing at all to do with threads, since
> neither ISO C nor ISO C++ define or support any sort of threading.
I personally don't care what is supported or not supported. Also, you
may want to fire a search on 'thread' here:
http://std.dkuug.dk/JTC1/SC22/WG21/docs/papers/2002/n1361.html
> josh was 100% exactly right in the context of this group,
Show me please your c.l.c++-Judge certificate or something
like that. Well, Y'know, I actually DO accept complaints on
topicality submitted via:
http://www.slack.net/~shiva/complain.txt
So please take a few spare minutes and submit the form...
regards,
alexander.
Well, I must clarify, that while I was totaly convinced that
"volatile" is not required in a POSIX environment, existing
implementations have quite compatible notion of what constitutes a
"violatile object access" and notably similar behavior (not putting
volatiles in registers, not reordering volatile accesses accross
sequence points (as mandated by the standard)), etc. so volatile is
quite usable, especially together with non-standard primitives (a
POSIX impementation is rarely implemented on top of a POSIX
implementation right ?)
> Nope. My aim is simply to get C/C++ volatile and *jmp deprecated,
> replace async. signals with threads...
Oh, no ... we already have Visual Basic, don't we ?
> uhhm, and merge C and C++ and POSIX.1
Uh-oh, I imagine monstrosity of such a beast ...
>... making exceptions work in "C" language core and
Amen. But first namespaces, eh ?
~velco
There are extensions to C/C++ that support threads, and the question
of whether thread-shared objects must (or should) have
volatile-qualified occurs frequently. The discussion quickly gets to
issues of what the standards require of volatile objects and why, at
which point (IMHO) the discussion this group becomes an appropriate
forum.
Also, there have been a number of suggestions that some from of
threading be added to the standard and discussions of what such a
specification might look like, which again is appropriate for this
group.
Tom Payne
Objects that represent the states of coordination mechanisms, e.g.,
mutexes, must be treated as volatiles --- it does no good to lock a
mutex and keep it cached in a register.
+ so volatile is
+ quite usable, especially together with non-standard primitives (a
+ POSIX impementation is rarely implemented on top of a POSIX
+ implementation right ?)
It is often claimed that one of C's early goals was to replace
assembly language. I'd like to be able to implement concurrency
packages like Pthreads portably in C, without the need to resort to
assembly code.
Tom Payne
Nah, please solve {accept one of "proposed"/find another solution,
reach consensus/legislate/blah-blah} THIS
http://groups.google.com/groups?selm=3DC944D5.EC2EF029%40web.de
(Subject: Re: Memory isolation)
*first*.
regards,
alexander.
I'd recommend that you try being a little less terse. You're point isn't
clear; and quoting yourself as you did further down doesn't make your
point any clearer.
The point was that the Std. C Rationale with respect to volatile and
threading is wrong and, IMO, simply confuse people. The quote was used
as a pointer to the relevant context within the referenced message:
"....
And, BTW, I'd like to urge folks at comp.std.c to consider *FIXING*
the C99 rationale as well..." ---> 2 x ">>!!!!ATTN WRONG ATTN WRONG
ATTN WRONG ATTN WRONG ATTN WRONG!!!!<<"
regards,
alexander.
Thanks. For some reason my browser got hung up at the second level of
indirection. But I think that your point is that the necessary
extensions/adjustments to the C Standard are not likely to be
forthcoming. I've realize that. But, so far, they've not been
proposed. Nor am I sure of exactly what they should be.
Tom Payne
Keep in mind that the "folks at comp.std.c" have no authority to fix it.
That's an issue you need to bring up with the Committee, not with this
newsgroup. There's only a small overlap between this newsgroup and the
Committee, and you shouldn't rely on that overlap for actually getting
anything done.
> the C99 rationale as well..." ---> 2 x ">>!!!!ATTN WRONG ATTN WRONG
> ATTN WRONG ATTN WRONG ATTN WRONG!!!!<<"
Yes, that's your general point. I was more concerned about your specific
point - the one you were making in your response to Mike Wahler's
question:
> What about objects shared with other processes?
Your response to that particular question was extremely cryptic. I have
no idea what it was intended to mean. And apparantly, neither did Mike.
And the follow up messages only served to raise the temperature of the
discussion, without clarifying your response to that particular
question.
<copy&paste>
G'Day,
FYI...
A) http://www.opengroup.org/austin/docs/austin_107.txt
(Defect in XBD 4.10 Memory Synchronization (rdvk# 26), Rationale
for rejected or partial changes)
"Our advice is as follows.
Hardware that does not allow atomic accesses cannot have
a POSIX implementation on it.
We propose no changes to the standard.
Please note that the committee is not required to give advice,
this sort of topic may be better to be discussed initially on the group
reflector prior to any aardvark submission."
B)
http://groups.google.com/groups?threadm=h0Ss9.855%24HI1.63365%40newsfep1-win.server.ntli.net
-------- Original Message --------
From: "Garry Lancaster" <glanc...@ntlworld.com>
Newsgroups: comp.programming.threads
Subject: Memory isolation
Message-ID: <h0Ss9.855$HI1....@newsfep1-win.server.ntli.net>
Date: Mon, 21 Oct 2002 13:01:55 +0100
Hi All
Say I have two global char variables:
char c1 = 0;
char c2 = 0;
and two mutexes:
Mutex m1;
Mutex m2;
(Assume Mutex is a C++ class with the obvious Lock and
Unlock functions, wrapping a mutex API such as the
pthread_mutex_init/destroy/lock/unlock functions.)
I have two functions, f1 and f2.
void f1() {
m1.Lock();
c1 = 1; // Line X.
if (1 != c1) abort(); // Line Y.
c1 = 0;
m1.Unlock();
}
void f2 {
m2.Lock();
c2 = 1; // Line Z.
if (1 != c2) abort();
c2 = 0;
m2.Unlock();
}
The critical sections in f1 and f2 may run concurrently because
they use different mutexes.
I spawn several threads. Some run f1, others f2.
It is my understanding that, even though I have protected my
variables using mutexes, the two variable values may interfere
with one another. For example, on a platform with only word-
sized memory access (and, naturally, more than a single 1 byte
char per word), if the two globals reside in adjacent memory
locations the write to c2 at line Z may generate a word
read including both c1 and c2, followed by a word write of the
same. If line X is run in between this read and write, it will
effectively be ignored, since the new value will be overwritten
by the old, and the program will abort at line Y.
In other words:
State: c1 = 0 c2 = 0
Action: Line Z word read.
State: c1 = 0 c2 = 0
Action: Line X word read
State: c1 = 0 c2 = 0
Action: Line X word write
State: c1 = 1 c2 = 0
Action: Line Z word write
State: c1 = 0 c2 = 1
Action: Line Y, condition false so abort.
Even though each variable is protected by its own mutex,
since they are not using the *same* mutex, they still
interfere.
I know the avoidance of this behaviour is part of what is
meant by atomicity. But, since it is not the whole of what
is meant (it does not address interruptibility), I am currently
using a different term: isolation. (I hope someone will
correct me if there is a standard term for this.)
Is the scenario I post possible under pthreads (or any other
threading system for that matter) or have I missed something
that means the problem will not occur?
If lack of isolation *is* a problem, what is the most portable
solution?
Thanks in advance.
Kind regards
Garry Lancaster
-------- Original Message --------
From: David Butenhof <David.B...@compaq.com>
Subject: Re: Memory isolation
Newsgroups: comp.programming.threads
Message-ID: <BKTs9.20$ZG7.4...@news.cpqcorp.net>
Date: Mon, 21 Oct 2002 13:58:57 GMT
Garry Lancaster wrote:
> Say I have two global char variables:
>
> char c1 = 0;
> char c2 = 0;
>
> and two mutexes:
>
> Mutex m1;
> Mutex m2;
>
> (Assume Mutex is a C++ class with the obvious Lock and
> Unlock functions, wrapping a mutex API such as the
> pthread_mutex_init/destroy/lock/unlock functions.)
>
> I have two functions, f1 and f2.
>
> void f1() {
> m1.Lock();
> c1 = 1; // Line X.
> if (1 != c1) abort(); // Line Y.
> c1 = 0;
> m1.Unlock();
> }
>
> void f2 {
> m2.Lock();
> c2 = 1; // Line Z.
> if (1 != c2) abort();
> c2 = 0;
> m2.Unlock();
> }
>
> The critical sections in f1 and f2 may run concurrently because
> they use different mutexes.
>
> I spawn several threads. Some run f1, others f2.
>
> It is my understanding that, even though I have protected my
> variables using mutexes, the two variable values may interfere
> with one another. For example, on a platform with only word-
> sized memory access (and, naturally, more than a single 1 byte
> char per word), if the two globals reside in adjacent memory
> locations the write to c2 at line Z may generate a word
> read including both c1 and c2, followed by a word write of the
> same. If line X is run in between this read and write, it will
> effectively be ignored, since the new value will be overwritten
> by the old, and the program will abort at line Y.
>
> In other words:
>
> State: c1 = 0 c2 = 0
> Action: Line Z word read.
> State: c1 = 0 c2 = 0
> Action: Line X word read
> State: c1 = 0 c2 = 0
> Action: Line X word write
> State: c1 = 1 c2 = 0
> Action: Line Z word write
> State: c1 = 0 c2 = 1
> Action: Line Y, condition false so abort.
>
> Even though each variable is protected by its own mutex,
> since they are not using the *same* mutex, they still
> interfere.
>
> I know the avoidance of this behaviour is part of what is
> meant by atomicity. But, since it is not the whole of what
> is meant (it does not address interruptibility), I am currently
> using a different term: isolation. (I hope someone will
> correct me if there is a standard term for this.)
>
> Is the scenario I post possible under pthreads (or any other
> threading system for that matter) or have I missed something
> that means the problem will not occur?
>
> If lack of isolation *is* a problem, what is the most portable
> solution?
There is no completely portable solution, because standards do not
provide means to control the exact layout of data in memory, nor
the instructions generated by a compiler to access them. (Even
"volatile" provides only very loose constraints on the
instructions used, and they're not useful here.)
Note that aside from problems that can destroy data, like "word
tearing", there are performance problems such as "false sharing".
False sharing won't hurt your final data (or even intermediate
data), but can drastically affect your performance when multiple
threads (running on separate CPUs) concurrently write to
non-adjacent data in the same cache line(s). (Because of cache
invalidate thrashing in the memory system.)
Your best bet to avoid the functional problems and minimize the
performance risks is to avoid declaring shared data as you've
shown. That is, instead of:
char c1 = 0;
char c2 = 0;
Mutex m1;
Mutex m2;
That not only places the shared data adjacent to each other (in
most implementations), but actually interleaves the shared data
to guarantee you'll get cache conflicts. (c1 is separated from
m1; while c1 and c2, and m1 and m2, are pushed together.)
Instead, use:
char c1 = 0;
Mutex m1;
char c2 = 0;
Mutex m2;
This still doesn't guarantee cache isolation, though at least
you know that the machine is far less likely to have atomicity
problems accessing c1 and c2 with respect to each other. For one
thing, on most machines without atomic access to the char data
type, the compiler will generate padding between the char and
the Mutex (which most likely has wider data, such as int or
long or pointer).
Or even better,
typedef struct {char c; Mutex m} Data;
Data *d1;
Data *d2;
d1 = malloc (sizeof Data);
d2 = malloc (sizeof Data);
Now you're letting the heap manager buy you some reasonable
minimal data alignment, as well as a high likelihood (though
still not a guarantee) that the two allocations will be in
separate cache lines. For further assurance, you could easily
pad the allocations to some reasonable size; 64 bytes is a
common cache line size.
"Mounting" your data into a structure comes as close as you
can in C to controlling the actual layout of data in memory.
While this is a little less trivially simple than the original,
it's not horrendously complicated, either. It'll buy you a lot
of flexibility to adapt to various architectures, as well as a
fair level of builtin basic protection.
--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/
-------- Original Message --------
From: David Butenhof <David.B...@compaq.com>
Subject: Re: Memory isolation
Newsgroups: comp.programming.threads
Message-ID: <PpRt9.10$Nx2.3...@news.cpqcorp.net>
Date: Thu, 24 Oct 2002 12:09:19 GMT
Alexander Terekhov wrote:
> Max Khesin wrote:
>>
>> why not something like this:
>>
>> char sharedData[sizeof(int)+1];
>>
>> char& c1 = sharedData[0];
>> char& c2 = sharedData[sizeof(int)];
>>
>> this would seem (assuming "int" is the largest size read at once) to
>> sufficiently separate the data.
>
> assert( sizeof( char ) == sizeof( int ) );
Again, the real problem with this alternative (as already pointed
out by several) is that a char array has no required alignment
and element 0 need not have (int*) alignment either. Now you need
to do bit masking to align the address of c1 as well as c2.
> [...]
>> > Instead, use:
>> >
>> > char c1 = 0;
>> > Mutex m1;
>> > char c2 = 0;
>> > Mutex m2;
>
> assert( sizeof( char ) == sizeof( Mutex ) );
This would almost certainly still be better than having c1 and c2
in adjacent bytes.
However, in general, yes, you've successfully described ONE (of
many) of the possibilities that caused me to say that these
strategies will often help but provide no real guarantees. I don't
really see why you bothered. (Or, even worse, why I'm bothering to
respond.)
>> > This still doesn't guarantee cache isolation, though at least
>> > you know that the machine is far less likely to have atomicity
>> > problems accessing c1 and c2 with respect to each other. For one
>> > thing, on most machines without atomic access to the char data
>> > type, the compiler will generate padding between the char and
>> > the Mutex (which most likely has wider data, such as int or long
>> > or pointer).
>> >
>> > Or even better,
>> >
>> > typedef struct {char c; Mutex m} Data;
>> > Data *d1;
>> > Data *d2;
>> >
>> > d1 = malloc (sizeof Data);
>> > d2 = malloc (sizeof Data);
>
> assert( 2*sizeof( Data ) <= sizeof( pthread_memory_granule_np_t ) );
Again, and more directly this time: "exactly: and so what"?
> http://groups.google.com/groups?selm=yahiuqxlycg.fsf%40berling.diku.dk
>
> "....
> However, I still think that malloc(1) on an implementation where all
> pointers have the same representation may still return a pointer that
> is not "aligned" in the everyday, non-standardese meaning of the term.
> Because it is undefined anyway what happens when one tries to use the
> pointer to access an object bigger than the size I asked malloc() for."
On a machine with no address alignment requirements, there are no
alignment requirements on malloc(). But address alignment rules aren't
the same as atomic access rules, and this can complicate "isolation".
On a machine like Alpha that requires natural data aligment, a (short*)
MUST have the low address bit clear, (int*) must have 2 low address bits
clear, and so forth. Therefore an implementation of malloc() that did
not return a value with the maximum number of cleared low address bits
would be erroneous. (Yes, 'malloc(1)' could return an unaligned address,
'malloc(2)' could return an address with a single cleared low bit, and
so forth, though this is an unlikely implementation. Certainly
'malloc(2)' cannot return a value with the low address bit set, because
it cannot legally presume the storage will be mapped to 'char[2]' rather
than 'int' even though that's all the information it has.)
It's possible, (though I know of no examples except one subtly broken
model of the VAX family), that a machine without address alignment
rules could have restrictions on atomic access to unaligned data. In
such an implementation, malloc(8) might return an address with the low
bit set, restricting atomic access to that data. Possible, but unlikely.
Except for very early VAX models, unaligned data access may have been
LEGAL, but was extremely inefficient (it meant locking the memory bus,
doing multiple atomic ALIGNED fetches, unlocking the memory bus, and
gluing the data together) -- and of course every VAX data access was
required to be atomic, so there was no way to skip that overhead. No
rational implementation of malloc() would ever return unaligned
addresses even though it might be "legal".
The only real solution to this has to be at the language level, an area
where POSIX and SUS can't tread. There must be language syntax, and it
must be general and simple. I don't recall the context of discussions
sited regarding an "isolated" keyword, but I doubt that'd be practical
or usable except in, uh, "isolated" instances.
Better might be a general compiler option, perhaps a standard #pragma,
to force all "discrete" data allocations to be sufficiently isolated
for atomic access on the target hardware. At the simplest (and most
easily usable) level it would appear in a header file (perhaps
<pthread.h>?) to cause all externs, statics, and allocated return
values (e.g., from malloc()) to be sufficiently separated to ensure
atomicity with respect to other values so allocated.
But... what about 'char foo[2];'? Clearly the address "&foo" must be
"aligned". But what about "&foo[1]"? If it IS, then you really need
to force the compiler to change the definition of sizeof(char) in
that compilation scope or break many patterns in previously portable
code. For example, "char *bar = &foo; bar[1] = 0;". (One could
construct nastier examples that would be harder to detect and fix.)
What about structures? Is each field in the structure expanded?
Essentially what we're saying is that if the machine can access
'long', but not 'int', 'short', or 'char', atomically, then we
really allocate nothing smaller than 'long'. Is that acceptable?
How does it impact application code (and data sizes)?
The best strategy would probably be to say that AN array or A
structure is an "atomicity unit". You don't, by default, gain any
guaranteed atomic access to members of the unit. (This could be
provided for by an additional pragma, or by something like
'isolated'; though the pragma would probably be cleaner.)
Often we want a larger alignment than strictly needed, for
efficiency. The best unit here is almost always the machine's cache
line size -- a value not commonly communicated to application code.
This has proven particularly critical in designing data structures
for NUMA environments, but compiler support tends to be pretty bad.
Perhaps something like "#pragma align_all ({cache|atomicity})"
(Where "cache" is required to subsume "atomicity", just to remove
ambiguity.)
I'm not entirely sure that'd be sufficient, either, but it's another
idea to consider.
--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/
-------- Original Message --------
From: "Garry Lancaster" <glanc...@ntlworld.com>
Newsgroups: comp.programming.threads
Subject: Re: Memory isolation
Message-ID: <mVTt9.7507$Af5.2...@newsfep2-win.server.ntli.net>
Date: Thu, 24 Oct 2002 16:00:07 +0100
[snip]
David Butenhof:
> The only real solution to this has to be at the language level,
> an area where POSIX and SUS can't tread.
I think POSIX *could* do it, but the languages *should*
do it. But then I also think that a lot of what POSIX
currently does would, in an ideal world, be done by
the languages. At the moment this has somehow
fallen through the cracks because it's quite subtle.
> There must be language syntax, and it must
> be general and simple. I don't recall the context of discussions
> sited regarding an "isolated" keyword, but I doubt that'd be
> practical or usable except in, uh, "isolated" instances.
>
> Better might be a general compiler option, perhaps a standard
> #pragma, to force all "discrete" data allocations to be
> sufficiently isolated for atomic access on the target hardware.
> At the simplest (and most easily usable) level it would appear
> in a header file (perhaps <pthread.h>?) to cause all externs,
> statics, and allocated return values (e.g., from malloc()) to
> be sufficiently separated to ensure atomicity with respect to
> other values so allocated.
We have to be careful not to confuse atomicity with
isolation. Depending on exactly how you define it,
atomicity is probably sufficient for isolation, but isolation
is not sufficient for atomicity e.g. a machine with only
byte access to memory can isolate multi-byte words,
but cannot access them atomically (at least not without
a global system lock or some other extra form of
synchronisation.)
If I can take it that you're actually talking about isolation
rather than full atomicity, I tend to agree with most of
what you write. Anyway, I make that assumption in my
subsequent comments...
You write that we need to isolate globals and dynamics.
As Alexander pointed out earlier, you also need
to isolate "thread private" objects. This includes
automatics (a.k.a. stack dwellers). In very many cases
the compiler has to do nothing special at all in order to
isolate automatics (specifically, where it can prove that
all objects existing within a given natural word/isolation
boundary are only accessed by the same single thread),
so it shouldn't waste much space, but requiring automatics
also to be isolated spells out that those cases where
special action is necessary must be dealt with correctly
by the compiler.
> But... what about 'char foo[2];'? Clearly the address "&foo"
> must be "aligned". But what about "&foo[1]"? If it IS, then
> you really need to force the compiler to change the definition
> of sizeof(char) in that compilation scope or break many
> patterns in previously portable code. For example, "char
> *bar = &foo; bar[1] = 0;". (One could construct nastier
> examples that would be harder to detect and fix.)
Changing sizeof(char) is a no-no. This, and the
rules for sizing arrays, provide a good reason why
the Java-esque default of having everything isolated
from everything else is not tenable for C and C++.
> What about structures? Is each field in the structure expanded?
> Essentially what we're saying is that if the machine can access
> 'long', but not 'int', 'short', or 'char', atomically, then we
> really allocate nothing smaller than 'long'. Is that acceptable?
> How does it impact application code (and data sizes)?
Right: this wouldn't be acceptable.
> The best strategy would probably be to say that AN array or A
> structure is an "atomicity unit". You don't, by default, gain
> any guaranteed atomic access to members of the unit. (This could
> be provided for by an additional pragma, or by something like
> 'isolated'; though the pragma would probably be cleaner.)
Yes, and/but:
- For the reasons stated above, I prefer "isolation unit".
- For what are properly known in C and C++ as arrays
of arrays, but which are often termed multi-dimensional
arrays, only the topmost array is an isolation unit (for one
thing because the language rules insist that T a[n][n] has
always to be n times the size of T b[n]). In contrast, structs
(and classes and unions) are allowed extra internal byte
padding, so they can always be isolation units.
- Objects that are not members of a struct/class/union
nor elements of an array should be in their own isolation
unit. An entirely non-isolated object is a dangerous
thing in a multi-threaded program: the languages would
do no service to their users by permitting it.
- The idea of a standard #pragma is, at least currently,
a contradiction in terms: they are specified to be used
for *implementation-defined* purposes. There
is always the possibility of changing this by introducing
the first ever standard #pragma, but I think it would be
difficult to sell this as better than a new keyword. (Plus
there is general dislike of the pre-processor amongst
the C++ standards people: they are unlikely to go for
anything that extends its role.) You don't need a pragma
anyway: when you need additional isolation units, just
refactor into multiple structs. For example,
// Members not guaranteed isolated from each other.
struct a {
char b;
char c;
};
// Members guaranteed isolated from each other.
struct ia {
struct { char b; } bb;
struct { char c; } cc;
};
Admittedly an anonymous-struct would be a nice
extension here, but for single member isolation
we can, in C++ at least, use an anonymous union
to permit the access syntax to remain unchanged.
// Members guaranteed isolated from each other.
// Anonymous-union syntax is C++ only.
struct ia2 {
union { char b; };
union { char c; };
};
> Often we want a larger alignment than strictly needed, for
> efficiency. The best unit here is almost always the machine's
> cache line size -- a value not commonly communicated to
> application code. This has proven particularly critical in
> designing data structures for NUMA environments, but compiler
> support tends to be pretty bad.
>
> Perhaps something like "#pragma align_all ({cache|atomicity})"
> (Where "cache" is required to subsume "atomicity", just to
> remove ambiguity.)
>
> I'm not entirely sure that'd be sufficient, either, but it's
> another idea to consider.
Aligning to cache lines *is* something that is suitable
for a #pragma: an environment-specific efficiency tweak.
This wouldn't be something that a language standard
would specify.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
-------- Original Message --------
From: "Garry Lancaster" <glanc...@ntlworld.com>
Newsgroups: comp.programming.threads
Subject: Re: Memory isolation
Message-ID: <EN7u9.8900$Af5.3...@newsfep2-win.server.ntli.net>
Alexander Terekhov:
> After spending some time [thanks to "mainframe schedulers: if you
> don't have cpu utilization at 99+%, something is seriously wrong"]
> trying to digest messages from Garry Lancaster and David Butenhof,
> I'm now thinking in the following direction:
>
> 1. std::thread_allocator<T>, thread_new/thread_delete, operator
> thread_new/operator thread_delete, thread_malloc()/thread_free(),
> etc. -- thread specific memory allocation facilities that would
> allow slightly more optimized/less expensive operations with
> respect to synchronization and isolation for data that is *NOT*
> meant to be thread-shared.
Alexander's later correction:
> Well, "*NOT* meant to be thread-shared" was probably confusing.
> The allocator AND all its allocated objects COULD be accessed by
> different threads, but serialized/synchronized -- with precluded
> asynchrony on some "higher" level. Sort of "dynamic segmented
> stack model" where the entire stack can be passed from thread to
> thread [if needed].
If I understand your correction correctly, all objects allocated
using these per-thread techniques are isolated except that
those allocated on the same thread need not be isolated
from each other. Makes sense.
I can think of two reasons why you might want thread-specific
allocation as part of a language standard:
1. You can reduce the padding between adjacent allocations
if you know they are only going to be used by the same
thread since isolation with respect to each other is not an
issue. This is sound in theory, however, the smallest allocation
chunks in most language library allocation routines are already
at or beyond the granularity of the natural isolation/word boundary.
If you only ask for 1 byte, you probably get 8 or 16 in many cases.
This happens because general purpose allocators need to
supply memory aligned to the maximum alignment requirement
of any type in the system and because of the bookkeeping space
overhead of small allocations. Your type-specific
std::thread_allocator<T> could get around the alignment issue,
but is likely that a relatively simple user-defined allocator
tailored for a specific purpose could out-perform it, so why
bother supplying this half-way house as standard?
2. Per-thread allocators can avoid the need for global
synchronization during each allocation and deallocation by
maintaining per-thread allocation and free lists etc. But
this doesn't require a special interface - thread local
storage is just as available to the current allocation
interfaces as it would be to your newly suggested ones.
I'm guessing things like the Hoard allocator do this.
So there is no advantage over what we have now. At
first glance you might think that using std::thread_allocator<T>
could get around the need to use TLS to implement
this, but the standard allocator interface doesn't work
like that: any state must be shared between objects.
(Bizarrely, all allocators of the same type must be able
to free each others allocations. Don't ask me why it is
that way. It just is.)
> 2. "isolation" scopes [might also be nested; possibly] for defs of
> objects of static storage duration and non-static class members:
>
> isolated {
>
> static char a;
> static char b;
> static mutex m1;
>
> }
>
> isolated {
>
> static char c;
> static char d;
> static mutex m2;
>
> }
For ease of comparison I'll re-write your examples as I would
write them if the isolation rules I proposed were in place.
(Just to show that we don't *need* a new keyword.)
static struct {
char a;
char b;
mutex m1;
} e;
static struct {
char c;
char d;
mutex m2;
} f;
or
static char a;
static char b;
static mutex m1;
static char c;
static char d;
static mutex m2;
That last is "over-isolated" compared to the others,
but given the relatively small amount of static data
in most programs any extra padding is likely to be
negligible (and the mutexes will most likely already be
aligned and padded to avoid cross-thread
interference in any case).
(Corrections applied to following.)
> struct something {
>
> isolated {
>
> char a;
> char b;
> mutex m1;
>
> }
>
> isolated {
>
> char c;
> char d;
> mutex m2;
>
> }
>
> } s; // isolated by default -- see below
struct something {
struct internal {
char a;
char b;
mutex m1;
};
struct internal c;
struct internal d;
} s;
> This would allow one to clearly express isolation boundaries.
Your use of the "isolated" keyword is sufficient, but it's
not necessary.
> By default, definitions of objects of static storage duration
> shall be treated as being isolated from each other:
>
> static char a; // isolated { static char a; }
> static char b; // isolated { static char b; }
Yes, I agree, and so do the rules I posted.
> Objects of automatic storage duration need NOT be isolated
> [the isolation of the entire thread stack aside] unless an
> address/ref is taken and it can't be proven that access to
> it from some other thread is impossible.
I agree with your intent, but you don't need to say that.
Just say they "shall be isolated" and let the implementations
figure out what they actually have to do to achieve it for
each object. If that's nothing and they can easily deduce
that during compilation, they will do.
> 3. Array elements can be made isolated ONLY using class type
> with "isolated" member(s):
>
> char c_array[2]; // no isolation with respect to elems
This is the same with my model.
(Corrections applied to following.)
> struct isolated_char {
>
> isolated { char c; }
>
> } ic_array[2]; // fully isolated ic_array[0].c
> // and ic_array[1].c
struct ichar { char c; } ic_array[2];
I think the two sets of rules are the same except that
in your model sub-objects or array elements of
class-type are not guaranteed to be isolated from
their "sibling" sub-objects or elements, and in mine
they are. Both models work.
In other words your "isolated" has the same semantics
with respect to isolation as sub-object structs/classes/
unions in mine.
I don't think there is any real difference in the
isolation boundaries achievable with the two sets
of rules: they just differ in their defaults and how
you control them.
So, the default amount of isolation in your model
is slightly less than in mine, which means you are
forced to hand-tweak the isolation boundaries
slightly more often to ensure isolation safety. In
favour of your rules you will undoubtedly save a
few bytes here and there in many programs. Since
ideally we would just be standardising current
practice, it would be interesting to know what current
compilers do with respect to isolation units (if they
even consider them).
The other main difference is the addition of the
keyword. Why are new keywords a bad thing?
- Any programs that already use the identifier
"isolated" (e.g. for a variable or type) will break. If
you choose the uglier "__isolated" instead you would
avoid breaking standard-conforming programs
provided no compiler vendor had already used this
as an extension. (The language standards say that
names containing double underscores are reserved
for implementations.)
- You create a mismatch between pre- and post-
isolated-aware code. Any single use of the new keyword
means the program will not compile on a pre-isolated
compiler.
> 4. Introduce something ala offsetof-"magic" with respect to
> alignment/padding that would provide the means to write
> thread-shared *AND* thread-private allocators entirely in
> standard C/C++.
What problems are there at the moment that wouldn't
be fixed by either set of suggested isolation rules?
> 5. In the single threaded "mode", isolation scopes can simply
> be ignored.
Again, you are right, but you do not need to say so
explicitly: implementations can figure that out for
themselves, provided they can tell the difference
between a single- and a multi-threaded build.
I bet that most of them will choose to keep the sizes
of all types the same across the different build models
though. Doing otherwise is not wrong but is likely to
break code that works but assumes more than it
should about structure layouts. (Some people think it
is a good thing for compilers to go out of their way to
break non-conforming code, though. Maybe I'm too
soft ;-)
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
-------- Original Message --------
From: Alexander Terekhov <tere...@web.de>
Newsgroups: comp.programming.threads
Subject: Re: Memory isolation
Date: Fri, 25 Oct 2002 20:31:09 +0200
Message-ID: <3DB98DED...@web.de>
Garry Lancaster wrote:
>
> Alexander Terekhov:
> > After spending some time [thanks to "mainframe schedulers: if you
> > don't have cpu utilization at 99+%, something is seriously wrong"]
> > trying to digest messages from Garry Lancaster and David Butenhof,
> > I'm now thinking in the following direction:
> >
> > 1. std::thread_allocator<T>, thread_new/thread_delete, operator
> > thread_new/operator thread_delete, thread_malloc()/thread_free(),
> > etc. -- thread specific memory allocation facilities that would
> > allow slightly more optimized/less expensive operations with
> > respect to synchronization and isolation for data that is *NOT*
> > meant to be thread-shared.
>
> Alexander's later correction:
> > Well, "*NOT* meant to be thread-shared" was probably confusing.
> > The allocator AND all its allocated objects COULD be accessed by
> > different threads, but serialized/synchronized -- with precluded
> > asynchrony on some "higher" level. Sort of "dynamic segmented
> > stack model" where the entire stack can be passed from thread to
> > thread [if needed].
>
> If I understand your correction correctly, all objects allocated
> using these per-thread techniques are isolated except that
> those allocated on the same thread need not be isolated
> from each other.
Well, I'd say that all objects allocated by the same allocator
are isolated from all other objects allocated by some other
allocator(s) but aren't necessarily isolated with respect to
each other. This would mean that the allocator and all its
allocated object shall be accesses by only one thread at any
time, but the "ownership" can be "transferred" from thread to
thread [optionally; if needed/wanted].
> Makes sense.
>
> I can think of two reasons why you might want thread-specific
> allocation as part of a language standard:
>
> 1. You can reduce the padding between adjacent allocations
> if you know they are only going to be used by the same
> thread since isolation with respect to each other is not an
> issue.
Yes.
> This is sound in theory, however, the smallest allocation
> chunks in most language library allocation routines are already
> at or beyond the granularity of the natural isolation/word boundary.
> If you only ask for 1 byte, you probably get 8 or 16 in many cases.
> This happens because general purpose allocators need to
> supply memory aligned to the maximum alignment requirement
> of any type in the system and because of the bookkeeping space
> overhead of small allocations.
Well, yes. And even the "buckets"-things like
http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365c35.htm#HDRI45811
(see MALLOCBUCKETS...)
http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/malloc_buckets.htm
("Malloc Buckets")
have some "restrictions" w.r.t. sizing/alignment:
"The bucket sizing factor must be a multiple of 8 for 32-bit
implementations and a multiple of 16 for 64-bit implementations
in order to guarantee that addresses returned from malloc
subsystem functions are properly aligned for all data types."
> Your type-specific
> std::thread_allocator<T> could get around the alignment issue,
I'm not sure how would one "get around the alignment issue"...
> but is likely that a relatively simple user-defined allocator
> tailored for a specific purpose could out-perform it, so why
> bother supplying this half-way house as standard?
Well, yes. I've played a bit with "user-defined allocator
tailored for a specific purpose" myself. You might want to
take a look at the following: [that's rather old stuff, but
it illustrates some ideas -- modulo bugs ;-) ]
http://www.terekhov.de/hsamemal.hpp
http://www.terekhov.de/hsamemal.inl
http://www.terekhov.de/hsamemal.cpp
http://www.terekhov.de/hsamemal.c
But I'd really prefer to use something "Standard" instead.
[...]
> > 2. "isolation" scopes [might also be nested; possibly] for defs of
> > objects of static storage duration and non-static class members:
> >
> > isolated {
> >
> > static char a;
> > static char b;
> > static mutex m1;
> >
> > }
> >
> > isolated {
> >
> > static char c;
> > static char d;
> > static mutex m2;
> >
> > }
>
> For ease of comparison I'll re-write your examples as I would
> write them if the isolation rules I proposed were in place.
> (Just to show that we don't *need* a new keyword.)
>
> static struct {
> char a;
> char b;
> mutex m1;
> } e;
> static struct {
> char c;
> char d;
> mutex m2;
> } f;
>
> or
>
> static char a;
> static char b;
> static mutex m1;
> static char c;
> static char d;
> static mutex m2;
>
> That last is "over-isolated" compared to the others,
> but given the relatively small amount of static data
> in most programs any extra padding is likely to be
> negligible (and the mutexes will most likely already be
> aligned and padded to avoid cross-thread
> interference in any case).
>
> (Corrections applied to following.)
> > struct something {
> >
> > isolated {
> >
> > char a;
> > char b;
> > mutex m1;
> >
> > }
> >
> > isolated {
> >
> > char c;
> > char d;
> > mutex m2;
> >
> > }
> >
> > } s; // isolated by default -- see below
>
> struct something {
> struct internal {
> char a;
> char b;
> mutex m1;
> };
> struct internal c;
> struct internal d;
> } s;
>
> > This would allow one to clearly express isolation boundaries.
>
> Your use of the "isolated" keyword is sufficient, but it's
> not necessary.
Apart from the problem of "over-isolation". ;-)
> > By default, definitions of objects of static storage duration
> > shall be treated as being isolated from each other:
> >
> > static char a; // isolated { static char a; }
> > static char b; // isolated { static char b; }
>
> Yes, I agree, and so do the rules I posted.
>
> > Objects of automatic storage duration need NOT be isolated
> > [the isolation of the entire thread stack aside] unless an
> > address/ref is taken and it can't be proven that access to
> > it from some other thread is impossible.
>
> I agree with your intent, but you don't need to say that.
> Just say they "shall be isolated" and let the implementations
> figure out what they actually have to do to achieve it for
> each object. If that's nothing and they can easily deduce
> that during compilation, they will do.
>
> > 3. Array elements can be made isolated ONLY using class type
> > with "isolated" member(s):
> >
> > char c_array[2]; // no isolation with respect to elems
>
> This is the same with my model.
Yes, you've convinced me that introduction of yet another type
qualifier [e.g. "isolated char", where sizeof( isolated char )
>= sizeof( char )] would be rather messy.
> (Corrections applied to following.)
> > struct isolated_char {
> >
> > isolated { char c; }
> >
> > } ic_array[2]; // fully isolated ic_array[0].c
> > // and ic_array[1].c
>
> struct ichar { char c; } ic_array[2];
>
> I think the two sets of rules are the same except that
> in your model sub-objects or array elements of
> class-type are not guaranteed to be isolated from
> their "sibling" sub-objects or elements, and in mine
> they are. Both models work.
Yes, I'm just fearing a bit the problem/overhead of "over-
isolation". Imagine how much memory could be wasted when
isolation is be done on something around 64 bytes [cache
line size]...
> In other words your "isolated" has the same semantics
> with respect to isolation as sub-object structs/classes/
> unions in mine.
>
> I don't think there is any real difference in the
> isolation boundaries achievable with the two sets
> of rules: they just differ in their defaults and how
> you control them.
>
> So, the default amount of isolation in your model
> is slightly less than in mine, which means you are
> forced to hand-tweak the isolation boundaries
> slightly more often to ensure isolation safety. In
> favour of your rules you will undoubtedly save a
> few bytes here and there in many programs. Since
> ideally we would just be standardising current
> practice, it would be interesting to know what current
> compilers do with respect to isolation units (if they
> even consider them).
>
> The other main difference is the addition of the
> keyword. Why are new keywords a bad thing?
They are NOT Good Things, for sure. ;-)
> - Any programs that already use the identifier
> "isolated" (e.g. for a variable or type) will break. If
> you choose the uglier "__isolated" instead you would
> avoid breaking standard-conforming programs
> provided no compiler vendor had already used this
> as an extension. (The language standards say that
> names containing double underscores are reserved
> for implementations.)
>
> - You create a mismatch between pre- and post-
> isolated-aware code. Any single use of the new keyword
> means the program will not compile on a pre-isolated
> compiler.
Yes, that's a problem. However, consider that I'm sort
of dreaming to have even more new keywords/constructs...
http://groups.google.com/groups?selm=3DA6C62A.AB8FF3D3%40web.de
(Subject: Re: local statics and TLS objects)
So, few keywords less here and there... ``big deal.'' ;-) ;-)
> > 4. Introduce something ala offsetof-"magic" with respect to
> > alignment/padding that would provide the means to write
> > thread-shared *AND* thread-private allocators entirely in
> > standard C/C++.
>
> What problems are there at the moment that wouldn't
> be fixed by either set of suggested isolation rules?
First off, under your rules, struct Char { char c; }
could be way too big [and with no gains whatsoever] for
purely "thread-specific"/"intra-thread" stuff. I really
don't like it. Under "my rules", I'd probably need to
know how much extra space need to be added to make my
custom user allocator "inter-thread" safe.
regards,
alexander.
-------- Original Message --------
From: "Garry Lancaster" <glanc...@ntlworld.com>
Newsgroups: comp.programming.threads
Subject: Re: Memory isolation
Message-ID: <bx7v9.135$P55....@newsfep1-win.server.ntli.net>
Date: Mon, 28 Oct 2002 09:35:32 -0000
[snip]
Garry Lancaster:
> > This is sound in theory, however, the smallest allocation
> > chunks in most language library allocation routines are already
> > at or beyond the granularity of the natural isolation/word
> > boundary. If you only ask for 1 byte, you probably get 8 or 16
> > in many cases. This happens because general purpose allocators
> > need to supply memory aligned to the maximum alignment
> > requirement of any type in the system and because of the
> > bookkeeping space overhead of small allocations.
[snip]
> > Your type-specific
> > std::thread_allocator<T> could get around the alignment issue,
Alexander Terekhov:
> I'm not sure how would one "get around the alignment issue"...
I simply mean that when T is known at compile time
an allocator can be designed that only satisfies T's
alignment requirements, rather than having to satisfy
the most conservative alignment requirements of all
types.
If a concrete example helps, think of std::thread_allocator<char>
implemented as a simple array-based allocator.
Maybe this allocator discussion would be better as a separate
thread. It is only tenuously related to the main subject.
[snip]
> > Your use of the "isolated" keyword is sufficient, but it's
> > not necessary.
> Apart from the problem of "over-isolation". ;-)
It's not necessary to avoid over-isolation either.
Any data layout you can acheive with the isolated
keyword can also be acheived without it by my
suggested rules.
[snip]
> Yes, I'm just fearing a bit the problem/overhead of "over-
> isolation". Imagine how much memory could be wasted when
> isolation is be done on something around 64 bytes [cache
> line size]...
Aren't you overstating your case a bit here? Aren't most
platforms' natural isolation boundaries the same or less
than their natural word size, so typically 8 bytes or less
rather than 64 bytes?
I understood from previous comments that cache size
alignment was an efficiency issue, not an isolation
issue (recall I defined isolation as what is necessary
to avoid word-tearing).
If I misunderstood then I would agree that some rethink
is required.
[snip]
> > The other main difference is the addition of the
> > keyword. Why are new keywords a bad thing?
>
> They are NOT Good Things, for sure. ;-)
>
> > - Any programs that already use the identifier
> > "isolated" (e.g. for a variable or type) will break. If
> > you choose the uglier "__isolated" instead you would
> > avoid breaking standard-conforming programs
> > provided no compiler vendor had already used this
> > as an extension. (The language standards say that
> > names containing double underscores are reserved
> > for implementations.)
> >
> > - You create a mismatch between pre- and post-
> > isolated-aware code. Any single use of the new keyword
> > means the program will not compile on a pre-isolated
> > compiler.
>
> Yes, that's a problem. However, consider that I'm sort
> of dreaming to have even more new keywords/constructs...
>
> http://groups.google.com/groups?selm=3DA6C62A.AB8FF3D3%40web.de
> (Subject: Re: local statics and TLS objects)
>
> So, few keywords less here and there... ``big deal.'' ;-) ;-)
Smileys noted, but I don't think the arguments against new
keywords are quite so easily dismissed. Particularly when
there is a counter-proposal without any new keywords.
> > > 4. Introduce something ala offsetof-"magic" with respect to
> > > alignment/padding that would provide the means to write
> > > thread-shared *AND* thread-private allocators entirely in
> > > standard C/C++.
> >
> > What problems are there at the moment that wouldn't
> > be fixed by either set of suggested isolation rules?
>
> First off, under your rules, struct Char { char c; }
> could be way too big [and with no gains whatsoever] for
> purely "thread-specific"/"intra-thread" stuff.
> I really don't like it.
(Ignoring isolation I think most people would write the above
as
typedef char Char;
anyway. But I take your general point even if I quibble over
your exact example.)
Same with (admittedly even more unlikely)
struct Char { isolated { char c; } };
under your rules.
Someone who understood isolation would tend to avoid
over-isolation in any case, so we are mostly talking about
supporting those who are not familiar with the concept.
Fair enough, that is a large enough chunk of people at the
moment (and until recently I was one of them!) and I wouldn't
expect isolation being tackled by the standard would
improve matters that much.
This is a hard decision, but at the moment I prefer a
solution that offers more of a safety net for these people:
one that more often produces code that is over-isolated
and wastes a few extra bytes than code that is under-
isolated and doesn't work. Not that even my suggestion
is totally safe for these people: we already rejected the
Java rules as too inefficient. The best solution in any case
is programmer education, but is that realistic here?
[snip]
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
I mean is 'volatile' a reasonable way to deal with them,
as it is with 'i/o ports'.
-Mike
Yeah. Note that Mike STILL doesn't get it (given his latest message).
>
> Your response to that particular question was ....
< Heck, it took me almost *1.5* cigarette. >
2002-11-16 10:40:01 PST [http://tinyurl.com/2st3]
<http://tinyurl.com/2r53> was posted by me to this thread pointing
to the "Talking about volatile and threads synchronization..."
c.p.t. discussion.
2002-11-17 16:49:14 PST [http://tinyurl.com/2st5]
Mike posts the question "What about objects shared with other processes."
2002-11-17 17:24:20 PST [http://tinyurl.com/2st7]
I reply "Hi Mike, contributing to the "quite a few $10^6" {growing}
amount or what?"
2002-11-17 18:35:34 PST [http://tinyurl.com/2st9]
http://groups.google.com/groups?selm=3DC9930A.10CB5BE%40web.de was
posted by me to this thread {AGAIN} pointing to the "Talking about
volatile and threads synchronization..." c.p.t. discussion.
2002-11-17 21:14:49 PST [http://tinyurl.com/2sta]
Mike replies: "I've already read those threads. You've just effectively
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
told me "Alexander is correct because Alexander says so." Huh?". And
a bit later joins Jerry in my collection of "plonkers" (I consider it
as sort of "trophies", so to speak).
Well, http://groups.google.com/groups?selm=3CA25853.2B30E9BC%40web.de
(but this time it's Mike/"above", not Jerry/"below").
regards,
alexander.
I suggested at the recent C standards meeting that we needed
to be working on thread support, but the general response was
to the effect that other people are already working on it.
I can only hope that they do a better job than I fear they
might.
I don't understand this. Could you please elaborate?
With some example(s), of possible. TIA.
regards,
alexander.
Me too.
There are lots of efforts directed toward specification of threading
libraries -- pthreads is a noteworthy example. I would like to extend
C just enough that those libraries could be written in C. Such such
support could be based on minor adaptations of:
- setjmp/longjmp
- volatility
- sig_atomic_t
plus
- a library functions that behaves as a barrier.
- some way of guaranteeing isolation, e.g., requiring that dynamically
allocated structs be isolated or adding an alternate version of malloc()
that provides such a guarantee.
Tom Payne
#include <extensions.h>
typedef union {
ext_cache_line_aligned_t dummy;
uint_least32_t data;
} cla_ul32;
cla_ul32 *p = malloc(sizeof(cla_ul32));
*p->data = 42; // whataver
Of course there are ways to clean up the cosmetics.
Oops, get the extra * off there and fix it up any other
way you think appropriate. The point about the union
with a special type should be clear, though.
volatile-qualification of type can be useful for the state objects of
locks, etc. For the lock-protected, thread-shared data objects it is
neither necessary nor sufficient. Besides that, it imposes a large
and unnecessary performance penalty.
Tom Payne
Clever idea.
Tom Payne
Does this mean that in addition to cache line alignment, sizeof
such union is ALWAYS guaranteed to be a multiple of "cache line"
(if sizeof(data) >= cache line)?
> + cla_ul32 *p = malloc(sizeof(cla_ul32));
Do you mean that this version of malloc will cache align all
allocations of "N * sizeof( ext_cache_line_aligned_t )" size?
I mean: what about malloc(1) and the implementation where all
pointers have the same representation?
> + *p->data = 42; // whataver
> +
> + Of course there are ways to clean up the cosmetics.
Such as
typedef union {
isolated { uint_least32_t data; }
} cla_ul32;
typedef struct {
isolated { uint_least32_t data1; }
isloated { uint_least32_t data2; }
} cla_ul32s;
<?> ;-)
regards,
alexander.
>
>t...@cs.ucr.edu wrote:
>>
>> In comp.std.c Douglas A. Gwyn <DAG...@null.net> wrote:
>> + Alexander Terekhov wrote:
>> +> "Douglas A. Gwyn" wrote:
>> +> > There is nothing to prevent introduction of some new
>> +> > data (C object) type with any access property you
>> +> > think all implementations can provide, and having
>> +> > programmers alias their other types against it using
>> +> > unions.
>> +> I don't understand this. Could you please elaborate?
>> +> With some example(s), of possible. TIA.
>> +
>> + #include <extensions.h>
>> + typedef union {
>> + ext_cache_line_aligned_t dummy;
>> + uint_least32_t data;
>> + } cla_ul32;
>
>Does this mean that in addition to cache line alignment, sizeof
>such union is ALWAYS guaranteed to be a multiple of "cache line"
>(if sizeof(data) >= cache line)?
sizeof a union is always a multiple of the lowest common multiple of
the alignment requirements of the members of the union. Otherwise
arrays couldn't exist, because array[1] wouldn't be correctly aligned.
>> + cla_ul32 *p = malloc(sizeof(cla_ul32));
>
>Do you mean that this version of malloc will cache align all
>allocations of "N * sizeof( ext_cache_line_aligned_t )" size?
>I mean: what about malloc(1) and the implementation where all
>pointers have the same representation?
malloc has to return a pointer suitably aligned for *all* types, so no
change to malloc is required.
Tom
Well, ``I know.'' Now, what if array[]/ptr arithmetic isn't used
ANYWERE in the application?
> >> + cla_ul32 *p = malloc(sizeof(cla_ul32));
> >
> >Do you mean that this version of malloc will cache align all
> >allocations of "N * sizeof( ext_cache_line_aligned_t )" size?
> >I mean: what about malloc(1) and the implementation where all
> >pointers have the same representation?
>
> malloc has to return a pointer suitably aligned for *all* types,
> so no change to malloc is required.
A) "suitably aligned" might be a rather "tricky" thing -- open to
various interpreteations, I'm afraid (see 1354-lines message).
B) If you're saying that each and every dynamic allocation shall
consume a multiple of cache line on MP (AFAIK, 64 bytes is
quite common; currently), then well, it's "OK".
regards,
alexander.
>> sizeof a union is always a multiple of the lowest common multiple of
>> the alignment requirements of the members of the union. Otherwise
>> arrays couldn't exist, because array[1] wouldn't be correctly aligned.
>
>Well, ``I know.'' Now, what if array[]/ptr arithmetic isn't used
>ANYWERE in the application?
This isn't a practical problem worth discussing.
>> >> + cla_ul32 *p = malloc(sizeof(cla_ul32));
>> >
>> >Do you mean that this version of malloc will cache align all
>> >allocations of "N * sizeof( ext_cache_line_aligned_t )" size?
>> >I mean: what about malloc(1) and the implementation where all
>> >pointers have the same representation?
>>
>> malloc has to return a pointer suitably aligned for *all* types,
>> so no change to malloc is required.
>
>A) "suitably aligned" might be a rather "tricky" thing -- open to
> various interpreteations, I'm afraid (see 1354-lines message).
>B) If you're saying that each and every dynamic allocation shall
> consume a multiple of cache line on MP (AFAIK, 64 bytes is
> quite common; currently), then well, it's "OK".
If this cache line type thing exists at all (which it may not on
current implementations - I don't know of a type with a 64 byte
alignment requirement!), then I believe that it is ok, since malloc
will have to align calls suitably for it, on 64 byte boundaries
(making memory allocation rather wasteful!).
But I think the relevent infrastructure for this should be in POSIX,
not C and C++. Perhaps a new allocation function called ialloc
(isolated alloc) could be added, to prevent having to make malloc so
wasteful. Perhaps you have a better suggestion?
Tom
Well,
void f() {
union LOCAL {
ext_cache_line_aligned_t dummy;
X data; // sizeof(X) == cache line + 1
} cla_X;
char c;
/* ... */
// &cla_X.data is passed to some other thread
// and f() does something with its char c and
// NO array[]/ptr arithmetic (for cla_X) is
// used here. Of course, f()'s author SHOULD
// synchronize with other threads (cla_X's
// lifetime). But what about c's and cla_X's/
// .data's ISOLATION with respect to each
// other?
}
[...]
> Perhaps you have a better suggestion?
All that I currently have can be found in the "1354-line"
message and [some "extra" stuff] in this c.p.t. thread:
regards,
alexander.
I would presume that, if the size of ext_cache_line_aligned_t is as
great as that of a cache line, cla_X and c are isolated. Otherwise,
we will have an case of false sharing. If we have strong cache
coherence, the consequence of that false sharing will be performance
degradation. Otherwise, ...
+ [...]
+> Perhaps you have a better suggestion?
+
+ All that I currently have can be found in the "1354-line"
+ message and [some "extra" stuff] in this c.p.t. thread:
+
+ <http://tinyurl.com/2tua>
By the way, thanks for that 1354-line posting -- I still studying it.
Tom Payne
No, but the union must be properly aligned for each of its
component types.
> Do you mean that this version of malloc will cache align all
> allocations of "N * sizeof( ext_cache_line_aligned_t )" size?
Well, probably I shouldn't have used malloc but rather
have simply declared the variable.
Note that the same issue arises no matter how you want
to specify such alignment.