"volatile" on variables and current processors

Vir Campestris

unread,

Aug 19, 2019, 4:56:53 PM8/19/19

to

In the old days volatile on a variable meant putting it in memory, not
holding it in a register. That was easy.

C++ is designed around multiple processors with non-coherent caches, and
we have memory fences and such to keep things clean.

volatile is no longer adequate for such machines.

That's the theory - but the practice - I've bumped into a big lump of C
code in our codebase being built by #include from a C++ program - so the
compiler will be applying C++ rules. And it's got lots of use of
volatile in the old way - don't cache this please, another thread may
touch it.

Will this actually be a problem on current ARM processors?

I'm reasonably sure that none of the accesses are close enough that
re-ordering would matter.

Andy

red floyd

unread,

Aug 19, 2019, 6:20:52 PM8/19/19

to

On 8/19/2019 1:56 PM, Vir Campestris wrote:
> In the old days volatile on a variable meant putting it in memory, not
> holding it in a register. That was easy.
>

No. That's not what volatile means. ISO/IEC 14882:2003 7.1.5.1/8 says
that volatile is an instruction to the optimizer.

"volatile is a hint to the implementation to avoid aggressive
optimization involving the object because the value of the object
might be changed by means undetectable by an implementation."

DISCLAIMER: C++03 is the latest copy of the standard that I have.
Can someone check a newer version and see if that's still what is
indicated?

red floyd

unread,

Aug 19, 2019, 6:21:52 PM8/19/19

to

Also, it's long been known and stated here that volatile is insufficient
for any threading purposes.

Chris M. Thomasson

unread,

Aug 19, 2019, 7:37:21 PM8/19/19

to

Right. Its not portable, they do not imply any memory barrier at all,
some things do some don't, it was never meant for threading. Use
std::atomic with _relaxed_ membars to read and write volatiles for a
device or something.

Keith Thompson

unread,

Aug 19, 2019, 8:08:45 PM8/19/19

to

ISO/IEC 14882:2011(E) 7.1.6.1 [dcl.type.cv] p6-7:

6 If an attempt is made to refer to an object defined with a
volatile-qualified type through the use of a glvalue with a
non-volatile-qualified type, the program behavior is undefined.

7 [ Note: volatile is a hint to the implementation to avoid

aggressive optimization involving the object because the value of
the object might be changed by means undetectable by an

implementation. See 1.9 for detailed semantics. In general, the
semantics of volatile are intended to be the same in C++ as they
are in C. — end note ]

1.9 [intro.execution] paragraph 8 says:

8 The least requirements on a conforming implementation are:
— Access to volatile objects are evaluated strictly according to
the rules of the abstract machine.
[...]

I don't have a copy of the C++17 standard, but N4700 (a draft of the
C++17 standard) 10.1.7.1 [dcl.type.cv] p5-6 says:

5 The semantics of an access through a volatile glvalue are
implementation-defined. If an attempt is made to access an object
defined with a volatile-qualified type through the use of a
non-volatile glvalue, the behavior is undefined.

6 [ Note: volatile is a hint to the implementation to avoid

aggressive optimization involving the object because the value of
the object might be changed by means undetectable by an

implementation. Furthermore, for some implementations, volatile
might indicate that special hardware instructions are required to
access the object. See 4.6 for detailed semantics. In general, the
semantics of volatile are intended to be the same in C++ as they
are in C. — end note ]

4.6 [intro.execution] p7 says:

7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly
according to the rules of the abstract machine.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Alf P. Steinbach

unread,

Aug 19, 2019, 8:22:34 PM8/19/19

to

On 20.08.2019 00:21, red floyd wrote:
> On 8/19/2019 3:20 PM, red floyd wrote:
>> On 8/19/2019 1:56 PM, Vir Campestris wrote:
>>> In the old days volatile on a variable meant putting it in memory,
>>> not holding it in a register. That was easy.
>>>
>>
>> No. That's not what volatile means. ISO/IEC 14882:2003 7.1.5.1/8 says
>> that volatile is an instruction to the optimizer.
>>
>> "volatile is a hint to the implementation to avoid aggressive
>> optimization involving the object because the value of the object
>> might be changed by means undetectable by an implementation."
>>
>> DISCLAIMER: C++03 is the latest copy of the standard that I have.
>> Can someone check a newer version and see if that's still what is
>> indicated?

C++17 $7.1.7.1, paragraphs 5 and 6:

[quote]
What constitutes an access to an object that has volatile-qualified type
is implementation-defined. If an attempt is made to refer to an object

defined with a volatile-qualified type through the use of a glvalue with

a non-volatile-qualified type, the behavior is undefined.

[Note: volatile is a hint to the implementation to avoid aggressive

optimization involving the object because the value of the object might

be changed by means undetectable by an implementation. Furthermore, for
some implementations, volatile might indicate that special hardware

instructions are required to access the object. See 1.9 for detailed

semantics. In general, the semantics of volatile are intended to be the

same in C ++ as they are in C.
[/quote]

> Also, it's long been known and stated here that volatile is insufficient
> for any threading purposes.

There is a proposal to deprecate `volatile`.

<url: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1152r2.html>

Cheers!,

- Alf

Bonita Montero

unread,

Aug 20, 2019, 12:08:51 AM8/20/19

to

volatile might be still relevant if you implement your own
synchronization-primitives or implement lockfree programming.

David Brown

unread,

Aug 20, 2019, 4:05:42 AM8/20/19

to

It is not a proposal to deprecate "volatile", despite the name of the
paper - it is a proposal to deprecate the unreliable and inconsistent
aspects which can have wildly different behaviour on different compilers.

Volatile loads and stores are very useful, implemented consistently, and
the paper here does not suggest removing them at all. Other operations
- in particular, things like "x++;" or "x = y = z;" are poorly specified
for volatiles, and there is great variation between targets and
compilers on how they work. Writing such volatile expressions in code
is madness, and it is a good idea to remove them from the language.

C++ should also adopt the changes made in C18 regarding volatiles, where
the language is changed to describe volatile accesses, not volatile
objects. This is consistent with how compilers implement volatile and
how programmers often use volatile.

(Just because "volatile" is insufficient for many aspects of
multithreading, does not mean it is not essential and sufficient for
some programming tasks.)

David Brown

unread,

Aug 20, 2019, 4:14:33 AM8/20/19

to

On 19/08/2019 22:56, Vir Campestris wrote:
> In the old days volatile on a variable meant putting it in memory, not
> holding it in a register. That was easy.
>
> C++ is designed around multiple processors with non-coherent caches, and
> we have memory fences and such to keep things clean.

No, it isn't - it is designed around single processors, single threads,
and processors that give a simple, serial ordering to everything.
Multi-threading and the rest of it is an add-on that came in C++11.

Of course, this makes programming for multi-threaded and multi-core
systems much easier as there is support in the language.

>
> volatile is no longer adequate for such machines.

Volatile alone was never adequate for such machines. Until atomics,
locks, etc., made it into the language and library with C++11, you used
OS-specific or compiler-specific features for synchronisation, locks,
atomic accesses, and so on. In many cases, "volatile" was part of the
implementation of these features - but it was never sufficient, and
usually supplemented by compiler intrinsics or inline assembly.

>
> That's the theory - but the practice - I've bumped into a big lump of C
> code in our codebase being built by #include from a C++ program - so the
> compiler will be applying C++ rules. And it's got lots of use of
> volatile in the old way - don't cache this please, another thread may
> touch it.
>
> Will this actually be a problem on current ARM processors?
>
> I'm reasonably sure that none of the accesses are close enough that
> re-ordering would matter.
>
> Andy

That all depends on what you mean by "current ARM processors" and what
you are trying to do. The "current ARM processors" I mostly use are all
single core with in-order execution and no reordering for loads or
stores in the hardware - "volatile" is sufficient for many purposes, and
can be a lot more efficient than atomics. But that is because my
"current ARM processors" are cores for modern microcontrollers, rather
than general-purpose multi-core processors.

And if you mean multi-core systems, then "volatile" is unlikely to be
sufficient on its own. You can't rely on any measure of "distance
between accesses" to keep you safe.

Juha Nieminen

unread,

Aug 20, 2019, 9:03:41 AM8/20/19

to

Vir Campestris <vir.cam...@invalid.invalid> wrote:
> And it's got lots of use of
> volatile in the old way - don't cache this please, another thread may
> touch it.

volatile does in no way make a variable thread-safe. And it has never been
even really meant for that.

The main use for volatile is in certain architectures (eg. certain
embedded systems) where eg. certain memory locations have more significance
to the system than being merely dummy storage for a value. For example,
a particular memory location might be mapped by the hardware to be input
and output to a port (which may be, for instance, some kind of 7-segment
LCD display or the like). Thus writing to that memory location will output
the value to that port, and readint that location will read the port. In
these architectures telling the compilar that eg. a pointer to that memory
location is pointing to a volatile value may be important, as it (usually)
prevents the compiler from optimizing away reads and writes to it.

In general, for "normal" variables or pointers you don't want to
needlessly declare then volatile, because they can become really inefficient,
as the compiler usually obey the instruction and literally read and write
to that variable in memory every single time it's done in the code.
(For example, declaring a loop counter variable volatile for no good
reason would be rather silly.)

David Brown

unread,

Aug 20, 2019, 10:18:38 AM8/20/19

to

On 20/08/2019 15:03, Juha Nieminen wrote:
> Vir Campestris <vir.cam...@invalid.invalid> wrote:
>> And it's got lots of use of
>> volatile in the old way - don't cache this please, another thread may
>> touch it.
>
> volatile does in no way make a variable thread-safe. And it has never been
> even really meant for that.
>
> The main use for volatile is in certain architectures (eg. certain
> embedded systems) where eg. certain memory locations have more significance
> to the system than being merely dummy storage for a value. For example,
> a particular memory location might be mapped by the hardware to be input
> and output to a port (which may be, for instance, some kind of 7-segment
> LCD display or the like). Thus writing to that memory location will output
> the value to that port, and readint that location will read the port. In
> these architectures telling the compilar that eg. a pointer to that memory
> location is pointing to a volatile value may be important, as it (usually)
> prevents the compiler from optimizing away reads and writes to it.
>

That was the original motivation for "volatile", and it is still a major
use-case. (Usually the hardware will have features for treating the
memory locations of these hardware registers in special ways, such as
skipping caches and disallowing re-ordering in buffers or speculative
accesses - this gives you some of the effects of C++11 atomic accesses
on these addresses even when you just have a simple "volatile".)

"volatile" also finds use in single core systems - it has long been
essential in microcontroller programming for handling data between
interrupt functions and other code, and on most microcontrollers it is
still sufficient (while C++11 atomics would be overkill and slower).

Sometimes it is also used to force particular effects. For example, if
you have a pure calculation function "foo" (i.e., it has no side effects
and does not access non-const global data), and you want to know how
long it takes to run, you might try:

const int x = 123;
auto start_time = get_time();
for (int i = 0; i < 1000000; i++) {
int y = foo(x);
}
auto end_time = get_time();
std::cout << "Time for 1000000 runs is " <<
(end_time - start_time) << "\n";

Enable optimisation, and the compiler may happily remove the entire loop.

But if you use:

volatile const int x = 123;

and

volatile int y = foo(x);

you get the timing you want.

> In general, for "normal" variables or pointers you don't want to
> needlessly declare then volatile, because they can become really inefficient,
> as the compiler usually obey the instruction and literally read and write
> to that variable in memory every single time it's done in the code.
> (For example, declaring a loop counter variable volatile for no good
> reason would be rather silly.)
>

Yes.

Vir Campestris

unread,

Aug 20, 2019, 4:58:20 PM8/20/19

to

In this case I have a damn great lump of C code (not C++) which has been
jammed into the C++ compiler as C++.

But I don't need it to be portable. It only needs to run on medium sized
ARMs with maybe 4 processors.

I know what the code ought to be. It ought to be re-written in C++ using
atomics (and in several cases other library features - I noticed a
spin-lock)

But it's big. It isn't going to be rewritten unless it's proved to be a
problem. And proving this would be hard - it'll be a race condition
somewhere.

I was only hoping someone here had experience of the actual behaviour of
volatile - but it looks like not.

Thanks anyway
Andy

Scott Lurndal

unread,

Aug 20, 2019, 6:18:34 PM8/20/19

to

On ARM, the behavior will depend on the memory type. It should work fine
when accessing device memory (e.g. device command/status registers, PCIe
ECAM space, etc).

For Normal, Sharable, Cacheable memory the weak (relative to x86) memory
model will cause problems even if you qualify your accesses with the volatile
qualifier (in a highly out-of-order pipeline, other accesses may pass the
volatile access or vice versa, so you need a load/store with ordering
semantics).

Paavo Helde

unread,

Aug 21, 2019, 2:03:10 AM8/21/19

to

On 20.08.2019 23:58, Vir Campestris wrote:
>
> I know what the code ought to be. It ought to be re-written in C++ using
> atomics (and in several cases other library features - I noticed a
> spin-lock)
>
> But it's big. It isn't going to be rewritten unless it's proved to be a
> problem. And proving this would be hard - it'll be a race condition
> somewhere.

Race conditions are not so hard to demonstrate, especially when you know
where the problem is. Just make a volatile integer counter with initial
value one, then let your 4 cores increase and decrease it in tight loops
in parallel. Whenever the counter drops to zero or below you have
demonstrated the race condition. Should happen in seconds or minutes,
but my experience is from Intel, not ARM, so YMMV.

BTW, the above is an extract from a real life scenario where the counter
mimics a smartpointer reference counter. In real life the effects are
much more fun - the pointed object will be released when the refcount
drops to zero, so the next non-trivial member function call to it will
probably segfault. This kind of behavior might make the demo a bit more
convincing ;-)

Chris M. Thomasson

unread,

Aug 21, 2019, 2:10:52 AM8/21/19

to

On 8/20/2019 11:02 PM, Paavo Helde wrote:
> On 20.08.2019 23:58, Vir Campestris wrote:
>>
>> I know what the code ought to be. It ought to be re-written in C++ using
>> atomics (and in several cases other library features - I noticed a
>> spin-lock)
>>
>> But it's big. It isn't going to be rewritten unless it's proved to be a
>> problem. And proving this would be hard - it'll be a race condition
>> somewhere.
>
> Race conditions are not so hard to demonstrate, especially when you know
> where the problem is. Just make a volatile integer counter with initial
> value one, then let your 4 cores increase and decrease it in tight loops
> in parallel. Whenever the counter drops to zero or below you have
> demonstrated the race condition.

Indeed. :^)

> Should happen in seconds or minutes,
> but my experience is from Intel, not ARM, so YMMV.
>
> BTW, the above is an extract from a real life scenario where the counter
> mimics a smartpointer reference counter. In real life the effects are
> much more fun - the pointed object will be released when the refcount
> drops to zero, so the next non-trivial member function call to it will
> probably segfault. This kind of behavior might make the demo a bit more
> convincing ;-)
>

Kind of reminds me of creating a rng using race-condition's:

https://groups.google.com/d/topic/comp.lang.c++/7u_rLgQe86k/discussion

David Brown

unread,

Aug 21, 2019, 2:35:08 AM8/21/19

to

It could use C11 atomics - there is no need to switch it to C++ just for
the atomic accesses.

> But it's big. It isn't going to be rewritten unless it's proved to be a
> problem. And proving this would be hard - it'll be a race condition
> somewhere.
>
> I was only hoping someone here had experience of the actual behaviour of
> volatile - but it looks like not.
>

Many of us - like me - have plenty of experience of using volatile. And
in the right place, it is fine.

But unfortunately for you, it sounds like this is /not/ the right place.
I'm guessing this is code that worked fine before on a single-core
system, but you can be quite confident that it is not correct for a
multi-core device.

You often can't prove you have no race conditions in existing code - you
can only program in a way to make them impossible. And you can't be
sure of finding them by testing. It is very easy to have code that
worked when you tested it in the lab, and fails in the field.

Bonita Montero

unread,

Aug 21, 2019, 2:43:34 AM8/21/19

to

> Right. Its not portable, they do not imply any memory barrier at
> all, some things do some don't, it was never meant for threading.
> Use std::atomic with _relaxed_ membars to read and write volatiles
> for a device or something.

Sometimes it doesn't matter that volatile fetches or writes from or
to memory completely asynchronously. Remember my shared-readers mutex
where the flags and counters value is repeatedly read from a volatile
memory-location on each (failed) iteration.

Chris Vine

unread,

Aug 21, 2019, 3:40:31 PM8/21/19

to

> multi-core device.t

>
> You often can't prove you have no race conditions in existing code - you
> can only program in a way to make them impossible. And you can't be
> sure of finding them by testing. It is very easy to have code that
> worked when you tested it in the lab, and fails in the field.

Reference to a "race condition" in C++ is apt to cause confusion.
According to the C++ standard, a "data race" arises if the program
"contains two potentially concurrent conflicting actions, at least one
of which is not atomic, and neither happens before the other ...", and
two concurrent actions conflict "if one of them modifies a memory
location and the other one accesses or modifies the same memory
location".

This is different from what an ordinary programmer would call a "race
condition" or "data race", which in general usage is a reference to
something improperly synchronized. An improperly synchronized program
which uses atomic variables with relaxed memory ordering does not
according to the standard suffer from a data race, even if it is
(because of the relaxed ordering) completely unsynchronized and
generates randomly incorrect results and gives rise to a race condition
in common parlance.

It is important to be clear which case you are in fact referring to.

In any machine I have ever come across, using a volatile scalar which
is atomic at the hardware level, such as an int, has identical effect
to using the equivalent atomic variable with relaxed memory ordering.
The code emitted by the compiler is identical.

If you need lock-free synchronization for your volatile ints, you can
use fences just as you can use fences with atomic ints with relaxed
memory ordering. (If you are using locks to synchronize, volatile
or atomic variables are unnecessary and a pessimization - hence the
"happens before" in the text I have quoted.) So Bonita's use may be
fine.

The question is: now that we have C and C++ standards which provide
atomic scalars with relaxed memory ordering, why are you using a
volatile int at all? The answer "because I don't want to rewrite my
code unnecessarily" seems to me to be a reasonable answer, provided the
program is indeed adequately synchronized by some other means such as
fences, so that it does not contain a race condition in common parlance.

Chris Vine

unread,

Aug 21, 2019, 4:03:22 PM8/21/19

to

On Wed, 21 Aug 2019 20:39:59 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]

> The question is: now that we have C and C++ standards which provide
> atomic scalars with relaxed memory ordering, why are you using a
> volatile int at all? The answer "because I don't want to rewrite my
> code unnecessarily" seems to me to be a reasonable answer, provided the
> program is indeed adequately synchronized by some other means such as
> fences, so that it does not contain a race condition in common parlance.

The "why are you using a volatile int" was a reference to using volatile
ints with threads. Volatile ints (such as sig_atomic_t) can of course
be required in other cases such as with signals, mapping hardware
addresses and mmaping.

David Brown

unread,

Aug 22, 2019, 7:30:34 AM8/22/19

to

That's some good points.

It is /relatively/ easy tell if you have no data races in your code even
if you are not strict about using C++11/C11 atomics or
implementation-specific atomics. If your code does not access
potentially shared objects that are bigger than the hardware's write
size, and it only uses reads or writes to them (it doesn't assume that
"x += 1;" is atomic), then you are not going to get data races. When
two 32-bit cores both try to write to the same 32-bit memory address,
one core will hit first - you are not going to get a mixed write unless
you have a rather unusual and specialised system.

You might have noticed quite a few if's there - you have to be quite
restricted in your code to avoid the possibility of data races (as
defined by the C++ standard). And without additional guarantees from
your hardware, such as a total ordering on volatile accesses, it is not
going to be enough to be useful. If your system is relatively simple -
like a single core microcontroller - then you /do/ have such guarantees,
and "volatile" can be enough. For the OP, however, he doesn't have such
hardware - and "volatile" is not going to cut it.

Proving that you don't have any more general "race conditions" or
improper synchronisation is a lot harder.

>
> In any machine I have ever come across, using a volatile scalar which
> is atomic at the hardware level, such as an int, has identical effect
> to using the equivalent atomic variable with relaxed memory ordering.
> The code emitted by the compiler is identical.
>

Agreed.

I can imagine machines for which that is not the case - perhaps using
caches which do not snoop between other cpus' caches. This would make
some aspects of programming significantly harder, but could make
hardware a lot simpler (and therefore faster and/or cheaper). I'd
expect to see it only in quite specialised systems.

> If you need lock-free synchronization for your volatile ints, you can
> use fences just as you can use fences with atomic ints with relaxed
> memory ordering. (If you are using locks to synchronize, volatile
> or atomic variables are unnecessary and a pessimization - hence the
> "happens before" in the text I have quoted.) So Bonita's use may be
> fine.
>
> The question is: now that we have C and C++ standards which provide
> atomic scalars with relaxed memory ordering, why are you using a
> volatile int at all? The answer "because I don't want to rewrite my
> code unnecessarily" seems to me to be a reasonable answer, provided the
> program is indeed adequately synchronized by some other means such as
> fences, so that it does not contain a race condition in common parlance.
>

Volatile accesses have certain advantages over atomics - even relaxed
atomics. Let us restrict ourselves to the world of single-core systems,
as is typical for microcontrollers - since when you have multi-core
systems you will be needing atomics and locks (correctness trumps
efficiency and convenience every time, and we know volatile is not
enough for most purposes in such systems). We'll assume a 32-bit system
for convenience.

Such systems are often asymmetric in their threading. You have a
hierarchy. If you have a RTOS, you have a layers of threads that have
strictly controlled priorities. A higher priority thread can pre-empt a
lower priority thread, but not vice versa - but complicated by priority
boosting at times. Above that, you have layers of interrupts at
different priorities, usually more strictly prioritised.

Imagine a timer interrupt function that tracks time as a 64-bit counter.
The "global_timer" variable is only ever written within that interrupt
function. It can be declared "int64_t global_timer;", and incremented
as "global_timer++;". This is safe - no need for atomics, volatile, or
anything else - these would be pessimisations. For code that reads this
value from another thread or context, you need to be smarter. Here you
/do/ need volatile accesses.

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

(Forgive the gcc'ism and C style - in C++, you'd make a template but it
doesn't affect the principle.)

You can read your global timer in various ways, such as:

disable_interrupts();
int64_t now = volatileAccess(global_timer);
enable_interrupts();

or

int64_t now = volatileAccess(global_timer);
int64_t now2 = volatileAccess(global_timer);
while (now != now2) {
now = now2;
now2 = volatileAccess(global_timer);
}

(You can also break this last one into separate high and low words to be
slightly more efficient.)

Both of these are much more efficient than using relaxed 64-bit atomic
access, as such accesses are often implemented. Even more importantly,
both of them /work/ - unlike some implementations I have seen of atomic
accesses (like
<https://gcc.gnu.org/wiki/Atomic/GCCMM?action=AttachFile&do=view&target=libatomic.c>)
which rely on spinlocks.

A key point here with volatiles is that you can have a normal object,
and use volatile accesses on it. You can't do that with atomic objects,
in the way you can with the macro above, limiting your flexibility to
use your knowledge of the program to use different access types for
different balances of efficiency and synchronisation control. It's true
that a relaxed atomic load or store is going to be efficient for small
enough sizes - and the cost is just the ugly and verbose syntax. For
larger sizes, it's a different matter. If your code is within a
critical section (due to a lock, or interrupt control) and you know it
cannot possibly clash with other access to the same object, there is a
huge efficiency difference between using a normal access to the object,
and using an atomic access (even a relaxed one).

Vir Campestris

unread,

Aug 22, 2019, 4:39:38 PM8/22/19

to

On 20/08/2019 23:18, Scott Lurndal wrote:
> On ARM, the behavior will depend on the memory type. It should work fine
> when accessing device memory (e.g. device command/status registers, PCIe
> ECAM space, etc).
>

I've already checked and passed a bunch of those.

> For Normal, Sharable, Cacheable memory the weak (relative to x86) memory
> model will cause problems even if you qualify your accesses with the volatile
> qualifier (in a highly out-of-order pipeline, other accesses may pass the
> volatile access or vice versa, so you need a load/store with ordering
> semantics).

This is the kind of thing I need. Looks like I should worry about the
code - it's a 3rd party lump, and it was written in C without atomics.

I'll speak to the people who pulled it in next time I'm in their country.

Thanks Paavo, Chris & David.

Andy
--
BTW the first machine I ever worked on professionally didn't do cache
snooping. Cache coherency was a manual process. Which meant that 2
processors were more than twice as fast as one on some workloads!