Q about "visibilty"

Gary Schmidt

unread,

May 12, 2004, 3:07:32 AM5/12/04

to

I've managed to get myself confused after getting involved in the
"newbie: shutting down threads" thread, and, more for completeness of
understanding than anything else, I wonder what taking a mutex does.

I have two distinct concepts, and I wonder which is more "correct".

Concept 1: On a thread taking a mutex, _all_ other threads see a
(temporarily) consistent state of memory.

Concetp 2: On a thread taking a mutex, _that_ thread sees a
(temporarily) consistent stat of memory.

I am pretty sure that the code is written using concept 2 (oh, I so hope
it is...), but I just wonder...

Cheers,
Gary B-)

--

Speaking strictly for myself.

David Butenhof

unread,

May 12, 2004, 10:18:48 AM5/12/04

to

Gary Schmidt wrote:

Neither, unfortunately, is correct; though "concept 2" is a bit closer to
reality. However, beware that when you say that one thing (a thread's view
of memory) is "consistent", you've said nothing. "Consistent" in this sense
is a comparison operator -- it must be consistent WITH something.

In general, a memory visibility protocol requires two parts: first, one CPU
making the "system view" (e.g., main memory) consistent with its own view
(e.g., flushing cache and memory interface buffers); and second, another
CPU making its own view consistent with the new system view. There's
nothing either can do to force another CPU to update either its own view or
the system view.

Some hardware is less aggressive than this general model permits; for
example most X86 systems ensure consistency of CPU and system views with
each read and write; SPARC generally operates in a mode where reads always
update the CPU view to match the system view but writes may not update the
system view to match the new CPU view. Alpha and Itanium, though, don't
guarantee any useful consistency across processors without explicit
barrier/fence flags.

The guarantee that's actually necessary to allow synchronization is that
when data is changed while thread T1 holds mutex M1, and T1 then UNLOCKS
M1, the next thread (T2) to lock M1 can see the data written by T1 before
unlocking M1. But not necessarily any data written by T1 AFTER it unlocked
M1, nor any data written by some OTHER thread before T1 unlocked M1.

Because there's no way for software or hardware to reliably associate any
particular data with a particular mutex, in practice any thread that locks
any mutex will have a view of (all) memory consistent with the view of the
last thread to unlock any mutex (at the time of that unlock). This is
essentially as if you replaced the mutex lock and unlock operations by
general (full) memory barriers.

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Alexander Terekhov

unread,

May 12, 2004, 4:38:41 PM5/12/04

to

David Butenhof wrote:
[...]

> Some hardware is less aggressive than this general model permits; for
> example most X86 systems ensure consistency of CPU and system views with

> each read and write; ...

Uhmm, not quite. Weakly ordered memory and streaming stuff aside
for a moment... (in kinda Itanic terms)

- IA32 stores have release semantics (sink-load/store mbar for
preceding loads/stores in the program order).

- IA32 loads have acquire semantics (hoist-load/store mbar for
subsequent loads/stores in the program order)

- IA32 lock instructions have release and acquire semantics
(fully fenced).

Now, http://www.well.com/~aleks/CompOsPlan9/0005.html

<quote>

>What we need is that if the following sequence is executed
>
> P1: P2:
> x = 0 y = 0
> x = 1 y = 1
> read y read x
>
>has the values read will be one of
>
> 1 0
> 0 1
> 1 1
>
>0,0 blows us away.

You could get 0,0 even on the 486 or Pentium. The difference
is that the PPro has such deep pipelines and buffers that it
is more likely to expose such bugs.

</quote>

That "plan9 problem" illustrates the need/use of "StoreLoad"
barrier on IA32 (mfence/cpuid/lock, AFAIK).

regards,
alexander.

SenderX

unread,

May 12, 2004, 4:00:53 PM5/12/04

to

> Because there's no way for software or hardware to reliably associate any
> particular data with a particular mutex, in practice any thread that locks
> any mutex will have a view of (all) memory consistent with the view of the
> last thread to unlock any mutex (at the time of that unlock). This is
> essentially as if you replaced the mutex lock and unlock operations by
> general (full) memory barriers.

Right. Mutex is ignorant of memory it protects, so it can "theoretically" be
used as a crappy memory barrier...

;)