Effect of multiple-processors on memory allocation

147 views
Skip to first unread message

Rob Thorpe

unread,
Oct 20, 2006, 10:22:15 AM10/20/06
to
Someone pointed out an interesting problem to me that applies to JVMs,
I'd be interested to understand how multi-threaded lisps deal with it.

Lets say there is a variable "foo". It is assigned to by something
like (setf foo (complex-stuff)). Consider the situation where two
processors are used both running threads from the same program. Also,
lets assume that the processors do not gaurantee the order of stores.
In "complex-stuff" a data-structure is created to be assigned to foo.
Imagine another thread reads foo just as this is happening. If the
order of stores isn't gauranteed then the data-structure being created
by "complex-stuff" might not be complete from the point of view of this
second thread. Potentially the memory may not even be initialized.

How do lisp runtimes deal with this problem?

Lars Rune Nøstdal

unread,
Oct 20, 2006, 10:33:40 AM10/20/06
to

If I understand this question correctly this is a general problem that
isn't JVM- or Lisp-specific.

What one does is use "mutual exclusion" for sections of code that access
shared data. See: http://en.wikipedia.org/wiki/Mutex

--
Lars Rune Nøstdal
http://lars.nostdal.org/

Rob Thorpe

unread,
Oct 20, 2006, 11:06:01 AM10/20/06
to

Yes, I realise that. What I'm trying to understand is what it means
for memory allocation specifically.

One way to treat it correctly would be to have one heap for all objects
shared between processors and to mutex the heap. This would be
inefficient though because of the amount of waiting it would require.

Another approach would be to have mini-heaps on a per-thread basis.
When some memory is needed it is first taken from this thread-local
heap. But this doesn't seem to solve the problem completely. In the
example above the data-structure created by "complex-stuff" could be
put in the thread-local heap. But later it's assigned to a variable
visible in other threads - ie to "foo". As far as I can see before
this happens all pending stores on the data must be completed.

How do lisps do this?

pTymN

unread,
Oct 20, 2006, 11:10:58 AM10/20/06
to
Interesting question, I've been curious about this too.

Joe Marshall

unread,
Oct 20, 2006, 1:01:36 PM10/20/06
to

Rob Thorpe wrote:
>
> Yes, I realise that. What I'm trying to understand is what it means
> for memory allocation specifically.
>
> One way to treat it correctly would be to have one heap for all objects
> shared between processors and to mutex the heap. This would be
> inefficient though because of the amount of waiting it would require.
>
> Another approach would be to have mini-heaps on a per-thread basis.
> When some memory is needed it is first taken from this thread-local
> heap. But this doesn't seem to solve the problem completely. In the
> example above the data-structure created by "complex-stuff" could be
> put in the thread-local heap. But later it's assigned to a variable
> visible in other threads - ie to "foo". As far as I can see before
> this happens all pending stores on the data must be completed.
>
> How do lisps do this?

Modern processors that can perform out-of-order writes usually have
some way to ensure that pending writes have been flushed before
performing critical operations. In a single processor system, there
shouldn't be a problem because the processor ought to mask the
out-of-order writes from its own view of the world (that is, the writes
will appear to have occurred in order with regard to reads as far as
the processor is concerned, regardless of what happened in the memory
system).

As far as multiple processors on shared memory go, the issue is more
complicated. I believe there are `write gate' instructions that force
the processor to flush the pending writes. These could be used to
ensure that the data structure is fully written before exposing a
pointer to it. There are also mechanisms in the memory system that
allow you to inform the processor that out-of-order reads and writes
are prohibited (it would be bad for a memory-mapped device if things
didn't happen in order). Multiprocessors have ways of `snooping' on
other processors' caches, too. This allows the other processors to
agree on a last-written value.

Matthias Buelow

unread,
Oct 20, 2006, 1:02:24 PM10/20/06
to
Rob Thorpe <robert...@antenova.com> wrote:

> Someone pointed out an interesting problem to me that applies to JVMs,
> I'd be interested to understand how multi-threaded lisps deal with it.

[...]


> How do lisp runtimes deal with this problem?

By providing a mutex mechanism, either implicitly (which would be
transparent to the programmer but imho would also be prohibitively
expensive) or explicitly, just in any other multi-threaded language,
or something of a mix between the two.

I'm replying because I've been thinking about that for some time lately
and I think that it might be better to restrict thread interaction
to function calling (locking the mutex upon call) because otherwise
the programmer might have to care a lot more about proper locking
and avoiding deadlocks. There're also issues with garbage collection
(multi-threaded GC is a really annoying thing... the implementations
I've seen on generic hardware all seem to use workarounds and hacks..)

Matthias Buelow

unread,
Oct 20, 2006, 1:06:26 PM10/20/06
to
Rob Thorpe <robert...@antenova.com> wrote:

> One way to treat it correctly would be to have one heap for all objects
> shared between processors and to mutex the heap. This would be
> inefficient though because of the amount of waiting it would require.

My idea is one heap per thread. Typically, real implementations like Sun's
JVM use one big heap. I think that's suboptimal because while it might
work if you have a handful of threads, it will become prohibitive if you
have, say, thousands of threads on a large-scale application. If you
have one heap per thread, you don't have to mutex-lock for allocation,
and to some extend, you can also garbage-collect that heap separately
while the other threads keep running (with one big heap, there's always
a stop-the-world situation, although certain techniques can cut down on
the time the world has to be stopped).

> Another approach would be to have mini-heaps on a per-thread basis.

Yes, that's what I've been thinking of aswell.

> When some memory is needed it is first taken from this thread-local
> heap. But this doesn't seem to solve the problem completely. In the
> example above the data-structure created by "complex-stuff" could be
> put in the thread-local heap. But later it's assigned to a variable
> visible in other threads - ie to "foo". As far as I can see before
> this happens all pending stores on the data must be completed.

I've been coming up with entry/exit nodes, like with distributed GC but
I haven't yet thought it through completely. Multithreaded allocation
and GC is a real bugger to get right.

Barry Margolin

unread,
Oct 20, 2006, 11:02:37 PM10/20/06
to
In article <1161354135....@h48g2000cwc.googlegroups.com>,
"Rob Thorpe" <robert...@antenova.com> wrote:

In Lisp, when you assign a data structure to a variable (or other
container, such as the CAR/CDR of a cons, a slot of a structure, array
element, etc.), you're actually just putting a reference in the place,
you're not actually copying the structure contents. And this won't
actually take place until the data structure is allocated and
initialized. The SETF itself just involves storing the reference that
was returned by (complex-stuff), which under the covers is just a
pointer assignment (there are a few simple data types that most
implementations assign directly, such as fixnums, but they don't matter
much for this discussion).

So you don't really have to worry abot seeing a partially-constructed
data structure. However, there still may be a need for mutual
exclusion. On most computer systems, storing a pointer into a memory
location requires multiple memory cycles. So while you won't see a
partial data structure, you might see a partial pointer.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

William D Clinger

unread,
Oct 21, 2006, 9:19:17 AM10/21/06
to
Barry Margolin wrote:
> ....On most computer systems, storing a pointer into a memory

> location requires multiple memory cycles. So while you won't see a
> partial data structure, you might see a partial pointer.

I don't think this part of Barry's response is true. As
others have noted, the fact that only the storing processor
is guaranteed to see the stores in the expected order means
that other processors may perceive partially initialized data
structures unless a memory barrier is inserted following the
initializing stores, before the newly allocated pointer is
returned.

And while stores may require multiple cycles, that doesn't
mean they aren't atomic (all-or-nothing) transactions. On
most systems, word-length stores at word-aligned locations
are atomic. If pointers are word-length, as in most (but
not all) systems, then pointer stores should be atomic.

For more background on this, I'd suggest reading the "Java
Language Specification", third edition, chapter 17, especially
sections 17.4 and 17.5 [1].

Will

[1] http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html

Kaz Kylheku

unread,
Oct 21, 2006, 2:02:08 PM10/21/06
to

By using memory barriers, like everyone else?

What are you asking: how do Lisp runtimes deal with this internally in
their memory management code, or whether Lisp runtimes take care of
this problem for application programs doing these types of unprotected
shared variable accesses?

Kaz Kylheku

unread,
Oct 21, 2006, 2:06:34 PM10/21/06
to

Rob Thorpe wrote:
> Lars Rune Nøstdal wrote:
> > On Fri, 20 Oct 2006 07:22:15 -0700, Rob Thorpe wrote:
> >
> > > Someone pointed out an interesting problem to me that applies to JVMs,
> > > I'd be interested to understand how multi-threaded lisps deal with it.
> > >
> > > Lets say there is a variable "foo". It is assigned to by something
> > > like (setf foo (complex-stuff)). Consider the situation where two
> > > processors are used both running threads from the same program. Also,
> > > lets assume that the processors do not gaurantee the order of stores.
> > > In "complex-stuff" a data-structure is created to be assigned to foo.
> > > Imagine another thread reads foo just as this is happening. If the
> > > order of stores isn't gauranteed then the data-structure being created
> > > by "complex-stuff" might not be complete from the point of view of this
> > > second thread. Potentially the memory may not even be initialized.
> > >
> > > How do lisp runtimes deal with this problem?
> >
> > If I understand this question correctly this is a general problem that
> > isn't JVM- or Lisp-specific.
> >
> > What one does is use "mutual exclusion" for sections of code that access
> > shared data. See: http://en.wikipedia.org/wiki/Mutex
>
> Yes, I realise that. What I'm trying to understand is what it means
> for memory allocation specifically.

Huh?

> One way to treat it correctly would be to have one heap for all objects
> shared between processors and to mutex the heap. This would be
> inefficient though because of the amount of waiting it would require.
> Another approach would be to have mini-heaps on a per-thread basis.

You can have multiple heaps which are not on a per-thread basis. On a
NUMA system, you'd want each node to prefer allocating from nearer
memory that is faster to access, etc.

To ward off contention issues, you can have an array of heaps, and use
wait-free atomic instructions to try to acquire one of the heaps, in a
round-robin fashion.

> When some memory is needed it is first taken from this thread-local
> heap. But this doesn't seem to solve the problem completely. In the
> example above the data-structure created by "complex-stuff" could be
> put in the thread-local heap. But later it's assigned to a variable
> visible in other threads - ie to "foo". As far as I can see before
> this happens all pending stores on the data must be completed.

Ever heard of memory barrier instructions? You'd execute a write
barrier before passing the pointer to the other threads. I.e.

- write to the data structure
- execute write barrier
- give pointer to other processor

The other processor might execute a read barrier:

- obtain the pointer
- execute read barrier
- access data

Kaz Kylheku

unread,
Oct 21, 2006, 4:34:17 PM10/21/06
to
Barry Margolin wrote:
> So you don't really have to worry abot seeing a partially-constructed
> data structure.

There are mutiprocessor architectures where if one processor stores to
a memory location X and then to a memory location Y, another processor
might the update to Y before the update to X. X could be in the data
structure, and Y could be memory location which holds the pointer to
it. Cache-coherent NUMA, for instance. Suppose Y is closer than X to
the writing processor, but X is closer than Y to the reading processor.

Rob Thorpe

unread,
Oct 23, 2006, 5:39:57 AM10/23/06
to

Yes, I understand that, but it's not relevant to the issue.
Imagine the structure being updated is a cons. The lisp compiler
issues the instructions below:-
STORE car_info, car_location
STORE cdr_info, cdr_location
STORE cons_location, relevant_place

In a single processor system these must appear to occur in this order,
the processor will use buffers and/or stalling to make sure this
happens. In a multiple-processor system without strict memory ordering
they may appear to occur in a different order from the point of view of
some other processor. If this other processor uses the pointer in
"relevant_place" before the other two have been updated then trouble
may occur.

> So you don't really have to worry abot seeing a partially-constructed
> data structure. However, there still may be a need for mutual
> exclusion. On most computer systems, storing a pointer into a memory
> location requires multiple memory cycles. So while you won't see a
> partial data structure, you might see a partial pointer.

Stores of pointers to memory may require multiple cycles, but they
appear atomic from the point of view of other code.

Rob Thorpe

unread,
Oct 23, 2006, 6:01:51 AM10/23/06
to
Kaz Kylheku wrote:
> Rob Thorpe wrote:
> > Someone pointed out an interesting problem to me that applies to JVMs,
> > I'd be interested to understand how multi-threaded lisps deal with it.
> >
> > Lets say there is a variable "foo". It is assigned to by something
> > like (setf foo (complex-stuff)). Consider the situation where two
> > processors are used both running threads from the same program. Also,
> > lets assume that the processors do not gaurantee the order of stores.
> > In "complex-stuff" a data-structure is created to be assigned to foo.
> > Imagine another thread reads foo just as this is happening. If the
> > order of stores isn't gauranteed then the data-structure being created
> > by "complex-stuff" might not be complete from the point of view of this
> > second thread. Potentially the memory may not even be initialized.
> >
> > How do lisp runtimes deal with this problem?
>
> By using memory barriers, like everyone else?

:) I thought so. I just didn't know whether there is some special
trick that could remove the need for them.

> What are you asking: how do Lisp runtimes deal with this internally in
> their memory management code, or whether Lisp runtimes take care of
> this problem for application programs doing these types of unprotected
> shared variable accesses?

Well both would be interesting to understand.

How does the VM handle these situations in a multi-processor scenario.
Does it:-
1. Ignore it, requiring the programmer to mutex code to prevent the
problem.
2. Handle it by placing a memory barrier on every construction of a
data structure.
3. Handle it by placing a memory barrier on every assignment of a data
structure to a globally visible object.

Or something else?

Kaz Kylheku

unread,
Oct 23, 2006, 1:26:20 PM10/23/06
to
Rob Thorpe wrote:
> Stores of pointers to memory may require multiple cycles, but they
> appear atomic from the point of view of other code.

Not if the processor can handle an interrupt midway through that
sequence of cycles and then complete it when the interrupt handler
returns. Then you have to add code to disable interrupts around that
instruction.

Rob Thorpe

unread,
Oct 24, 2006, 5:42:15 AM10/24/06
to

Yes, but in most OSes the interrupt will be handled by the OS. The OS
will make sure that the work it does while handling the interrupt will
not disturb the running process, so pointer updates continue to appear
atomic for code within a user process. This will even happen if the
interrupt was a page fault. Saying "they appear atomic from the point
of view of other code" was rather strong "they appear atomic from the
point of view of user code in a sane OS" would be better.

mark.h...@gmail.com

unread,
Oct 24, 2006, 7:01:21 AM10/24/06
to
Kaz Kylheku wrote:
> You can have multiple heaps which are not on a per-thread basis. On a
> NUMA system, you'd want each node to prefer allocating from nearer
> memory that is faster to access, etc.

There are even parallel programming languages that take advantage
of this locality -- they provide a "distributed shared memory" model
that lets you think in shared-memory terms, even when working on a
distributed-memory machine. UPC (Unified Parallel C), Titanium and
Co-Array Fortran are three examples. Titanium in particular is a Java
dialect and therefore has a parallel GC, so you might want to check
out their parallel GC strategies: http://titanium.cs.berkeley.edu/ and
in particular http://titanium.cs.berkeley.edu/papers.html#memorymgt

mfh

George Neuner

unread,
Oct 24, 2006, 11:49:29 AM10/24/06
to
On 23 Oct 2006 10:26:20 -0700, "Kaz Kylheku" <kkyl...@gmail.com>
wrote:

CPUs designed for narrow memory typically use a synchronous
multi-cycle transfer when the data access is wider than the bus.

The problems come with segmented architectures where pointers can be
wider than a CPU register.

George
--
for email reply remove "/" from address

Frode Vatvedt Fjeld

unread,
Oct 24, 2006, 3:20:15 PM10/24/06
to
"Rob Thorpe" <robert...@antenova.com> writes:

You probably want to know about the concept of consistency models.
<URL:http://en.wikipedia.org/wiki/Consistency_model>

> How do lisp runtimes deal with this problem?

I think most systems today provide sequential consistency, meaning
that the problem you describe is a non-issue for lisp or other
software-level systems. For a JVM or other virtual machine to somehow
detract from this and provide weaker consistency, would seem to me a
very unlikely and bad idea.

There probably exist concurrent architectures with weaker than
sequential consistency, and there it is that much more difficult to
write concurrent programs. You'd probably use mutexes or some other
means of synchronization. Nothing about this is particular to lisp, I
believe.

--
Frode Vatvedt Fjeld

Kaz Kylheku

unread,
Oct 24, 2006, 6:46:18 PM10/24/06
to
Rob Thorpe wrote:
> Kaz Kylheku wrote:
> > Rob Thorpe wrote:
> > > Stores of pointers to memory may require multiple cycles, but they
> > > appear atomic from the point of view of other code.
> >
> > Not if the processor can handle an interrupt midway through that
> > sequence of cycles and then complete it when the interrupt handler
> > returns. Then you have to add code to disable interrupts around that
> > instruction.
>
> Yes, but in most OSes the interrupt will be handled by the OS. The OS
> will make sure that the work it does while handling the interrupt will
> not disturb the running process, so pointer updates continue to appear

After the OS has finished servicing the interrupt, before returning
from it, it may choose to preempt the current thread in favor of
another one. That new thread could access the same piece of "atomic"
data.

> atomic for code within a user process. This will even happen if the
> interrupt was a page fault.

Threads can be preempted due to a page fault. In fact, they have to be.
The thread can't run until the page comes in, which could be much
later. Another thread gets to run.

Mind you, you would not straddle a memory access that is intended to be
atomic across a page boundary.

So that one isn't going to be a problem.

Rob Thorpe

unread,
Oct 25, 2006, 6:51:02 AM10/25/06
to
Kaz Kylheku wrote:
> Rob Thorpe wrote:
> > Kaz Kylheku wrote:
> > > Rob Thorpe wrote:
> > > > Stores of pointers to memory may require multiple cycles, but they
> > > > appear atomic from the point of view of other code.
> > >
> > > Not if the processor can handle an interrupt midway through that
> > > sequence of cycles and then complete it when the interrupt handler
> > > returns. Then you have to add code to disable interrupts around that
> > > instruction.
> >
> > Yes, but in most OSes the interrupt will be handled by the OS. The OS
> > will make sure that the work it does while handling the interrupt will
> > not disturb the running process, so pointer updates continue to appear
>
> After the OS has finished servicing the interrupt, before returning
> from it, it may choose to preempt the current thread in favor of
> another one. That new thread could access the same piece of "atomic"
> data.

Not normally. Generally there is special exception in the OS scheduler
that prevents it from choosing a thread that lives within an address
space with an outstanding incomplete instruction.

> > atomic for code within a user process. This will even happen if the
> > interrupt was a page fault.
>
> Threads can be preempted due to a page fault. In fact, they have to be.
> The thread can't run until the page comes in, which could be much
> later. Another thread gets to run.

Yes. But again, while servicing a page-fault the scheduler will not
select any thread from the process space affected by the fault. Only
after the fault has been serviced will it select threads from that
process space.

Rob Thorpe

unread,
Oct 25, 2006, 7:13:25 AM10/25/06
to
Frode Vatvedt Fjeld wrote:
> "Rob Thorpe" <robert...@antenova.com> writes:
>
> > Someone pointed out an interesting problem to me that applies to JVMs,
> > I'd be interested to understand how multi-threaded lisps deal with it.
> >
> > Lets say there is a variable "foo". It is assigned to by something
> > like (setf foo (complex-stuff)). Consider the situation where two
> > processors are used both running threads from the same program. Also,
> > lets assume that the processors do not gaurantee the order of stores.
> > In "complex-stuff" a data-structure is created to be assigned to foo.
> > Imagine another thread reads foo just as this is happening. If the
> > order of stores isn't gauranteed then the data-structure being created
> > by "complex-stuff" might not be complete from the point of view of this
> > second thread. Potentially the memory may not even be initialized.
>
> You probably want to know about the concept of consistency models.
> <URL:http://en.wikipedia.org/wiki/Consistency_model>

Yes I know about them. Above I was explaining the problem in the terms
I thought would be understandable to the broadest range of people on
c.l.l.

> > How do lisp runtimes deal with this problem?
>
> I think most systems today provide sequential consistency, meaning
> that the problem you describe is a non-issue for lisp or other
> software-level systems. For a JVM or other virtual machine to somehow
> detract from this and provide weaker consistency, would seem to me a
> very unlikely and bad idea.

Yes. My question though was about the case of an architecture with
weak consistency, "where the processors do not gaurantee the order of
stores".

Preferably operating systems & software should not weaken consistency
further than that the architecture can provide. It happens though, for
example sometimes compilers rearrange stores, in the linux kernel there
are special ways of preventing this happening in some code.

> There probably exist concurrent architectures with weaker than
> sequential consistency, and there it is that much more difficult to
> write concurrent programs. You'd probably use mutexes or some other
> means of synchronization. Nothing about this is particular to lisp, I
> believe.

No, but the memory problem is specific to languages that attempt to be
"safe" - that is where incomplete objects can never be encountered in a
program. Lisp is one of these.

In C if you were to read from a pointer in thread A that had been
initialized in thread B you wouldn't be surprised if there was a
possibility of finding junk at the other end that could crashs your
entire program. In Lisp though it is normally taken that bad code
should not sink the whole VM.

I've had a little look at some other languages. What seems to happen is
one of three things:-
1. Allocations are done with memory barriers
2. Variables that sit between threads have memory barriers around them
3. Threads are done inside the language runtime rather than the OS,
thereby not utilizing multiple processors.
4. Don't both with threads

Smalltalks, Schemes, Ruby, Python and Perl seem to take approach #3.

OCaml, many schemes & lisps, and MLs take option #4.

I think Steel bank Common Lisp takes approach #2 since the only
variables that one thread can access/view in another thread are special
variables.

It seems Java does #1 from what others have said so far. But I'm not
sure.

Frode Vatvedt Fjeld

unread,
Oct 25, 2006, 11:19:37 AM10/25/06
to
"Rob Thorpe" <robert...@antenova.com> writes:

> [..] the memory problem is specific to languages that attempt to be


> "safe" - that is where incomplete objects can never be encountered
> in a program. Lisp is one of these.
>
> In C if you were to read from a pointer in thread A that had been
> initialized in thread B you wouldn't be surprised if there was a
> possibility of finding junk at the other end that could crashs your
> entire program. In Lisp though it is normally taken that bad code
> should not sink the whole VM.

Ok, then I think I understand better where you are coming
from. Although I don't quite understand where reordering of writes
enters the picture.

Anyways, having designed a CL runtime myself, I can describe how I
chose to deal with this problem there. That is, the probem of fresh
but uninitialized objects--let's call them "fetuses".

First, note that you don't need anything more fancy than exceptions
for this problem to arise. For example, if the initialization code
performs a division by zero, the exception handler might see the fetus
on the stack somehow (if nothing else, by inspecting the stack with a
debugger). The problem is even more accute if there are asynchronous
exceptions such as interrupts present.

I chose to deal with this problem by using roll-back atomic
sequences. That is, before starting the allocation the
program/thread/local-run-time-state enters an atomic sequence. If some
kind of exception occurs while inside the atomic sequence, this has
two consequences: 1. All local state (CPU registers etc) that might
hold references to the fetus is "thrown away", i.e. disregarded
(simply overwritten by something uninteresting like NIL). 2. When the
exception handler completes (should it not perform a non-local exit),
the program is re-started at the point where it entered the atomic
sequence (hence "roll-back").

This approach seems to work OK, although it has not been tested
extensively with multi-threading (and not at all with multi-CPU). I
don't really see any pitfalls there, though (famous last words..).

I think I remember that SBCL does almost the same thing: push-forward
atomic sequences. Here, exception handlers are delayed until the
atomic sequence has completed. This has the advantage over roll-back
that the atomic sequence is executed precisely once. For roll-back,
the atomic sequence must be written with some care to allow roll-backs
to occur (in practice I haven't found this to be a big
problem). Furthermore, very frequent interrupts may cause the atomic
sequence never to complete (however, such a situation is most likely a
symptom that something (else) is not right in your system). On the
other hand, roll-back has the advantage that it doesn't introduce a
non-deterministic delay on interrupt handling (for example, consider
push-forwarding the initialization of a 1 GB array).

> It seems Java does #1 from what others have said so far. But I'm not
> sure.

I know very little about Java, but my impression has been that early
on they did quite a bit of synchronization "just to be sure" (at every
method call?), but later on had to abandon this strategy for
performance reasons. Anyways, I suppose in a VM this problem is less
interesting because it's rather easy to have state "outside" of the
VM, so you can have virtual instructions that are atomic (from the VM
perspective) even if in reality (from the hardware perspective) they
are complex. But I'm just guessing about this.

--
Frode Vatvedt Fjeld

William D Clinger

unread,
Oct 25, 2006, 11:38:15 AM10/25/06
to
Frode Vatvedt Fjeld wrote:
> You probably want to know about the concept of consistency models.
> <URL:http://en.wikipedia.org/wiki/Consistency_model>
>
> > How do lisp runtimes deal with this problem?
>
> I think most systems today provide sequential consistency, meaning
> that the problem you describe is a non-issue for lisp or other
> software-level systems.

Not so. The Pentium 4, Xeon, and P6 family provide processor
order. The Sparc family provides total store order (and two even
weaker orders). None of these provide strong ordering, which is
probably what you think sequential consistency means.

> For a JVM or other virtual machine to somehow
> detract from this and provide weaker consistency, would seem to me a
> very unlikely and bad idea.

The Java Memory Model of Java 5.0 does not require JVMs to
provide strong ordering. The new memory model was put into
Java 5.0 because everyone agreed that the original model was
broken; in particular, several routine compiler optimizations
were not compatible with the original model, but Java compilers
(both javac and various JIT compilers) were performing them
anyway.

> There probably exist concurrent architectures with weaker than
> sequential consistency, and there it is that much more difficult to
> write concurrent programs. You'd probably use mutexes or some other
> means of synchronization.

The problem we've been discussing in this thread is usually
solved with memory barriers. In Java, you specify the need for
memory barriers by declaring a variable "volatile".

> Nothing about this is particular to lisp, I believe.

That is true.

Will

Frode Vatvedt Fjeld

unread,
Oct 25, 2006, 1:33:35 PM10/25/06
to
"William D Clinger" <cesu...@yahoo.com> writes:

> Not so. The Pentium 4, Xeon, and P6 family provide processor order.
> The Sparc family provides total store order (and two even weaker
> orders). None of these provide strong ordering, which is probably
> what you think sequential consistency means.

From what I've been able to google, "strong ordering" appears to be
the kind that requires every operation to be immediately globally
visible, killing all hopes of any performance. If this is correct,
then this would correspond to "strict consistency" in my terminology
(and the one on wikipedia,
http://en.wikipedia.org/wiki/Strict_consistency
http://en.wikipedia.org/wiki/Sequential_consistency).

Also, it seems to me that (speculative) processor ordering satisfies
sequential consistency. But admittedly it's been a while since this
stuff was part of my active vocabulary, so if you can correct any
misunderstandings I'd be grateful.

> The Java Memory Model of Java 5.0 does not require JVMs to provide
> strong ordering. The new memory model was put into Java 5.0 because
> everyone agreed that the original model was broken; in particular,
> several routine compiler optimizations were not compatible with the
> original model, but Java compilers (both javac and various JIT
> compilers) were performing them anyway.

If my understanding above of strong ordering is correct, I would agree
that the original Java specification was broken indeed.

--
Frode Vatvedt Fjeld

William D Clinger

unread,
Oct 26, 2006, 9:08:52 AM10/26/06
to
Frode Vatvedt Fjeld wrote:
> http://en.wikipedia.org/wiki/Sequential_consistency).

Few modern multiprocessors guarantee sequential consistency as
defined by that Wikipedia article. In general, the processor that
issues a store instruction may see the result of that store before
other processors.

If you want sequential consistency in a multiprocessor system,
with most current hardware, you need to follow certain rules when
writing your program, and you need a compiler that inserts memory
barriers according to certain rules, and also limits its optimizations
to conform to certain rules. On x86 systems, memory barriers are
typically implemented by MFENCE instructions. On Sparcs, memory
barriers are typically implemented by membar instructions.

The rules are given by a memory model. If you are programming
in Java, the memory model is described in chapter 17 of the Java
Language Specification, third edition. If you are programming in
Common Lisp, for which there is no standard memory model (for
multiprocessor systems), you have to ask your vendor or inquire
at comp.lang.lisp, as was done by the original poster.

Will

Rob Warnock

unread,
Oct 27, 2006, 1:07:57 AM10/27/06
to
William D Clinger <cesu...@yahoo.com> wrote:
+---------------

| Frode Vatvedt Fjeld wrote:
| > http://en.wikipedia.org/wiki/Sequential_consistency).
|
| Few modern multiprocessors guarantee sequential consistency as
| defined by that Wikipedia article. In general, the processor that
| issues a store instruction may see the result of that store before
| other processors.
+---------------

That's not incompatible with "sequential consistency", which only
requires that there exist *some* potential global sequential order
of execution that matches what each processor sees. Or said another
way, your notion of "before" is too strict, implying absolute time
[which is not necessary for sequential consistency].

Cc-NUMA systems such as SGI's "Origin" and "Altix" *do* provide
sequential consistency in the usual interpretation of Leslie
Lamport's term, to which the Wikipedia article refers.

[Historical note: While "Origin" was initially designed to be able
to support "release consistency" as an option, it was found that
too many existing customer codes assumed sequential consistency,
so AFAIK release consistency was never enabled in shipped product
(and was perhaps never even fully debugged).]

+---------------


| If you want sequential consistency in a multiprocessor system,
| with most current hardware, you need to follow certain rules when
| writing your program, and you need a compiler that inserts memory
| barriers according to certain rules, and also limits its optimizations
| to conform to certain rules.

+---------------

If that is necessary, then I would say that such a multiprocessor
system does *not* provide sequential consistency in the hardware.
SGI's "Origin" and "Altix" do (and presumably a number of other
non-x86 platforms as well).


-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Reply all
Reply to author
Forward
0 new messages