Architectures with not atomic stores/loads

Dmitriy V'jukov

unread,

Jul 21, 2007, 7:27:21 AM7/21/07

to

Are there any modern widespread architectures on which loads or stores
to aligned word-sized locations are not atomic?

Dmitriy V'jukov

Chris Thomasson

unread,

Jul 26, 2007, 2:22:36 PM7/26/07

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:1185017241....@k79g2000hse.googlegroups.com...

> Are there any modern widespread architectures on which loads or stores
> to aligned word-sized locations are not atomic?

I don't think so.

David Gay

unread,

Jul 27, 2007, 11:36:43 AM7/27/07

to

Dmitriy V'jukov <dvy...@gmail.com> writes:
> Are there any modern widespread architectures on which loads or stores
> to aligned word-sized locations are not atomic?

Pretty much all the 8-bit processors, I'd say. And yes, they are
widespread, have C compilers, and simple operating systems. Though
they are unlikely to be in any SMP-like configuration ;-)

David Gay

Chris Thomasson

unread,

Jul 27, 2007, 12:08:33 AM7/27/07

to

"David Gay" <o...@barnowl.research.intel-research.net> wrote in message
news:79ir85y...@barnowl.research.intel-research.net...

>
> Dmitriy V'jukov <dvy...@gmail.com> writes:
>> Are there any modern widespread architectures on which loads or stores
>> to aligned word-sized locations are not atomic?
>
> Pretty much all the 8-bit processors, I'd say. And yes, they are
> widespread, have C compilers, and simple operating systems.

Darn. I forgot about those!

> Though
> they are unlikely to be in any SMP-like configuration ;-)

Indeed.

Bill Todd

unread,

Jul 28, 2007, 9:41:20 AM7/28/07

to

Chris Thomasson wrote:
> "David Gay" <o...@barnowl.research.intel-research.net> wrote in message
> news:79ir85y...@barnowl.research.intel-research.net...
>>
>> Dmitriy V'jukov <dvy...@gmail.com> writes:
>>> Are there any modern widespread architectures on which loads or stores
>>> to aligned word-sized locations are not atomic?
>>
>> Pretty much all the 8-bit processors, I'd say. And yes, they are
>> widespread, have C compilers, and simple operating systems.
>
> Darn. I forgot about those!

Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
the architecture vs. some de facto accepted constant size?

- bill

Eric Smith

unread,

Jul 29, 2007, 3:25:21 PM7/29/07

to

Bill Todd writes:
> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
> the architecture vs. some de facto accepted constant size?

What's the "natural" size of the 6809? 68000? IBM 360? TMS34010?

Motorola documents the size of a "word" for the 68000 to be 16 bits, but
that doesn't seem to be because 16 bits is a more architecturally natural
size than 32 bits. (The data bus and ALU of the original 68000 are 16 bits
wide, but that's an implementation detail, not a characteristic of the
architecture.)

Eric

Nick Maclaren

unread,

Jul 29, 2007, 3:52:24 PM7/29/07

to

In article <m3bqdv8...@donnybrook.brouhaha.com>,

Eric Smith <er...@brouhaha.com> writes:
|> Bill Todd writes:
|> > Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
|> > the architecture vs. some de facto accepted constant size?
|>
|> What's the "natural" size of the 6809? 68000? IBM 360? TMS34010?

Quite. The System/370, for example, had "natural" sizes in the sense
of atomicity and shared memory of 4 and 8 bytes, simultaneously. Yes,
a 32-bit architecture really did have the concept of a 64-bit word,
even though the only atomic operations on that size were floating-point
and some very special ones.

In the beginning was the word. And thereafter, everyone has been
arguing over what that word was.

Regards,
Nick Maclaren.

Eric P.

unread,

Jul 29, 2007, 4:14:17 PM7/29/07

to

I don't think that is sufficient.
For 'word tearing' to be possible it must have:
- a bus that is smaller than the 'word' size so that it
requires multiple bus cycles to transfer a 'word'.
Usually this is connected to a smaller width memory bank.
- the bus ownership must be relinquished between multi-cycle
transfers to a different master.

For example, the 8088 was an 16 bit 8086 that used an 8 bit bus
and presumable 8 bit wide memory bank. It would require 2 bus
cycles to transfer a word. However unless it released the bus
to a new master (dma or other cpu) between aligned 2 byte
transfers the word tearing would not be possible.

Eric

Nick Maclaren

unread,

Jul 29, 2007, 4:34:32 PM7/29/07

to

In article <46ACF519...@sympaticoREMOVE.ca>,

|> I don't think that is sufficient.
|> For 'word tearing' to be possible it must have:
|> - a bus that is smaller than the 'word' size so that it
|> requires multiple bus cycles to transfer a 'word'.
|> Usually this is connected to a smaller width memory bank.
|> - the bus ownership must be relinquished between multi-cycle
|> transfers to a different master.

Nope. You could get it on the System/370, with a 32-bit word and
a 64-bit 'bus', in several different ways. And it came in several
SMP configurations.

Regards,
Nick Maclaren.

Eric P.

unread,

Jul 29, 2007, 4:43:19 PM7/29/07

to

How can value tearing occur when storing a 32 bit aligned value
across a 64 bit bus? There must be something you are leaving out.

Eric

Bill Todd

unread,

Jul 29, 2007, 6:54:03 PM7/29/07

to

Eric Smith wrote:
> Bill Todd writes:
>> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
>> the architecture vs. some de facto accepted constant size?
>
> What's the "natural" size of the 6809? 68000? IBM 360? TMS34010?

Save perhaps for the 6809, I wasn't aware that these were 8-bit
processors (the context in which my comment was made). As for a more
general definition of 'natural word size', don't we go through that
discussion every few years here (usually cutting it short by referring
back to earlier run-arounds)?

- bill

krw

unread,

Jul 29, 2007, 7:18:38 PM7/29/07

to

In article <46ACFBE7...@sympaticoREMOVE.ca>,
eric_p...@sympaticoREMOVE.ca says...

Read-Modify-Write?

--
Keith

krw

unread,

Jul 29, 2007, 7:19:58 PM7/29/07

to

In article <46ACF519...@sympaticoREMOVE.ca>,
eric_p...@sympaticoREMOVE.ca says...

I don't believe it was possible to interrupt the operation mid-write,
at least with out buggering up the hardware.

--
Keith

Chris Thomasson

unread,

Jul 29, 2007, 10:01:33 PM7/29/07

to

"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:f8itko$ik4$1...@gemini.csx.cam.ac.uk...

Yup. I believe a principals of operation manual from IBM stated this under
one of the appendices for their example of multiprocessing free-pool (e.g.,
lock-free stack) implementation examples.

Eric P.

unread,

Jul 29, 2007, 10:32:25 PM7/29/07

to

I took the question to be WRT just loads or stores from multiple cpus.
For example for a 32 bit aligned integer, if one cpu stores
0xAAAAAAAA and another stores 0x55555555 to the same location,
can any cpu ever see 0xAAAA5555 or 0x5555AAAA in that location?
I can't see how except as I outlined.

Concurrent read modify write can loose a value due to overwrite,
but it should loose the whole value not part of it if aligned.

Eric

Nick Maclaren

unread,

Jul 30, 2007, 2:10:25 AM7/30/07

to

In article <46AD4DB9...@sympaticoREMOVE.ca>,
"Eric P." <eric_p...@sympaticoREMOVE.ca> writes:

|> krw wrote:
|> >
|> > > How can value tearing occur when storing a 32 bit aligned value
|> > > across a 64 bit bus? There must be something you are leaving out.
|> >
|> > Read-Modify-Write?
|>

|> I took the question to be WRT just loads or stores from multiple cpus.
|> For example for a 32 bit aligned integer, if one cpu stores
|> 0xAAAAAAAA and another stores 0x55555555 to the same location,
|> can any cpu ever see 0xAAAA5555 or 0x5555AAAA in that location?
|> I can't see how except as I outlined.

As soon as you have a cache, you are very likely to have 'plain'
stores implemented by a read-modify-write. "Not atomic" isn't JUST
about "word tearing" but about not corrupting adjacent locations.

Regards,
Nick Maclaren.

Eric Smith

unread,

Jul 30, 2007, 2:17:45 AM7/30/07

to

Bill Todd wrote:
>>> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
>>> the architecture vs. some de facto accepted constant size?

I wrote:
>> What's the "natural" size of the 6809? 68000? IBM 360? TMS34010?

Bill Todd wrote:
> Save perhaps for the 6809, I wasn't aware that these were 8-bit
> processors (the context in which my comment was made).

You're claiming that 'word' is defined in terms of a 'natural' size
for the architecture on 8-bit processors, but is defined in some other
way for non-8-bit-procesors?

Bill Todd

unread,

Jul 30, 2007, 6:03:08 AM7/30/07

to

No: at this point I'm questioning your ability to understand the
concept of context.

- bill

David Gay

unread,

Jul 30, 2007, 11:52:03 AM7/30/07

to

"Eric P." <eric_p...@sympaticoREMOVE.ca> writes:

> David Gay wrote:
>>
>> Dmitriy V'jukov <dvy...@gmail.com> writes:
>> > Are there any modern widespread architectures on which loads or stores
>> > to aligned word-sized locations are not atomic?
>>
>> Pretty much all the 8-bit processors, I'd say. And yes, they are
>> widespread, have C compilers, and simple operating systems. Though
>> they are unlikely to be in any SMP-like configuration ;-)
>>
>> David Gay
>
> I don't think that is sufficient.
> For 'word tearing' to be possible it must have:
> - a bus that is smaller than the 'word' size so that it
> requires multiple bus cycles to transfer a 'word'.
> Usually this is connected to a smaller width memory bank.
> - the bus ownership must be relinquished between multi-cycle
> transfers to a different master.

True - I was actually thinking of 8-bit processors w/ 16-bit addresses
but no 16-bit loads. Then word (or address, if you want to argue over
what a natural "word" size is) loads tear even w/o worrying about
buses, etc.

The example of a modern arch w/o 16-bit loads is the Atmel AVR. Now that
I think of it, I'm not sure how many 8-bit archs didn't have 16-bit loads.

David Gay

Eric P.

unread,

Jul 30, 2007, 12:07:49 PM7/30/07

to

You don't happen to have a reference handy do you?
I'd like to read this for myself. It'll probably be online somewhere.

Eric

Eric P.

unread,

Jul 30, 2007, 12:09:44 PM7/30/07

to

Ah, ok - I didn't have a pin out for the 8088 handy.
But I do happen to have one for the NS 16032, a 32 bit processor
on a 16 bit bus and it does not appear to present a signal that
would allow a bus arbiter to determine that it was performing
a sequence of accesses and so not relinquish the bus
(though I might have missed something).

Eric

Eric Smith

unread,

Jul 30, 2007, 1:42:35 PM7/30/07

to

Bill Todd wrote:
>>>>> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
>>>>> the architecture vs. some de facto accepted constant size?

I wrote:
>>>> What's the "natural" size of the 6809? 68000? IBM 360? TMS34010?

Bill Todd wrote:
>>> Save perhaps for the 6809, I wasn't aware that these were 8-bit
>>> processors (the context in which my comment was made).

I wrote:
>> You're claiming that 'word' is defined in terms of a 'natural' size
>> for the architecture on 8-bit processors, but is defined in some other
>> way for non-8-bit-procesors?

Bill Todd wrote:
> No: at this point I'm questioning your ability to understand the
> concept of context.

I fully understand that the original context was a question about 8-bit
processors. However, that doesn't explain how it is that your question
quoted above is only relevant to that context and not to other (non-8-bit)
processors. At this point I'm questioning your ability to engage in
rational discourse.

Eric P.

unread,

Jul 30, 2007, 6:17:57 PM7/30/07

to

"Eric P." wrote:
>
> Chris Thomasson wrote:
> >
> > "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
> > news:f8itko$ik4$1...@gemini.csx.cam.ac.uk...
> > >

> > > Nope. You could get it on the System/370, with a 32-bit word and
> > > a 64-bit 'bus', in several different ways. And it came in several
> > > SMP configurations.
> >
> > Yup. I believe a principals of operation manual from IBM stated this under
> > one of the appendices for their example of multiprocessing free-pool (e.g.,
> > lock-free stack) implementation examples.
>
> You don't happen to have a reference handy do you?
> I'd like to read this for myself. It'll probably be online somewhere.
>
> Eric

I found a 390 PofO pdf and it does say the following
under Storage-Operand References

"All bits within a single byte of a fetch-type operand
are accessed concurrently. When an operand
consists of more than one byte, the bytes may be
fetched from storage piecemeal, one byte at a
time. Unless otherwise specified, the bytes are
not necessarily fetched in any particular sequence."

"All bits within a single byte of a store-type operand
are accessed concurrently. When an operand
consists of more than one byte, the bytes may be
placed in storage piecemeal, one byte at a time.
Unless otherwise specified, the bytes are not necessarily
stored in any particular sequence."

so they do grant themselves license to do whatever they like.
However later on they contradict this

"Block-Concurrent References
For some references, the accesses to all bytes
within a halfword, word, or doubleword are specified
to appear to be block concurrent as observed
by other CPUs. These accesses do not necessarily
appear to channel programs to include more
than a byte at a time. The halfword, word, or
doubleword is referred to in this section as a
block. When a fetch-type reference is specified to
appear to be concurrent within a block, no store
access to the block by another CPU is permitted
during the time that bytes contained in the block
are being fetched. Accesses to the bytes within
the block by channel programs may occur
between the fetches. When a store-type reference
is specified to appear to be concurrent within
a block, no access to the block, either fetch or
store, is permitted by another CPU during the time
that the bytes within the block are being stored.
Accesses to the bytes in the block by channel programs
may occur between the stores.

Consistency Specification
For all instructions in the S, RX, or RXE format,
with the exception of CONVERT TO BINARY,
CONVERT TO DECIMAL, LOAD REVERSED,
RESUME PROGRAM, STORE CLOCK
EXTENDED, STORE REVERSED, STORE
SYSTEM INFORMATION, TRAP, and the I/O
instructions, when the operand is addressed on a
boundary which is integral to the size of the
operand, the storage-operand references appear
to be block concurrent as observed by other
CPUs.

For the instructions COMPARE AND SWAP,
COMPARE AND SWAP AND PURGE, and
COMPARE DOUBLE AND SWAP, all accesses to
the storage operand appear to be block concurrent
as observed by other CPUs."

So it would appear that certain loads and stores are 'atomic'
as well as CAS, CASD, etc.

Eric

Eugene Miya

unread,

Jul 30, 2007, 11:03:41 PM7/30/07

to

In article <1185017241....@k79g2000hse.googlegroups.com>,

Dmitriy V'jukov <dvy...@gmail.com> wrote:
>Are there any modern widespread architectures on which loads or stores
>to aligned word-sized locations are not atomic?

Do you get zero(s) with that?

--

Bill Todd

unread,

Jul 31, 2007, 12:07:05 AM7/31/07

to

Eric Smith wrote:
> Bill Todd wrote:
>>>>>> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
>>>>>> the architecture vs. some de facto accepted constant size?
>
> I wrote:
>>>>> What's the "natural" size of the 6809? 68000? IBM 360? TMS34010?
>
> Bill Todd wrote:
>>>> Save perhaps for the 6809, I wasn't aware that these were 8-bit
>>>> processors (the context in which my comment was made).
>
> I wrote:
>>> You're claiming that 'word' is defined in terms of a 'natural' size
>>> for the architecture on 8-bit processors, but is defined in some other
>>> way for non-8-bit-procesors?
>
> Bill Todd wrote:
>> No: at this point I'm questioning your ability to understand the
>> concept of context.
>
> I fully understand that the original context was a question about 8-bit
> processors.

Well, that's a start, at least. Do you also therefore understand why
your original response to my comment was irrelevant?

However, that doesn't explain how it is that your question
> quoted above is only relevant to that context and not to other (non-8-bit)
> processors.

Ah - I guess you don't understand the concept of context after all,
then. Perhaps more careful parsing will help you: my question began
with the words "Wouldn't *that*..." - an explicit reference to the
context that you don't seem to have understood was the specific thrust
of the question.

At this point I'm questioning your ability to engage in
> rational discourse.

I'm just not very patient with people who not only fail to understand
something the first time around but persist in that failure after it has
been pointed out to them twice subsequently, I guess. For a discussion
to be rational, some limited level of comprehension on the part of
*both* parties is usually necessary.

As for the wider topic that you'd apparently have liked to divert
discussion toward, I also addressed that, albeit tangentially, by noting
that it had occurred multiple times here in the past: Google groups is
your friend if you'd like to get up to speed on those deliberations
rather than having the rest of us to go through them yet again for your
benefit.

- bill

Chris Thomasson

unread,

Jul 31, 2007, 1:27:50 AM7/31/07

to

"Eric P." <eric_p...@sympaticoREMOVE.ca> wrote in message
news:46AE0CD5...@sympaticoREMOVE.ca...
>
>
> Chris Thomasson wrote:
[...]

>> > Nope. You could get it on the System/370, with a 32-bit word and
>> > a 64-bit 'bus', in several different ways. And it came in several
>> > SMP configurations.
>>
>> Yup. I believe a principals of operation manual from IBM stated this
>> under
>> one of the appendices for their example of multiprocessing free-pool
>> (e.g.,
>> lock-free stack) implementation examples.
>
> You don't happen to have a reference handy do you?
> I'd like to read this for myself. It'll probably be online somewhere.

http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf

Appendix A/Multi-Processing Examples/Free-Pool Manipulation

Says something about how the 'LM' instruction (e.g., Load Multiple) is
required when you load the anchor (e.g., header) of a lock-free lifo
data-structure.

Nick Maclaren

unread,

Jul 31, 2007, 4:11:59 AM7/31/07

to

In article <46AE6395...@sympaticoREMOVE.ca>,

|> I found a 390 PofO pdf and it does say the following
|> under Storage-Operand References

When it comes to this sort of thing, there have been significant
changes over the years. Since the first SMP System/370 systems,
some loads and stores have been specified as atomic, but the
detailed constraints have varied.

I doubt that even the System/360 to zArch experts here, collectively,
could tell you more than 30% of the detail. The architecture does
span over four decades, half a dozen major architectural revisions
and hundreds of minor ones and individual systems!

Regards,
Nick Maclaren.

Rob Warnock

unread,

Jul 31, 2007, 5:33:24 AM7/31/07

to

Eric P. <eric_p...@sympaticoREMOVE.ca> wrote:
+---------------

| I found a 390 PofO pdf and it does say the following
| under Storage-Operand References

...

+---------------

*Yikes!!* No pointer-sized coherence between CPUs & DMA?!?
[IBM channels are really "smart" DMA...]

Lots of SGI MIPS/Irix DMA devices/drivers absolutely depended
for proper operation on pointer-sized coherence -- or what's
callled here "Block-Concurrence" -- between the CPUs *and*
between the CPUs & DMA, especially for DMA request descriptor
and status rings, which might have simultaneous fetches/stores
going on by both CPUs & DMA. [One of the best ways to limit the
interrupt rate is to poll the DMA descriptor rings for changes
before bothering with an interrupt or "go!" doorbell write.]
Ditto several other platforms I've worked on.

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Joe Seigh

unread,

Jul 31, 2007, 6:16:30 AM7/31/07

to

LM could be used to atomically load a register pair with doubleword
aligned operands on 32 bit machines. Not really necessary with
compare and swap algorithms where the update value is derived from
the compare value. You could load the compare value with a random
value and the algorithm would still be correct although the run
time might be lengthed a bit until the right random value came along.

Some of the lock implementations would optimize for the lock not
held case by assuming a lockword of zero and just zeroing the compare
register rather than loading the lockword. If the lock was held, well
you're going to be on the slow path anyway.

On the Intel processors where there's no convenient way to atomically
load a doubleword, you need to stick with algorithms that modify the
compare value which fortunately most of them are.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Nick Maclaren

unread,

Jul 31, 2007, 6:25:34 AM7/31/07

to

In article <2_Dri.7939$Q85.1692@trndny02>,

Joe Seigh <jsei...@xemaps.com> writes:
|>
|> LM could be used to atomically load a register pair with doubleword

|> aligned operands on 32 bit machines. ...

Yes and no. While that was true on almost all (all?) System/370s,
it wasn't part of the architecture as such, but of the implementation
details. My understanding is that it wasn't true on the System/360
SMP systems.

It also wasn't true on System/370s when you came to I/O versus CPU
conflicts, but they had the property that you could get the cache
out of step with the memory. A colleague of mine found that when
reading the microcode, I implemented a program that showed it, but
we never reported it as we agreed that no sane programmer would ever
notice the effect :-)

Regards,
Nick Maclaren.

Guy Macon

unread,

Jul 31, 2007, 7:14:10 AM7/31/07

to

Bill Todd wrote:

>I'm just not very patient with people who not only fail to understand
>something the first time around but persist in that failure after it has
>been pointed out to them twice subsequently, I guess.

And yet you still read Usenet newsgroups... :)

--
Guy Macon
<http://www.guymacon.com/>

Chris Thomasson

unread,

Jul 31, 2007, 7:16:56 AM7/31/07

to

"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message

news:f8n2mu$j0a$1...@gemini.csx.cam.ac.uk...

Yikes! Insane programmers are some of the most innovative minds around?

;^) Kidding of course...

Chris Thomasson

unread,

Jul 31, 2007, 7:21:06 AM7/31/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:6aqdnQWR-PJrhDLb...@comcast.com...

Humm... Well, imho, insane programmers have the inherent ability to
purposely,or accidentally, create some of the most very interesting
race-conditions imaginable; indeed!

;^)

Chris Thomasson

unread,

Jul 31, 2007, 7:26:01 AM7/31/07

to

"Guy Macon" <http://www.guymacon.com/> wrote in message
news:h7Cdnd2yG-4...@giganews.com...

>
>
>
> Bill Todd wrote:
>
>>I'm just not very patient with people who not only fail to understand
>>something the first time around but persist in that failure after it has
>>been pointed out to them twice subsequently, I guess.
>
> And yet you still read Usenet newsgroups... :)

Well, USENET can be useful. Humm, some people don't even bother to look:

http://groups.google.com/group/comp.programming.threads/msg/6bd932519c061b77

http://groups.google.com/group/comp.programming.threads/msg/bbfaa7b4f802598f

Sad state of affairs out in the wild public these days? na... USENET along
with its high quality groups such as this one is well respected in the
academic community.

Nick Maclaren

unread,

Jul 31, 2007, 7:42:58 AM7/31/07

to

In article <QOmdnZTjDcdyhzLb...@comcast.com>,

Yes. My colleague and I would never have dreamed of writing real
code that might trigger that effect, because it would have been
asking for trouble, and doing that is stupid. The authors of
program fetch were similarly cautious with code that could easily
have done so.

That made it different from the things that cause serious problems,
where code that 'obviously' should work, doesn't.

Regards,
Nick Maclaren.

Michel Hack

unread,

Jul 31, 2007, 10:51:20 AM7/31/07

to

On Jul 31, 5:33 am, r...@rpw3.org (Rob Warnock) wrote:

> Eric P. <eric_patti...@sympaticoREMOVE.ca> wrote:
>
> +---------------
> | I found a 390 PofO pdf and it does say the following
> | under Storage-Operand References
> ...
> | "Block-Concurrent References
> | For some references, the accesses to all bytes
> | within a halfword, word, or doubleword are specified
> | to appear to be block concurrent as observed
> | by other CPUs. These accesses do not necessarily
> | appear to channel programs to include more
> | than a byte at a time.
> +---------------
>
> *Yikes!!* No pointer-sized coherence between CPUs & DMA?!?
> [IBM channels are really "smart" DMA...]
>
> Lots of SGI MIPS/Irix DMA devices/drivers absolutely depended
> for proper operation on pointer-sized coherence -- or what's
> callled here "Block-Concurrence" -- between the CPUs *and*
> between the CPUs & DMA, especially for DMA request descriptor
> and status rings, which might have simultaneous fetches/stores
> going on by both CPUs & DMA. [One of the best ways to limit the
> interrupt rate is to poll the DMA descriptor rings for changes
> before bothering with an interrupt or "go!" doorbell write.]
> Ditto several other platforms I've worked on.

Well, old "parallel" channels really did read&write one byte at
a time, so the architectural statement was necessary. Programs
would have to copy (using CPU instructions with better atomicity
properties) to & from private buffers if this was an issue.

The channel interface is not intended for the type of I/O that
can be done with memory-mapped devices, even though 20 years
ago things like CETI (Continuously Executing Transfer Interface)
went part of the way towards that model.

OTOH, IBM channels have something no other interface has: the
change and reference bits of touched pages are set by channel
access. Win some and lose some, I guess.

Michel.

Eric P.

unread,

Jul 31, 2007, 1:36:01 PM7/31/07

to

The problem I have is that I cannot imagine a memory or
cache circuit design which would function as you describe or
as the PofO says "When an operand consists of more than one byte,

the bytes may be fetched from storage piecemeal, one byte at a
time. Unless otherwise specified, the bytes are
not necessarily fetched in any particular sequence."

This description just does not match any memory controller or
cache design I am familiar with.

Even considering by-byte channel IO I cannot see how the piecemeal
access could actually occur in real life.

So I was thinking this was a bureaucratic spec being intentionally
wishy washy to allow maximum future flexibility even though no one
in their right mind would ever implement a circuit that way.

However it sounds from your other msgs that in fact they did
implement this at some point so I am curious how it worked.

Eric

Nick Maclaren

unread,

Jul 31, 2007, 2:51:31 PM7/31/07

to

In article <46AF7301...@sympaticoREMOVE.ca>,

"Eric P." <eric_p...@sympaticoREMOVE.ca> writes:
|> |> The problem I have is that I cannot imagine a memory or
|> cache circuit design which would function as you describe or
|> as the PofO says "When an operand consists of more than one byte,
|> the bytes may be fetched from storage piecemeal, one byte at a
|> time. Unless otherwise specified, the bytes are
|> not necessarily fetched in any particular sequence."
|> This description just does not match any memory controller or
|> cache design I am familiar with.
|>
|> Even considering by-byte channel IO I cannot see how the piecemeal
|> access could actually occur in real life.

Let me guess - you started in IT after 1975? :-)

|> So I was thinking this was a bureaucratic spec being intentionally
|> wishy washy to allow maximum future flexibility even though no one
|> in their right mind would ever implement a circuit that way.
|>
|> However it sounds from your other msgs that in fact they did
|> implement this at some point so I am curious how it worked.

Think interleaving. Back before the days of commodity DIMMs or even
back when memory was real core, when the limit was bandwidth and
not latency, why shouldn't memory have been interleaved at the byte
level? And, with separate memory controllers for each byte in a
doubleword, what that specification describes is exactly what you
are likely to get.

Regards,
Nick Maclaren.

Eric Smith

unread,

Jul 31, 2007, 2:55:56 PM7/31/07

to

Bill Todd wrote:
> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
> the architecture vs. some de facto accepted constant size?

and later:

> No: at this point I'm questioning your ability to understand the
> concept of context.

and later:

> Ah - I guess you don't understand the concept of context after all,
> then. Perhaps more careful parsing will help you: my question began
> with the words "Wouldn't *that*..." - an explicit reference to the
> context that you don't seem to have understood was the specific thrust
> of the question.

And that was why I asked why you thought your question quoted above
was relevant only to 8-bit processors (the context you keep trumpeting)
and not in general.

In particular, saying that 'word' should be a 'natural' size for the
architecture on 8-bit machines begs the question, because there's no
obvious way to determine whether the machine is 8-bit without knowing
its word size.

> I'm just not very patient with people who not only fail to understand
> something the first time around but persist in that failure after it
> has been pointed out to them twice subsequently, I guess.

I'm not very patient with people that insist that what they posted
makes sense, while pointedly refusing to address any issues others
raise with it.

> As for the wider topic that you'd apparently have liked to divert
> discussion toward, I also addressed that, albeit tangentially, by
> noting that it had occurred multiple times here in the past: Google
> groups is your friend

I have absolutely *no* interest in discussing what "word length"
means, except in so far as pointing out that you effectively were
trying to define it in your earlier posting, and doing a bad job
of it.

Eric

Bill Todd

unread,

Aug 1, 2007, 9:48:09 AM8/1/07

to

Eric Smith wrote:
> Bill Todd wrote:
>> Wouldn't that depend upon how one defined 'word' - i.e., 'natural' for
>> the architecture vs. some de facto accepted constant size?
>
> and later:
>> No: at this point I'm questioning your ability to understand the
>> concept of context.
>
> and later:
>> Ah - I guess you don't understand the concept of context after all,
>> then. Perhaps more careful parsing will help you: my question began
>> with the words "Wouldn't *that*..." - an explicit reference to the
>> context that you don't seem to have understood was the specific thrust
>> of the question.
>
> And that was why I asked why you thought your question quoted above
> was relevant only to 8-bit processors (the context you keep trumpeting)
> and not in general.

Dear me - you appear to be utterly ineducable: it related only to 8-bit
processors because I (quite deliberately) chose to phrase it that way.
Had I wished to introduce a wider topic, rather than simply question the
specific statements to which I was responding, I would have done so.

- bill

Eric Smith

unread,

Aug 1, 2007, 10:52:03 PM8/1/07

to

Bill Todd <bill...@metrocast.net> writes:
> Dear me - you appear to be utterly ineducable: it related only to
> 8-bit processors because I (quite deliberately) chose to phrase it
> that way. Had I wished to introduce a wider topic, rather than simply
> question the specific statements to which I was responding, I would
> have done so.

What you said was effectively "on a processor with an 8-bit word size,
the word size is a natural size for the architecture". Depending on
how you define word size, that either makes no sense, or is a
tautology. In either case, your post was just a waste of bandwidth,
and I regret having used even more bandwidth trying to find out
whether you actually had some worthwhile thought to share, since
clearly you don't.

Bill Todd

unread,

Aug 2, 2007, 5:31:48 PM8/2/07

to

Eric Smith wrote:
> Bill Todd <bill...@metrocast.net> writes:
>> Dear me - you appear to be utterly ineducable: it related only to
>> 8-bit processors because I (quite deliberately) chose to phrase it
>> that way. Had I wished to introduce a wider topic, rather than simply
>> question the specific statements to which I was responding, I would
>> have done so.
>
> What you said was effectively "on a processor with an 8-bit word size,
> the word size is a natural size for the architecture".

Ah: perhaps your problem is that you have difficulty understanding
basic English. What I *actually* suggested was that the applicability
of the assertion about 8-bit processors seemed to depend upon *whether*
one was talking about a processor's 'natural' word size or some
commonly-used word size (such as 16 or 32 bits).

- bill

Eric Smith

unread,

Aug 2, 2007, 8:48:50 PM8/2/07

to

I wrote:
>> What you said was effectively "on a processor with an 8-bit word size,
>> the word size is a natural size for the architecture".

Bill Todd wrote:
> Ah: perhaps your problem is that you have difficulty understanding
> basic English. What I *actually* suggested was that the applicability
> of the assertion about 8-bit processors seemed to depend upon
> *whether* one was talking about a processor's 'natural' word size or
> some commonly-used word size (such as 16 or 32 bits).

Yes, fine, I agree that you expressed it as a question rather than a
statement.

That still doesn't explain how it makes sense only in the context
(which you insist is so terribly important) of 8-bit processors. Why
shouldn't it apply equally well to 16-bit processors, or 27-bit
processors?

And if it does only make sense in the context of 8-bit processors, how do
you determine whether a particular processor is an 8-bit processor, in
order to know whether your question applies?

In other words, your question still makes no sense.

Bill Todd

unread,

Aug 3, 2007, 5:51:55 AM8/3/07

to

Eric Smith wrote:
> I wrote:
>>> What you said was effectively "on a processor with an 8-bit word size,
>>> the word size is a natural size for the architecture".
>
> Bill Todd wrote:
>> Ah: perhaps your problem is that you have difficulty understanding
>> basic English. What I *actually* suggested was that the applicability
>> of the assertion about 8-bit processors seemed to depend upon
>> *whether* one was talking about a processor's 'natural' word size or
>> some commonly-used word size (such as 16 or 32 bits).
>
> Yes, fine, I agree that you expressed it as a question rather than a
> statement.

No, you nitwit: what I expressed (a choice between two possibilities)
was entirely different from the characterization (a tautology) which you
gave it above.

But apparently you just can't let it go even after having just expressed
regret about having persisted as long as your previous post. Stubborn
incompetence and/or incomprehension combined with lack of self-control:
are you a member of the Bush Administration, by any chance?

- bill

Jan Vorbrüggen

unread,

Aug 13, 2007, 5:37:04 AM8/13/07

to

> OTOH, IBM channels have something no other interface has: the
> change and reference bits of touched pages are set by channel
> access. Win some and lose some, I guess.

Hmmm...as the VAX/VMS CI interfaces used the OS's page tables for their
operation, I suspect they did that as well, but I don't know for sure. Can't
think how they could operate properly without so doing, however.

Jan

Michel Hack

unread,

Aug 13, 2007, 1:54:34 PM8/13/07

to

On Aug 13, 5:37 am, Jan Vorbrüggen <jvorbrueg...@not-mediasec.de>
wrote:

I don't know either, but I suspect the I/O supervisor would set the
bits explicitly to reflect the intended I/O, not necessarily the
actual
I/O, unless it explicitly checked whether short records crossed a page
boundary (perhaps the notion of short records did not exist).

Under VM/370 one could write channel programs that worked with both
real
and virtual card readers. Virtual card readers could read either
punch
spool files (virtual cards) or print spool files (print line images),
of
different length (80 vs 132 bytes). Real card readers would deliver
the
record length only after physical motion had stopped ("Device End"),
so
determining length read from the Channel Status Word could have been
very
slow. So we would allocate the buffer so as to cross a page boundary
in
the printer-only tail, and could tell from the Change bit which type
of
virtual card was read, without having to wait for Device End.

Michel.

Terje Mathisen

unread,

Aug 14, 2007, 1:29:42 AM8/14/07

to

Michel Hack wrote:
> different length (80 vs 132 bytes). Real card readers would deliver
> the record length only after physical motion had stopped ("Device

> End"),so determining length read from the Channel Status Word could

> have been very slow. So we would allocate the buffer so as to cross
> a page boundary in the printer-only tail, and could tell from the

> change bit which type of virtual card was read, without having to wait
> for Device End.

Ouch!

Today that would have given you an extra (real) page fault, I really
don't think that style of programming is coming back, at least not for
commonly taken paths. (As a zero-overhead way of getting exception
handling capability it is great!)

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Michel Hack

unread,

Aug 15, 2007, 7:39:03 PM8/15/07

to

On Aug 14, 1:29 am, Terje Mathisen <terje.mathi...@hda.hydro.com>
wrote:

> Michel Hack wrote:
> > different length (80 vs 132 bytes). Real card readers would deliver
> > the record length only after physical motion had stopped ("Device
> > End"),so determining length read from the Channel Status Word could
> > have been very slow. So we would allocate the buffer so as to cross
> > a page boundary in the printer-only tail, and could tell from the
> > change bit which type of virtual card was read, without having to wait
> > for Device End.
>
> Ouch!
>
> Today that would have given you an extra (real) page fault,

Not really; the two pages would be re-used, so they would both be
resident.
This is running in a guest. When running on a real machine, there
would be
no paging (I/O is to real, or rather absolute, addresses) -- and in
any case
the second page wouldn't be touched.

Another solution is to use a different path for real and virtual card
readers;
the virtual ones can be read more efficiently via hypervisor calls --
but we
were purists and wanted our code to be as independent of the
hypervisor as we
could.

Michel.