How the pages tables of each segment is located

Thierr...@googlemail.com

unread,

Nov 23, 2007, 5:35:18 PM11/23/07

to

Hey,

I am just confusing about the following and i hope someone can clear
it out for me :)
I have read about paging and i understand that in pentium the CR3
register holds the starting address of the page table in the memory.
This register is generally called the Page Table Base Register (PTBR)

Pure segmentation in based maping needs also a table however it is
often fully cached in the processor as its size tend to be small.

Now it seems the modern processor instruction set , implicitely or
explicitely deals with different segments in the application, i.e the
code is segmented. we don't have only one linera adress space but
many. However, windows NT and linux both ignore segmentation memory
mapping since according to the documentation all the application
segments descriptors are equal, they are all mapped into the same
linear space with base =0 and limit 4GB ( 32 bits processors).

Having said that how the page table of each segment can be retrieved
as there is only one PTBR in the system? there should not be only one
PTBR!

thank you

Joe Pfeiffer

unread,

Nov 24, 2007, 12:12:05 AM11/24/07

to

Thierr...@googlemail.com writes:
>
> Having said that how the page table of each segment can be retrieved
> as there is only one PTBR in the system? there should not be only one
> PTBR!

The Intel scheme doesn't use a page table per segment -- it uses the
segmentation hardware to map to a linear virtual address space, and
then uses the paging system to map this to physical memory.

Tim Roberts

unread,

Nov 24, 2007, 12:29:25 AM11/24/07

to

Thierr...@googlemail.com wrote:
>
>Pure segmentation in based maping needs also a table however it is
>often fully cached in the processor as its size tend to be small.
>
>Now it seems the modern processor instruction set , implicitely or
>explicitely deals with different segments in the application, i.e the
>code is segmented. we don't have only one linera adress space but
>many.

No. There is only one linear address space. Different segments can start
at different offsets within that linear space (and can overlap), but the
result is always a simple linear address.

>However, windows NT and linux both ignore segmentation memory
>mapping since according to the documentation all the application
>segments descriptors are equal, they are all mapped into the same
>linear space with base =0 and limit 4GB ( 32 bits processors).

Yes.

>Having said that how the page table of each segment can be retrieved
>as there is only one PTBR in the system? there should not be only one
>PTBR!

There is only one set of page tables. You start with a segment number and
offset (ie, CS:123456). You add the starting address for that segment to
the offset, and that produces a linear address. You look up the linear
address in the page tables, and that produces a physical address. The
physical address goes out on the bus.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Thierr...@googlemail.com

unread,

Nov 24, 2007, 6:14:22 AM11/24/07

to

> >However, windows NT and linux both ignore segmentation memory
> >mapping since according to the documentation all the application
> >segments descriptors are equal, they are all mapped into the same
> >linear space with base =0 and limit 4GB ( 32 bits processors).
>
> Yes.
>

> >Having said that how the page table ofeachsegmentcan be retrieved

> >as there is only one PTBR in the system? there should not be only one
> >PTBR!
>

> There is only one set of pagetables. You start with asegmentnumber and

> offset (ie, CS:123456). You add the starting address for thatsegmentto
> the offset, and that produces a linear address. You look up the linear

> address in the pagetables, and that produces a physical address. The

> physical address goes out on the bus.
> --

> Tim Roberts, t...@probo.com
> Providenza & Boekelheide, Inc.

Many thanks for your reply. I said above and you agreed on that Linux
and Windows ignore segmentation by setting the base in the segment
descriptor to zero. therefore,the linear addresses of CS:123456 and DS:
123456 are the same and equal to Base+123456= 123456. As such any
address of the code segment, data segment and any other segment will
point to the same entry in the page table. how this can happen? sorry
but i am still confused

Thierr...@googlemail.com

unread,

Nov 24, 2007, 6:17:55 AM11/24/07

to

On Nov 24, 5:12 am, Joe Pfeiffer <pfeif...@cs.nmsu.edu> wrote:
> ThierryBi...@googlemail.com writes:
>
> > Having said that how the page table ofeachsegmentcan be retrieved

> > as there is only one PTBR in the system? there should not be only one
> > PTBR!
>

> The Intel scheme doesn't use a page table persegment-- it uses the

> segmentation hardware to map to a linear virtual address space, and
> then uses the paging system to map this to physical memory.

Thanks Joe.... and since windows and linux ignore segmentation by
setting the base value of all descriptors to zeros, how can we protect
between the different segments. as i said above Cs: offset, DS: offset
will be translated to the value offset then passed to the page
table....where it will index the same entry!!

Matt

unread,

Nov 24, 2007, 6:24:20 AM11/24/07

to

Yes, they all point to the same place in the same pagetable, because
they access the same byte of memory. The only difference is what you are
allowed to do to the byte. If it is accessed through the CS, then you
can execute it. If it is accessed through the DS, then you can
read/write it. What you can do is then controlled further by the
settings of the writable/executable/kernel/user flags in the descriptor
used. Thus, if you access the byte through the CS, but the CS is set to
be a user segment, then if the pagetable lists the page as either data,
or kernel data, or kernel code, then you will get a fault.

In other words, the pagetable defines what you type of memory something
is, whereas the segment descriptor defines what you are trying to do to
the memory. Thus, if your user code is given only a user data and a user
code descriptor, (remember that it cannot create its own descriptors)
then it can only access pages marked as user data or user code. If the
descriptor and the pagetable entry do not match according to a specific
set of rules, then a fault occurs.

Hope that helps.

Matt

Maxim S. Shatskih

unread,

Nov 24, 2007, 9:28:33 AM11/24/07

to

> allowed to do to the byte. If it is accessed through the CS, then you
> can execute it.

Read too :)

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
ma...@storagecraft.com
http://www.storagecraft.com

Matt

unread,

Nov 24, 2007, 12:19:05 PM11/24/07

to

Maxim S. Shatskih wrote:
>> allowed to do to the byte. If it is accessed through the CS, then you
>> can execute it.
>
> Read too :)
>

True,

but you can also use the pagetable to limit to execute only, with no
read. (probably screws up any compilers that store constants in the code
segments, but can be done.)

Matt

Thierr...@googlemail.com

unread,

Nov 24, 2007, 5:02:35 PM11/24/07

to

On Nov 24, 11:24 am, Matt <travellingmatt2...@yahoo.co.uk> wrote:

> be a usersegment, then if the pagetable lists the page as either data,

> or kernel data, or kernel code, then you will get a fault.
>
> In other words, the pagetable defines what you type of memory something

> is, whereas thesegmentdescriptor defines what you are trying to do to

> the memory. Thus, if your user code is given only a user data and a user
> code descriptor, (remember that it cannot create its own descriptors)

> then it can only accesspagesmarked as user data or user code. If the

> descriptor and the pagetable entry do not match according to a specific
> set of rules, then a fault occurs.
>
> Hope that helps.
>

> Matt- Hide quoted text -
>
> - Show quoted text -

Thanks Matt for your answer but i am still confused despite trying to
look over more online materials. May i give the following example,
assume a user application with 2 segments: code and data. each segment
has 4 pages. therefore the operating systems will create a page table
of 8 valid entries. however the page number from the code segment
ranges from 0 to 3 as does the page number from the data segment. they
are fully overlapped and half of the page table and as such the
application physical address space can't be addressed. i really don't
know what i am missing...

John L

unread,

Nov 24, 2007, 6:07:19 PM11/24/07

to

>assume a user application with 2 segments: code and data. each segment
>has 4 pages. therefore the operating systems will create a page table
>of 8 valid entries.

No, the OS creates a page table of four entries. In the usual 32 bit
flat model, the code and data segments map to the same place in the
linear address space. If a program makes a code reference to CS:100,
the segment setup will map that to location 100 in the linear address
space, which is in page 0, so it's 100 bytes into wherever the OS maps
page 0. If a program makes a data reference to DS:100, it is mapped
to the exact same place, location 100 in the linear address space,
which being the same location is also 100 bytes into the mapped page
zero.

For most purposes, on the 32 bit x86 it's easiest to pretend that the
segments don't exist at all. The OS generally sets up all the segments
to do a 1:1 mapping of each segment into the linear address space.

R's,
John

Thierr...@googlemail.com

unread,

Nov 24, 2007, 6:27:23 PM11/24/07

to

On Nov 24, 11:07 pm, jo...@iecc.com (John L) wrote:
> >assume a user application with 2 segments: code and data.eachsegment

> >has 4pages. therefore the operating systems will create a page table

> >of 8 valid entries.
>
> No, the OS creates a page table of four entries. In the usual 32 bit
> flat model, the code and data segments map to the same place in the
> linear address space. If a program makes a code reference to CS:100,

> thesegmentsetup will map that to location 100 in the linear address

> space, which is in page 0, so it's 100 bytes into wherever the OS maps
> page 0. If a program makes a data reference to DS:100, it is mapped
> to the exact same place, location 100 in the linear address space,
> which being the same location is also 100 bytes into the mapped page
> zero.
>
> For most purposes, on the 32 bit x86 it's easiest to pretend that the
> segments don't exist at all. The OS generally sets up all the segments

> to do a 1:1 mapping ofeachsegmentinto the linear address space.
>
> R's,
> John

:) thanks John. but after reading your post i got confused further.
Your explanation means that DS:100 and CS:100 will point to the same
byte/word in the physical memory (as they transit through the same
page table entry and have both the same relative offset) although that
physical location should hold two separate items...please forgive my
ignorance...it seems i am missing an elementary concept

Matt

unread,

Nov 24, 2007, 7:02:00 PM11/24/07

to

They DO hold the same information. The only reason for having a CS and
DS is not to separate what they access, but what you are asking to do
with the information. If you want data and code in different places,
(which is usual) you put them in different places by linking them to
different addresses.

There are two totally different mechanisms for protecting information.

1) Putting it in NON-overlapping segments. Thus CD:100 and DS:100 are
different addresses, in different pages. You separate your 4Gb address
space into chunks which are then given to processes. Each chunk
(segment) has a specific type, such as user data, kernel code etc. Each
is seen by the code as starting at 0, and ending at some arbitrary number.

2) Putting in overlapping segments, but separating the
code/data/kernel/user stuff by actually giving them different addresses.
The pagetable then controls the protection, more than the
segmentation, and each segment starts and ends at some arbitrary value.
This is not a problem, as you usually compile/link your code to use
separate addresses for code and data.

Most compilers are written for the second mechanism, and assume that all
code and data segments cover the entire address range. Separating
processes is then a case of changing the pagetable pointer, so that all
code thinks it is running at the same addresses, and accessing data at
the same addresses, regardless of where it is loaded in real memory.

Matt

John L

unread,

Nov 24, 2007, 7:53:38 PM11/24/07

to

>:) thanks John. but after reading your post i got confused further.
>Your explanation means that DS:100 and CS:100 will point to the same
>byte/word in the physical memory (as they transit through the same
>page table entry and have both the same relative offset)

Right. It's a single flat address space.

> although that physical location should hold two separate items...

No, it shouldn't. This is a Von Neumann architecture, not a Harvard
architecture.

Although it is possible to set up 386 segments so that the code and
data map to different places, nobody does so. On the 286 you had to
do separate code and data segments because few programs could fit all
of the code and data into a single 64K segment. On the 386 a single
segment can map the entire 32 bit linear address space, so that's what
we do.

As someone else noted, the way we keep code and data separate is to
put them at different addresses, and we can use page protection to
(mostly) prohibit broken programs from writing into their code.

Ciaran Keating

unread,

Nov 24, 2007, 8:21:26 PM11/24/07

to

On Sun, 25 Nov 2007 10:27:23 +1100, <Thierr...@googlemail.com> wrote:

> Your explanation means that DS:100 and CS:100 will point to the same
> byte/word in the physical memory (as they transit through the same
> page table entry and have both the same relative offset) although that
> physical location should hold two separate items...

Indeed, at first glance it does look like something crazy's going on. The
secret is that if page 0 is a code page then your memory manager will
never return a block of memory containing the linear address 100. And so
your compiler will (should) never generate a reference to DS:100.

When your memory manager allocates memory and returns a linear address,
that address encodes the PDE, PTE and offset. The only way this address
could be 100 is if PDE=0, PTE=0, offset=100. That implies the first page
in the first page table. If the first page was previously allocated for
code, then the memory manager won't return any address in that range when
it allocates heap memory - it might map page 1 and return the address
PDE=0, PTE=1, offset=100.

So the two addresses you're thinking of (the 100th byte in code and the
100th byte in data) will look something like this (illustration only, I
haven't done the arithmetic):

CS:100 -> PDE=0, PTE=0, offset=100 -> linear 0x00000064
DS:4196 -> PDE=0, PTE=1, offset=100 -> linear 0x00001064

Cheers,
Ciaran

--
Ciaran Keating
Amadan Technologies

Anne & Lynn Wheeler

unread,

Nov 24, 2007, 9:03:16 PM11/24/07

to

jo...@iecc.com (John L) writes:
> No, it shouldn't. This is a Von Neumann architecture, not a Harvard
> architecture.
>
> Although it is possible to set up 386 segments so that the code and
> data map to different places, nobody does so. On the 286 you had to
> do separate code and data segments because few programs could fit all
> of the code and data into a single 64K segment. On the 386 a single
> segment can map the entire 32 bit linear address space, so that's what
> we do.
>
> As someone else noted, the way we keep code and data separate is to
> put them at different addresses, and we can use page protection to
> (mostly) prohibit broken programs from writing into their code.

and the newer "no-execute" ... countermeasure for (buffer overflow)
attacks that polute data areas with executable instructions and then
attempt to get execution transferred there.

Researcher: CPU No-Execute Bit Is No Big Security Deal
http://www.techweb.com/wire/security/166403451
'No Execute' Flag Waves Off Buffer Attacks
http://www.washingtonpost.com/wp-dyn/articles/A55209-2005Feb26.html
What's the new /NoExecute switch that's added to the boot.ini file
http://www.windowsitpro.com/Article/ArticleID/46302/46302.html
CPU-Based Security: The NX Bit
http://hardware.earthweb.com/chips/article.php/3358421
A detailed description of the Data Execution Prevention (DEP) feature in
Windows XP Service Pack 2, Windows XP Tablet PC Edition 2005, and
Windows Server 2003
http://support.microsoft.com/kb/875352

misc. past posts mentioning buffer overflow
http://www.garlic.com/~lynn/subintegrity.html#overflow

Thierr...@googlemail.com

unread,

Nov 24, 2007, 9:11:17 PM11/24/07

to

On Nov 25, 12:53 am, jo...@iecc.com (John L) wrote:
> >:) thanks John. but after reading your post i got confused further.
> >Your explanation means that DS:100 and CS:100 will point to the same
> >byte/word in the physical memory (as they transit through the same
> >page table entry and have both the same relative offset)
>
> Right. It's a single flat address space.
>
> > although that physical location should hold two separate items...
>
> No, it shouldn't. This is a Von Neumann architecture, not a Harvard
> architecture.
>
> Although it is possible to set up 386 segments so that the code and
> data map to different places, nobody does so. On the 286 you had to
> do separate code and data segments because few programs could fit all

> of the code and data into a single 64Ksegment. On the 386 a singlesegmentcan map the entire 32 bit linear address space, so that's what

> we do.
>
> As someone else noted, the way we keep code and data separate is to
> put them at different addresses, and we can use page protection to
> (mostly) prohibit broken programs from writing into their code.

OK i think i understand now. Many thanks to you all. Considering
windows and linux memory mapping in IA32, most of the compilers opt
for the flat memory model. As such, the offset "X" in an instruction
with operand DS:X is actually relative to the beginning of the entire
flat logical address space of the application and not the Data
segment. This does not match the segmentation section in the
silberschatz OS textbook! actually he talks more about the theoritical
segmentation concept

Please Is there any reference or textbook that i can consult
addressing compilers code generation practical implementation
techniques , e.g flat model, segmented model , etc

Many thanks

John L

unread,

Nov 24, 2007, 11:45:37 PM11/24/07

to

>flat logical address space of the application and not the Data
>segment. This does not match the segmentation section in the
>silberschatz OS textbook! actually he talks more about the theoritical
>segmentation concept

That's not surprising--segmented architectures have long been more
popular in academia than in real life. The idea of tidily
constraining each address to its proper segment seems appealing, and
works OK for the toy programs you write in programming classes, but
doesn't work very well in interestingly large practical programs.

Segments worked adequately on Multics, back when both memories and
addresses were small, but the Intel chips killed them dead. On the
286, the segments were too small for a lot of data structures, and
poorly chosen bit layout in segment selectors made it needlessly hard
to use multiple segments for a single array or data structure. Also,
loading segment registers on a 286 was very slow, so there was a
performance advantage if you designed your program to use as few
segments as possible. The 386 segments were just as slow, and Intel's
decision to do paging in a single linear address space rather than per
segment made the segments all but useless since unlike the 286, using
segments didn't let you address more memory than the "tiny model" that
uses one code segment and one data segment both mapped to the same
memory. IBM 390 mainframes had an addressing mode similar to what the
386 would have been if there were a page table per segment, but the
newer Z series has flat 64 bit addressing.

>Please Is there any reference or textbook that i can consult
>addressing compilers code generation practical implementation
>techniques , e.g flat model, segmented model , etc

Compiler books generally assume flat model addressing unless they
specifically say otherwise, and in any event I can't think of many
compiler techniques useful for segmented addressing. My book "Linkers
and Loaders" might be useful to help understand how programs and data
are laid out in memory.

R's,
John

Joe Pfeiffer

unread,

Nov 25, 2007, 1:29:40 AM11/25/07

to

Thierr...@googlemail.com writes:

Exactly. We just don't worry about CS, DS, etc and deal with
protection strictly at the paging level.

Joe Pfeiffer

unread,

Nov 25, 2007, 1:35:10 AM11/25/07

to

Thierr...@googlemail.com writes:
>
> :) thanks John. but after reading your post i got confused further.
> Your explanation means that DS:100 and CS:100 will point to the same
> byte/word in the physical memory (as they transit through the same
> page table entry and have both the same relative offset) although that
> physical location should hold two separate items...please forgive my
> ignorance...it seems i am missing an elementary concept

Near as I can tell, you're assuming the segmentation is relevant --
the whole point of the way Windows and Linux do their VM is that it's
not. So CS:100 and DS:100 do indeed point to the same place: the
same place in the linear address space since CS and DS are both
zeroed, and then the same place in the physical memory since the
paging system only has one set of page tables per process.

It's just treating the whole thing as a single, linear, per-process
address space. So you're perfectly free to try to read an instruction
from address 100, or read or write data from address 100. But if you
try to do both in a single program, you're either doing something very
weird or you've got a bug (and many people would argue that you're
trying to do something weird enough that it constitutes a bug...).

l_c...@juno.com

unread,

Nov 25, 2007, 1:36:28 AM11/25/07

to

On Nov 24, 8:45 pm, jo...@iecc.com (John L) wrote:
> >flat logical address space of the application and not the Data
> >segment. This does not match the segmentation section in the
> >silberschatz OS textbook! actually he talks more about the theoritical
> >segmentation concept
>
> That's not surprising--segmented architectures have long been more
> popular in academia than in real life. The idea of tidily
> constraining each address to its proper segment seems appealing, and
> works OK for the toy programs you write in programming classes, but
> doesn't work very well in interestingly large practical programs.
>

Actually, segmentation works quite
well when it has the right hardware
and programmer mindset support.
Unisys 2200 XPA-series mainframes
which use segmentation (i.e.
"banking" in Unisys-speak) on top
of a very large (2**57 word) paged
intermediate address space are
still around and have been doing
useful work since their introduced
in the 1990's.
And the last time I looked, the OS
for these Unisys 2200 series was
no toy and consisted several million
lines of code.
Some regard Linux as a toy by
comparison.

> Segments worked adequately on Multics, back when both memories and

> addresses were small, but the Intel chips killed them dead. ...

IIRC Multics was pretty much dead
before the 8008 even hit the fan.

> ... On the

> 286, the segments were too small for a lot of data structures, and
> poorly chosen bit layout in segment selectors made it needlessly hard
> to use multiple segments for a single array or data structure. Also,
> loading segment registers on a 286 was very slow, so there was a
> performance advantage if you designed your program to use as few

> segments as possible. The 386 segments were just as slow, ...

Unfortunately true.

> ...and Intel's

> decision to do paging in a single linear address space rather than per
> segment made the segments all but useless since unlike the 286, using
> segments didn't let you address more memory than the "tiny model" that
> uses one code segment and one data segment both mapped to the same

> memory. ...

IMHO, the use of a single linear
address space wasn't so much the
problem as the fact that it was
2**32 bytes long, the same size
as the physical address space,
rather than at least 2*35 bytes
long, large enough to hold every
possible based segment.
Unfortunately, I don't think
there were enough unused bits in
the segment descriptors to pull
this off while maintaining
backward compatibility.

> ... IBM 390 mainframes had an addressing mode similar to what the

Joe Pfeiffer

unread,

Nov 25, 2007, 1:37:54 AM11/25/07

to

Thierr...@googlemail.com writes:
>
> OK i think i understand now. Many thanks to you all. Considering
> windows and linux memory mapping in IA32, most of the compilers opt
> for the flat memory model. As such, the offset "X" in an instruction
> with operand DS:X is actually relative to the beginning of the entire
> flat logical address space of the application and not the Data
> segment. This does not match the segmentation section in the
> silberschatz OS textbook! actually he talks more about the theoritical
> segmentation concept

A great textbook, and the one I picked for teaching senior-level OS
from this semester. But I made a point of skipping the segmentation
section (because it's just not used at this point).

> Please Is there any reference or textbook that i can consult
> addressing compilers code generation practical implementation
> techniques , e.g flat model, segmented model , etc

Actually, Silberschatz does it as well as anybody I know. The thing
is, an architecture supporting a model doesn't mean the OS has to use
it.

John Ahlstrom

unread,

Nov 25, 2007, 7:00:21 AM11/25/07

to

John L wrote:
>> flat logical address space of the application and not the Data
>> segment. This does not match the segmentation section in the
>> silberschatz OS textbook! actually he talks more about the theoritical
>> segmentation concept
>
> That's not surprising--segmented architectures have long been more
> popular in academia than in real life. The idea of tidily
> constraining each address to its proper segment seems appealing, and
> works OK for the toy programs you write in programming classes, but
> doesn't work very well in interestingly large practical programs.
>

-- snip snip

> R's,
> John
>
Any comments from the B5500/6700/A-Series builders or customers?

--
Blaming one party is easy. Finding a solution is hard.
Patrick Scheible

John Ahlstrom

unread,

Nov 25, 2007, 7:01:14 AM11/25/07

to

How's that working for you? Virus-wise?

Rainer Weikusat

unread,

Nov 25, 2007, 7:17:35 AM11/25/07

to

l_c...@juno.com writes:

[...]

> IIRC Multics was pretty much dead
> before the 8008 even hit the fan.

This would have been in 1972, which makes this statement wrong to the
degree of being completely ridicolous, as easily verified on the web.

John L

unread,

Nov 25, 2007, 10:29:49 AM11/25/07

to

>Any comments from the B5500/6700/A-Series builders or customers?

I must admit I'd forgotten about the Burroughs machines. My
impression is that they're the healthiest segmented machines around
today, but they also suffer from performance and address space issues.

John L

unread,

Nov 25, 2007, 10:46:27 AM11/25/07

to

>> Exactly. We just don't worry about CS, DS, etc and deal with
>> protection strictly at the paging level.
>
>How's that working for you? Virus-wise?

Fine thanks. On my FreeBSD box, the protection between processes and a
design that doesn't run everything as the superuser is much more important
than putting code and data in separate address spaces.

For the latter, recent x86 models have per-page no-execute protection,
but I hear it doesn't help much.

Anne & Lynn Wheeler

unread,

Nov 25, 2007, 11:16:52 AM11/25/07

to

jo...@iecc.com (John L) writes:
> I must admit I'd forgotten about the Burroughs machines. My
> impression is that they're the healthiest segmented machines around
> today, but they also suffer from performance and address space issues.

there is MVS and various descendants ... where the same ("segmented")
image of the kernel appears in every virtual address space ... along
with the "common segment" ... which was an early MVS gimick allowing
pointer-passing paradigm to continue to work between different
applications and various subsystems functions when the were moved into
different virtual address spaces (i.e. application could squirrel
something away in the "common segment" and make a subsystem call,
passing a pointer to the "common segment" data). of course,
"dual-address" space ... and follow-on "access registers" ... were
attempt to obsolete the need for the common segment ... aka allowing
called routines (in different virtual address spaces) to "reach" back
into the virtual address space of the calling routine.

misc. recent posts mentioning common segment
http://www.garlic.com/~lynn/2007g.html#59 IBM to the PCM market(the sky is falling!!!the sky is falling!!)
http://www.garlic.com/~lynn/2007k.html#27 user level TCP implementation
http://www.garlic.com/~lynn/2007o.html#10 IBM 8000 series
http://www.garlic.com/~lynn/2007q.html#26 Does software life begin at 40? IBM updates IMS database
http://www.garlic.com/~lynn/2007q.html#68 Direction of Stack Growth
http://www.garlic.com/~lynn/2007r.html#56 CSA 'above the bar'
http://www.garlic.com/~lynn/2007r.html#69 CSA 'above the bar'

the ingrained (MVS) "common segment" even resulted in custom hardware
support in later machine generations. in lots of implementations,
table-look-aside (TLB) hardware implementation is virtual address space
"associative" (each TLB entry is associated with a specific virtual
address space). Segment sharing can result in the same virtual address
(information) in the same (shared) segment appearing multiple times in
the TLB (associated with use by specific virtual address spaces). The
"common segment" use was so prevalent in MVS ... that it justified
special TLB handling ... where the dominant TLB association was virtual
address space ... but there was a special case for common segment
entries ... to eliminate all the (unncessary) duplicate entries.

Maxim S. Shatskih

unread,

Nov 25, 2007, 12:01:07 PM11/25/07

to

> Fine thanks. On my FreeBSD box, the protection between processes and a
> design that doesn't run everything as the superuser is much more important
> than putting code and data in separate address spaces.

If everything is in the same address space - then any non-superuser can read
anything, and the very notion of super/non-super users is null and void, like
it is in Win9x/Me.

> For the latter, recent x86 models have per-page no-execute protection,
> but I hear it doesn't help much.

NX bit is in PAE (64bit PTEs) mode only. Helps a lot.

Paul Kimpel

unread,

Nov 25, 2007, 12:37:18 PM11/25/07

to

John's last statement is so bald and completely unsupported that I have
to take issue with it. Every architecture has performance and address
space issues, depending on the problem space in which you want to
consider it.

In the case of high-performance, numeric-intensive computing, the
Burroughs B5000/6000/7000/A Series machines (now known as Unisys
ClearPath MCP systems) admittedly do not stack up all that well. It's my
impression that for many this problem space is the only one in which to
measure "performance", but this facet of computing is only one of many.

In general server-oriented roles, especially OLTP and transactional data
base applications, the Unisys MCP architecture has superior qualities
and stacks up much better against other systems. The variable-length
segmentation is not entirely responsible for that, of course, but it
certainly is a contributing factor.

John's comment about address space is one that I really cannot
understand, though. All of the MCP machines going back to the B5500 have
had huge virtual address spaces, just not ones that are allocated all in
one piece. I'm not sure how you could compute the total virtual address
space limit on the current systems, but it's easily many trillions of
bytes PER TASK. You would hit some practical limitations, such as the
maximum size of the system's segment descriptor tables, long before
running out of virtual address space provided by the architecture. In
almost 40 years of working with these systems, I've never seen an
application on them that came even close to pressuring the virtual
address space, let alone exceeding it. There were serious PHYSICAL
address space problems with the B6700/7700 systems in the late 1970s and
early '80s, but these were resolved by the mid-80s in the A Series line,
and done so largely without impact on existing applications.

Maxim S. Shatskih

unread,

Nov 25, 2007, 12:40:06 PM11/25/07

to

> segments as possible. The 386 segments were just as slow

So are segment register loads on any x86 CPU.

x86-style segmentation requires the _developer_ (not even the compiler) to be
aware of it.

First of all, SS != DS issue, which is especially "fine" in DLLs with their own
data segment. MakeProcInstance for callbacks is second issue.

For tiny segments (286, including 16bit Windows), huge pointers are major
issue.

This, combined with high cost of segment register load and portability, caused
the most OS designers to abandon segment support from the OSes in the timeframe
of late 1980ies (design) and early-to-mid 90ies (market).

Rainer Weikusat

unread,

Nov 25, 2007, 2:09:44 PM11/25/07

to

"Maxim S. Shatskih" <ma...@storagecraft.com> writes:
>> Fine thanks. On my FreeBSD box, the protection between processes and a
>> design that doesn't run everything as the superuser is much more important
>> than putting code and data in separate address spaces.
>
> If everything is in the same address space - then any non-superuser can read
> anything, and the very notion of super/non-super users is null and void, like
> it is in Win9x/Me.

First, the 'everything' that is supposed to be in the same address
space is supposed to refer to the address space of a single process,
which is flat, meaning a pointer to anything in this address space is
just a 0-based offset. Second, MMUs usually support per-page access
permission with different privilege levels. Otherwise, they would be
quite useless for implementing memory protection.

Louis Krupp

unread,

Nov 25, 2007, 2:19:07 PM11/25/07

to

Paul Kimpel wrote:
> On 11/25/2007 7:29 AM, John L wrote:
>>> Any comments from the B5500/6700/A-Series builders or customers?
>>
>> I must admit I'd forgotten about the Burroughs machines. My
>> impression is that they're the healthiest segmented machines around
>> today, but they also suffer from performance and address space issues.
>>
>>
> John's last statement is so bald and completely unsupported that I have
> to take issue with it. Every architecture has performance and address
> space issues, depending on the problem space in which you want to
> consider it.

John said it was his "impression," which is something short of a claim
to knowing the one and only truth. I'm sure he's willing to stand
corrected, just as the rest of us are open to education.

>
> In the case of high-performance, numeric-intensive computing, the
> Burroughs B5000/6000/7000/A Series machines (now known as Unisys
> ClearPath MCP systems) admittedly do not stack up all that well. It's my
> impression that for many this problem space is the only one in which to
> measure "performance", but this facet of computing is only one of many.
>
> In general server-oriented roles, especially OLTP and transactional data
> base applications, the Unisys MCP architecture has superior qualities
> and stacks up much better against other systems. The variable-length
> segmentation is not entirely responsible for that, of course, but it
> certainly is a contributing factor.
>
> John's comment about address space is one that I really cannot
> understand, though. All of the MCP machines going back to the B5500 have
> had huge virtual address spaces, just not ones that are allocated all in
> one piece. I'm not sure how you could compute the total virtual address
> space limit on the current systems, but it's easily many trillions of
> bytes PER TASK. You would hit some practical limitations, such as the
> maximum size of the system's segment descriptor tables, long before
> running out of virtual address space provided by the architecture. In
> almost 40 years of working with these systems, I've never seen an
> application on them that came even close to pressuring the virtual
> address space, let alone exceeding it. There were serious PHYSICAL
> address space problems with the B6700/7700 systems in the late 1970s and
> early '80s, but these were resolved by the mid-80s in the A Series line,
> and done so largely without impact on existing applications.

Can you give an overview of how A-Series segmentation works? "RTFM" is
OK in my case, since I have the manual...

Louis

(I've trimmed follow-ups, since as far as I know, Linux hasn't been
ported to the A-Series. Of course, I'm willing to stand corrected on that.)

Maxim S. Shatskih

unread,

Nov 25, 2007, 4:10:10 PM11/25/07

to

> just a 0-based offset. Second, MMUs usually support per-page access
> permission with different privilege levels.

No MMU supports ACLs based on users/groups for memory accesses. The only
support is kernel/user, read-only/writeable and sometimes - not on all CPUs -
execute/no-execute.

Joe Pfeiffer

unread,

Nov 25, 2007, 5:23:00 PM11/25/07

to

John Ahlstrom <Ahlst...@comcast.net> writes:

It was sort of an editorial "we". But if Intel's paging
implementation had included an execute permission (before the recently
added NX bit), buffer overflow exploits wouldn't be a serious problem.

Tim Roberts

unread,

Nov 25, 2007, 10:25:03 PM11/25/07

to

jo...@iecc.com (John L) wrote:

Control Data's last great mainframe series, the Cyber 180, was strongly
influenced by Multics. It used a segmented architecture, with 16 rings, of
which only 9 were actually used. It went into production in the mid-1980s,
although I doubt any of them are still in use today.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Joe Pfeiffer

unread,

Nov 25, 2007, 10:39:09 PM11/25/07

to

"Maxim S. Shatskih" <ma...@storagecraft.com> writes:

>> just a 0-based offset. Second, MMUs usually support per-page access
>> permission with different privilege levels.
>
> No MMU supports ACLs based on users/groups for memory accesses. The only
> support is kernel/user, read-only/writeable and sometimes - not on all CPUs -
> execute/no-execute.

Why would it need to support ACLs? Since the page table is
per-process, and all the MMUs I'm familiar with support kernel/user,
just set the user-mode permissions based on the process owner's (or
whatever) access. No reason to make the MMU complicated to support
something that's easily done in software.

I do recognize not having execute/no-execute permission is a problem,
but that's independent of ACLs.

Josef Moellers

unread,

Nov 26, 2007, 5:00:03 AM11/26/07

to

Thierr...@googlemail.com wrote:

> :) thanks John. but after reading your post i got confused further.
> Your explanation means that DS:100 and CS:100 will point to the same
> byte/word in the physical memory (as they transit through the same
> page table entry and have both the same relative offset) although that
> physical location should hold two separate items...please forgive my
> ignorance...it seems i am missing an elementary concept

No, the compiler will not generate code like that. The compiler is aware
of the fact that it's generating code for a machine with "non-separate I
and D space", i.e. there is *one* linear address space per process, and
code and data share this address space. If you intend to write assembler
code, you'll have to be aware of that fact, too.
So, if you have 8k program code (i.e. 2 pages of 4k each) and 8k data
(i.e. 2 pages), then they will be mapped to different areas in this one
address space, e.g. the code might be mapped from 0..8k and the data
might be mapped to 128k..136k.
The other way round, if you have a NOP instruction at address CS:100,
then reading a byte from DS:100 will return 0x90, the opcode for a NOP
instruction.

HTH,

Josef
--
These are my personal views and not those of Fujitsu Siemens Computers!
Josef Möllers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize (T. Pratchett)
Company Details: http://www.fujitsu-siemens.com/imprint.html

Rainer Weikusat

unread,

Nov 26, 2007, 5:38:22 AM11/26/07

to

"Maxim S. Shatskih" <ma...@storagecraft.com> writes:

>> just a 0-based offset. Second, MMUs usually support per-page access
>> permission with different privilege levels.
>
> No MMU supports ACLs based on users/groups for memory accesses.

What a MMU support need not necesarily be describable in terms of NTFS
file system access permissions, despite that you may know these
better.

> The only
> support is kernel/user,

aka 'different privilege levels'

> read-only/writeable and sometimes - not on all CPUs -
> execute/no-execute.

aka 'access permisssions

Actual MMUs are not necessarily that simplistic. For instance, an
ARMv5 MMU supports per-page access permission which can be 'no
access', 'ro-access in priviledged mode, no access in user mode',
'ro-access in both priviledged mode and user mode', 'r/w access in
privileged mode, no access in user mode', 'r/w access in privileged
mode, ro-access in user mode', 'r/w access in privileged mode and in
user mode'. Additionally, each descriptor belongs to one of sixteen
'domains' and the exists a 'domain controll register', which can,
individually for each process and every domain, grant either 'client'
or 'manager' access to memory belonging to that particular domain,
where 'client' access (in all modes) are checked against the access
permission bits, while manager accesses are not.

BTW, the assumption that what one does not understand will certainly
be wrong is a common, but usually wrong one.

Michel Hack

unread,

Nov 26, 2007, 10:09:54 AM11/26/07

to

On Nov 24, 11:45 pm, jo...@iecc.com (John L) wrote:

> IBM 390 mainframes had an addressing mode similar to what the
> 386 would have been if there were a page table per segment, but the
> newer Z series has flat 64 bit addressing.

Minor correction: The only difference between Z and S/390 (in this
regard) is the address space size. Z still supports Access-Register
mode (which is the one comparable to a machine with segmentation).
Note that what IBM calls "segments" are really just the second level
in the hierarchical page table, and Z adds three new "region" levels
to cover the larger address range.

Perhaps you meant to say that Z operating systems use a single flat
view where their S/390 predecessors exploited AR mode to get more
effective virtual address range (multiple 2G spaces).

Michel.

Anne & Lynn Wheeler

unread,

Nov 26, 2007, 5:20:17 PM11/26/07

to

Anne & Lynn Wheeler <ly...@garlic.com> writes:
> and the newer "no-execute" ... countermeasure for (buffer overflow)
> attacks that polute data areas with executable instructions and then
> attempt to get execution transferred there.

re:
http://www.garlic.com/~lynn/2007t.html#9 How the pages tables of each segment is located

hot off the press:

Buffer Overflows Are Top Threat, Report Says
http://www.darkreading.com/document.asp?doc_id=139871

from above:

Research data says buffer overflow bugs outnumber Web app
vulnerabilities, and some severe Microsoft bugs are on the decline

... snip ...

as before ... lots of past posts mentioning buffer overlow
threat/problems
http://www.garlic.com/~lynn/subintegrity.html#overflow

John L

unread,

Nov 26, 2007, 8:24:31 PM11/26/07

to

>Perhaps you meant to say that Z operating systems use a single flat
>view where their S/390 predecessors exploited AR mode to get more
>effective virtual address range (multiple 2G spaces).

Right. Now that there's a 64 bit address space, I wouldn't expect anyone
to use AR mode other than for backward compatbility.

robert...@yahoo.com

unread,

Nov 26, 2007, 9:24:18 PM11/26/07

to

Ah, no. AR mode is used extensively for communication between address
spaces. For example, let's say you issue an SQL request to DB2. The
DB2 subsystem can write the results of the Select directly into your
address space. That is its primary function.

A secondary use for AR mode is for dataspaces (and related entities,
like Hiperspaces), which *are* used for additional data storage that
won't fit in a single 2GB address space. It is reasonable to expect
that the use of dataspaces will decrease as more and mode code becomes
64-bit aware, and that AR mode will end up mainly doing IPC.

John L

unread,

Nov 27, 2007, 1:18:56 AM11/27/07

to

>> In general server-oriented roles, especially OLTP and transactional data
>> base applications, the Unisys MCP architecture has superior qualities
>> and stacks up much better against other systems. The variable-length
>> segmentation is not entirely responsible for that, of course, but it
>> certainly is a contributing factor.

I'd think that in OLTP, fast context switching would be important,
which you get from the stack architecture. How does the segmentation
help? Burroughs style segments are certainly helpful for reliability,
since they make it nearly impossible to clobber program code, but the
extra memory traffic to load all those segment descriptors has to be
paid for somehow.

>> John's comment about address space is one that I really cannot
>> understand, though. All of the MCP machines going back to the B5500
>> have had huge virtual address spaces, just not ones that are
>> allocated all in one piece.

The limit I'm wondering about is per-segment, not overall. On the 286,
there were plenty of segments (8K per process plus 8K global) but the
per-segment size was the problem.

Most of what I know about the Burroughs and descendants' architecture
is from Blaauw and Brooks. Their description is somewhat confusing
(not really their fault, since the hardware architecture is
phenomenally complicated), but as far as I can tell, each segment is
limited to 32K words. I realize that multidimensional arrays are
arrays of pointers so each row of an array is a separate segment, but
do you never have structures or text blobs that don't fit in 15 bits
of intra segment address.

>(I've trimmed follow-ups, since as far as I know, Linux hasn't been
>ported to the A-Series.

Too bad.

R's,
John

Stephen Sprunk

unread,

Nov 27, 2007, 10:59:37 AM11/27/07

to

<Thierr...@googlemail.com> wrote in message
news:24e4d799-f790-4587...@s6g2000prc.googlegroups.com...
> I am just confusing about the following and i hope someone can clear
> it out for me :)
> I have read about paging and i understand that in pentium the CR3
> register holds the starting address of the page table in the memory.
> This register is generally called the Page Table Base Register (PTBR)
>
> Pure segmentation in based maping needs also a table however it is
> often fully cached in the processor as its size tend to be small.
>
> Now it seems the modern processor instruction set , implicitely or
> explicitely deals with different segments in the application, i.e the
> code is segmented. we don't have only one linera adress space but
> many. However, windows NT and linux both ignore segmentation
> memory mapping since according to the documentation all the
> application segments descriptors are equal, they are all mapped
> into the same linear space with base =0 and limit 4GB ( 32 bits
> processors).
>
> Having said that how the page table of each segment can be retrieved

> as there is only one PTBR in the system? there should not be only one
> PTBR!

32-bit x86 processors have a single "linear" address space that segments are
mapped into. Because the linear address space is 32-bit, instead of
something more sensible, common practice is to set all of the segments to
base=0, limit=4GB. That means the offset maps to the linear address space
and segmentation is effectively disabled, giving you a "flat" memory model.
This is _not_ what your textbooks call a "segmented" system, despite the
misleading presence of segments.

64-bit x86 processors in Long Mode basically mandate the above behavior
since the base and limit of CS, DS, SS, and ES are are all ignored. Of
course, AMD decided to make that mandatory because every 32-bit OS already
worked that way, so there was no point in wasting silicon to support
anything else.

(FS and GS still work as expected in both modes but are usually dedicated by
the OS to specific purposes, like per-CPU and per-thread structures, so
they're generally ignored by user code and compilers.)

Paging is what translates linear addresses to physical addresses, and each
process will have its own set of page tables. When the kernel switches
processes, it simply resets CR3 to the PTB for the new process, restores the
registers, and jumps back into the appropriate place in the process's code.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Dan Mills

unread,

Nov 27, 2007, 5:44:25 PM11/27/07

to

On Tue, 27 Nov 2007 09:59:37 -0600, Stephen Sprunk wrote:
> When the kernel switches
> processes, it simply resets CR3 to the PTB for the new process, restores the
> registers, and jumps back into the appropriate place in the process's code.

IIRC there should be a TLB flush in there somewhere?

The interesting thing about this is that that jump back is just a standard
return to userspace in the same manner as any kernel API would. This is
because the the stack has actually been replaced as part of changing of the
memory map.

Regards, Dan.

Stephen Sprunk

unread,

Nov 27, 2007, 7:26:50 PM11/27/07

to

"Dan Mills" <dmi...@exponent.myzen.co.uk> wrote in message
news:474c9dc9$0$8413$db0f...@news.zen.co.uk...

> On Tue, 27 Nov 2007 09:59:37 -0600, Stephen Sprunk wrote:
>> When the kernel switches
>> processes, it simply resets CR3 to the PTB for the new process, restores
>> the
>> registers, and jumps back into the appropriate place in the process's
>> code.
>
> IIRC there should be a TLB flush in there somewhere?

IIRC, the processor will automatically (i.e. not explicitly software
directed) do a TLB flush any time CR3 is changed if it caches based on
virtual addresses. I'm not aware of any existing x86 processors that cache
based on physical addresses, but AIUI they're theoretically possible and
wouldn't require a TLB flush.

Josef Moellers

unread,

Nov 28, 2007, 4:20:02 AM11/28/07

to

Dan Mills wrote:
> On Tue, 27 Nov 2007 09:59:37 -0600, Stephen Sprunk wrote:
>
>>When the kernel switches
>>processes, it simply resets CR3 to the PTB for the new process, restores the
>>registers, and jumps back into the appropriate place in the process's code.
>
>
> IIRC there should be a TLB flush in there somewhere?

No. "Updates to the CR3 register cause the entire TLB to be
invalidated except for global pages."

Dan Mills

unread,

Nov 28, 2007, 11:32:42 AM11/28/07

to

On Tue, 27 Nov 2007 18:26:50 -0600, Stephen Sprunk wrote:
> "Dan Mills" <dmi...@exponent.myzen.co.uk> wrote in message

>> IIRC there should be a TLB flush in there somewhere?

>
> IIRC, the processor will automatically (i.e. not explicitly software
> directed) do a TLB flush any time CR3 is changed if it caches based on
> virtual addresses.

I stand corrected.

Regards, Dan.

Edward Reid

unread,

Dec 3, 2007, 12:14:20 AM12/3/07

to

On Sun, 25 Nov 2007 04:00:21 -0800, John Ahlstrom wrote:
> Any comments from the B5500/6700/A-Series builders or customers?

No B6700 or its successors has ever been taken down by a buffer overflow.
Not even on a B6700 heavily loaded with mixed production and development
work in 1972. Never. How much is that worth?

Typically today these are called "MCP systems" by everyone but Unisys
marketing, since the marketdroids have changed the names so often that
nobody else can keep track of them. And all the rest of the multiple OSs
named MCP have faded away, so the term "MCP system" has become fairly
specific.

On Tue, 27 Nov 2007 06:18:56 +0000 (UTC), John L wrote:
> I'd think that in OLTP, fast context switching would be important,
> which you get from the stack architecture. How does the segmentation
> help?

Mostly it does not help with process switching. It helped more when memory
was scarce, because you did not need to make much in particular present in
order to swap out a segment. (And code segments are read-only and thus
could be replaced without swapping out.) But as with any machine, swapping
and overlaying was always a huge kludge; it didn't get around the fact that
disk is slower than RAM.

MCP systems have never been that fast on process initiation, termination,
and switching, even with the stack architecture to help. They are great in
that they carry a lot of information with the process, and that you can
switch from one very complex environment to another with no more oerhead
than a trivial switch, but all in all really lightweight processes are not
part of the design.

> Burroughs style segments are certainly helpful for reliability,
> since they make it nearly impossible to clobber program code, but the
> extra memory traffic to load all those segment descriptors has to be
> paid for somehow.

Nah, they are just about always in the fastest CPU cache. Was probably more
of an issue in the B6700 days. However, in those days the fact that the
code is extremely compact made a lot of difference -- fewer memory fetches
for code, more likely to find the code word in cache. And the compactness
of the code was in part due to the work done in the CPU for the
segmentation and the stack.

>>> John's comment about address space is one that I really cannot
>>> understand, though. All of the MCP machines going back to the B5500
>>> have had huge virtual address spaces, just not ones that are
>>> allocated all in one piece.

This was Paul's comment. There is one caveat. It's true that the virtual
address space of a task has always been essentially unlimited. However, at
one time the physical address space was limited to 1 megaword (6MB). You
can't use a huge amount of virtual memory if your physical is severely
limited. This was becoming a major problem by the late 1970s. There were
several generations of systems with kludges which allowed more physical
memory on a system, but it was difficult for a single task to use more than
1MW. Finally -- I guess it was in the mid 1980s -- they completed the
architecture changes which allowed even a single task to use amounts of
physical memory which are still not common. (I think it runs into some
limitations at half a terabyte, but I'm not sure, and even that could be
very easily expanded within the current struture.) And again, that is a
limitation on physical memory, not virtual.

But this may have been where the idea about a limited address space came
from.

> The limit I'm wondering about is per-segment, not overall. On the 286,
> there were plenty of segments (8K per process plus 8K global) but the
> per-segment size was the problem.

On most current MCP systems, I can declare a single row of an array to be
up to 2^28 words. (The MCP will actually segment the physical memory
allocated to such a large array, but this is totally invisible to the
program. At one time this invisible segmentation carried a performance
penalty, but larger control stores have pretty much eliminated the
penalty.) But there's hardly ever a need to use a single array row that way
-- software which does so is usually doing structuring within a large flat
area, and on MCP system you would use at least some of the MCP's memory
structuring instead of doing it yourself.

There's probably still a limitation on the size of code segments. But the
compilers take care of that, and it's totally transparent. It's been years
since I was even aware of the size of code segments except in bizarre cases
where it would be a clue to some strange bug.

If you want a lot of big segments, just declare

ARRAY A [0:2**20-1, 0:2**20-1, 0:2**20-1];

(Note that exponentiation is Algol is **.) That gives you a billion array
rows of a million words each. A googol of them would only take a couple
more lines. You would not be able to use them all due to physical memory
limitations and the number of lifetimes it would take to go through them,
but you could compile the program and access some random words from
anywhere just to prove the point.

Ironically, the most noticeable limit today is one you aren't likely to
hear about. IOs are still limited to 2^16 words (6 * 2^16 bytes).

> Most of what I know about the Burroughs and descendants' architecture
> is from Blaauw and Brooks.

I'm not familiar with their book. I recommend Elliot Organick's "Computer
System Organization: The B-5700-B-6700 Series". Used copies are not
numerous but are not hard to find either. It won't give all the details,
but it's well written. Three decades out of date, but the basics haven't
changed.

> Their description is somewhat confusing
> (not really their fault, since the hardware architecture is
> phenomenally complicated), but as far as I can tell, each segment is
> limited to 32K words.

Totally incorrect. I can't even guess where this misconception came from.
As mentioned above, the modern limit is about 2^28 words. In programing
terms it was never less than 2^20 words. There were various limits that
were smaller, but they did not affect programming.

Remember that most MCP segmentation was always transparent to the
programmer.

Code segments? Maybe code segments are limited to 2^15 words, though
actually 2^13 comes to mind. But it's totally transparent. When the
compiler fills up a code segment, it generates a branch to the next code
segment. End of problem.

> I realize that multidimensional arrays are
> arrays of pointers so each row of an array is a separate segment, but
> do you never have structures or text blobs that don't fit in 15 bits
> of intra segment address.

Huh? I do not even have a context to hang this in. Perhaps you are thinking
that structures have to be implemented within a segment? But no, if you
have a structure within a structure, the embedded structure is simply
represented by a descriptor, which has all the capabilities of a top level
descriptor. That's why the virtual address space is essentially unlimited.
This method does cause probems in copying structures, which is not a strong
point under the MCP.

If that's not it, please explain a bit more what type of structure you are
thinking of.

>>(I've trimmed follow-ups, since as far as I know, Linux hasn't been
>>ported to the A-Series.
>
> Too bad.

However, C was ported long ago, and POSIX compliant programs run pretty
well. I realize that's not the same thing by a long shot, but a lot of
useful programs have become available under the MCP as a result.

Edward
--
Art Works by Melynda Reid: http://paleost.org

Mike Hore

unread,

Dec 3, 2007, 2:14:45 AM12/3/07

to

Edward Reid wrote:

>...

> I'm not familiar with their book. I recommend Elliot Organick's "Computer
> System Organization: The B-5700-B-6700 Series". Used copies are not
> numerous but are not hard to find either. It won't give all the details,
> but it's well written. Three decades out of date, but the basics haven't
> changed.

http://bitsavers.org/pdf/burroughs/B5000_5500_5700/Organick_B5700_B6700_1973.pdf

Enjoy!!

Cheers, Mike.

---------------------------------------------------------------
Mike Hore mike_h...@OVE.invalid.aapt.net.au
---------------------------------------------------------------

--
Posted via a free Usenet account from http://www.teranews.com

Jim Haynes

unread,

Dec 3, 2007, 3:24:51 PM12/3/07

to

In article <q5cex7tglrbr.1wcaoin4j4mpa$.d...@40tude.net>,

Edward Reid <edw...@paleoNOTTHIS.org.NOTTHIS> wrote:
>On Sun, 25 Nov 2007 04:00:21 -0800, John Ahlstrom wrote:
>> Their description is somewhat confusing
>> (not really their fault, since the hardware architecture is
>> phenomenally complicated), but as far as I can tell, each segment is
>> limited to 32K words.
>
>Totally incorrect. I can't even guess where this misconception came from.

I would guess that it came from the B-5500 where there were only 15
bits for address. I was told once that this number came more or less
from the IBM 7090 family, which also had only 15 bits of address and
which were competitors to the B5500.

>>>(I've trimmed follow-ups, since as far as I know, Linux hasn't been
>>>ported to the A-Series.
>>
>> Too bad.
>
>However, C was ported long ago, and POSIX compliant programs run pretty
>well. I realize that's not the same thing by a long shot, but a lot of

I haven't looked at any of this, but will comment that it's hard to do
something like Linux in a Burroughs-style machine because C in a more
conventional machine lets you get away with things that the Burroughs
architecture would prohibit. And there are differences in coding
style. In the B5500 MCP there were lots of arrays, taking advantage
of the array descriptor facilities, but no structures. In Unix/Linux
there tend to be arrays or lists of structures. Things that are grouped
together in a structure in Unix tend to be scattered into a whole bunch
of arrays in MCP. In B5500 MCP a central datum was the mix index, a
small number representing a job currently in the mix and used as an index
into all the arrays of information about that job. In Unix the process ID
number serves somewhat the same purpose, but PIDs run into thousands, with
most of the available numbers being unused at any one time.

The B5500 was an outstanding batch job machine and a lousy time sharing
machine. The reason it was lousy was that absolute addresses get into the
stack, so when a job is swapped completely out it has to be swapped back
in to the same memory addresses it previously occupied. This was corrected
by an architectural change in the B6500. Also the disk I/O was slow
enough that swapping a large job in and out was seriously time consuming.
While there was a time sharing version of B5500 MCP, many sites instead
ran the batch MCP and an application called R/C or Remote/Card with was
similar to Wylbur as used on OS/360. R/C allows a terminal user to
call up and edit a program file, submit it as a batch job for execution,
and then examine the output when the job completes.

Paul Kimpel

unread,

Dec 9, 2007, 3:16:44 PM12/9/07

to

On 11/25/2007 11:19 AM, Louis Krupp wrote:
> Paul Kimpel wrote:
>> On 11/25/2007 7:29 AM, John L wrote:
>>>> Any comments from the B5500/6700/A-Series builders or customers?
>>>
>>> I must admit I'd forgotten about the Burroughs machines. My
>>> impression is that they're the healthiest segmented machines around
>>> today, but they also suffer from performance and address space issues.
>>>
>>>
>> John's last statement is so bald and completely unsupported that I
>> have to take issue with it. Every architecture has performance and
>> address space issues, depending on the problem space in which you want
>> to consider it.
>
> John said it was his "impression," which is something short of a claim
> to knowing the one and only truth. I'm sure he's willing to stand
> corrected, just as the rest of us are open to education.
>

<snip>

>>
>> John's comment about address space is one that I really cannot
>> understand, though. All of the MCP machines going back to the B5500
>> have had huge virtual address spaces, just not ones that are allocated
>> all in one piece. I'm not sure how you could compute the total virtual
>> address space limit on the current systems, but it's easily many
>> trillions of bytes PER TASK. You would hit some practical limitations,
>> such as the maximum size of the system's segment descriptor tables,
>> long before running out of virtual address space provided by the
>> architecture. In almost 40 years of working with these systems, I've
>> never seen an application on them that came even close to pressuring
>> the virtual address space, let alone exceeding it. There were serious
>> PHYSICAL address space problems with the B6700/7700 systems in the
>> late 1970s and early '80s, but these were resolved by the mid-80s in
>> the A Series line, and done so largely without impact on existing
>> applications.
>
> Can you give an overview of how A-Series segmentation works? "RTFM" is
> OK in my case, since I have the manual...
>
> Louis
>
> (I've trimmed follow-ups, since as far as I know, Linux hasn't been
> ported to the A-Series. Of course, I'm willing to stand corrected on
> that.)

First, Louis' point concerning my comment on John L's "impression" is
well taken. As I read John's sentence concerning the Burroughs machines,
the first clause was an impression, but the second was a conclusion. If
that was not what John intended, then I apologize.

Louis asked if I could give an overview of how the Burroughs
segmentation works, and John Ahlstrom has privately encouraged me to do
so. I think I know this reasonably well from a software perspective, but
as John L points out, the architecture is phenomenally complicated. The
elements of the architecture are also extremely synergistic, and it's
impossible to talk about one aspect (such as segmentation) in the
absence of others. As a result, this is a very long post -- about 9900
words.

In order to understand the context in which memory addressing and
segmentation operate, I also have to talk to some degree about stacks,
compilers, codefiles, and the very tight integration between the
hardware architecture and the MCP (operating system) architecture. This
isn't easy, so please bear with me while I lay down some groundwork and
then to try to describe (without pictures) how segmentation and memory
addressing for this very complex, but fascinating architecture work.

Background.

I will start with a short history lesson. The origin of the Burroughs
stack/descriptor architecture was the B5000 of 1962. Bob Barton is
generally credited with the conceptual design of the architecture. This
machine was a major departure from anything Burroughs had attempted
before. It was clearly influenced by the Rice University (where Barton
had spent some time) computer, particularly by the Rice system's use of
"codewords", from which the concept of descriptors arose. I believe the
Ferranti Atlas was also an influence on the B5000 design. A somewhat
updated version of the architecture was released as the B5500 in 1964
and stayed in production until 1970. At the very end there was briefly a
version called the B5700, but it was really a B5500. (The B5900
mentioned below is a contemporary of the other Bx900 models and not a
variant of the B5500.)

Burroughs began work on a successor machine, the B6500, in 1965, but it
was not released until 1969. It had a very difficult introduction to
customer use, and did not work properly until it was updated and
re-released as the B6700 in 1971. All B6500 CPUs in the field were then
replaced with B6700s.

The B6500/B6700 was a substantially different design from the B5500 --
the two instruction sets were quite different, as were the format of
descriptors and other control words. Limits on the size of both physical
and virtual address spaces were increased dramatically. A dramatically
different way of handling character-oriented data was also introduced.
About the only things that got carried over without change were the word
size (48 bits), the integer and floating point word formats, and the
six-bit character code, known as BCL. The MCP operating system was also
completely redesigned. Nonetheless, the two designs are conceptually
similar, and both the B6500/6700 hardware and system software should be
seen as refinements of those for the B5500.

The Burroughs B6700 is the basis for the Unisys ClearPath MCP systems
that are still being sold today. There have been some tweaks to the
instruction set, and a major change was made in the mid-1980s to the way
memory is addressed to accommodate still larger physical address spaces,
but the processor architecture (at least from the perspective of
user-level software) has remained essentially the same since 1970. The
architecture is now referred to internally as "E-mode" (I think the "E"
stands for "emulation"), and its formal specification has gone through a
number of levels, known as Beta, Gamma, Delta, and (the latest) Epsilon.

The machines have had a variety of marketing names over the years. Since
the various names can be confusing to those not familiar with the
product line, here is a summary:

Burroughs B5000/5500/5700 -- the original early 1960s architecture.

Burroughs B6500/6700/7700/6800/7800/B5900/6900/7900

Burroughs/Unisys A Series (MicroA, A1-A7, A9-A13, A14-A19)

Unisys ClearPath 4600/4800/5600/5800/6800

Unisys ClearPath NX4200 (the first of the models where the hardware is a
standard Intel box and the MCP architecture is emulated on the Intel
processor. This approach is currently used on all small-to-medium
performance models)

Unisys ClearPath LX5000/6000/7000 (later emulated systems)

Unisys ClearPath Libra (these are also referred to as the "CS" series;
the Libra 300, 400, and 520 models are emulated systems)

Having so many models, we need a generic name, so I'll follow the
current Unisys convention and refer to them as MCP systems
(acknowledging that other, now obsolete, Burroughs/Unisys architectures
also used operating systems named "MCP").

With that background, I will try to describe how memory addressing and
segmentation works on the B6700, since discussing that particular model
gives a good conceptual base. The mechanism was similar, but more
primitive, on the B5500. This mechanism changed with the A Series to
expand the physical address space and implement improvements in some
other areas. I'll discuss those differences later. Even though the B6700
has been obsolete for 30 years, I'll talk about it mostly in the present
tense.

Words and Tags.

The best place to start is with word formats. The B6700 (like the B5500)
has a 48-bit data word. One of the significant changes from the B5500,
however, was the addition of some extra bits to the word, called the
"tag". The B6700 and later systems through E-mode Beta had a three-bit
tag. E-mode Gamma and subsequent systems use a four-bit tag.

The tag indicates what type of information the word contains. Generally,
tags 0, 2, 4, and 6 are data types that user-level code can freely
manipulate; the rest are control types that are used by a combination of
the hardware and operating system kernel. The tag values for the B6700 are:

0 = single-precision data word (numeric or character data)

1 = IRW (indirect reference word) or SIRW (stuffed indirect
reference word). These are effectively indirect addresses to
words in a program stack. A brief discussion of what "stuffed"
means is covered in the section on Addressing Data, below.

2 = double-precision data word (typically numeric)

3 = general purpose protected control word: stack linkage, code
segment descriptor. All words in a code segment (one containing
executable instructions) also must have tags of 3.

4 = SIW (step index word): control word for the STBR (step branch)
looping instruction -- a nice idea, but the performance was
poor, and STBR is no longer supported on current systems. This
tag value is now used by the MCP as a special marker word in
stacks (e.g., for fault trap indicators).

5 = data descriptor. These words are the basis for data
segmentation and are discussed in detail below.

6 = uninitialized operand. Words with this tag value can be
overwritten, but reading them generates a fault interrupt. Used
as the initial and NULL value for pointers.

7 = PCW (program control word). The entry point to a procedure
(subroutine). Also used as one form of dynamic branch address.

The additional tag values introduced with E-mode Gamma are primarily
used for optimizations of descriptors after a segment has been allocated
and are not significant for this conceptual discussion.

There are user-mode instructions that can read and set the tag on the
word at top of stack. You could theoretically do quite a bit of mischief
with this on the B6700, but the compilers are a trusted component of the
system, so in practice this was not a problem. In more recent systems,
the microcode carefully restricts which tag values can be set on data
words in the stack. The Set Tag instruction (STAG) running in user-level
code cannot now, for example, set tag 5 on a data word with bit 47=1
(i.e., create a present data descriptor -- see below for what this means).

This brings up the subject of "codefiles" (files containing executable
code) and compilers. A codefile is a controlled entity under the MCP.
Codefiles can be generated only by compilers, which are programs that
must be "blessed" using a privileged MCP command. There is no assembler
for these systems, nor anything that allows a user to generate arbitrary
sequences of object code. Recompiling the source code for, say, the
COBOL-85 compiler does not generate a compiler, just an ordinary program
that inputs source code and outputs a file of instructions and other
run-time data. That file is not a codefile, however. It cannot be
executed and cannot be turned into a codefile. There is no facility for
a program (even one running with the highest privileges) to read that
file and present any of its data to the hardware as a code segment.
Words of instructions in a code segment must have tags of 3. There is no
way for user-level code to set those tags, let alone create a code
segment and branch to it. The I/O hardware has a variant of the disk
read operation that will apply tags of 3 to data as it is being read and
transferred to memory, but only the MCP has access to the physical I/O
hardware, so there is no way for user programs to initiate such an
operation.

[When I say "no way" here, I mean that the hardware/MCP architecture is
specifically designed to prohibit such a thing, and I'm not aware of
even a conceptual attack that could get around the design. There have
been some holes discovered in the past, though, and of course it's
entirely possible that there are more that have not yet been brought to
light. I won't speculate on the likelihood of their existence. To me,
the weakest part of this design is the trust that must be placed in the
compilers. If you can generate arbitrary code, or modify a valid
codefile externally (e.g., on a backup tape) and reload it, you can get
around at least some of the system's protections, but it's still not easy).]

This discussion of codefiles and compilers highlights a basic
characteristic of MCP memory segmentation: code and data are entirely
separate entities and are allocated and managed by the system
separately. There are code segments and data segments, and while they
are allocated from the same system-global heap and may be adjacent in
physical memory, logically they are separate and addressed entirely
differently. Both types of segments can be created only by the MCP. The
contents of code segments are loaded solely from codefiles. Code
segments are read-only, and as we will see, are automatically reentrant.

Data Segment Descriptors.

Descriptors are called such because they "describe" an area of memory.
MCP systems are a form of capability architecture, and the descriptors
are the capability -- you have to have access to the descriptor to
access the data it describes. Descriptors are the basis of memory
addressing and memory segmentation.

A tag-5 word in the B6700 architecture represents a data descriptor. The
word has a number of fields which I will identify using what is called
partial-word notation, [s:n], where "s" is the starting bit number and
"n" is the length of the field in bits. Bit 47 is high order and bit 0
is low order; all MCP systems use big-endian addressing.

[47:1] Presence bit (commonly known as the "P-bit")

[46:1] Copy bit

[45:1] Indexed bit

[44:1] Paged bit

[43:1] Read-only bit

[42:3] Element Size field
0 = single precision words
1 = double precision words
2 = four-bit characters (packed decimal)
3 = six-bit characters (BCL, now obsolete)
4 = eight-bit characters (EBCDIC)

[39:20] Length or index field (determined by [45:1])

[19:20] Segment starting address

Accessing the data in a segment is done by means of "indexing" the
descriptor with a zero-relative offset into the segment. There are a
series of instructions that do this, as will be detailed below.

The Presence bit indicates whether the memory area represented by the
descriptor is physically present in memory. It is the basis for the
virtual memory mechanism. "Virtual memory" is IBM's clever marketing
term that everyone has adopted, but it does not accurately describe what
goes on in MCP systems. "Automated memory segment overlay" is a more
accurate term, and is what Burroughs called it before IBM's term caught
everyone's attention.

When the Presence bit is 1, the segment is physically present in memory
starting at the real address in [19:20]. (Note that in A Series and
later systems, the real memory address is no longer contained in
descriptors, as will be discussed later).

When the Presence bit is zero, the segment is not present in memory (or
has never been allocated) and the descriptor is said to be "absent"
(although it's really the data area that is absent). Attempting to
access a segment through an absent descriptor generates a "Presence bit
interrupt" -- what other systems would call a page fault. When handling
this interrupt, the MCP interprets the value in the address field as
follows:

* If zero, this indicates that the segment has never been allocated.
The MCP simply allocates an area of length specified by [39:20],
relocating or overlaying other physically-present segments as
necessary. The MCP clears the allocated area to binary zeroes
(primarily to wipe out any nasty tags that may be there from prior
use of that space), fixes up the address field in the descriptor and
sets its Presence bit. Exiting from the interrupt handler causes the
hardware to restart the instruction which generated the interrupt.
The compilers generate code in the initialization section of
procedures to build these "untouched" descriptors directly in the
program's stack. The physical memory area is not allocated until
(and unless) the program actually references it.

* Some types of objects require additional information from the
compiler (e.g., for multi-dimensional array-of-arrays structures,
the length field in the descriptor only specifies the length of the
first dimension so certain bit patterns in the address field for
untouched descriptors point to data structures the compiler has
built within the codefile that carry the necessary information for
the other dimensions).

* Otherwise, the segment was once present in memory but has been
rolled out, usually due to pressure from other memory allocation
activity. The value in the address field of the descriptor indicates
the location within the task's "overlay file" (a temporary, unnamed
file the MCP allocates for each task) where the data associated with
this descriptor has been rolled out. The MCP allocates an
appropriate area of memory, reads the segment from the overlay file,
fixes up the descriptor with the new real memory address, and exits
the interrupt handler to restart the instruction that was
interrupted.

The Copy bit indicates that a tag-5 word is not the original descriptor
for an area, but rather a copy of it. A non-copy descriptor is said to
be the "mom" descriptor for an area, and there can only be one such mom.
Copy descriptors are generated automatically by a number of
instructions, principally those involved with indexing and loading words
on the top of stack (e.g., to pass an array as a parameter to a
procedure). Present copies point directly to the segment's memory area;
absent copies point to the mom.

In the B6700, there is no central page table, per se, to keep track of
memory areas. The mom descriptor effectively serves the purpose of a
page table entry. Every area of physical memory (allocated or available)
is surrounded by a set of "memory link" words that are used by the MCP
memory allocation routines to keep track of allocated areas and locate
available areas of appropriate size. For allocated areas, one of these
link words points back to the mom descriptor. The handling of moms and
copies is another thing that has changed with the A Series and later models.

The Indexed bit indicates whether the descriptor points to a whole
segment or to one element within a segment. When the bit is 0, the
length/index field contains the length of the segment. Some indexing
instructions generate an "indexed" copy descriptor with both the Indexed
and Copy bits set to 1 (all indexed descriptors are by definition
copies). Indexed descriptors are typically used as a pointer, e.g., as a
destination address for store operators or as a call-by-reference
parameter to a procedure. They are also used as starting addresses for
string manipulation instructions. Algol has a "pointer" type; it
represents an indexed data descriptor.

In an indexed descriptor, the length field is replaced by a
zero-relative offset into the segment. The physical address of that
element (assuming the segment is present in memory) is the sum of the
base address in [19:20] and the offset in the length field.

Depending on the value of the Element Size field [42:3], a descriptor
could be word oriented or character oriented. Physically, memory is
accessed as whole words. When a character-oriented descriptor is
indexed, its length field is replaced by a specially-encoded offset.
Bits [35:16] are a word offset within the segment; bits [39:4] are a
character offset within that word. This obviously limits the offset for
character pointers to 393215 for eight-bit characters and twice that for
four-bit packed decimal digits (although this limit can be effectively
eased by the use of paged areas, discussed next). The string
instructions understand these character offsets and transparently handle
character operations that start in the middle of words.

The Paged bit indicates whether the memory area for the descriptor is
monolithic (0) or paged (1). If paged (the original term for this is
"segmented", but talking about segmented segments can be confusing), the
address field does not point to the address of the data, but instead to
a vector of other descriptors. Each of these descriptors in turn points
to a data segment. In memory, this looks identical to a two-dimensional
array-of-arrays structure. The difference is in how the software
accesses the data. With a standard two-dimensional array, the software
must explicitly compute and apply an index for each dimension. With a
paged segment, the software is unaware of the second dimension. When the
indexing instructions encounter a descriptor with the Paged bit set,
they partition the index value into a page number and page offset and
automatically re-index the second dimension. If an indexed copy is
generated, it is a copy of the descriptor for the page, not the original
(paged) descriptor, and the index offset will be that within the page.
This helps alleviate the smaller limit for word offset available with
indexed character descriptors.

For the B6700, the page size was 256 words. Data areas were typically
paged when they exceeded 1024 words, but the programmer had some control
over this. The page size on current systems is 8192 words. The paging
threshold is adjustable by the system administrator, and by default is
also 8192 words. As with all memory areas in the system, the individual
pages are not allocated until first referenced. The array of page
descriptors (called a "dope vector") is not even allocated until one of
the pages is initially accessed. All pages are individually relocatable
and overlayable.

String operations that run off the end of a physical memory area
generate a "segmented [paged] array" interrupt. The MCP responds to this
interrupt by locating the next page in sequence (if there is one) and
restarting the operation. If there is no next page (which would also be
the case if the string operation ran off the end of an unpaged segment),
the result is a "Segmented Array Error", with which everyone who has
programmed for MCP systems is no doubt intimately familiar. Unless
trapped, this error (a form of boundary violation) is fatal to the program.

The Read-only bit is just that -- if 1, it marks the segment as
non-writeable. This is primarily used for segments that represent
constant pools generated by the compiler and which are loaded from the
codefile.

The Element Size field, as discussed above, determines whether the
descriptor represents single precision word, double precision word, or
character-oriented data. It is quite common, especially with COBOL
programs, to have multiple descriptors with differing size fields
pointing to the same segment. Only one of these could be the mom, of
course; the rest would have to be copies. This allows the software to
address the same memory area as a mixture of word and character fields.

The B5500 strictly used six-bit characters; the B6700 was basically an
eight-bit EBCDIC machine, but could also handle six- and four-bit
characters. Support for the six-bit codes was dropped from the
architecture in the Bx900 models, ca. 1980. I understand that the latest
Libra models have added support for 16-bit characters.

The role of the Length/index and Address fields has been covered in the
discussion above, so nothing additional needs to be said about them here.

Code Segment Descriptors.

Thus far, the discussion has been about data descriptors. There are also
descriptors for code segments. They are similar to data descriptors, but
simpler. Code segment descriptors have a tag of 3 and the Presence bit,
Copy bit, Length, and Address fields of data descriptors. Bits [45:6]
are not used. Code segments cannot be indexed or paged. They are by
definition read only. The only element size they support is single
precision words.

Code segment descriptors live in a special type of data segment called
the Segment Dictionary. An image of this segment is built by the
compiler (all descriptors being in their absent, untouched form, of
course) and stored in the codefile. The Segment Dictionary is loaded by
the MCP as part of task initiation. In addition to code segment
descriptors, the Segment Dictionary may contain (read-only) data
descriptors for constant pools and scalar constant values. The Segment
Dictionary in memory is actually a type of stack, although not for
push/pop type of activity. As we will see, stacks in MCP systems are a
central element in the addressing environment, and it is for this
purpose that a Segment Dictionary is loaded as a stack. Segment
Dictionaries are also sharable -- if multiple tasks are initiated from
the same codefile, the Segment Dictionary is loaded only once and the
separate tasks are linked to this common copy. Thus, all of the object
code and read-only constant pools for a program are automatically reentrant.

Object code is addressed using a three-part index. The first part is the
"segment number", which is the code segment descriptor's offset within
the Segment Dictionary. The second part is the word offset within the
segment. The third part is the instruction syllable (byte) offset within
the word. These numbers are zero-relative and generally written in hex,
so an address of 03C:0041:3 indicates segment #60, word offset 65, byte
offset 3.

The processor uses variable-length instructions and can branch to any
syllable offset within a segment. By using one of several dynamic branch
instructions with a Program Control Word (PCW, tag 7) as an argument, a
program can branch across segments -- in fact PCWs allow programs to
branch and call procedures across stacks. When a program branches to a
different code segment (either directly or by means of a procedure
call), the segment is made present if it is not already so, and the
segment length and base memory address are loaded into registers within
the processor designated for that purpose. Intra-segment branches use
only the word and syllable offsets, and therefore do not need to
continually reference the code segment descriptor.

The Presence bit in a code segment descriptor indicates physical
presence or absence of the segment in memory, just as for data segments.
For an absent descriptor, the address field indicates the offset within
the codefile where the segment starts. Code segments and constant pools
are not loaded from the codefile until first reference. When a program
is initiated, the Segment Dictionary is the only thing that is loaded
from the codefile. Everything else is loaded as a result of Presence bit
interrupts.

Since codefiles and code segments are guaranteed to be read only, code
segments (and constant pools loaded from the codefile) are never rolled
out to an overlay file. The physical memory area is simply deallocated
and the codefile offset restored in the absent descriptor (that offset
is stored in one of the memory link words while the segment is present
and the descriptor's address field points to the area in memory). The
code or data is simply reloaded from the codefile the next time a task
trips over the descriptor's Presence bit.

Resizing Segments.

Descriptors obviously support bounds checking, along with the dynamic
relocation and overlay of real memory areas, but they have another
significant advantage -- the dynamic resizing of data segments. Since
the length of a segment is part of the descriptor, the basis for bounds
checking is centralized. The physical memory area can be resized by a
user program at run time, the length in the descriptor will be updated,
and future indexing operations will check against the new length.

The B6700 and all later systems support the programmatic resizing of
data segments. Code segments are immutable, so resizing them is not a
meaningful operation. Support for resizing varies by language, but in
Algol, it is performed using the RESIZE intrinsic procedure.

RESIZE(A,N) resizes the array A to a new length of N. The unit of N
is determined by the Element Size field in the descriptor. The old
segment is deallocated and its contents are lost. The segment with
the new length will not be allocated until it is next referenced.

RESIZE(A,N,RETAIN) allocates a new segment of length N and copies up
to N units from the old segment to the new one. If the new length is
shorter, any remaining units from the old segment are lost; if the
new length is longer, the remainder of the segment is filled with
binary zeros. Once the data is copied to the new segment, the old
segment is deallocated.

RESIZE(A,N,PAGED) is similar to the RETAIN option, but creates the
new data segment with a paged data descriptor.

RESIZE works on individual segments. In the case of multi-dimensional
array-of-arrays structures, the row for each final dimension can be
resized independently and to differing lengths.

Actually, RESIZE can be applied to any of the dimensions of a
multi-dimensional array. Given an Algol declaration of

ARRAY M [0:99, 0:49, 0:63, 0:4095];

RESIZE(M[4,7,*,*], 75, RETAIN) would resize the M[4,7] dope vector in
the third dimension from its original length of 64 to a new length of
75. This would create 11 new, untouched descriptors of length 4096 at
the end of that dope vector.

These resizable array-of-arrays structures are often used, especially in
Algol, to implement flexible, safe, dynamic storage allocation schemes
for user programs.

Addressing Data -- Indexing and the Stack.

Note that in the discussion thus far, nothing has been said about
user-level programs having access to memory addresses -- real or
virtual. Data descriptors for the B6700 can hold real memory addresses,
but the content of descriptors is controlled only by hardware
instructions and the MCP, not user-level code. User-level code (and all
but small parts of the MCP kernel, for that matter) does not manipulate
addresses, it manipulates offsets into memory areas. Separate memory
areas generally relate to separate objects for a programming language
(e.g., arrays for Algol, Pascal, and Fortran, or 01 levels for COBOL).

Since there are no addresses, there can be no invalid addresses. You can
try to use an invalid offset, of course, but that offset must be applied
against a descriptor, where it is ALWAYS checked against the length
field. If the offset is less than zero or greater than or equal to the
length, your task becomes the happy recipient of the "Invalid Index"
bounds violation interrupt. Depending on the language you are using, you
can trap this interrupt (actually, what you trap is the MCP's response
to the interrupt), but you can't restart the operation which caused it
-- your only options are to branch to a recovery routine or enter the
catch portion of a try/catch block. If you do not trap the interrupt,
your task is terminated by the MCP.

Another thing that has not been mentioned yet is registers. There are
certainly registers in the processor -- the B6700 had dozens of them --
but the instruction set accesses them implicitly. None of them are
accessed directly by user-level code. (Actually, on the B6700 that
wasn't quite true -- compilers would generate code to access some of the
registers directly, but this was not common. Access to these registers
has been tightened up considerably on more recent systems). Instead of
loading addresses directly into registers and using those to read and
store real or virtual memory locations, user-level programs on MCP
systems access data in two ways: (a) directly in a stack or (b) outside
of a stack by going through a descriptor that is in a stack.

A stack in the MCP architecture is simultaneously three things:

(a) a memory area for push/pop expression evaluation,
(b) the stack frames (activation records) and return history for
procedure calls, and
(c) the basis for addressing global and local variables in a
program.

The B5000 and all of its descendants were designed to support Algol. To
understand how stack addressing works, you need to understand how Algol
programs can be structured. If you don't know Algol, think Pascal --
they're very similar.

An Algol program is constructed as a series of nested blocks. Each block
can contain local variables (including procedure declarations), but also
has addressability to the variables in all of its more global (i.e.,
containing) blocks. The body of a procedure can be considered a block,
whether it has local declarations or not. The scope rules for
identifiers in this scheme are the same as for a two-level language such
as C, it's just that there can be more levels. In MCP systems, this
nesting is referred to as the "lexicographical level" (lex level or LL).
The global code ("outer block") of most user programs runs at LL=2.
First-level procedures (such as you would have in a C program) run at
LL=3. Any procedures declared within those first-level procedures
(something you can't do in C) would run at LL=4, and so forth.

The Segment Dictionary is loaded as a stack with an environment of LL=1.
Therefore, multiple tasks initiated from the same codefile see their
code segments and constant pools as globals at a higher level -- hence
the Segment Dictionary and all of the segments based on it are
reentrant. The MCP stack (another that is purely an addressing
environment, not a data space for a task) is at LL=0. Note that stack
addressing can cross stacks. Also note that on the B6700, all MCP
globals were in the scope of the user programs. This allowed them to
call MCP service procedures directly, as if they were global to the user
program (which, in fact, they were). There is no "service call" or
"branch communicate" or other major environment change to access O/S
services -- user programs simply call what to them are global procedures.

This accessibility to the MCP stack was recognized as a serious security
issue fairly early on, and later systems blocked direct access to LL=0.
User programs now still call MCP service routines as normal procedures,
but access to the procedure entry points and other global objects is
provided through indirect addresses in the Segment Dictionary. These
indirect addresses are initially set up to cause a fault interrupt when
first accessed, at which point the MCP verifies the access and replaces
the intentionally bad word with a valid SIRW (tag=1, Stuffed Indirect
Reference Word) to the MCP global. Subsequent references to the service
routine merely require a one-level indirection to reach the entry point.

The "stuffed" for an SIRW refers its ability to address a word in a
stack outside the current scope chain, and in particular, across stacks.
What gets stuffed in the word is sufficient environment information to
allow out-of-scope, cross-stack addressing. A normal IRW (also tag=1)
only addresses within the current scope chain. There is an instruction
(STFF) that converts a normal IRW to an SIRW. Perhaps inevitably, it is
commonly called the "get stuffed" operator. (Current systems no longer
generate normal IRWs -- all reference words are now generated as SIRWs
unconditionally, so STFF has become a no-op).

The local variables for a block are allocated in the stack. In the case
of procedures, this provides efficient recursion. Simple blocks (i.e.,
nested BEGIN/ENDs containing declarations) are implemented as
parameterless procedure calls. Therefore, more-global variables are
lower in the stack, or possibly in a separate stack (note that unlike
some systems, MCP stacks grow from low addresses to high addresses -- a
push increments the top-of-stack address, not decrements it).

Addressing within the current scope chain is a very common thing to do,
so MCP systems provide a series of base registers, called the "D" (for
"display") registers. There is one D register for each lex level, and it
contains the absolute (real) memory address to the base of a block's
stack frame. The B6700 had 32 such D registers, but this proved (for
once) to be more than necessary. Later systems cut back to 16 D
registers (allowing user procedures nested 13 levels deep -- I doubt
that I've ever coded anything that goes more than four levels deep). The
B5900, which was the first microcoded processor and was based on
bit-slice chips, tried to get by with four D registers (0, 1, 2, and
current LL), but that didn't work too well, and that approach was
abandoned in later designs.

The simplest addressing mode for MCP systems is based on these D
registers and uses a construct known as an "address couple". The address
couple has two fields, LL (which selects a D register) and an offset
from the address in that D register. This is written "(LL,offset)" --
thus (2,17) refers to the 17-th (zero relative) word in the global
(LL=2) address space of a program. For an Algol program, this address
space would be the outer block; for COBOL, it would be WORKING-STORAGE;
for FORTRAN, it would be COMMON; for a C program it would be the
environment for static declarations.

With that introduction to stack addressing, here is the key concept:
scalar variables are allocated in the stack; non-scalar variables
(arrays, structures, records, whatever) are allocated outside the stack.
A descriptor is allocated in the stack to access that non-scalar area.
To illustrate this, here is a simple Algol program:

1: BEGIN
2: INTEGER I; % I is allocated at (2,2)
3: ARRAY A[0:99]; % descriptor for A at (2,3)
4: I:= 1;
5: DO
6: A[I]:= A[I-1]+I
7: UNTIL I:= I+1 > 100;
8: END.

The stack offset for I starts at 2 because there are two linkage words
at the base of a stack frame for (a) the procedure return address and
(b) a control word called the MSCW (Mark Stack Control Word) that allows
the processor to reconstruct the D registers to the values for the
calling environment upon procedure (or in this case, block) exit.

Here is the code that Algol would generate for this snippet (note that
this is slightly idealized and out of order from what the current Algol
compiler would generate, but both the concept and effect are accurate).

Line 1: (nothing)

Line 2: ZERO push a zero value onto the stack at (2,2)

Line 3: LT48 000006400000 push a skeletal descriptor with length=100
onto the stack at (2,3)
LT8 5 push a literal 5 onto the stack
STAG set a tag=5 onto the skeletal descriptor; this
leaves an untouched descriptor at (2,3)
PUSH flush the top-of-stack (TOS) words into memory

LT16 2048 push literal 2048 (this is a marker word for the
MCP BLOCKEXIT procedure called at the end)
BSET 47 set bit 47 in TOS word
LT8 6 push a literal 6
STAG set tag=6 on the BLOCKEXIT marker word

Line 4: ONE push a literal 1 onto the stack
NAMC (2,2) Name Call: push an IRW for I onto the stack
STOD Store Destructive: store the 1 to the (2,2)
address; pop both the IRW and literal 1

Line 5: (nothing)

Line 6: VALC (2,2) Value Call: push a copy of the value of I
NAMC (2,3) push an IRW to the descriptor for A
INDX Index: pop both parameters; push an indexed copy
descriptor for A[I] (think: address of A[I])
VALC (2,2) push another copy of I
ONE push a literal 1
SUBT Subtract: pop both parameters and push the
value of I-1
NAMC (2,3) push IRW to the descriptor for A
NXLV index and load value: index A by I-1; pop both
parameters and push the value of A[I-1]
VALC (2,2) push a copy of the value of I
ADD pop top two values; push the value of A[I-1]+1
STOD store A[I-1]+I in A[I]; pop both addr & value

Line 7: VALC (2,2) push a copy of the value of I
ONE push a literal 1
ADD pop top two values and push the value of I+1
NTGR integerize TOS word with rounding
NAMC (2,2) push an IRW for I onto the stack
STON Store Nondestructive: store I+1 back into I;
pop the address but leave value I+1 on TOS
LT8 100 push a literal 100 onto the stack
GRTR Compare Greater: [(I+1) > 100]; pop both values;
push result of comparison: 1 if true, 0 if false
BRFL 4:4 branch false: if low-order bit of TOS word is 0,
branch to Line 6 (word 4, syllable 4 in the
current code segment); pop the TOS word

Line 8: MKST construct and push a MSCW (Mark Stack Control
Word) in preparation for a procedure call
NAMC (1,4) push an IRW to the PCW for the MCP's BLOCKEXIT
procedure (actually, for an MCP intrinsic, it
would be an IRW to an SIRW in the Segment
Dictionary to the PCW in the MCP stack for
BLOCKEXIT). This procedure is not passed any
parameters, but if it were, they would be pushed
into the stack at this point.
ENTR Enter: call the BLOCKEXIT procedure
EXIT Exit Procedure: cut back the stack (thus
destroying this activation record), and exit the
outer block of the program (this exits back into
an MCP procedure which terminates the task and
disposes of the stack's memory and related
resources)

When this program is initiated, the MCP reads some information from the
codefile that tells it how to set up the data stack, including a
recommended initial size. If the Segment Dictionary is not already
present (due to another task executing the same codefile), a "code
stack" is allocated for the Segment Dictionary and its image is loaded
from the codefile. There is a base area of the data stack that the MCP
uses for task management, which it also sets up. No program globals are
loaded, however -- this will be done by stack-building code generated by
the compiler for the outer block's data segment (as is shown for I and A
in the example above). Instead, the MCP creates a dummy stack frame that
makes it appear as if this task has called a procedure, but the return
address from that call is set up as the entry point to the outer block's
code segment.

The MCP also constructs a TOSCW (Top of Stack Control Word) at the base
of the stack, which tells the hardware how to find the top of stack and
the base of the top stack frame. From that, the processor can
reconstruct all of the stack linkage, D registers, return instruction
address, and so forth. After building the initial stack image, the MCP
simply links the task into the READYQ, the prioritized list of tasks
waiting for a processor. Once the task rises to the top of this queue, a
processor is assigned to it, at which point the processor "exits" into
the entry point.

The B6700 has a instruction, MVST (Move to Stack) that switches the
processor from its current stack to another one. This instruction
constructs a TOSCW for the current stack and uses the TOSCW for the new
stack to reconstruct stack linkage and registers settings. Later systems
did context switching in different ways, but it appears that on the
current Libra systems, MVST is once again how it's done.

Note that once the MCP sets up the initial stack image and releases the
new task to the READYQ, all further saving and restoring of registers
and other state information is handled automatically by the hardware.
Since all registers have specific purposes (i.e., there are no
general-purpose registers being used who knows when and for what), the
hardware knows when the value of a register needs to be pushed into
memory or recalled. This applies not only to context switches between
tasks, but also to all procedure calls. Hardware interrupts are
implemented as a forced procedure call on the stack that currently has
the interrupted processor, so the same state-saving mechanism is used
for interrupts as well.

Back on the Algol example above, the very first thing that happens when
the processor exits to the entry point is a Presence bit interrupt as it
detects the Presence bit is zero in the descriptor for the outer block's
code segment. Execution continues once this code segment is made present.

The stack-building code at the beginning of the outer block creates the
local variables for the stack frame and pushes them onto the stack. In
the case of integer I, this is simply a literal zero; in the case of
array A, the code constructs an untouched data descriptor of length 100.

The 100 words of memory for the array will not be allocated until the
descriptor is first "touched" and its zero Presence bit detected. This
will happen the first time the NXLV (Index and Load Value) instruction
is executed in Line 6. Note that the INDX (Index) instruction executed
earlier does not cause a Presence bit interrupt, since it only generates
an indexed copy descriptor and does not attempt to access an array
element. The INDX instruction effectively acts as a "load address"
instruction. Bounds checking takes place on both INDX and NXLV, however.

The program then proceeds to initialize the value of I (some compilers
would fold this assignment into the stack-building code, but Algol does
not), and execute the DO loop that iterates through the elements of
array A. To someone used to register-based architectures, this code
probably looks like it generates a lot of memory accesses -- all those
VALC and index operators, not to mention the stack pushes and pops. On
the B6700 that certainly was true, as there was essentially no caching,
except for two TOS registers. More recent implementations use caching
extensively, however, and most of the apparent memory references would
stay inside the processor.

Another thing that may be apparent is that Algol does very little
optimization. It is a one-pass compiler and, for better or worse, emits
instructions in pretty much a what-you-code-is-what-you-get manner.

This program contains an intentional bug, which becomes apparent on the
last iteration of the DO loop. The value of I is 100, which is greater
than the upper bound of array A. When I compiled this and ran it on an
MCP system, I got the following message from the MCP:

2228 MSRHI3:INVALID INDEX @ (00100600)
2228 HISTORY: 003:0001:3 (00100600).
F-DS 2228 (PAUL)OBJECT/SIMPLE/ALGOL ON OPS.

This indicates a bounds violation on line 100600 (a sequence number that
is part of the source file line) at code segment address 003:0001:3,
which is the INDX instruction for Line 6 in the example above. This
bounds checking is not due to any debug or diagnostic mode I enabled for
the compiler or the object code -- it's implicit in the segmented
addressing mechanism for the architecture and cannot be turned off.

The value 2228 is the MCP-assigned task number. F-DS indicates the
program was terminated (discontinued, or "DS-ed" in MCP parlance) due to
a fault interrupt. Although it is not apparent from this example, the
HISTORY line is a trace of return addresses -- it shows the history of
procedure calls that got to the point where the fault occurred.

Assuming this bug did not exist (i.e., the comparison on line 7 was ">
99" instead of ">100"), the loop would have terminated when the value of
I reached 100 and control would have fallen into the END statement for
the block. The NTGR instruction for line 7 is due to the numeric format
used with all MCP systems since the B5000 -- integers are implemented as
a subset of floating-point values, and integer overflow generates a
floating-point result. NTGR normalizes the TOS value as an integer and
generates a fault interrupt if it exceeds the limits of integer
representation (+/- 2**39-1).

The call to BLOCKEXIT at the end of the code for the block is generated
by the compiler to dispose of any complex objects (arrays, files, etc.)
that were declared in this stack frame. The compiler generates a tag-6
marker word at the end of stack-building code that serves as a parameter
to BLOCKEXIT. This marker word contains a bitmask indicating which types
of resources BLOCKEXIT should look for. Failure to call BLOCKEXIT when
required would result in memory leaks, and the presence of this call is
another example of the trust the system places in its compilers. More
recent E-mode levels include a "blockexit bit" in one of the stack
linkage words that can be used by the MCP to enforce proper disposal of
stack frame resources before the frame can be exited.

A Series Enhancements to the Descriptor Mechanism.

The Address field of data and code segment descriptors is 20 bits wide,
which allows for a total of 1048576 words (6 MB). The B6700 has the same
maximum physical memory size, so the field width was adequate. In the
late 1960s and early '70s 1 MW seemed near infinite, but as systems
became larger through the '70s (and especially as the use on-line
applications and data bases grew during this period), this upper limit
on physical memory of 1 MW became grossly inadequate. The
B6800/7800/6900/7900 implemented a somewhat crude paging technique (the
infamous "Global(tm)Memory") that helped somewhat on multi-processor
systems, but the physical address space for a given processor at any one
time remained at 1 MW.

The A Series models introduced starting in the early 1980s addressed
this problem by implementing a concept known as ASD (Actual Segment
Descriptor). Heretofore, the "mom" descriptor in a program's data stack
was the owner of a memory area and pointed directly to it when the
segment was present in memory. There was no room to expand the address
field in the 48 bit descriptor word, so the role of owner was moved from
the mom to a central ASD table in memory. Instead of a real memory
address, the Address field in descriptors now holds an index into this
ASD table. On the latest processors, each entry in this table contains
eight 48-bit words, of which only the first three are used by the
hardware. The actual location, length, and status of each allocated
memory area is now stored in these table entries, hence the ASD name. It
functions similarly to the page table in other virtual memory
architectures, except that the "pages" are variable-length segments.
Most processors use caching to reduce the incidence of real memory
accesses to this table, effectively implementing a form of TLB.

The concept of "mom" and "copy" descriptors no longer exists in the
architecture, at least not in anything like the way it was in the B6700.
All descriptors in data segments (except untouched descriptors) point to
an ASD table entry. In fact, since E-mode level Gamma introduced the use
of four-bit tags (initially for the A11 and A16 models in the early
1990s), tag-5 words are used only to represent untouched descriptors.
Once the area is allocated, descriptors accessible to user-level code
(now called "virtual segment descriptors" or VSDs) use tag values of C,
D, E, and F to identify various combinations of indexed/unindexed and
paged/unpaged descriptors.

The ASD index field in these VSD words is currently 23 bits in length,
allowing for a maximum of just over 8 million segments in the system.
The maximum length of a segment is still limited to 2**20-1 words,
although it appears this could be extended. The physical address field
in the ASD is currently 36 bits, allowing a maximum physical memory
space of 64 GW or 384 GB. The large Libra systems I have encountered
recently have 4-8 GW of physical memory, which appears to be more than
adequate.

With so much physical memory available, most systems today run with
large amounts of the memory space unallocated, and almost no overlay due
to memory allocation pressure takes place. As a result, the Presence bit
mechanism now largely serves as an allocate-on-first-reference
capability. Should you fill up the physical memory, however, the
automated overlay ("virtual memory") mechanism will still do its thing.

The size of the ASD table is established at system initialization time,
based on the total size of physical memory and a factor that is settable
by the system administrator. Running out of ASD table entries is a
no-no, and causes the system to halt.

A serious issue with the B6700 design was management of copy
descriptors. Every time the presence, location, or size of a segment
changed, not only did the mom descriptor need to be fixed up, but all of
the copies, too. The only way to find those copies was to search for
them, so copy descriptors were only permitted in stacks. There were
(still are) special instructions to search for these copies, and a
considerable software investment was made to minimize the number of
stacks that needed to be searched, but stack-search overhead could be
fierce, especially on a system that was near or beyond the thrashing point.

The ASD implementation considerably helped this situation. Copies no
longer contain real addresses to the data area or to the mom -- instead
they point to the ASD table entry for the segment. This table index does
not change through the life of the segment, so copies no longer need to
be found and fixed up when the location, size, or status of a segment
changes.

One of the nicest aspects of the ASD implementation was that it had
essentially no impact on user applications. Since descriptors are
managed by the MCP, the details of how the indexing instructions compute
real addresses are opaque to user-level code. It was necessary to
recompile some programs, although not specifically to support the ASD
addressing changes -- the B6700-era compilers emitted code that would
access fields of descriptors (e.g., to determine the length of an area
in Algol, you could obtain a tagless copy of the descriptor and isolate
bits [39:20]). Starting with E-mode level Gamma, the length field was
not even in the VSDs accessible by user-level code anymore, so new
instructions (e.g., GLEN to determine the length of a segment) were
implemented to perform these functions, and the old methods (along with
the Algol syntax that supported them) were deprecated. Some other
model-specific instruction sequences (such as directly accessing
processor registers) were eliminated at the same time, all of which
improved the security of the system and reduced somewhat the reliance on
trusted compilers. That reliance was not eliminated entirely, however.
There was a lengthy transition period that allowed users to recompile
their programs so that their codefiles would be compliant with newer
processors.

Issues with Descriptors and the MCP Architecture in General.

The first thing almost everyone comments on when first being exposed to
the stack- and descriptor-based architecture of MCP systems is the
memory access overhead of push/pop on the stack and of having to index
through descriptors to reach data. The second thing that gets comments
is the lack of user-accessible registers.

There is no question that these characteristics of the architecture add
overhead and (at least in the myopic view in which most seem to consider
performance issues) degrade performance. This overhead is at least
partially offset, however, by:

* the lack of unnecessary state saving,
* the efficiencies resulting from variable-length memory segments,
* the efficiencies resulting from unnecessary memory allocation by
delaying it until first reference,
* the efficiencies resulting from code and data segments being
closely related to language objects,
* the efficiencies resulting from being able safely to access data
and code across addressing environments and inter-task
(marshalling data across process boundaries is a foreign concept
in the MCP),
* the efficiencies in context switching,
* the efficiencies in interrupt handling, and
* the efficiencies resulting from hardware and operating system
environments that were designed specifically for each other.

In an I/O-intensive, transaction-server environment (which is what the
MCP systems are designed for), this performance trade-off balances out
better than you might think. Where the architecture loses at the micro
level of instruction performance, it gains at the macro level of system
performance. For a server, that's what you want. Need to do
high-performance numerical computation? MCP systems are probably not the
ones you should consider using. Need to do high-performance transaction
processing and safely balance the needs of hundreds or thousands of
tasks competing for processors, memory, and I/O paths? MCP systems do
quite well in that solution space.

There is another aspect to performance that I do not think is considered
often enough -- reliability performance. The lack of low-level bounds
checking in other systems terrifies me, and it should terrify you. The
idea that bound violations can be prevented simply by programmers "being
careful" is both silly and irresponsible. Giving addresses to
programmers is like giving whiskey and car keys to teenagers -- sooner
or later something stupid is going to happen, and it's probably going to
be sooner. I say this being a programmer. The current problems we have
with malware are largely due to unchecked memory accesses and allowing
data to be treated as code. These are problems that MCP systems simply
do not have.

As I have said before in this space, there is a cost to using
descriptors and hardware-enforced bounds protection. This is also a cost
to not using descriptors and hardware-enforced bounds protection.

In my opinion, the MCP architecture has two major problems -- and
neither of them relates to performance. The first is the reliance on
trusted compilers. As discussed briefly above, tweaks to the
architecture over the past 30 years have improved this situation, but
the security of the system is still too dependent on the quality of code
that the compilers generate. Barring a social engineering attack, it is
quite difficult get untrusted code into the system in a form that can be
executed. Once an untrustworthy compiler is authorized, though, havoc is
possible. I am not aware that penetration of untrusted code has ever
been a problem since the early B6700 days when some major holes were
exposed in the enforcement of codefile integrity, but this is too ripe
and area for potential abuse, and one that requires enforcement in too
many parts of the system, to be considered an adequate aspect of the
architecture.

The second problem is that the segmentation, addressing, and memory
management mechanisms of the system are built for hierarchical,
block-structured languages such as Algol and Pascal. Memory objects
effectively "belong" to the block that declared them, and are
automatically deallocated when that block exits. This approach also
works fine for COBOL, and is adequate for FORTRAN (at least through
FORTRAN-77). It works poorly, though, for languages which rely on
heap-based memory management, where an object can have a life after its
originating environment no longer exists.

The MCP compilers and operating system go to great lengths to prevent
"up-level pointers" -- the existence of references to a
locally-allocated segment that can be stored in a more global scope. The
system does not have an efficient way to locate and invalidate such
up-level references when the locally-allocated segment is deallocated,
so their use is prohibited.

Current MCP systems support a C compiler, but the performance of its
code is not all that good, partly because C is basically a high-level
assembler for register-based, flat-address architectures (to which the
MCP architecture is a nearly complete antithesis), and partly because
the C heap is currently implemented as a series of array rows, with C
pointers implemented using integer offsets into the runtime
environment's heap space. It works, but the result is not very efficient.

The problem is even worse for object-oriented languages such as Java. A
descriptor-based capability architecture should be a good fit with an
object-oriented language (a descriptor, after all, is a primitive form
of object), but the current MCP architecture is too closely tied with
the Algol memory model to work well natively with Java. I consider this
to be both a real shame, and a real threat to the future viability of
the MCP architecture.

Those issues aside, I think the MCP architecture is still the most
interesting, if least appreciated, one on the market today. There are
other interesting aspects of the architecture that I have either passed
over quickly (such as stack linkage and cross-stack addressing) or
ignored altogether (such as the process model and the concepts of task
families and synchronous and asynchronous dependent tasks). Then there
are the MCP stack, the stack vector, procedure calls, accidental entries
("thunks"), parameter passing, string and data movement instructions,
server libraries, and connection libraries. These are all also well
integrated with the segmentation, stack, and addressing issues discussed
above. In fact, one of the endlessly fascinating things about this
architecture to me is that, for all of its complexity, everything fits
into a nicely integrated whole. It's quite elegant, really.

For those willing to RTFM, the Unisys support web site allows free
access to essentially all of the documentation for current systems. You
can access this through the front door by going to
http://support.unisys.com/ and clicking the "Access documentation" link
at very bottom of page. Click through the agreement and you will be
presented with a page that can search the documentation. You can also
access documents directly if you know their URL. Here are some useful ones:

ClearPath Libra 680/690 (latest version of E-mode level Epsilon spec):
http://public.support.unisys.com/c71/docs/Libra680-1.1/68787530-004.pdf

ClearPath NX6820/6830 (E-model level Delta spec):
http://public.support.unisys.com/aseries/docs/Hardware/70205547-001.pdf

Current Algol language reference:
http://public.support.unisys.com/aseries/docs/ClearPath-
MCP-11.1/PDF/86000098-507.pdf

Bitsavers has a good collection of documents for the older MCP systems
under:

http://www.bitsavers.org/pdf/burroughs/

In particular, you might want to look at the following documents under
that URL:

Narrative Description of the B5500 MCP:
B5000_5500_5700/1023579_Narrative_Description_Of_B5500_MCP_Oct66.pdf

B5500 Reference Manual (architecture reference):
B5000_5500_5700/1021326_B5500_RefMan_May67.pdf

B5500 Extended Algol:
B5000_5500_5700/1028024_B5500_ExtendedAlgol_Jul67.pdf

Elliot Organick's 1973 book on the MCP architecture:
B5000_5500_5700/Organick_B5700_B6700_1973.pdf

B6700 Reference Manual (architecture reference):
B6500_6700/1058633_B6700_RefMan_May72.pdf

A good paper by Hauck and Dent on the B6500/6700 stack mechanism:
B6500_6700/1035441_B6500_B7500_Stack_Mechanism_1968.pdf

If on first reading you don't understand this architecture, you're
running about average. I will happily try to reply to questions and
comments.

Paul Kimpel

unread,

Dec 9, 2007, 6:37:22 PM12/9/07

to

I will certainly agree with Edward that process initiation and
termination carry a fair amount of overhead, but disagree about
switching among processes. Because register state is saved and restored
automatically by the hardware, there is very little unsaved state that
needs to be handled during a process switch. On the B6700 and on the
latest high-end Libra machines there is a single instruction that does
the switch. The overhead Edward may be considering is operating system
overhead for process accounting -- e.g., accumulated CPU time -- but
that is done out of choice, not architectural necessity.

The current hardware limit on the size of a single data segment is
2^20-1 words. I thought that was also the upper limit on the size of a
user-level array row, and seeing 2^28 quoted (and assuming that meant
2^28-1), I had to check this out. I therefore wrote the following Algol
program:

BEGIN
DEFINE MAXBIT = 28 #;
ARRAY A[0:2**MAXBIT-1];

A[0]:= 0;
A[2**(MAXBIT DIV 2)-1]:= 2**(MAXBIT DIV 2)-1;
A[2**MAXBIT-1]:= 2**MAXBIT-1;
PROGRAMDUMP (ARRAYS);
END.

This did not compile (using the MCP 11.1 DMALGOL compiler on a Libra
300), generating an error on the declaration for array A, "This
dimension is declared with too many elements".

Then I tried MAXBIT=27. That compiled, but running the program generated
a run-time error on the assignment to A[0] (which is where the array
would have been allocated, being the first reference to one of its
elements), "DIMENSION SIZE ERROR 1=134217728".

Next, I tried MAXBIT=26, and that both compiled and executed
successfully. The memory dump for the task generated by the next-to-last
statement confirmed that the array was segmented, and allocated as 8193
pages of 8192 words each (the last page is zero length, which is a
hardware requirement when the length of the segmented array is a
multiple of 8192). The dump also confirmed that only pages 0 and 8191
were actually allocated in memory, as they were the only ones that were
touched.

The Libra 300 is an E-model level Delta machine (it's also an emulated
machine -- the MCP architecture is implemented in software using a
standard Intel Pentium box running Windows Server 2003). It is possible
that the latest, E-mode level Epsilon machines may allow higher limits,
but I do not have ready access to one to test this.

>
> There's probably still a limitation on the size of code segments. But the
> compilers take care of that, and it's totally transparent. It's been years
> since I was even aware of the size of code segments except in bizarre cases
> where it would be a clue to some strange bug.
>
> If you want a lot of big segments, just declare
>
> ARRAY A [0:2**20-1, 0:2**20-1, 0:2**20-1];
>
> (Note that exponentiation is Algol is **.) That gives you a billion array
> rows of a million words each. A googol of them would only take a couple
> more lines. You would not be able to use them all due to physical memory
> limitations and the number of lifetimes it would take to go through them,
> but you could compile the program and access some random words from
> anywhere just to prove the point.
>
> Ironically, the most noticeable limit today is one you aren't likely to
> hear about. IOs are still limited to 2^16 words (6 * 2^16 bytes).

In other words, the largest unpaged memory segment that can be rolled
out to disk is 64K words in length, because the MCP requires virtual
memory paging to be done in one I/O. Permanently resident segments can
theoretically be 2^20-1 words in length, though.

>
>> Most of what I know about the Burroughs and descendants' architecture
>> is from Blaauw and Brooks.
>
> I'm not familiar with their book. I recommend Elliot Organick's "Computer
> System Organization: The B-5700-B-6700 Series". Used copies are not
> numerous but are not hard to find either. It won't give all the details,
> but it's well written. Three decades out of date, but the basics haven't
> changed.
>
>> Their description is somewhat confusing
>> (not really their fault, since the hardware architecture is
>> phenomenally complicated), but as far as I can tell, each segment is
>> limited to 32K words.
>
> Totally incorrect. I can't even guess where this misconception came from.
> As mentioned above, the modern limit is about 2^28 words. In programing
> terms it was never less than 2^20 words. There were various limits that
> were smaller, but they did not affect programming.
>
> Remember that most MCP segmentation was always transparent to the
> programmer.
>
> Code segments? Maybe code segments are limited to 2^15 words, though
> actually 2^13 comes to mind. But it's totally transparent. When the
> compiler fills up a code segment, it generates a branch to the next code
> segment. End of problem.

The word offset in a code segment is limited to 2^13-1 in a PCW (Program
Control Word, tag=7) and in the intra-segment branch address format of
branch instructions.

The important point here, I think, is that both Edward and I have been
using MCP systems for a long, long time, and both of us are quite
familiar with them, but not only do we not worry about segment size
limitations in our daily work, gosh -- we're not even sure what they are
anymore. I had to dive into the architecture reference manual to address
the issues in this thread. There are limits, but they are not
constraints on practical use. That was not the case on the B5500, which
had much lower limits all around, but for the B6700 and later systems,
virtual addressability for either code or data has never been much of a
concern.

Stephen Fuld

unread,

Dec 10, 2007, 12:57:34 AM12/10/07

to

Paul Kimpel wrote:

snip very detailed descriptions

> Those issues aside, I think the MCP architecture is still the most
> interesting, if least appreciated, one on the market today.

Thank you very much Paul for that wonderful description. I have been
looking for something like that for a long time. I also appreciate that
you talked about the disadvantages/flaws in the system as well as its
advantages as it is good to get a dispassionate post like that from an
obviously interested and dedicated "Burroughsian". :-)

One other issue which some might consider a flaw or disadvantage is that
it seems hard to see if you could do a JIT compiler for this system. Is
that right?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Louis Krupp

unread,

Dec 10, 2007, 3:42:46 AM12/10/07

to

Paul Kimpel wrote:
<snip>

> If on first reading you don't understand this architecture, you're
> running about average. I will happily try to reply to questions and
> comments.

Cool. Thanks very much for posting that. I knew some, but not nearly
all, of it, once upon a time. I'll have to reread the ASD stuff.

I'd have to say that among the disadvantages of the MCP architecture are
(1) its native use of EBCDIC instead of ASCII and (2) the 48-bit word
format. It would be hard to take a C program reading 32-bit
twos-complement integers or IEEE floating point numbers, translate it to
ALGOL, run it on an MCP system, and get the same results (after finding
the buffer overruns that were there all along without anyone knowing).

Louis

mattmill...@gmail.com

unread,

Dec 10, 2007, 12:45:10 PM12/10/07

to

On Dec 9, 9:57 pm, Stephen Fuld <S.F...@PleaseRemove.att.net> wrote:

> snip other stuff

> One other issue which some might consider a flaw or disadvantage is that
> it seems hard to see if you could do a JIT compiler for this system. Is
> that right?
>
> --
> -StephenFuld
> (e-mail address disguised to prevent spam)

Stephen,

MCP systems have supported JIT compilation since at least the mid to
late 1980's.
The first I encountered and contributed to was the S-Code compiler.
The MCP was extended to provide the ability to allow code to be
generated "on the fly" in memory (JIT wasn't as common-place term back
then since Java and .net was still almost a decade later).
Then when we added TADS support for COBOL85, C and Pascal83 using the
Slice compiler architecture we leveraged the same MCP support.
When using these products, "JIT" compilations are taking place
following your input.
The Slice TADS libraries for these products include the respective
compiler that allow you to compile a statement (or block of
statements) at a time in the intended context.
There were other experimental "JIT" tools in-house at the time that I
probably can't talk about, but needless to say there is no additional
barrier to implement a trusted JIT versus implementing a regular
trusted compiler.

Matt.

Jim Haynes

unread,

Dec 10, 2007, 9:10:45 PM12/10/07

to

In article <13lpv08...@corp.supernews.com>,

Louis Krupp <lkr...@pssw.nospam.com.invalid> wrote:
>
>I'd have to say that among the disadvantages of the MCP architecture are
>(1) its native use of EBCDIC instead of ASCII and (2) the 48-bit word
>format. It would be hard to take a C program reading 32-bit

At the time IBM was absolutely dominant in business computing, and
IBM had turned its back on ASCII and put forth EBCDIC in its place.
We can speculate that IBM figured EBCDIC would take over and ASCII
would wither away; that was usually the way with de facto standards
established by IBM. So if Burroughs wanted to be a factor in business
computing they had to support EBCDIC. Open question whether Burroughs
thought ASCII would wither away and EBCDIC would come to dominate.

The 48-bit word format was carried over from the B-5500 and was well
thought out. I've been told that Burroughs management insisted on
some compatibility between the B-5500 and the systems that were to
follow it, and thus the architects were not free to improve on some
aspects. I think it's pretty clever that the 48 bit numeric format
allows integers and floats to have the same representation, so you
don't need to be always converting from one to the other. Then
48 bits was cool in the days of 6-bit character codes since you got
8 characters to the word.

C came along a lot later.

Stephen Fuld

unread,

Dec 11, 2007, 1:42:37 AM12/11/07

to

mattmill...@gmail.com wrote:
> On Dec 9, 9:57 pm, Stephen Fuld <S.F...@PleaseRemove.att.net> wrote:
>
>> snip other stuff
>
>> One other issue which some might consider a flaw or disadvantage is that
>> it seems hard to see if you could do a JIT compiler for this system. Is
>> that right?
>>
>> --
>> -StephenFuld
>> (e-mail address disguised to prevent spam)
>
> Stephen,
>
> MCP systems have supported JIT compilation since at least the mid to
> late 1980's.
> The first I encountered and contributed to was the S-Code compiler.
> The MCP was extended to provide the ability to allow code to be
> generated "on the fly" in memory

OK. That wasn't apparent from Paul's post. Can any compiler generate
code on the fly or is it an "extra" permission beyond being able to
generate codefiles?

--
- Stephen Fuld

Louis Krupp

unread,

Dec 11, 2007, 3:42:22 AM12/11/07

to

I'm familiar with the history (once upon a time, I ported code from the
B5500 to the B6700), and I'm not saying that the 48-bit word was a bad
idea. My point is that no machine has a chance of being taken seriously
by most programmers today unless it does native mode ASCII, 32-bit
twos-complement integers, and IEEE floating point.

In our industry, there is a tendency to believe that newer is better,
and C is automatically assumed to be an improvement over its predecessors.

Louis

Jan Vorbrüggen

unread,

Dec 11, 2007, 6:32:23 AM12/11/07

to

> My point is that no machine has a chance of being taken seriously
> by most programmers today unless it does native mode ASCII, 32-bit
> twos-complement integers, and IEEE floating point.

IEEE fp and twos-complement, I agree. ASCII as compared to EBCDIC, I
donÄt see as a hardware constraint. However, software that needs to know
the size of the integers it is using is, in almost all cases, broken by
design.

Summary: The 48-bitness does not matter to any software I want to care
about.

Jan

Alex Colvin

unread,

Dec 12, 2007, 10:05:39 PM12/12/07

to

>I'd have to say that among the disadvantages of the MCP architecture are
>(1) its native use of EBCDIC instead of ASCII and (2) the 48-bit word
format. It would be hard to take a C program reading 32-bit
>twos-complement integers or IEEE floating point numbers, translate it to
>ALGOL, run it on an MCP system, and get the same results (after finding
>the buffer overruns that were there all along without anyone knowing).

Think of all those C programs that use int32 for portability!

fixed bin(17,0) is the way to go!

--
mac the naïf

mattmill...@gmail.com

unread,

Dec 12, 2007, 11:04:29 PM12/12/07

to

On Dec 10, 10:42 pm, Stephen Fuld <S.F...@PleaseRemove.att.net> wrote:

> snip other stuff

> > MCP systems have supported JIT compilation since at least the mid to

> > late 1980's.
> > The first I encountered and contributed to was the S-Code compiler.
> > The MCP was extended to provide the ability to allow code to be
> > generated "on the fly" in memory
>
> OK. That wasn't apparent from Paul's post. Can any compiler generate
> code on the fly or is it an "extra" permission beyond being able to
> generate codefiles?
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

The compilers that generate code JIT or on the fly need to have a
runtime container/sandpit/environment in which to generate and to
reference in-memory JIT'ed code. In the case of the Slice TADS
libraries this is the runtime Test And Debug environment. They need
access to the necessary MCP API/entry point and when installed the
library has to be given the appropriate additional privileges that a
regular compiler does not need.

Matt.

Stephen Fuld

unread,

Dec 12, 2007, 11:37:58 PM12/12/07

to

Thanks. That makes sense.

Edward Reid

unread,

Dec 14, 2007, 2:04:02 AM12/14/07

to

On Tue, 11 Dec 2007 01:42:22 -0700, Louis Krupp wrote:
> My point is that no machine has a chance of being taken seriously
> by most programmers today unless it does native mode ASCII, 32-bit
> twos-complement integers, and IEEE floating point.

It is indeed ironic that the B6700 handled TWO different character sets
(three if you include hex, which is still handled identically to eight-bit
code), but did not include the eventual winner. As has been pointed out, in
1970 IBM's star was in the ascendent and EBCDIC looked like the right
choice. It turned out to be the wrong choice, but I don't think anyone
could have known that at the time. The watchword was "no one ever got fired
for buying IBM". Today substitute Microsoft for IBM -- the cult has become
just as strong. Microsoft will follow IBM, not this year, maybe not even
next decade, but eventually. Who will be right then? Personally I'm not
risking any money on it. (Though it may be worth pointing out that anyone
who bought Unisys stock when it was around $1, around 1991, and sold it
10-15 years later made a pretty profit.)

Maybe programmers put value in particular integer formats, but that's
religion. For integers, I know of no advantage to one format over another.

Floating point is another matter; IEEE floating point has real advantages.
But how many programs are using IEEE floating point in 32-bit words? Today
as in 1970, 32-bit floating point is OK for toy programs but not much else.
Scientific programs running on the IBM 360 were famous for producing
cockeyed results due to insufficient precision, and many programmers got
used to automatically using double precision. OTOH, 48-bit Burroughs
floating point worked fine for most of those same programs, since it had
almost twice the mantissa precision (roughly 37 bits vs 21 bits after
considering various factors). So the 48-bit word actually saved a lot of
that expensive core memory for many programs, which on IBM equipment had to
use 64-bit floating point.

It's too bad that IEEE floating point doesn't specify an integer subset. I
don't think it would have been difficult. Might even be an easy extension
now. The Burroughs format still has the advantage there.

ASCII vs EBCDIC is yet another religion. I have MCP programs which read and
write files directly on Windows servers. I declare the files with
INTMODE=EBCDIC, EXTMODE=ASCII. End of problem. In fact, for the most part
the architecture is agnostic with respect to ASCII vs EBCDIC -- it's just
8-bit characters. The only exception that comes to mind is in converting
binary to decimal and vice versa, where IIRC there is no option to generate
ASCII digits instead of EBCDIC digits or to interpret zones as ASCII on
input. (I *think* that the IBM 360 did have such an option, but it's been a
loooooong time.) Certainly the software is heavily locked in to EBCDIC, but
I don't see EBDCIC on the screen any more than I am seeing ASCII as I write
this -- I just see letters in the Roman alphabet. It's really a pretty
minor issue for working programmers. But religion is always a major issue.

And in truth ... ASCII is the loser too. Modern systems use Unicode.

Edward
--
Art Works by Melynda Reid: http://paleo.org

Edward Reid

unread,

Dec 14, 2007, 2:50:48 AM12/14/07

to

On Sun, 09 Dec 2007 23:37:22 GMT, Paul Kimpel wrote:
> The current hardware limit on the size of a single data segment is
> 2^20-1 words. I thought that was also the upper limit on the size of a
> user-level array row, and seeing 2^28 quoted (and assuming that meant
> 2^28-1), I had to check this out.

This is actually documented in FAQ # 10026491. Please check it out ... I
spent a year and a half in correspondence persuading them to document it!
(I actually had a situation where a COBOL program needed to declare as
large as possible an 01 level without taking any risk that it might fail on
another CPU. Plus those of us in the business of writing general purpose
software like to know the limits of any system it might possibly run on.)

https://www.support.unisys.com/ALL/PLE/WEB-VIEW-DOC?PLATFORM=C71&NAVIGATION=LIB680&CHOSEN-DOCUMENT=FAQ&ID=10026491&OPTION=DEFAULT

or if the wrap is a problem try

http://tinyurl.com/35zwqp

Edward
--
Art Works by Melynda Reid: http://paleo.org

Edward Reid

unread,

Dec 14, 2007, 3:03:34 AM12/14/07

to

On Mon, 03 Dec 2007 20:24:51 -0000, Jim Haynes wrote:
> Things that are grouped
> together in a structure in Unix tend to be scattered into a whole bunch
> of arrays in MCP.

That's an accurate statement about traditional Algol programming and
represents the state of most software running under the MCP. However, for
some 10-15 years now Algol has had good object-oriented structures. There's
no polymorphism or inheritance, but they are still extremely useful. Algol
today can look quite a bit different from even well written Algol of 25
years ago. Still, I'm sure that the use of memory is radically different
from that of other systems even with similar declared structures.

> The B5500 was an outstanding batch job machine and a lousy time sharing
> machine. The reason it was lousy was that absolute addresses get into the
> stack, so when a job is swapped completely out it has to be swapped back
> in to the same memory addresses it previously occupied. This was corrected
> by an architectural change in the B6500.

It may have been improved, but the B6700 still had major problems with this
kind of swapping. Stacks still contained absolute addresses. They could be
swapped back in to different locations, but at the cost of going through
the stack to fix up every single descriptor. I suppose the difference was
that this was possible on the B6700 and not on the B5500 (on the B6700 the
absolute addresses occurred only in descriptors, which could be identified
by their tags). I further suppose this is probably corrected today, since
descriptors now contain ASD numbers instead of memory addresses ... now
that we have plenty of memory and don't need swapping any more.

Edward
--
Art Works by Melynda Reid: http://paleo.org

Louis Krupp

unread,

Dec 14, 2007, 4:50:18 AM12/14/07

to

Edward Reid wrote:
> On Mon, 03 Dec 2007 20:24:51 -0000, Jim Haynes wrote:

<snip>

>> The B5500 was an outstanding batch job machine and a lousy time sharing
>> machine. The reason it was lousy was that absolute addresses get into the
>> stack, so when a job is swapped completely out it has to be swapped back
>> in to the same memory addresses it previously occupied. This was corrected
>> by an architectural change in the B6500.
>
> It may have been improved, but the B6700 still had major problems with this
> kind of swapping. Stacks still contained absolute addresses. They could be
> swapped back in to different locations, but at the cost of going through
> the stack to fix up every single descriptor. I suppose the difference was
> that this was possible on the B6700 and not on the B5500 (on the B6700 the
> absolute addresses occurred only in descriptors, which could be identified
> by their tags). I further suppose this is probably corrected today, since
> descriptors now contain ASD numbers instead of memory addresses ... now
> that we have plenty of memory and don't need swapping any more.

I believe the hardware operator "Masked Search for Equal" was created to
make stack searching for copy descriptors a little faster. I vaguely
remember that when the system was thrashing, the mask showed up
prominently in the register display.

Louis

Ken Hagan

unread,

Dec 14, 2007, 5:49:27 AM12/14/07

to

On Fri, 14 Dec 2007 07:04:02 -0000, Edward Reid
<edw...@paleoNOTTHIS.org.NOTTHIS> wrote:

> Today as in 1970, 32-bit floating point is OK for toy programs but
> not much else.

Toys are a big market. CPU (and GPU) makers can hardly ignore them.
If you can double the effective size and bandwidth of your RAM with
no observable downside on your application, you'd be a pretty poor
engineer not to.

Paul Kimpel

unread,

Dec 14, 2007, 9:52:14 AM12/14/07

to

Sure enough. I changed my little Algol test program to make the arrays
one word shorter, thus:

BEGIN
DEFINE MAXBIT = 28 #;

ARRAY A[0:2**MAXBIT-2];

A[0]:= 0;
A[2**(MAXBIT DIV 2)-1]:= 2**(MAXBIT DIV 2)-1;

A[2**MAXBIT-2]:= 2**MAXBIT-2;
PROGRAMDUMP (ARRAYS);
END.

This makes the array length 2^MAXBIT-1 instead of 2^MAXBIT. Using
MAXBIT=28 now compiles on an NX4201 (an E-mode Beta system running MCP
7.0) and the Libra 300 (an E-mode Delta system running MCP 11.1) that I
tried before. It also gets a Dimension Size error on both, as the FAQ
indicated it should. MAXBIT=28 should work on E-mode Epsilon systems
(Libra 185 and later), though.

Using MAXBIT=27 allows the program to compile and run on the Libra 300.
I had to go all the way down to MAXBIT=20 in order for the program to
run on the NX4201, again as the FAQ indicated should happen.

The NX4201 (along with all the other E-mode Beta and Gamma models) is no
longer supported, so it appears that the maximum array size you can
currently count on for cross-system compatibility is 2^27-1 words.

Thanks, Edward. I learned something from this. Now all I need to do is
find an application that requires a virtually contiguous 768 MB memory
vector.

Message has been deleted

Edward Reid

unread,

Dec 14, 2007, 2:27:27 PM12/14/07

to

On Fri, 14 Dec 2007 02:50:18 -0700, Louis Krupp wrote:
> I believe the hardware operator "Masked Search for Equal" was created to
> make stack searching for copy descriptors a little faster.

I doubt it was created solely for that, but it was definitely used for that
and I imagine that the requirements of stack searching were a major
consideration in the design of the operator. Probably made it a lot faster,
not just a little -- the search still had to access every word in the
stack, but did not have to fetch or interpret instructions while doing so.

If you read the description of the masksearch intrinsic in the ALGOL
manual, there's one important thing it doesn't tell you: the tag takes part
in both the mask and the target. I don't remember if this is clear in the
architecture manuals.

OK, for those not In The Know: the operator (mnemonic SRCH) takes three
parameters on the stack: a target, a mask, and an array location. The
target and mask are single full words. The array location is an indexed
descriptor -- if you read all of Paul's treatise, then you know what this
is, but otherwise just think of it as a location in an array. The SRCH
operator compares words of the array with the target word, moving toward
the beginning of the array, comparing only the bits which are on in the
mask. You can also describe this as searching for a word where ((target EQV
arrayword) AND mask) has all bits off. The operator returns the array
subscript of the first match, or -1 if no match is found.

When ALGOL programs use this operator via the MASKSEARCH or ARRAYSEARCH
intrinsics, the most commonly written mask is REAL (NOT FALSE), which is
all bits on (used because the generated code is only two bytes: ONE, LNOT).
But the tag is still zero. In ALGOL there is no way to create a mask
parameter with a non-zero tag, so there is no point in documenting that the
tag takes part in the comparison. A double array may have tags = 2 on the
data words, but with no way to make the tag non-zero in the mask, that's
irrelevant.

Of course the MCP languages (ESPOL in the early days, NEWP now) can set the
mask tag. That means that they can search for various kinds of descriptors,
checking both the tag and other bits such as the copy and indexed bits.
Various combinations of this were used to fix up descriptors for overlays.
As Paul described, a great deal of this is no longer applicable on ASD
systems, and I imagine that a much smaller range of the capabilities is now
used in this context.

> I vaguely
> remember that when the system was thrashing, the mask showed up
> prominently in the register display.

I never actually saw this, but I remember the story. On the B6700, the CPU
maintenance panel had a light for every bit in the CPU, including the
top-of-stack registers. The code for putting the CPU into the idle state
could be arranged so that the TOS registers generated particular patterns.
In the early days, it spelled out the word IDLE; later it was changed to
the big B logo. (Altering this pattern was one of the most popular local
MCP patches.)

Managers didn't like to see their expensive computer idle. (I'm not
claiming that all or even most managers were this clueless, just that this
is the story as I heard it.) Operators could handle this by loading more
work, even if that work didn't really need to be done. (Most never learned
that they could disable the IDLE just by entering "?L:GO L" on the console,
and in any case that didn't work until WFL was implemented on release 2.3,
around 1973.) But when the CPU was idle because all tasks were waiting on
overlays, the IDLE display showed up just as bright as when there was
really no work. So the operators would pile on more work trying to get rid
of the IDLE ...

Eventually the pattern would change and the IDLE would go away. The system
would still be waiting on overlays, totally thrashing, but the CPU would be
pegged handling the overhead of doing the overlays. Most of this overhead
was stack searching (which Paul described), mostly using the SRCH operator.
The mask would spend a lot of time in one of the TOS registers, and not
surprisingly the mask had most bits on. So there would be these waves of
solid lights moving across the display. Very impressive to the managers,
who realized that their expensive computers were very busy.

There was a name for this effect, but I can't remember it.

Greg Lindahl

unread,

Dec 14, 2007, 2:46:07 PM12/14/07

to

In article <1j79nh4dimobe.v...@40tude.net>,
Edward Reid <edw...@paleoNOTTHIS.org.NOTTHIS> wrote:

>Floating point is another matter; IEEE floating point has real advantages.
>But how many programs are using IEEE floating point in 32-bit words? Today
>as in 1970, 32-bit floating point is OK for toy programs but not much else.
>Scientific programs running on the IBM 360 were famous for producing
>cockeyed results due to insufficient precision, and many programmers got
>used to automatically using double precision.

The IBM 360 didn't have IEEE floating point, so your example doesn't work.

There are many significant modern scientific applications which can
get accurate results with 32-bit IEEE fp.

-- greg

Edward Reid

unread,

Dec 14, 2007, 3:30:11 PM12/14/07

to

On Sun, 09 Dec 2007 12:16:44 -0800, Paul Kimpel wrote:

A totally excellent technical and historical treatise. I've seen good
descriptions of the architecture, but this one is particularly lucid, and
includes integrated historical information, which I don't recall seeing
elsewhere.

> The compilers generate code in the initialization section of
> procedures to build these "untouched" descriptors directly in the

> program's stack. [...]

> The Copy bit indicates that a tag-5 word is not the original descriptor
> for an area, but rather a copy of it. A non-copy descriptor is said to
> be the "mom" descriptor for an area, and there can only be one such mom.

The alert reader will have wondered whether these conditions co-exist. In
fact the "untouched mom" was an important condition, and was referred to
internally in the code as a "virgin mom". I don't remember whether a virgin
mom was exactly the same as an untouched mom or involved some additional
distinction. In any case, this terminology for some reason never made it
into the public documentation.

> String operations that run off the end of a physical memory area
> generate a "segmented [paged] array" interrupt. The MCP responds to this
> interrupt by locating the next page in sequence (if there is one) and
> restarting the operation.

Note that at this point Paul is describing the Early Days. Paged arrays
could be expensive in some contexts due to these interrupts and the fixup
required to restart the operator -- just filling, say, a four kiloword
array with zeroes using a string operator (which would seem to be the
efficient method from a naive viewpoint) would result in 16 interrupts. Now
it's all handled in the microcode and the overhead is miniscule, plus as
Paul pointed out the pages are much larger now and so the change of pages
happens far less often anyway.

> Segment
> Dictionaries are also sharable -- if multiple tasks are initiated from
> the same codefile, the Segment Dictionary is loaded only once and the
> separate tasks are linked to this common copy. Thus, all of the object
> code and read-only constant pools for a program are automatically reentrant.

This is literally true, and very useful. However, once one works where
reentrancy of the code itself is a given, one begins to realize that there
are more degrees of reentrancy than IBM's old classification (non-reusable,
serially reusable, reentrant) allowed for. In particular, the complexity of
the environment may mean that code has its own copy of some data but uses
shared data in other cases, and this use of shared data can arise in
unexpected ways. The fact that code is literally reentrant does not absolve
the programmer of the responsibility for coordinating access to shared
data, which turns out to be more difficult than reentrancy of the code
itself.

> the content of descriptors is controlled only by hardware
> instructions and the MCP, not user-level code. User-level code (and all
> but small parts of the MCP kernel, for that matter) does not manipulate
> addresses, it manipulates offsets into memory areas.

Fully true now, and always true for end-user programming. As Paul described
later, the compilers did generate code which built and manipulated
descriptors using bit fields, all of which had to be removed for the ASD
migration. I remember somewhere around mark 38 finding and reporting a
situation where the ALGOL compiler was still using the old version. I don't
recall the exact resolution, but it was messy because it meant that the
three release recompile rule wasn't adequate to assure that affected
programs were recompiled before being run on ASD systems. Most of the code
migration depended on the three-release rule.

> This accessibility to the MCP stack was recognized as a serious security
> issue fairly early on, and later systems blocked direct access to LL=0.

I think that the maintenance issue was the real nightmare, more so than
security. With the compilers generating code with offsets into the MCP
stack, those offsets could not be changed. Procedures declared in the MCP
for public use had to be address-equated to specific D0 offsets, and the
assignments had to be managed manually. All of this went away with the
indirect linkage, which is basically handled the same as library linkage.

> The B6700 had 32 such D registers, but this proved (for
> once) to be more than necessary. Later systems cut back to 16 D
> registers (allowing user procedures nested 13 levels deep -- I doubt
> that I've ever coded anything that goes more than four levels deep).

I think that GEMCOS used up to about D9 or D10. But that wasn't all real
programmatic nesting; there were some levels of unnamed blocks added for
reasons I no longer recall (probably bad reasons, based on what I do
remember about the GEMCOS code).

The real advantage of having so many (16) D registers is simply that you
never have to worry about the limit, since as Paul explained, 16 is
basically limitless. If I have a procedure I want to insert within another
procedure (or a large amount of code to $include in a procedure), I can do
it. I don't need to worry what level the host procedure is at, nor how much
nesting is within the code being inserted. It's just a non-issue, and thus
something I don't have to waste time thinking about.

> bounds checking is not due to any debug or diagnostic mode I enabled for
> the compiler or the object code -- it's implicit in the segmented
> addressing mechanism for the architecture and cannot be turned off.

And because it's part of the ISA, the bounds checking is typically done in
parallel with other CPU functions. As a result, it does not incur the
performance penalties typical of software-based bounds checking.

> The first thing almost everyone comments on when first being exposed to
> the stack- and descriptor-based architecture of MCP systems is the
> memory access overhead of push/pop on the stack and of having to index
> through descriptors to reach data.

Paul gives a good summary of the fact that caching in the CPU mitigates
most of this. Even the B6700 had two explicit "top of stack" (TOS)
registers to buffer the need to push and pull from main memory (which IIRC
started as core but was thin-film by the time the B6700's life ended). The
architecture manual even had details about when these TOS registers (called
A and B) had to be pushed into memory or refreshed from memory, per the
requirements of each operator. The B7700, the higher-performance system of
the same era, had a much larger TOS cache, but I do not remember the
details.

> Need to do
> high-performance numerical computation? MCP systems are probably not the
> ones you should consider using.

However, in the days when Burroughs supported FORTRAN, you might have
wanted to develop your numerical software on MCP systems even if you
planned to run it on other systems.

When I was working for the State of Florida DHSMV in the 1970s, a federal
government agency (perhaps NHTSA) visited to install a statistics package
which they were using to collect data from every state. They had already
installed it on IBM systems and one other (perhaps Univac but I'm not
sure). Not surprisingly, they had to make more modifications for the B6700.
But at the end, they said they wished they had done the B6700 first,
because then they would have had almost no problems on the other systems.
The B6700 didn't require a lot of specialized changes, but rather allowed
(and even required) them to write standard, portable FORTRAN (as much as it
could be standardized in those days).

> In my opinion, the MCP architecture has two major problems -- and
> neither of them relates to performance.

I would add a third problem: the Burroughs/Unisys Sales Prevention
Departments. They never figured out how to sell it. My favorite example of
this is that in the 1960s and 1970s, the B5000/B6700 were very popular in
academic circles due to the interesting architecture. Quite a number of
universites had B6700s, some of the best known being UCSD, the University
of Tasmania, and SUNY Fredonia. You would think that marketing would say
hey, students are the next generation of programmers, let's sell B6700s to
universities at the very lowest possible cost so that the next generation
(who will also be driving decisions on what to buy) is familiar with them.
Instead, they appear to have said hey, this looks like a good profit
center, let's see how much money we can get out of them. Needless to say,
academic interest wasn't sufficient to make up for the pricing.

> C is basically a high-level
> assembler for register-based, flat-address architectures

In the 1970s it was said that FORTRAN was the world's most popular
assembler language. It has been replaced in this distinction by C: not only
is C now far more popular, but Fortran (yes, the capping changed) has
evolved into a good, modern programming language.

> There are
> other interesting aspects of the architecture that I have either passed
> over quickly (such as stack linkage and cross-stack addressing)

I have long held that the most important aspect of the architecture is the
environment management. This is a lot harder to explain than a stack, so
most people think "stack machine". The operators to enter and exit
procedures have to do some fancy linking to update the proper D registers
for the new environment. At first thought this may seem easy -- when you
are thinking "stack machine" -- but try to imagine what happens when you
are linked to a procedure in a library (a different environment entirely,
not part of the task's D register sequence, see Paul's decription of
SIRWs), and then pass that procedure as a parameter to a higher or lower
level procedure, which calls the passed procedure, not even knowing what
lex level it's going to run at. (If you followed that, then you either have
a lot of experience with such environments or do not yet fully comprehend
the complexity.) Actually it can be done pretty simply if you don't mind a
lot of extra memory accesses, but that would mess with performance, so
determining when the D-update can be truncated is critical. The only
operators which the architecture manuals never fully explained were the
enter and exit operators -- the writers got to a certain point and then
kind of waved their arms around saying "the D registers get updated".

In the early days of ALGOL, there were test programs to see if ALGOL
compilers properly handled complex environments. I recently ran across one
such program written by Donald Knuth. (I think it's linked from the
Wikipedia article on ALGOL, which unfortunately has some bad information in
other parts.) Burroughs ALGOL never had to worry about such programs --
Knuth's test was actually a pretty trivial one, though apparently a lot of
compilers had trouble with it. The B6700 ISA just did what was needed, and
the compiler didn't have to break a sweat.

John Dallman

unread,

Dec 14, 2007, 5:18:00 PM12/14/07

to

In article <1j79nh4dimobe.v...@40tude.net>,
edw...@paleoNOTTHIS.org.NOTTHIS (Edward Reid) wrote:

> Today as in 1970, 32-bit floating point is OK for toy programs but not
> much else.

Not true. There are many things that can be done with 32-bit. But if you
need more than 32-bit, you tend to really need it, and not to be able to
make do.

> Scientific programs running on the IBM 360 were famous for producing
> cockeyed results due to insufficient precision, and many programmers
> got used to automatically using double precision.

The reasons for IBM mainframe producing weird floating point are not
primarily to do with the word length. The system regards the basic digit
of calculation as the 4-bit hex digit, rather than a single bit, and -
if I remember this correctly - this can make it discard 4 bits of
precision where IEEE would only discard 1 bit. This can make a serious
difference, once it's accumulated for a sequence of operations.

http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture

I'd never known GEC 4000 minis used IBM hexadecimal floating point,
though. Fits with the general awfulness of everything about them, though.

> OTOH, 48-bit Burroughs floating point worked fine for most of those
> same programs, since it had almost twice the mantissa precision
> (roughly 37 bits vs 21 bits after considering various factors).

Any idea if the 6-byte software floating point format Borland used to
use was based on the Burroughs one?

--
John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

John K.

unread,

Dec 14, 2007, 10:44:55 PM12/14/07

to

"Edward Reid" <edw...@paleoNOTTHIS.org.NOTTHIS> wrote in message
news:bjwi3n4xgms1.1p...@40tude.net...

Stacks weren't the only place you could find absolute addresses on the
(Pre-ASD) B6700. The MCP had them hidden in a number of different structures
in those days. In some cases they weren't even in descriptors.

The B6700 never was able to swap stacks in and out like they were data
areas. The only way to get a stack written to disk was to use a mechanism
called SWAPPER. With SWAPPER, all the memory areas belonging to a task were
allocated in a contiguous subset of memory. Then, when a task was sent to
disk, all of it's memory went at the same time. When it came back into
memory, the MCP knew it had to go through the entire subspace and adjust all
the absolute addresses.

John Keiser ( retired MCP type and former SWAPPER expert )

Michael Newbery

unread,

Dec 14, 2007, 11:04:46 PM12/14/07

to

In article <clls5e4mk20e.h...@40tude.net>,
Edward Reid <edw...@paleoNOTTHIS.org.NOTTHIS> wrote:

> I never actually saw this, but I remember the story. On the B6700, the CPU
> maintenance panel had a light for every bit in the CPU, including the
> top-of-stack registers. The code for putting the CPU into the idle state
> could be arranged so that the TOS registers generated particular patterns.
> In the early days, it spelled out the word IDLE; later it was changed to
> the big B logo. (Altering this pattern was one of the most popular local
> MCP patches.)
>
> Managers didn't like to see their expensive computer idle. (I'm not
> claiming that all or even most managers were this clueless, just that this
> is the story as I heard it.) Operators could handle this by loading more
> work, even if that work didn't really need to be done. (Most never learned
> that they could disable the IDLE just by entering "?L:GO L" on the console,
> and in any case that didn't work until WFL was implemented on release 2.3,
> around 1973.) But when the CPU was idle because all tasks were waiting on
> overlays, the IDLE display showed up just as bright as when there was
> really no work. So the operators would pile on more work trying to get rid
> of the IDLE ...

Hence the widely deployed 'Speedo' patch, which replaced the 'meatball'
(The Burroughs 'B') with
W A
I T
for I/O,
P
B
for Presence Bit, and a smiley face for Idle

Terje Mathisen

unread,

Dec 15, 2007, 4:23:44 AM12/15/07

to

It had to be _similar_, but Anders Hejlsberg is the one to ask.

Turbo Pascal REAL was designed to be easy to implement in (16-bit) sw,
so I'm guessing there was no hidden bit, and that the sign+exponent
field used a nice/round number of bits (maybe 12?).

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Nick Maclaren

unread,

Dec 15, 2007, 5:44:15 AM12/15/07

to

In article <memo.2007121...@jgd.compulink.co.uk>,

Er, no. Neither statement is precisely true, but Edward Reid is more
nearly correct. For anything except toy programs, only a numeric expert
can use 32-bit reliably - indeed, for anything with more than very minor
error buildup, most people will simply produce nonsense with 32-bit.
The majority of people who need it, need it because they don't have the
expertise to use 32-bit - or aren't prepared to waste the time on
maintaining full accuracy at all stages.

|> The reasons for IBM mainframe producing weird floating point are not
|> primarily to do with the word length. The system regards the basic digit
|> of calculation as the 4-bit hex digit, rather than a single bit, and -
|> if I remember this correctly - this can make it discard 4 bits of
|> precision where IEEE would only discard 1 bit. This can make a serious
|> difference, once it's accumulated for a sequence of operations.
|>
|> http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture

You are quite simply wrong, I am afraid. The reason for the loss of
precision is because it used truncation, not rounding. That is why it
was comparable in accuracy (for numeric work) with the 48-bit designs
that did round. Note that it doesn't matter exactly HOW you round, as
long as it is reasonable.

I did an experiment using AUGMENT in the late 1970s, and the result
was very clear indeed. If the myth about the loss of accuracy being
due to the base were true, the IEEE 754R proposals for decimal floating-
point would be insane - as distinct from merely being a non-solution
to the problems they claim to address.

Regards,
Nick Maclaren.

John Dallman

unread,

Dec 15, 2007, 10:34:00 AM12/15/07

to

In article <fk0b5v$ff6$1...@gemini.csx.cam.ac.uk>, nm...@cus.cam.ac.uk (Nick
Maclaren) wrote:

> You are quite simply wrong, I am afraid. The reason for the loss of
> precision is because it used truncation, not rounding.

I believe you, but could you explain a little more? Are you saying that
it simply discards excess bits, rather then using them to round the last
retained bit to give a best approximation?

Or was it discarding whole 4-bit digits rather than single bits?

John L

unread,

Dec 15, 2007, 10:46:13 AM12/15/07

to

>|> http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture
>
>You are quite simply wrong, I am afraid. The reason for the loss of
>precision is because it used truncation, not rounding.

That's certainly one reason. Another reason is that the hidden bit
trick doesn't work with hex exponents, losing a second bit.

The third reason that everyone else seems to accept is that they
botched the scaling analysis. In the paper in the IBMSJ, they assumed
that leading digits were uniformly distributed, so that there would be on
average one leading zero which they expected to get back from the smaller
number of exponent bits. In reality, of course, they're geometrically
distributed so on average there's two zero bits.

> If the myth about the loss of accuracy being due to the base were
> true, the IEEE 754R proposals for decimal floating- point would be
> insane - as distinct from merely being a non-solution to the
> problems they claim to address.

It is my impression that they address precision questions by keeping a
whole heck of a lot of digits.

Nick Maclaren

unread,

Dec 15, 2007, 11:13:31 AM12/15/07

to

In article <memo.2007121...@jgd.compulink.co.uk>,
j...@cix.co.uk (John Dallman) writes:
|>

|> > You are quite simply wrong, I am afraid. The reason for the loss of
|> > precision is because it used truncation, not rounding.
|>
|> I believe you, but could you explain a little more? Are you saying that
|> it simply discards excess bits, rather then using them to round the last
|> retained bit to give a best approximation?
|>
|> Or was it discarding whole 4-bit digits rather than single bits?

Oh, no, that's not it. Yes, it simply discards excess bits, thus
discarding whole 4-bit digits (though there is a guard digit). And
that effectively reduces the precision by 3 bits compared to a
similar binary architecture. So the gain in precision by encoding
the exponent more compactly is more than lost.

But that is more-or-less a fixed loss, and does not lead to the
error building up in the same way that truncation does. To a very
crude approximation, a higher base loses O(log(base)) bits of
precision, but truncation loses O(sqrt(N)) where N is the number
of cumulative operations you perform.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Dec 15, 2007, 11:18:25 AM12/15/07

to

In article <fk0ss5$2ppo$1...@gal.iecc.com>, jo...@iecc.com (John L) writes:
|> >|> http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture
|> >
|> >You are quite simply wrong, I am afraid. The reason for the loss of
|> >precision is because it used truncation, not rounding.
|>
|> That's certainly one reason. Another reason is that the hidden bit
|> trick doesn't work with hex exponents, losing a second bit.
|>
|> The third reason that everyone else seems to accept is that they
|> botched the scaling analysis. In the paper in the IBMSJ, they assumed
|> that leading digits were uniformly distributed, so that there would be on
|> average one leading zero which they expected to get back from the smaller
|> number of exponent bits. In reality, of course, they're geometrically
|> distributed so on average there's two zero bits.

See my response to John Dallman. Your points merely affect the fixed
loss, and not the real problem, which was the accumulation of error
due to the truncation. A very few of us confirmed our analyses by
actual experimentation, and the executive summary is that the base and
similar details are unimportant - the only important aspect is rounding
versus truncation.

|> > If the myth about the loss of accuracy being due to the base were
|> > true, the IEEE 754R proposals for decimal floating- point would be
|> > insane - as distinct from merely being a non-solution to the
|> > problems they claim to address.
|>
|> It is my impression that they address precision questions by keeping a
|> whole heck of a lot of digits.

Yes, but that's the political meaning of 'address'. I was referring
to the technical aspects.

Regards,
Nick Maclaren.

John Dallman

unread,

Dec 15, 2007, 12:11:00 PM12/15/07

to

In article <fk0ufb$mvm$1...@gemini.csx.cam.ac.uk>, nm...@cus.cam.ac.uk (Nick
Maclaren) wrote:

> But that is more-or-less a fixed loss, and does not lead to the
> error building up in the same way that truncation does. To a very
> crude approximation, a higher base loses O(log(base)) bits of
> precision, but truncation loses O(sqrt(N)) where N is the number
> of cumulative operations you perform.

Just to check I understand you. The tricks that IEEE plays to get a
rounded value that is the correctly rounded value of the infinitely
precise result ... just don't exist in this floating-point format?
Excess precision is simply discarded, rather than rounded?

That seems unimaginably awful to me, but them I'm used to IEEE. Clearly
IBM convinced themselves at first that this was a good idea, but I see
why they had to introduce IEEE once it was commonplace on everything
else.

John L

unread,

Dec 15, 2007, 1:16:33 PM12/15/07

to

>Just to check I understand you. The tricks that IEEE plays to get a
>rounded value that is the correctly rounded value of the infinitely
>precise result ... just don't exist in this floating-point format?
>Excess precision is simply discarded, rather than rounded?

Yup. Oops. Originally it didn't even have guard digits but that made
the results so awful that IBM added them to the 360 architecture and
at great expense field retrofitted the machines that had already been
shipped. Dunno why they didn't retrofit rounding at the same time.

>That seems unimaginably awful to me, but them I'm used to IEEE. Clearly
>IBM convinced themselves at first that this was a good idea

They were very concerned about performance, and didn't appreciate the
accuracy issues. See page 25 in this paper by Amdahl et al.

http://www.research.ibm.com/journal/rd/441/amdahl.pdf

Also see this paper by Sweeney, which made the suggestion that hex
floating point would be faster due to fewer normalization steps. He
doesn't seem to have appreciated the accuracy issues, either.

http://domino.research.ibm.com/tchjr/journalindex.nsf/e90fc5d047e64ebf85256bc80066919c/6e770ac12c2eb02c85256bfa006859ff?OpenDocument

In fairness to the 360's designers, they got an amazing number of
things right. The floating arithmetic was the only major botch.
(Well, other than too few address bits, but that was inevitable.)

already...@yahoo.com

unread,

Dec 15, 2007, 1:29:01 PM12/15/07

to

On Dec 14, 9:04 am, Edward Reid <edw...@paleoNOTTHIS.org.NOTTHIS>
wrote:

> On Tue, 11 Dec 2007 01:42:22 -0700, Louis Krupp wrote:
>
> Floating point is another matter; IEEE floating point has real advantages.
> But how many programs are using IEEE floating point in 32-bit words? Today
> as in 1970, 32-bit floating point is OK for toy programs but not much else.

Almost exact opposite is true: 32bit floating points are perfectly
good for wast majority of today's floating point workloads - signal
and image processing, 3D rendering and other multimedia tasks.
Today's situation is radically different from the '70s when majority
of fp cycles went into solving PDEs.

Nick Maclaren

unread,

Dec 15, 2007, 1:43:11 PM12/15/07

to

In article <fk15m1$30gp$1...@gal.iecc.com>, jo...@iecc.com (John L) writes:
|> >Just to check I understand you. The tricks that IEEE plays to get a
|> >rounded value that is the correctly rounded value of the infinitely
|> >precise result ... just don't exist in this floating-point format?
|> >Excess precision is simply discarded, rather than rounded?
|>
|> Yup. Oops. Originally it didn't even have guard digits but that made
|> the results so awful that IBM added them to the 360 architecture and
|> at great expense field retrofitted the machines that had already been
|> shipped. Dunno why they didn't retrofit rounding at the same time.

Not the IBM way? Dunno. Rounding wasn't exactly new by the time the
System/360 was designed. It is possible that all of the decisions
were taken by people who didn't know anything about numerics.

|> Also see this paper by Sweeney, which made the suggestion that hex
|> floating point would be faster due to fewer normalization steps. He
|> doesn't seem to have appreciated the accuracy issues, either.
|>
|> http://domino.research.ibm.com/tchjr/journalindex.nsf/e90fc5d047e64ebf85256bc80066919c/6e770ac12c2eb02c85256bfa006859ff?OpenDocument

That may have been true when they started the design (shortage of
gates), but wasn't by the late 1960s. They may well not have
appreciated the consequences of Moore's Law.

Regards,
Nick Maclaren.

already...@yahoo.com

unread,

Dec 15, 2007, 2:00:04 PM12/15/07

to

On Dec 15, 11:23 am, Terje Mathisen <terje.mathi...@hda.hydro.com>
wrote:
> John Dallman wrote:
> > In article <1j79nh4dimobe.vl1pk2qnr0do....@40tude.net>,

> - <Terje.Mathi...@hda.hydro.com>

> "almost all programming can be viewed as an exercise in caching"

I googled around a little.
It seems Borland real has 8-bit exponent, 1-bit sign and 39-bit
mantissa. Seems like it does have hidden bit but no support for
denormals. Overall very similar to VAX F_Floating but with 16
additional bits in mantissa and different byte order.
http://www.usenet.com/newsgroups/comp.lang.pascal.borland/msg00479.html

John L

unread,

Dec 15, 2007, 2:43:06 PM12/15/07

to

>|> Yup. Oops. Originally it didn't even have guard digits but that made
>|> the results so awful that IBM added them to the 360 architecture and
>|> at great expense field retrofitted the machines that had already been
>|> shipped. Dunno why they didn't retrofit rounding at the same time.
>
>Not the IBM way? Dunno.

I think the problem was architectural. Guard digits gave them the
results that the architecture book said they were supposed to get.
Rounding would have changed the results. They'd have been better
results, but too late.

>|> Also see this paper by Sweeney, which made the suggestion that hex
>|> floating point would be faster due to fewer normalization steps. He

>|> doesn't seem to have appreciated the accuracy issues, either. ...

>
>That may have been true when they started the design (shortage of
>gates), but wasn't by the late 1960s. They may well not have
>appreciated the consequences of Moore's Law.

Since they were making these design decisions in 1963 and Moore didn't
legislate until 1965, that's understandable.

Remember that the 360 project was all about managing technology risk.
The 360 series was built out of not-quite-integrated SLT that mounted
several discrete transistors per package, because IBM wasn't confident
that they could produce working ICs soon enough. These designs were
all done in the 7090 transistor era, and the idea that you could throw
gates at the problem and use a barrel shifter was a long way in the
future.

At the time, it was a leap of faith to use 24 rather than 16 bit
addressing, and to design with expansion to 32 bits in mind. This all
had to work on a 360/30, with an 8 bit data path and a 1.5 to 2us
cycle time, where a 64 bit floating add took 65us and a floating
divide was about 1.6ms. Those trips through the shifter hurt and they
doubtless felt very clever when they figured out a cheap-looking way
to avoid them.

R's,
John

Nick Maclaren

unread,

Dec 15, 2007, 3:03:26 PM12/15/07

to

In article <fk1aoa$olp$1...@gal.iecc.com>, jo...@iecc.com (John L) writes:
|>
|> I think the problem was architectural. Guard digits gave them the
|> results that the architecture book said they were supposed to get.
|> Rounding would have changed the results. They'd have been better
|> results, but too late.

Could be.

|> Since they were making these design decisions in 1963 and Moore didn't
|> legislate until 1965, that's understandable.

The phenomenon he described was well under way by then, and there
was not excuse not to have observed it. Realising its consequences
is another matter.

|> At the time, it was a leap of faith to use 24 rather than 16 bit

|> addressing, and to design with expansion to 32 bits in mind. ...

Not all that much. The Ferranti Atlas used 24, and that was earlier.
But IBM weren't very good at noticing what happened this side of the
pond.

|> This all
|> had to work on a 360/30, with an 8 bit data path and a 1.5 to 2us
|> cycle time, where a 64 bit floating add took 65us and a floating
|> divide was about 1.6ms. Those trips through the shifter hurt and they
|> doubtless felt very clever when they figured out a cheap-looking way
|> to avoid them.

Well, they weren't the first with that, either. But, yes, I agree
that the advantages of that loomed larger then than they do now :-)

Regards,
Nick Maclaren.

Anne & Lynn Wheeler

unread,

Dec 15, 2007, 4:31:51 PM12/15/07

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> Not all that much. The Ferranti Atlas used 24, and that was earlier.
> But IBM weren't very good at noticing what happened this side of the
> pond.

here is this quote about the science center
http://www.garlic.com/~lynn/subtopic.html#545tech

justifying its virtual memory/virtual machine project:

What was most significant was that the commitment to virtual memory was
backed with no successful experience. A system of that period that had
implemented virtual memory was the Ferranti Atlas computer, and that was
known not to be working well. What was frightening is that nobody who
was setting this virtual memory direction at IBM knew why Atlas didn't
work

... snip ...

found in melinda's historical document
http://www.princeton.edu/~melinda

another item from the above:

Creasy had decided to build CP-40 while riding on the MTA. "I launched
the effort between Xmas 1964 and year's end, after making the decision
while on an MTA bus from Arlington to Cambridge. It was a Tuesday, I
believe." (R.J. Creasy, private communication, 1989.)

... snip ...

the original implementation, cp40 was on a specially modified 360/40
with virtual memory hardware. this morphed into cp67 when 360/67 (with
virtual memory) became generally available. the 360/67 supported both
24bit and 32bit addressing modes.

the folklore is that the science center got budget from the company by
telling them that it was going to be spent on a graphical interface
project ... to avoid/sidestep the political consequences of raising
awareness that the science center would be trampling on some other
organizations turf (doing some virtual memory related stuff).

i then significantly redid lots of the cp67 virtual memory
implementation when i was an undergraduate ... after cp67 had been
installed at the univ., last week jan68 ... some related posts
http://www.garlic.com/~lynn/subtopic.html#wsclock

Nick Maclaren

unread,

Dec 15, 2007, 5:59:18 PM12/15/07

to

In article <m3sl23z...@garlic.com>,

Anne & Lynn Wheeler <ly...@garlic.com> writes:
|>
|> > Not all that much. The Ferranti Atlas used 24, and that was earlier.
|> > But IBM weren't very good at noticing what happened this side of the
|> > pond.
|>
|> here is this quote about the science center
|> http://www.garlic.com/~lynn/subtopic.html#545tech
|>
|> justifying its virtual memory/virtual machine project:
|>
|> What was most significant was that the commitment to virtual memory was
|> backed with no successful experience. A system of that period that had
|> implemented virtual memory was the Ferranti Atlas computer, and that was
|> known not to be working well. What was frightening is that nobody who
|> was setting this virtual memory direction at IBM knew why Atlas didn't
|> work

Eh? When was that? The Ferranti Atlas worked extremely well, and the
Titan at Cambridge (effectively an Atlas) delivered a general purpose
(interactive) service in the mid-1960s that no production IBM system
could match until a decade later. Some dozens of users logged on,
plus batch work, and users allowed to compile and debug interactively.

Some of that was software, which is the problem we had when moving to
an IBM 370/165, but the lack of virtual memory hardware made the latter
system very hard to use for the above purpose.

Regards,
Nick Maclaren.

Anne & Lynn Wheeler

unread,

Dec 15, 2007, 7:26:57 PM12/15/07

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> Eh? When was that? The Ferranti Atlas worked extremely well, and the
> Titan at Cambridge (effectively an Atlas) delivered a general purpose
> (interactive) service in the mid-1960s that no production IBM system
> could match until a decade later. Some dozens of users logged on,
> plus batch work, and users allowed to compile and debug interactively.
>
> Some of that was software, which is the problem we had when moving to
> an IBM 370/165, but the lack of virtual memory hardware made the latter
> system very hard to use for the above purpose.

re:
http://www.garlic.com/~lynn/2007u.html#77 IBM Floating-point myths

I have no direct knowledge ... just that quote from Melinda's paper from
early science center justification from doing virtual memory, virtual
machine work.
http://www.princeton.edu/~melinda

with a little bit of work, I easily got cp67 to 35-40 users on 360/67
doing mix-mode edit, compile, execute workload with subsecond response
... at a time when tss/360 (the corporate strategic virtual memory
effort) on the same hardware couldn't get subsecond response running
four users doing effectively same workload mix.

with a little bit more work, i got it to 75-80 users on 360/67 getting
subsecond response.

Grenoble science center had 1mbyte (about 155 pageable pages after fixed
memory requirements) 360/67 and did a modified cp67 for the "working
set" dispatcher. Cambridge science center had 768kbyte (104 pageable
pages after fixed memory requirements) 360/67 (i.e. Grenoble
configuration had 50percent more real storage for paging than Cambridge
System).

Cambridge with 80 users got about the same response and thruput as
Grenoble with 35 users (both configurations running similar workload
mix).

CP67 was "officially" announce at the spring 68 SHARE meeting in Houston
and customers commingly ran 35-40 users. Some number of stuff that I had
done as undergraduate had been incorporated and shipped in the product.
Some other stuff didn't ship until vm370 timeframe when my resource
manager was released
http://www.garlic.com/~lynn/subtopic.html#fairshare

however, there was a fairly close apples-to-apples comparison between
the system running in Cambridge (modulo not having as much hardware) and
the one running in Grenoble ... except Grenoble's "working set
dispatcher" could achieve the peek workload thruput as the Cambridge
system (mainly because of the limitations of the working set dispatcher
despite running on system with more resources).

Stephen Fuld

unread,

Dec 15, 2007, 11:31:28 PM12/15/07

to

John L wrote:

snip

> In fairness to the 360's designers, they got an amazing number of
> things right. The floating arithmetic was the only major botch.
> (Well, other than too few address bits, but that was inevitable.)

I don't think it was the only botch. There were some other *biggies*.
But they did get a lot right, and the IBM marketing force was so good
that the things they got wrong didn't matter enough.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

John L

unread,

Dec 16, 2007, 4:19:35 AM12/16/07

to

>> In fairness to the 360's designers, they got an amazing number of
>> things right. The floating arithmetic was the only major botch.
>> (Well, other than too few address bits, but that was inevitable.)
>
>
>I don't think it was the only botch. There were some other *biggies*.

Well, I'll bite. What else was seriously wrong?

R's,
John

Nick Maclaren

unread,

Dec 16, 2007, 5:19:45 AM12/16/07

to

In article <m3k5nfz...@garlic.com>,

Anne & Lynn Wheeler <ly...@garlic.com> writes:
|>

|> I have no direct knowledge ... just that quote from Melinda's paper from
|> early science center justification from doing virtual memory, virtual
|> machine work.
|> http://www.princeton.edu/~melinda

Well, I think that claim is based on a mistaken analysis, and the
evidence is against it. It is perfectly possible that the Atlas had
serious trouble for some years, until the software got sorted out. I
don't know, because it was before my time and the last of my useful
contacts about that era died a year ot two back.

|> with a little bit of work, I easily got cp67 to 35-40 users on 360/67
|> doing mix-mode edit, compile, execute workload with subsecond response
|> ... at a time when tss/360 (the corporate strategic virtual memory
|> effort) on the same hardware couldn't get subsecond response running
|> four users doing effectively same workload mix.
|>
|> with a little bit more work, i got it to 75-80 users on 360/67 getting
|> subsecond response.

I can believe that. As I said, I know that the problems were largely
software. The Cambridge Titan was an older machine, less suited to
such work, and ran a mixed batch and interactive workload, so the
numbers are pretty comparable.

My assertion is not that it can't be done on a physical memory design,
but that virtual memory makes it a lot easier.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Dec 16, 2007, 5:22:56 AM12/16/07

to

The application to system interface, in several respects. Too many
complex data structures (security issues), no extracode facility
(leading to inconsistency problems later, e.g. 128-bit floating-point
divide), dire exception handling interface and so on.

To be fair, the last is a generic failure, and the previous one
was got right only on a few systems :-(

Regards,
Nick Maclaren.

Stephen Fuld

unread,

Dec 16, 2007, 8:51:28 AM12/16/07

to

Of course, these are my personal opinions and others might not share them.

1. I don't view the 24 bit addressing as a mistake in that it was a
reasonable compromise for the technology available. What was a mistake
was allowing users, and even IBM itself to use the upper eight bits
(instead of requiring them to be zero) which made going to larger
address spaces very painful. There are still, to this day in MVS,
things that have to be "below the line" - at addresses below 16 MB.

2. The lack of relocation hardware (not necessarily virtual memory -
there were other solutions), that made programs essentially impossible
to move. This made swapping of programs almost useless as they had to
come back to the same physical location as they left, and made the
introduction of time sharing a real nightmare. (I tried to use TSO - it
really was an *option* on s S/360 and it was *ugly*)

3. Having only a 12 bit offset in the instructions that necessitated all
the BALR Using stuff where user programs had to mess around with loading
multiple base registers.

5. I have spoken before of the issues with doing true variable length
records on disk drives. Everyone, including, I believe, IBM's older
machines used fixed length records. Again, The whole CKD mess still
haunts IBM today where it has to be emulated on fixed length record drives.

6. The addressing scheme that puts substantial chunks of the OS within
each user's address space. Lynn has talked of the problems that has
caused and the measures that IBM has had to invent to get around it.
Imagine a scheme where you can't add code to the OS to fix a bug because
that would expand its address space requirement which would require all
user programs to get smaller. And, of course, since users could address
the OS, they got into the habit of using OS data structures for there
own purposes which limited IBM's ability to change them later on. At
least two of IBM's competitors avoided this pitfall.

In general, many of these fall into the category of not hiding enough of
the physical machine from the users so that they could/had to know too
much which limited what could be done in the future.

John Dallman

unread,

Dec 16, 2007, 10:09:00 AM12/16/07

to

In article <A3a9j.268996$kj1.1...@bgtnsc04-news.ops.worldnet.att.net>,
S.F...@PleaseRemove.att.net (Stephen Fuld) wrote:

> In general, many of these fall into the category of not hiding enough
> of the physical machine from the users so that they could/had to know
> too much which limited what could be done in the future.

Ah ... you've just made sense for me of the reasons why VM (the
operating system) is always described as offering a complete machine to
each user. Because, for the reasons you described, that's more necessary
with this architecture than it is with one that offer better hiding.

Anne & Lynn Wheeler

unread,

Dec 16, 2007, 10:46:00 AM12/16/07

to

Stephen Fuld <S.F...@PleaseRemove.att.net> writes:
> 2. The lack of relocation hardware (not necessarily virtual
> memory -
> there were other solutions), that made programs essentially impossible
> to move. This made swapping of programs almost useless as they had to
> come back to the same physical location as they left, and made the
> introduction of time sharing a real nightmare. (I tried to use TSO -
> it really was an *option* on s S/360 and it was *ugly*)

re:

http://www.garlic.com/~lynn/2007u.html#77 IBM Floating-point myths

http://www.garlic.com/~lynn/2007u.html#79 IBM Floating-point myths

there was a separate issue with os/360 system convention ... as opposed
to 360 hardware.

os/360 system convention was that program images on disk had something
called "relocatable address constants" ... and as part of fetching the
program image into memory (real or virtual) ... the "relocatable address
constants" would be swizzeled to absolute address.

contrast this with tss/360 ... built for virtual memory 360/67 machine
... had their "relocatable address constants" as separate structures
from the program image. this allowed the same program image to appear at
different locations in different virtual address spaces ... with the
"relocatable address constant" structure adjusted appropriately for a
specific image (as previously mentioned, tss/360 got lots of other
things wrong ... including lots of thruput/performance)

As os/360 added virtual memory support and morphed into MVS ... it
maintained its "relocatable adcon" implementation. This required that as
part of program loading, the loader had to run thru the (uniquely
loaded) program image and finding and modifying all the (somewhat
randomly distributed) relocatable adcons. This resulted in heavy
initialization for program loading (prefetching and modifying all the
virtual pages containing relocatable adcons). It also precluded any
program loader implementation that could leverage page-mapped filesystem
and/or easily sharing common program image across multiple virtual
address space (i.e. say by leverage segmentation hardware).

CMS (cambridge monitor system, later conversational monitor system) was
developed at the science center, originally with cp40, and then moving
along with the morph of cp40 to cp67 and then to vm370. It could be
consider similar to the genre of "virtual appliance" activity associated
with current day virtualization activity.

CMS provided os/360 compatibility simulation that was heavily used to
run os/360 applications and programs ... and suffered the same overhead
of hitting and modifying every program image virtual pages containing
the (somewhat randomly distributed) relocatable adcons. CMS did have a
feature to save a fixed-address program image (after all the relocatable
adcons had been swizzled) ... and then do "fast" program reload.

In the early 70s, I had implemented a CMS (virtual memory) paged-mapped
filesystem ... some old references
http://www.garlic.com/~lynn/subtopic.html#mmap

and leveraged the virtual memory segment hardware to support page-mapped
"loading" of common program image (or any file) shared across multiple
different (CMS) virtual address spaces ... even the same image appearing
at different addresses in different virtual address spaces.

However, the (os/360) relocatable address constant convention gave me
fits, i had the choice between page-mapped loading program image

1) before the relocatable adcons had been swizzled ... the swizzling
would modify the image and preclude it being shared across multiple
virtual spaces

2) after the relocatable adcons had been swizzled ... the images could
be shared ... but were forced to appear at the same address in every
virtual address space.

So I developed a relative address hack ... redoing some amount of
program source so I could page-map load common program image (across
multiple virtual address space) and allow it to be loaded at whatever
free address/segment was available in any virtual address space. Lots of
past post mentioning difficulty of doing relative address hack (despite
any limitations in the underlying hardware)
http://www.garlic.com/~lynn/subtopic.html#adcon

for other topic drift ... there were some number of companies spun off
in the cp67 and early vm370 timeframe to offer (virtual machine based)
commercial timesharing
http://www.garlic.com/~lynn/subtopic.html#timeshare

Anne & Lynn Wheeler

unread,

Dec 16, 2007, 11:23:28 AM12/16/07

to

Stephen Fuld <S.F...@PleaseRemove.att.net> writes:
> In general, many of these fall into the category of not hiding enough
> of the physical machine from the users so that they could/had to know
> too much which limited what could be done in the future.

re:

http://www.garlic.com/~lynn/2007u.html#77 IBM Floating-point myths
http://www.garlic.com/~lynn/2007u.html#79 IBM Floating-point myths

http://www.garlic.com/~lynn/2007u.html#81 IBM mainframe history, IBM Floating-point myths

one of the more complicated areas is I/O ... and the mainframe I/O
"channel programs". The I/O channels execute "channel programs" using
real addresses.

cp67, to provide virtual machine simulation had to intercept I/O
activation, and scan the associated virtual machine's channel program,
create a shadow copy of it, fetch each specified virtual page and fix in
real memory (until the i/o operation has completed) ... and then
activate the shadow copy channel program for execution.

OS/360 had a convention where applications built "real" channel programs
(mostly done by called library routines) which were then passed to the
OS/360 kernel for activation. In the transition of OS/360 to a virtual
memory environment (when hardware became generally available on 370s),
it had to 1) build virtual address space tables, 2) handle page faults
and paging operations, and 3) translate the passed application channel
programs ... in manner similar to that described for cp67.

in fact, the initial os/360 virtual memory implementation ... borrowed
the channel program translation code (CCWTRANS) from cp67.

These days if you have MVS running in a VM virtual machine, MVS will
translate the application's channel program, activating the translated
copy/shadow channel program; VM hypervisor will then intercept the
activation and perform the translation all over again.

Early on, CP67 provided custom hypervisor interfaces ... several of
which were tailored to CMS operation to reduce virtual machine emulation
overhead. I had done a flavor of one while an undergradate to
drastically cut down on the (translation) overhead associated with doing
CMS file i/o operations. However, this retained the virtual->real
address translation metaphor (although the pathlength to scan, copy, and
translate was drastically reduced).

I completely eliminated that overhead in the early 70s, when i did
a hypervisor API that supported page-mapped filesystem operations
http://www.garlic.com/~lynn/subtopic.html#mmap

also mentioned in this previous post
http://www.garlic.com/~lynn/2007u.html#81 IBM mainframe history, IBM Floating-point myths

I/O continues to be one of the major issues in virtual machine
implementations.