64-BIT COMPUTING - BYTE MAGAZINE , SEPT 1991 , MASHEY

John Mashey

unread,

Dec 20, 2003, 12:07:56 AM12/20/03

to

<as promised, here's the original. I'll follow up (in couple weeks) with a
12-years-later rerospective summarizing what happened later.

There are a few typos in this, but most of it was pretty close,
unsurprising as lots of people knew all this; I just wrote it down.

In mid-1991, I somehow :-) missed predicting that true 64-bit micros:
a) Would rapidly become crucial to the Internet, i.e., via
many CISCO routers.
b) Would ship in low cost videogames (Nintendo N64) by 1996.
c) Would end up in disk controllers, wireless chips,
set top boxes, communications processors, laser printers, etc.>

======================================================================
64-BIT COMPUTING

What is a 64-bit microprocessor?
Why would you want one, especially in a personal system?

John R. Mashey
BYTE Magazine - September 1991, 135-142.
========================================
Today's most popular computers are built around 32-bit microprocessors.
The next generation of chips - 64-bit microprocessors - will bring
even more power to the desktop.

But what does it mean to call a chip 64-bit? It's easy to get confused,
because different numbers of bits are used in different parts of a
microprocessor. (see text box "What's in a Chip?" on page 138).
Although the MIPS R4000 is currently the only 64-bit microprocessor,
64 bits is almost certainly a coming trend. At microprocessor
conferences, sessions on the future of chip technology
routinely predict widespread use of true 64-bit microprocessors
by 1995 or earlier.

You may be thinking, "My PC software still runs in 16-bit mode and
it will be years before the software catches up with 32 bits.
But 64 bits? People who predict widespread use of true 64-bit
microprocessors by 1995 must be raving lunatics!"

There are two reasons for the prediction: 64-bit integer processing
and convenient use of more than 32 bits of address space.
The first reason is a straightforward performance issue;
the second has more widespread implications. As you'll see,
applications for 64-bit microprocessors exist for both servers
and desktops.

CPU ARCHITECTURES

When it comes to CPU architectures, it helps to distinguish
between Instruction Set Architecture, which presents an
assembly language programmer's view of a processor, and hardware
implementations of that ISA. Successful ISAs persist unchanged
or evolve in an upward-compatible direction for years. Distinct
implementations are often built to yield different cost/performance
points. At times people get confused about the difference between ISA
and implementation sizes. Table 1 may help clear up the confusion.

In figure 1, the CPU's integer registers are R bits wide. Address
arithmetic starts with R bits, either producing a virtual address size
of V bits (V is the generated user address, V <= R) or using a
segment register to expand R bits to V bits. The memory management
unit translates V bits of virtual address to A bits of physical address
that are actually used to access memory. For each access, up to D bits
are transferred (i.e., the data bus is D bits wide). For user-level
programs, R and V are programmer-visible properties of the ISA; A and D
are usually less-visible implementation-specific characteristics.
(Floating-point register size is almost always 64 or 80, and so is
not included.)

========================================================================
Figure 1. Efficient address arithmetic is limited by the integer
register width (R). Actual memory access is limited by the virtual
address size (V) and address bus width (A). Efficient memory I/O
size is limited by the data bus width (D).

Table 1 lists numbers for well-known computer families. For simplicity,
V is only given for user-level programs. The table shows that
physical address size (A) and data bus size can vary within a processor
family. The IBM S/360 family included five data bus sizes (8 to 128 bits);
the 32-bit Intel 386 is sold in two sizes - 32 and 16.

========================================================
Table 1: The size that a microprocessor is called is generally the
integer register size.

CPU ISA Hardware
Characteristics implementation
Year Integer Gen'd Phys Data
Released register user addr bus
Size size addr size size
Called (R) (V) (A) (D)
--------------------------------------------------------
DEC PDP-11/45 1973 16 16 16* 18 32
DEC PDP-11/70 1976 16 16 16* 22 32
DEC VAX 11/780 1978 32 32 31 32 64

IBM S/360 1964 32 32 24 24 8-128
IBM S/370XA 1983 32 32 31 32 128
IBM ESA/370 1988 32 32 31* 32 128

IBM RISC 1990 32 32 32* 32 64-128
System/6000

HP Precision 1986 32 32 32* 32 32-64

Intel 386DX 1985 32 32 32* 32 32
Intel 386SX 1987 32 32 32* 24 16
Intel 860 1989 64 32 32 32 64
Intel 486DX 1989 32 32 32* 32 32
Intel 486SX 1991 32 32 32* 32 32

MIPS R2000 1986 32 32 31 32 32
MIPS R4000 1990 64 64 40-62 36 64

Motorola 68000 1980 32 32 24 24 16
Motorola 68020 1985 32 32 32 32 32
Motorola 68030 1987 32 32 32 32 32
Motorola 68040 1990 32 32 32 32 32

Sun SPARC 1987 32 32 32 36 32-64
========================================================
* These processors use some form of segmentation to obtain
more bits of user address space when necessary.

BETTER PERFORMANCE WITH BIGGER INTEGERS

For years, PDP-11 Unix systems have used 16-bit integers
for most applications, as do many PCs. Sometimes performance
can improve merely by switching to larger integers. Integer
code has proved resistant to recent speedup techniques
that have greatly helped floating-point performance,
so any integer improvement is welcome. Some applications
for 64-bit integers are the following:

- Long strings of bits and bytes.
By using 64-bit instead of 32-bit integers, some programs
may run up to twice as fast. First, operating systems
often spend 10 percent to 20 percent of their time zeroing
memory or copying blocks of memory; often, doubling the integer
size can help these operations. Second modern global optimizing
compilers spend a great deal of time performing logical
operations on long bit vectors, where 64-bit integers nearly
double the speed. Third, the increasing disparity between
CPU and I/O device speed is increasing the use of
compression/decompression methods, some of which rely on the
main CPU, where 64 bits may be helpful.

- Graphics.
Graphics operations are a special, but important, case
of the long bit-and-byte-string problem. Using 64-bit integer
operations can speed the work required by raster graphics.
The increase in performance is especially true for large-
area operations like scrolling and area-fill, where
performance may approach a full two times that of a 32-bit CPU.
This approach helps raise the graphics performance of a
minimal-cost-design - a CPU plus a frame buffer but without
graphics-support chips.

- Integer arithmetic.
Most chips make addition and subtraction of multiprecision
integers (i.e., 64-bit, 96-bit, 128-bit, etc) reasonably
fast, but multiplication and division are often quite slow.
Cryptography is a heavy user of multiple-precision
multiples and divides. Financial calculations could use
integer arithmetic; 32-bit integers are far too small,
but 64-bit integers are easily big enough to represent
objects like the US national debt or Microsoft's
annual revenue to the penny.

BIG-TIME ADDRESSING

Perhaps more important than using 64-bit integers for performance
is the extension of memory addressing above 32 bits,
enabling applications that are otherwise difficult to program.
It is especially important to distinguish between virtual
addressing and physical addressing.

The virtual addressing scheme often can exceed the limits of
possible physical addresses. A 64-bit address can handle literally
a mountain of memory: Assuming that 1 megabyte of RAM requires
1 cubic inch of space (using 4-megabit DRAM chips), 2**64 bytes
would require a square mile of DRAM piled more than 300 feet high!
For now, no one expects to address this much DRAM, even with
next-generation 16-Mb DRAM chips, but increasing physical memory
slightly beyond 32 bits is definitely a goal. With 16-Mb DRAM
chips, 2**32 bytes fits into just over 1 cubic foot (not including
cooling) - feasible for deskside systems.

An even more important goal is the increase of virtual addresses
substantially beyond 32 bits, so you can "waste" it to make
programming easier - or even just possible. Although this goal is
somewhat independent of the physical memory goal, the two are related.

Database systems often spread a single file across several disks.
Current SCSI disks hold up to 2 gigabytes (i.e., they use 31-bit
addresses), Calculating file locations as virtual memory
addresses requires integer arithmetic. Operating systems are
accustomed to working around such problems, but it becomes unpleasant
to make workarounds; rather than just making things work well,
programmers are struggling just to make something work.

The physical address limit is an implementation choice that is often
easier to change than the virtual address limit. For most computers,
virtual memory limits often exceed physical limits, because the
simplest, cheapest way to solve many performance problems is to
add physical memory. If the virtual limit is much smaller than
the physical limit, adding memory doesn't help, because software
cannot take advantage of it. Of course, some processors use segmentation
schemes to extend the natural size of the integer registers until
they are equal to or greater than the physical address limit.

THE MAINFRAME, MINICOMPUTER, AND MICROPROCESSOR

Reflect on this aphorism:

Every design mistake gets made at least three times:
once by mainframe people, once by minicomputer people,
and then at least once by microprocessor people.

An illustrative sequence is found among IBM mainframes, DEC
superminicomputers, and various microprocessors.

IBM S/360 mainframes used 32-bit integers and pointers but computed
addresses only to 24 bits, thus limiting virtual (and physical) memory
to 16 MB (see reference 1). This seemed reasonable at the time, as systems
used core memory, not DRAM chips. A "large" mainframe (such as a 360/75)
provided at most 1 MB of memory, although truly huge mainframes (360/91)
might offer as much as 6 MB. In addition, most S/360s did not support
virtual memory, so user programs generated physical addresses directly.
There was little need to consider addresses larger than the physical
address size. Although it was unfortunate that only 16MB was addressable,
it was even worse to ignore the high-order 8 bits rather than trap on
non-zero bits. Assembly language programmers "cleverly" took advantage
of this quirk to pack 8 bits of flags with a 24-bit address pointer.

As memory became cheaper, the "adequate" 16-MB limit clearly became
inadequate, especially as virtual addressing S/370s made it possible to run
programs larger than physical memory. By 1983, 370-XA microprocessors
added a 31-bit addressing mode for user programs but were required to
retain a 24-bit mode for upward compatibility. Much software had to be
rewritten to work in the 31-bit mode. I admit I was one of those "clever"
programmers and was somewhat surprised to discover that a large program
I wrote in 1970 is still running on many mainframes - in 24-bit
compatibility mode, because it won't run any other way. "The evil
that men do lives after them, the good is oft interred with their bones."

By the mid-1980s, 31-bit addressing was also viewed as insufficient for
certain applications, especially databases. ESA/370 was designed with a
form of segmentation to allow code to access multiple 2-gigabyte regions
of memory, although it took tricky programming to do so.

In the minicomputer phase of this error, the DEC PDP-11 was a 16-bit
minicomputer. Unfortunately, a single task addressed only 64 kilobytes
of data and perhaps 64 KB of instructions. Gordon Bell and Craig Mudge
wrote, "The biggest and most common mistake that can be made in computer
design is that of not providing enough address bits for memory addressing
and management. The PDP-11 followed this hallowed tradition of skimping
on address bits, but was saved on the principle that a good design can
evolve through at least one major change. For the PDP-11, the limited
address space was solved for the short run, but not with enough finesse
to support a large family of minicomputers. This was indeed a costly
oversight." (See reference 2.)

Some PDP-11/70 database applications rapidly grew awkward on machines
with 4 MB of memory that could only be addressed in 64-KB pieces, requiring
unnatural acts to break up simple programs into pieces that would fit.
Although the VAX-11/780 was not much faster than the PDP-11/70, the
increased address space was such a major improvement that it essentially
ended the evolution of high-end PDP-11s. In discussing the VAX-11/780,
William Strecker wrote, "For many purposes, the 65-Kbyte virtual address
space typically provided on minicomputers such as the PDP-11 has not been
and probably will not continue to be a severe limitation. However, there
are some applications whose programming is impractical in a 65-Kbyte
address space, and perhaps more importantly, others whose programming is
appreciably simplified by having a large address space." (See reference 3.)

Finally, we come to microprocessors. The Intel 8086 was a 16-bit
architecture and thus, likely to fall prey to the same issues as the PDP-11.
Fortunately, unlike the PDP-11, it at least provided a mechanism for
explicit segment manipulation by the program. This made it possible for
a single program to access more than 64 KB of data, although it took
explicit action to do so. Personal computer programmers are familiar with
the multiplicity of memory models, libraries, compiler flags, extenders,
and other artifacts needed to deal with the issues.

The Motorola MC68000 started with a more straightforward programming model,
since it offered 32-bit integer registers and no segmentation. However,
by ignoring the high 8 bits of a 32-bit address computation,
it repeated the same mistake made 15 years earlier by the IBM S/360.
Once again, "clever" programmers found uses for those bits, and when the
MC68020 interpreted all 32 bits, programs broke. BYTE readers may
recall problems with some applications when moving from the original
Macintosh to the Mac II.

THE NEED FOR BIG COMPUTERS

Two common rules of thumb are that DRAM chips gets four times bigger
every three years and that virtual memory usage grows by a factor of
1.5 to 2 per year (see reference 4). Additional memory is often the
cheapest and easiest solution to performance problems, but only if
software can easily take advantage of it.

As the natural size of code and data reaches and then exceed some virtual
address limit, the level of programming pain increases rapidly, because
programmers must use more and more unnatural restructuring. If the
virtual address limit is lower than the physical limit, it is especially
irritating, since buying DRAM won't do you any good. Fortunately, the
virtual address limit is typically larger than the physical limit,
so programs may work but perhaps run slowly. In this case, you can
at least add physical memory until performance becomes adequate.

There is no definite ratio between maximum task virtual-address limits
and physical address limit. Conversations with many people have
convinced me that a 4-to-1 ratio is reasonable (i.e., you will actually
see practical programs four times bigger than physical memory) if the
operating system can support them. Some people claim that a ratio of
4 to 1 is terribly conservative and that advanced file-mapping
techniques (as in Multics or Mach) use up virtual memory much faster
than physical memory. Certainly, in the process of chip design and
simulation at Mips Computer Systems, some of our 256-MB servers
routinely run programs with virtual images that are four to eight
times larger (1 to 2 gigabytes). Several companies (including Mips)
already sell desktops with 128 MB of memory. With 16-Mb DRAM chips,
similar designs will soon hit 512 MB - enough to have programs that
could use at least 4 gigabytes of virtual memory.

32-BIT CRISIS IN 1993

Consider the history of microprocessor-based servers from Mips Computer
Systems and Sun Microsystems. Figure 2 shows that the 32-bit limit will
become an issue even for physical memory around 1993 or 1994.

As soon as 16-Mb DRAM chips are available, some microprocessors will be
sold with 2 to 4 gigabytes of main memory - in fact, just by replacing
memory boards in existing cabinets. You may now be convinced that Sun
and Mips designers must be crazy to think of such things; but if so, they
have plenty of company from others, like those at Silicon Graphics,
Hewlett Packard, and IBM. Keeping pace with DRAM growth requires
appropriate CPU chips in 1991 so that tools can be debugged in 1992
and applications debugged by 1993 or 1994 - barely in time.

========================================================================
HITTING THE 32-BIT LIMIT
Figure 2: The memory sizes of a Mips machine and a Sun machine, year
by year, using a logarithmic scale. The data points fall on a straight
line, gaining 2 bits every 3 years, as they naturally follow DRAM curves.
The top line shows virtual memory size at four times the maximum physical
memory size, hinting that large leading-edge applications may already be
pushing 32-bit limits in 1991 (and they are). The line below shows
physical memory size at 50 percent of maximum size. Vendors actually
sell a substantial number of such machines.

[I can't draw it here: it has a vertical size in number of bits,
with a band of points going from lower left to upper right.]

1991: 32-bit trouble for leading-edge systems
1994: 32-bit trouble for many systems
=========================================================================

WHY SO MUCH MEMORY?

Finally, look at applications that put pressure on the size of virtual
memory addressing. To handle virtual memory greater than 32 bits, you
either need segmentation of 64-bit integer registers.

Why 64 and not something smaller, like 48? It is difficult to introduce a
new architecture that runs the C language poorly. C prefers byte-addressed
machines whose number of 8-bit bytes per word is a power of 2. The use of
6 bytes per word requires slow addressing hardware and breaks many C
programs, so 64 is the next step after 32.

Segmentation may or may not be an acceptable solution, but there is
insufficient space here to debate the relative merits. Suffice it to
say that many people with segmentation experience consider it
a close encounter of a strange kind.

The following applications tend to consume virtual memory space quickly
and generally prefer convenient addressing of large memory space,
whether it's contiguous or sparse.

- Databases.
Modern operating systems increasingly use file mapping, in which an entire
file is directly mapped into a task's virtual memory. Since you can leave
empty space for the file to grow, virtual memory is consumed much faster
than physical memory. As CPUs rapidly increase their performance relative
to their disk-access speeds, disk accesses are often avoided by keeping
the disk blocks in large DRAM cache memories. Database managers on
mainframes have long felt the pressure here, as many installations are
already above 2**40 bytes. Distributed systems designs often use some bits
of the address as a system node address, whether it's contiguous or sparse.

- Video.
For uncompressed video, a 24-bit color, 1280-by-1024-pixel screen needs
3.75 MB of memory. At 24 frames per second, 4 gigabytes of memory is
consumed by only 45 seconds of video.

-Images.
At 300 dots per inch, a 24-bit-color, 8 1/2 by 11-inch page used 25 MB,
so 4 gigabytes is filled by 160 of these pages. Databases of such
objects get large very quickly.

-CAD.
CAD applications often include large networks of servers and desktops,
in which the servers manage the databases and run large simulations.
They naturally can make use of 64-bit software. Desktops navigate through
the huge databases, and although they are not likely to map in as much
data at one time as the servers, software compatibility is often desirable.

-Geographic information systems.
These systems combine maps, images, and other data and have most of the
stressful characteristics of video, CAD, and GIS.

-Traditional number crunching.
Of course, technical number-crunching application developers have never
been satisfied with any memory limits on any machine that exists.

ON THE DESKTOP?

Perhaps you now believe that 64-bit servers may be reasonable, but you
still wonder about the desktop. Table 2 lists the application areas
discussed, showing whether the primary use of 64-bit systems is for speed
(either in desktop or server); for addressing large amounts of data
simultaneously; or for using software in a desktop system identically
to its use in a server but with less actual data. Such compatibility
is likely to be crucial for CAD applications but is also important for
others, if only to get software development done.

===========================================================================
APPLICABILITY OF 64 BITS
Table 2: The applicability of 64 bits differs for servers and desktop systems.

Server Workstation
---------------- ----------------
Application Speed Addressing Speed Compatibility
----------------------------------------------
Byte pushing X X
Graphics X
Big integers X X X
Database X X
Video X
Image X X
CAD X X
GIS* X X
Number crunch X X

* Geographic information systems
==========================================================================

For most readers, 64 bits is likely to be most important as an enabling
technology to bring powerful new applications to the desktop. The history
of the computing industry, especially of personal computers, shows there
is some merit to thinking ahead.
Some of us remember when a 640-KB limit was considered huge.

As 64-bit systems become available, some of the number-crunching people
will recompile their FORTRAN programs immediately, and some other
developers will start working in this direction. However, I'd expect
only a small fraction of applications to jump to 64 bits quickly. For
example, I do not expect to see 64-bit word processors soon.
[Editor's note: However, see "ASCII Goes Global," July BYTE.]
As a result, and important part of 64-bit chip and software design is
the ability to mix 32-bit and 64-bit programs on the same system.

Although 64-bit applications may be relatively few, some are absolutely
crucial and some are indirectly important to many people. You've probably
seen vendors' predictions of huge numbers of transistors per chip over
the next few years. Although you may not do electrical CAD yourself,
you may buy a system with those big chips; so, somewhere people will be
running programs to simulate those big chips, and those programs are huge.

I often give talks that compare computers to cars, using the CPU chip as
the engine, exception handling as the brakes, and so forth.
What kind of car is a 64-bit computer? Think of it as a car with a
four-wheel drive that you engage when necessary for better performance,
but especially when faced with really tough problems, like driving up
mountainsides. You wouldn't engage four-wheel drive to go to the
grocery store, but when you'd need it, you'd need it very badly.
Some people already have problems that require 64-bit processing,
and soon more will. The necessary vehicles - 64-bit microprocessors -
are on the way.

REFERENCES

1. Prasad, N.S. IBM Mainframes: Architecture and Design. New York:
McGraw-Hill, 1989.

2. Bell, C. Gordon, and J. Craig Mudge. "The Evolution of the PDP-11."
In Computer Engineering: A DEC View of Computer System Design, edited
by C. Gordon Bell, J. Craig Mudge, and John E. McNamara. Bedford, MA:
Digital Press, 1978.

3. Strecker, William D. "VAX-11/780: A Virtual Address Extension to the
DEC PDP-11 Family."
In Computer Engineering: A DEC View of Computer System Design, edited
by C. Gordon Bell, J. Craig Mudge, and John E. McNamara. Bedford, MA:
Digital Press, 1978.

4. Hennessy, John L., and David A. Patterson, Computer Architecture:
A Quantitative Approach. San Mateo CA: Morgan Kaufmann, 1990.

++++++++++++++SIDEBAR+++++++++++++++++++++++++++++++++++++++++++++++++++++
What's in a Chip?
Kenneth M. Sheldon

The first personal computers were built using microprocessors with integer
registers that were 8 bits wide, so they were called 8-bit chips and 8-bit
systems. Microprocessors with 16-bit registers went into 16-bit systems,
and so on. Life was simple.

The problem is, data flows to and from those registers over pathways, or
buses, that are often not the same width as the registers. The data bus
(which carries data to and from external locations, such as memory chips)
and the address bus (which carries the location of the data) may be smaller
or larger than the registers, depending on design considerations.
This makes it somewhat confusing to decide what to call a chip.

For example, the original IBM PC had an 8088 chip with 16-bit registers,
but it has an 8-bit data bus and a 20-bit address bus. Its fraternal twin -
the 8086 - was the same but with a 16-bit data bus. (If the 8088 were
released today, Intel would probably call it the 8086SX; the 386SX is a
32-bit chip that's similar to the 386DX, but it has a 16-bit data bus and a
24-bit address bus rather than full 32-bit buses.

Similarly, all the members of the Motorola 680x0 family have 32-bit
registers. However, while the 68020, 68030, and 68040 have full 32-bit
data and address buses, the original 68000 has a 16-bit data bus and a
24-bit address bus.

Currently, the only microprocessor that uses 64-bit registers is the
Mips R4000, The R4000 has a 36-bit address bus, which allows it to
address up to 64 gigabytes of data.

For the sake of accuracy and consistency, the main article refers to
microprocessor size based on the width of the microprocessors'
respective internal registers, without regard to the width of their
data or address buses.
---------------------
Kenneth M. Sheldon is a senior article for BYTE.
You can reach him on BIX as "ksheldon."
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Anton Ertl

unread,

Dec 20, 2003, 4:55:23 AM12/20/03

to

old_sys...@yahoo.com (John Mashey) writes:
>Address
>arithmetic starts with R bits, either producing a virtual address size
>of V bits (V is the generated user address, V <= R) or using a
>segment register to expand R bits to V bits. The memory management
>unit translates V bits of virtual address to A bits of physical address
>that are actually used to access memory. For each access, up to D bits
>are transferred (i.e., the data bus is D bits wide). For user-level
>programs, R and V are programmer-visible properties of the ISA;

V is hardly application-programmer visible in architectures like
Alpha, AMD64 and most other 64-bit architectures. The CPU designers
could add another level of page tables in the next version, adding
another 10 bits or so to V, and with appropriate OS support nearly all
applications would run on the new version without even noticing the
difference.

Fortunately the architects of the 64-bit CPUs did not repeat the
IBM360 and 68K mistake of ignoring the currently-unused bits.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

John Mashey

unread,

Dec 20, 2003, 3:33:55 PM12/20/03

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote in message news:<2003Dec2...@a0.complang.tuwien.ac.at>...

> old_sys...@yahoo.com (John Mashey) writes:
> >Address
> >arithmetic starts with R bits, either producing a virtual address size
> >of V bits (V is the generated user address, V <= R) or using a
> >segment register to expand R bits to V bits. The memory management
> >unit translates V bits of virtual address to A bits of physical address
> >that are actually used to access memory. For each access, up to D bits
> >are transferred (i.e., the data bus is D bits wide). For user-level
> >programs, R and V are programmer-visible properties of the ISA;
>
> V is hardly application-programmer visible in architectures like
> Alpha, AMD64 and most other 64-bit architectures. The CPU designers
> could add another level of page tables in the next version, adding
> another 10 bits or so to V, and with appropriate OS support nearly all
> applications would run on the new version without even noticing the
> difference.
>
> Fortunately the architects of the 64-bit CPUs did not repeat the
> IBM360 and 68K mistake of ignoring the currently-unused bits.
>
> - anton

1) If V <= R, it is visible, in the sense that its behavior is part
of the contract with the user, whereas the "A" size usually isn't at
all visible to the user. It's just that the nature of the contract
changed (in R4000, Alpha, etc) to guarantee some minimum number V1,
but make sure there was no temptation to use the currently-unused bits.
Note, of course that this made it possible, that at some point, when
the actually-implemented Vn was larger, that you could be running a
program that would not work on the earlier machines. R4000s had
a 40-bit V, i.e., limited to 1 Terabyte of virtual address space.
You'd think that would hold people for a while, and it did,
but later chips raised the limits, of necessity. Machines with
1TB of shared-addressing main memory appeared no later than 2000/2001
[and maybe earlier, if someone bought a Cray T3E that size].
NASA Ames has run individual problems that used all the memory.

These days, it's easy enough to get 256GB in one rack, and if you believe
my 4:1 rule, that means that even a 1-rack machine could start finding
1TB limits uncomfortable ... although at that size, and for the kinds of
apps they're likely to be doing, I'd guess 4:1 would only apply if
there is some sparse addressing. Nobody in their right mind is going to
run a big computational-intense, ~TB-size code that pages a lot!

2) This was discussed in a Sept 2, 1991 posting you get via:
Google: mashey r4000 virtual address size group:comp.arch.*
[the whole thread is relevant, but that particular one notes that
some people wanted us to ignore some of the high-order bits so that
they could be used as flags. NO WAY, NOT AGAIN.]

3) It wasn't "fortunate", in the sense of luck, it was that people
finally learned. MIPS folks knew this early [I'd done S/360, PDP-11,
and 68K work before MIPS, hence the specific examples I used],
and DEC people certainly knew it, after the address-bits problem of
the PDP-11.

Roger Larsson

unread,

Dec 20, 2003, 7:32:52 PM12/20/03

to

John Mashey wrote:

> [the whole thread is relevant, but that particular one notes that
> some people wanted us to ignore some of the high-order bits so that
> they could be used as flags. NO WAY, NOT AGAIN.]
>

Wouldn't it be possible to use some LSB instead of some MSB?
Data usually is aligned anyway...

/RogerL

--
Roger Larsson
Skellefteĺ
Sweden

John Dallman

unread,

Dec 20, 2003, 10:28:00 PM12/20/03

to

In article <ce9d692b.03122...@posting.google.com>,
old_sys...@yahoo.com (John Mashey) wrote:

> 2) This was discussed in a Sept 2, 1991 posting you get via:
> Google: mashey r4000 virtual address size group:comp.arch.*
> [the whole thread is relevant, but that particular one notes that
> some people wanted us to ignore some of the high-order bits so that
> they could be used as flags. NO WAY, NOT AGAIN.]

And there are idiots out there who claim, from their two years experience
of Windows programming, that there's no need to go to a bigger address
space than 32-bit, because segmentation would work fine, and be
beautifully easy to program. Humans!

---
John Dallman j...@cix.co.uk
"Any sufficiently advanced technology is indistinguishable from a
well-rigged demo"

Anton Ertl

unread,

Dec 21, 2003, 4:48:49 AM12/21/03

to

old_sys...@yahoo.com (John Mashey) writes:
[V: number of virtual address bits]

>1) If V <= R, it is visible, in the sense that its behavior is part
>of the contract with the user,

Maybe part of the contract with the kernel programmer, but (for the
usual OSs) not part of the contract with the user-level programmer.
E.g., Linux on an EV68 lets me allocate 2047GB in one chunk (with an
appropriate setting of /proc/sys/vm/overcommit_memory), but fails with
"Cannot allocate memory" when I ask for 2048GB. I can guess that this
is related to V (which is 43 AFAIK), but there is no way to be sure of
that; it could be just an arbitrary restriction in the OS.

Anton Ertl

unread,

Dec 21, 2003, 5:02:23 AM12/21/03

to

Roger Larsson <roger....@norran.net> writes:
>John Mashey wrote:
>
>> [the whole thread is relevant, but that particular one notes that
>> some people wanted us to ignore some of the high-order bits so that
>> they could be used as flags. NO WAY, NOT AGAIN.]
>>
>
>Wouldn't it be possible to use some LSB instead of some MSB?
>Data usually is aligned anyway...

Depends on how many bits you want. People who implement programming
languages with tagged data (including tagged pointers) usually look at
different ways of encoding the tags in the word, and the related
decoding speed: in the bottom bits, split between bottom and top, at
the top with addresses shifted, etc.

Jeff Kenton

unread,

Dec 21, 2003, 8:57:21 AM12/21/03

to

Roger Larsson wrote:
> John Mashey wrote:
>>[the whole thread is relevant, but that particular one notes that
>>some people wanted us to ignore some of the high-order bits so that
>>they could be used as flags. NO WAY, NOT AGAIN.]
>
> Wouldn't it be possible to use some LSB instead of some MSB?
> Data usually is aligned anyway...

If you want a tagged memory architecture, build it explicitly. This kind of
hackery will bite you in the end. There's always a cost to this. If you need
it badly enough, build it into the hardware and get it exactly the way you
want it, with enough tag bits to do the job right. (Question: how many tag
bits would you want if you were designing it from scratch?)

--

-------------------------------------------------------------------------
= Jeff Kenton Consulting and software development =
= http://home.comcast.net/~jeffrey.kenton =
-------------------------------------------------------------------------

John Mashey

unread,

Dec 21, 2003, 7:04:01 PM12/21/03

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote in message news:<2003Dec2...@a0.complang.tuwien.ac.at>...

> old_sys...@yahoo.com (John Mashey) writes:
> [V: number of virtual address bits]
> >1) If V <= R, it is visible, in the sense that its behavior is part
> >of the contract with the user,
>
> Maybe part of the contract with the kernel programmer, but (for the
> usual OSs) not part of the contract with the user-level programmer.
> E.g., Linux on an EV68 lets me allocate 2047GB in one chunk (with an
> appropriate setting of /proc/sys/vm/overcommit_memory), but fails with
> "Cannot allocate memory" when I ask for 2048GB. I can guess that this
> is related to V (which is 43 AFAIK), but there is no way to be sure of
> that; it could be just an arbitrary restriction in the OS.

Let me try one more time.
Contracts guarantee some behavior and they don't guarantee others;
that they don't guarantee everything doesn't make them non-contracts.

The original IBM S/360 *CPU* offered a contract to the (user) programme
that said: only the low 24-bits of a register participates in address
calculation, and you can use the high 8-bits for anything you want,
and people did, relying on that contract. [As note, the 68000
had the sasme contract, but 68020s didn't.]

Newer S/360 successor CPUs offered a different contract, but they were forced
to retain a compatibility-mode feature to honor the existing contract.

The early S/360s were mostly real-memory OSs, for machines mostly
<= 1MB of physical memory, and the contract *never* guaranteed that
an attempt to allocate 15MB of memory would succeed. It simply guaranteed
that the 8-bits would be ignored.

Most OS's start with the limits created by the CPU, and then subtract more,
with the subtractions being driven by the OS version, the actual physical
memory present, or even the currently-available swap space.

The 64-bit *CPUs* have typically offered contracts (like R4000, etc):
1) The high-order (2) bits must be (zero, or you'll get trapped.
2) The next M high-order bits may or may not tbe usable in a given
machine, but they are *never* ignored.

Robert Wessel

unread,

Dec 22, 2003, 2:11:19 AM12/22/03

to

old_sys...@yahoo.com (John Mashey) wrote in message news:<ce9d692b.0312...@posting.google.com>...

> The original IBM S/360 *CPU* offered a contract to the (user) programme
> that said: only the low 24-bits of a register participates in address
> calculation, and you can use the high 8-bits for anything you want,
> and people did, relying on that contract. [As note, the 68000
> had the sasme contract, but 68020s didn't.]
>
> Newer S/360 successor CPUs offered a different contract, but they were forced
> to retain a compatibility-mode feature to honor the existing contract.

It was actually worse for the S/360s. The two *subroutine call*
instructions actually stored (non-address) data in the "unused" high
byte of the return address. At least all of the other instructions
that generated addresses stored zeros there.

Oh yeah, address space wraparound at 24 bits was *architecturally*
defined (yes, in 24 bit mode, a load register to address 0xfffffe is
defined to pick up two bytes at the top of memory, and two bytes at
address zero).

What *were* they smoking that day?

Jan C. Vorbrüggen

unread,

Dec 22, 2003, 4:22:36 AM12/22/03

to

> > [the whole thread is relevant, but that particular one notes that
> > some people wanted us to ignore some of the high-order bits so that
> > they could be used as flags. NO WAY, NOT AGAIN.]
>
> Wouldn't it be possible to use some LSB instead of some MSB?
> Data usually is aligned anyway...

That is, in fact, done in some architectures in very specific cases.
For instance, the VAX architecture requires that queue headers are aligned
on 16-byte boundaries; the queue insertion/deletion instructions use
the low bit of the first word as a secondary interlock, because the
architecture specifies atomic access only to 4-byte entities.

But the notion that data must be aligned on natural boundaries only came
with "RISC" processors into mainstream use and was made a requirement,
although compilers, for instance, tried to do so long before because their
writers knew that the hardware would process aligned data faster.

Jan

glen herrmannsfeldt

unread,

Dec 22, 2003, 1:35:14 PM12/22/03

to

Jan C. Vorbrüggen wrote:

(snip)

> But the notion that data must be aligned on natural boundaries only came
> with "RISC" processors into mainstream use and was made a requirement,
> although compilers, for instance, tried to do so long before because their
> writers knew that the hardware would process aligned data faster.

S/360 required data to be aligned. This restriction was lifted
on S/370 for most instructions, though they may run slower.
A few special instructions, like CS and CDS still require it.

S/360 has some properties of RISC machines, though most don't
consider it one.

-- glen

John Mashey

unread,

Dec 22, 2003, 11:40:09 PM12/22/03

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote in message news:<CPGFb.16981$VB2.37516@attbi_s51>...

See Google group search in comp.arch: "RISC CISC repost long",
which gives you a June 1995 repost of the ancient big post.
This put S/360 (barely) on the CISC side of the border.
As noted there, a big chunk of the ISA is relatively RISC-like,
i.e., if you stick with RR, RX, SI, and some RS instructions, the ISA
is pretty RISC-like. Recall that one of the best price/performance
machines (360/44) omitted the decimal operations, for example.
Even the SS instructions are nowhere near as tough as ones in the VAX
or MC68020, which could end up with indirect addressing, register
automodifications, and long instructions. S/360 instruction lengths
were at least determined at the start of the opcodes, and the address
modes were relatively simple.

Paul A. Clayton

unread,

Dec 23, 2003, 7:56:25 AM12/23/03

to

In article <2003Dec2...@a0.complang.tuwien.ac.at>,
an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

>V is hardly application-programmer visible in architectures like
>Alpha, AMD64 and most other 64-bit architectures. The CPU designers
>could add another level of page tables in the next version, adding
>another 10 bits or so to V, and with appropriate OS support nearly all
>applications would run on the new version without even noticing the
>difference.

All but the few pieces written by 'clever' programmers who
assume that virtual addresses for AMD64 (e.g.) will be 48b
and so 64KiB-aligned slabs of memory can be tracked
with 32b references. :-\

(With OS support might one also use the upper 16b to
indicate remote memory region [assuming that some
code sequences rarely use remote memory such that
the savings of dropping a compare and {predictable}
branch would be worth the very rare exception overhead]?
If a different signal is sent to the application for an
access to unmapped but mappable [forming a system-
valid virtual address] than for an access to an unmappable
address, this could introduce an incompatibility.)

Paul A. Clayton