286 and 386

73 views
Skip to first unread message

Anton Ertl

unread,
Jun 14, 2022, 11:01:29 AMJun 14
to
I just read the "Intel 386 Microprocessor Design and Development Oral
History Panel"
<http://archive.computerhistory.org/resources/text/Oral_History/Intel_386_Design_and_Dev/102702019.05.01.acc.pdf>.
Here I'll discuss some of the points mentioned there.

One interesting aspect was that the 386 project at the start (in 1982)
was the compatible upgrade for the stopgap 8086 line (while the 432
was less than a shining success), with the company putting more
resources in the P7 project (eventually i960MX). While the project
was developed, IBM PC compatibles became a huge success with a huge
binary-software base, and the 386 project became king.

The most interesting aspect of the 386 to me is that it turned the
specialized registers of the 8086 into general-purpose registers.
This is covered in this oral history in very little space:

|[John] Crawford: [My model] gave us the 32-bit extensions but without
|having a mode bit and without having a huge difference in the
|instruction sets. As I mentioned earlier, the price for that was we
|kept the eight-register model which was a drawback of the X86, but not
|a major one. I guess in addition I got half credit for the register
|extension because I was able to generalize the registers. On the 8086
|they were quite specialized and the software people hated that
|too. One of the things I was able to get into the architecture was to
|allow any register to be used to address memory as a base or an index
|register, and was even able to squeeze in a scale factor for the
|index.

I had programmed the 286 in assembly language in a course, and it
looked like a mess, while the 386 looks much more regular, but I think
the only relevant difference is that instead of the [BX|BP+SI|DI]
addressing mode, the 386 supports [reg], offset[reg],
offset[reg+reg*1|2|4|8]. If you then ignored all the special-register
instructions like LODS (which were slower than the sequence of
general-register instructios on the 486 anyway), you had
general-purpose registers.

It also supports the 8086 addressing mode encoding in some way, but as
far as I am concerned, I would not have noticed if there was a
supervisor-settable mode bit (like the 64-bit and 32-bit instruction
sets on AMD64); I never mixed the two addressing mode encodings, and
never saw any code that did. I think you can mix them with some
prefix byte, and with a user-settable mode bit.

Another interesting aspect is how fast the project was. Started in
1982, it taped out in 1985, and the Compaq Deskpro 386 was announced
in September 1986. And this time span included quite a long
"definition" (architecture, not microarchitecture) time. Supposedly
these days CPUs take 5 years to develop (although that's a number I
have read many years ago; how true is it these days?).

Finally, they said quite a bit about the 286 in this panel. In
particular, they made this elaborate segment scheme to compete with
the "Zilog MMU". And apparently neither the customers nor most at
Intel liked it (including the designer of the 8086). Interestingly, I
worked on a 286 box running Xenix in a course, and it seems to me that
if you can live with 64KB processes (and swapping instead of paging),
the 286 MMU does ok.

In the 386 apparently they wanted to add paging (which is not really
mentioned that way, they only say something about a large flat address
space), and they needed a long time until they came up with the
combination of segments and paging that they eventually selected: map
segments to a linear address space and then use paging for the linear
address space. This does not look particularly sensible to me, but if
you follow the premise that OSs either use segments or paging, but not
really both at once, it's probably the solution that's cheapest to
implement.

Back to the 286: What the panelists stated was that at least in 1982
(too late for the 286) Intel was aware that the 68000 with its large
flat address space was much more attractive than the Zilog MMU
features. And Bob Childs apparently already had told them that when
the 286 was "defined":

|[Jim] Slager: I remember I think I was with Bob Childs and we were
|telling him all the great things about the 286 and he said, "Well, do
|people really want that? Don't they just really want large address
|spaces?" Oh man, <laughter> because he had it 100 percent right, but
|we didn't listen

So let's consider what might have happened if they had listened. One
approach would have been to just extend the 8086 to 32 bits, including
segment registers (which would still be used with a 4-bit shift).
Should easily fit into the 134,000 NMOS transistors that the real
80286 has (twice as many as the 68,000 supposedly has). Provide for
an off-chip paging MMU.

If there are any transistors left, John Crawford's general-purpose
register addressing modes would have been an ideal addition. However,
if we assume a 16-bit external interface like the 286 had, the
question is if the longer encoding of the general addressing modes
would have caused enough of a slowdown to discourage their use.

If Intel had gone that way, it would be interesting how various
software projects would have played out. Would the MMU have become a
universal add-on to the alternative-realtity 286? How would OS/2 have
developed? How would Windows have developed?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

MitchAlsup

unread,
Jun 14, 2022, 12:06:30 PMJun 14
to
Bite your tongue.
Looking back on this, I wonder why CPU designers did not do something like
the SUN MMU--SRAM between RAS and CAS of DRAM does the translation
without adding any cycles to the access.

John Dallman

unread,
Jun 14, 2022, 1:54:56 PMJun 14
to
In article <2022Jun1...@mips.complang.tuwien.ac.at>,
an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> If Intel had gone that way, it would be interesting how various
> software projects would have played out. Would the MMU have become
> a universal add-on to the alternative-realtity 286?

Probably a common one, judging by the usage of optional MMUs on
68000-based Macs of the period. Certainly would have been incorporated
into the next generation of Intel x86.

> How would OS/2 have developed? How would Windows have developed?

Windows would have been a cleaner design. When I first started trying to
program it in 1989, there was a lot of cruft visible from the period when
Windows programs had been limited to one 64KB data segment. Some of it
still has to be supported now in 32-bit versions of Windows 10.

There might not have been a need for OS/2, but if there was, it would
have been far less crippled by the need to run on a 286.

John

Anton Ertl

unread,
Jun 15, 2022, 7:16:48 AMJun 15
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>I just read the "Intel 386 Microprocessor Design and Development Oral
>History Panel"
><http://archive.computerhistory.org/resources/text/Oral_History/Intel_386_Design_and_Dev/102702019.05.01.acc.pdf>.
>Here I'll discuss some of the points mentioned there.
[...]
>In the 386 apparently they wanted to add paging (which is not really
>mentioned that way, they only say something about a large flat address
>space), and they needed a long time until they came up with the
>combination of segments and paging that they eventually selected: map
>segments to a linear address space and then use paging for the linear
>address space. This does not look particularly sensible to me, but if
>you follow the premise that OSs either use segments or paging, but not
>really both at once, it's probably the solution that's cheapest to
>implement.

I thought some more about this problem.

I don't know much about how the 286 MMU and segmentation works and am
also no expert on MMUs in general, but anyway, here are my ideas on
that problem:

Just concatenate the 13-bit segment number and the 32-bit in-segment
address into a 45-bit virtual address and then use a 4-level page
table to translate that into the physical address. The TLB would have
to compare 33 bits of the address to the TLB entries.

To implement a flat memory model (within a process), a process would
get a segment number, and all segment register would be loaded with
that segment number. The segment number would have a role similar to
the ASID/PCID that Intel added to Westmere in 2010. This approach
would require that user-level can be prevented from changing segments;
I don't know if that is the case for the 286, or what it would take to
add that in my alternative-reality 386 if not.

If that was doable, my guess would be that Unix implementations would
at first have been limited to 8192 processes, and when that would have
been found too little, it's a good question whether some hardware
mechanism for more segment numbers would have been added, or whether
ways for multiplexing the processes onto segment numbers (what now
happens with PCIDs) would have been added in software.

I also have no idea how well that would work for protected-mode 286
programs.

Extending all of that for virtual machines etc. probably also has some
challenges.

Given the holes in my knowledge, there is probably something wrong
with these ideas, and you can help close these holes, by poking holes
in what I wrote.

John Levine

unread,
Jun 15, 2022, 1:41:07 PMJun 15
to
It appears that Anton Ertl <an...@mips.complang.tuwien.ac.at> said:
>I don't know much about how the 286 MMU and segmentation works and am
>also no expert on MMUs in general, but anyway, here are my ideas on
>that problem:

It's worth learning about if only as a case study in how a large
sophisticated organization can shoot itself in the foot.

Segment numbers are 16 bits, but the low three bits are special. The
two low bits are the requested privilege level which has to be at
least as high as the actual level in the segment table, and the third
bit is the local/global flag. So if you wanted to simulate data bigger
than 64K, you had to shift the high half of the address left three
bits, then add it the appropriate low bits, and even so every segment
switch was really slow so you really didn't want to do that. I can
imagine some designer saying "now they'll HAVE to use segments the way
we intend." Nope.

>Just concatenate the 13-bit segment number and the 32-bit in-segment
>address into a 45-bit virtual address and then use a 4-level page
>table to translate that into the physical address. The TLB would have
>to compare 33 bits of the address to the TLB entries.

That's basically what zSeries does, although it treats the address as flat and
the page table boundaries as an implementation detail. But not Intel.

>To implement a flat memory model (within a process), a process would
>get a segment number, and all segment register would be loaded with
>that segment number. The segment number would have a role similar to
>the ASID/PCID that Intel added to Westmere in 2010. This approach
>would require that user-level can be prevented from changing segments;
>I don't know if that is the case for the 286, or what it would take to
>add that in my alternative-reality 386 if not.

The 286 (and 386) have a local and global segment table, with a lot of
extra stuff in each table entry. A process can address any segment
that's valid and to which it has access. Each process has its own
local segment table. Everything related to segment switching is slow
enough that we went to considerable trouble to avoid it. Even reloading
the same value into a segment register was slow.

>I also have no idea how well that would work for protected-mode 286
>programs.

Not at all due to the segment table stuff.

>Extending all of that for virtual machines etc. probably also has some
>challenges.

There is quite a lot of stuff in modern Intel chips to support virtualization
and nested page tables and such. I haven't looked at it in detail since
I haven't had to.

It was a fortunate coincidence that the IBM 360 architecture was so easy
to virtualize with a small and well defined set of things that a VM monitor
had to intercept and simulate, and even there modern systems have added a
lot of microcode support. The Intel architecture is way harder since it
is so much more complicated.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

BGB

unread,
Jun 15, 2022, 4:40:51 PMJun 15
to
On 6/15/2022 12:41 PM, John Levine wrote:
> It appears that Anton Ertl <an...@mips.complang.tuwien.ac.at> said:
>> I don't know much about how the 286 MMU and segmentation works and am
>> also no expert on MMUs in general, but anyway, here are my ideas on
>> that problem:
>
> It's worth learning about if only as a case study in how a large
> sophisticated organization can shoot itself in the foot.
>
> Segment numbers are 16 bits, but the low three bits are special. The
> two low bits are the requested privilege level which has to be at
> least as high as the actual level in the segment table, and the third
> bit is the local/global flag. So if you wanted to simulate data bigger
> than 64K, you had to shift the high half of the address left three
> bits, then add it the appropriate low bits, and even so every segment
> switch was really slow so you really didn't want to do that. I can
> imagine some designer saying "now they'll HAVE to use segments the way
> we intend." Nope.
>

Agreed.

The x86 segmentation was, if anything, an example of how not to do it.

Though, I still don't necessarily consider it the biggest blunder of x86
(TSS and Call-Gates and similar probably being worse).


>> Just concatenate the 13-bit segment number and the 32-bit in-segment
>> address into a 45-bit virtual address and then use a 4-level page
>> table to translate that into the physical address. The TLB would have
>> to compare 33 bits of the address to the TLB entries.
>
> That's basically what zSeries does, although it treats the address as flat and
> the page table boundaries as an implementation detail. But not Intel.
>

Would have been funny if an alternate-universe version of 32-bit x86
would have allowed using far pointers as a 45-bit flat address space.



Though, this does point out some potential foot-guns in the design of my
96-bit addressing extension, but they were partly inescapable.

Couldn't have gone over to linear 96-bit addresses without either
breaking binary compatibility with code built around 48-bit addressing,
or requiring separate operating modes.


Though, the latter case might not have been too much of an issue in
practice with a 128-bit ABI, since the (unavoidable) differences in the
C ABI (in the native 128-bit mode) effectively make it moot.


However, the 128-bit ABI is likely DOA anyways, which leaves me back
with my original plan for when I designed them:
Treating them effectively like 128-bit far pointers within a segmented
addressing scheme (would also have much less performance impact than
moving everything over to 128-bit pointers).
I wouldn't expect it to be too difficult to allow for implementing a
hypervisor for BJX2.

It would most likely involve effectively running everything (including
the ISRs) in user-mode with some address space trickery and emulation
(effectively doing double-translation for the LDTLB instruction, ...).

So, for example:
Hypervisor has an emulated TLB;
TLB Miss handler first checks for a TLBE in the emulated TLB;
If found, translate again and upload into real TLB.
Else:
Transfer control back to user-mode;
Invoke the emulated TLB Miss handler.

All privileged instructions (or any which effect architectural state)
would need to be trapped and emulated, but this would be needed in
usermode anyways (one doesn't want usermode code poking around in system
registers to begin with).

Things like the SR register and CPUID instruction would likely also need
to be treated as protected architectural state for sake of this.

...

Timothy McCaffrey

unread,
Jun 16, 2022, 3:32:26 PMJun 16
to
On Wednesday, June 15, 2022 at 1:41:07 PM UTC-4, John Levine wrote:
> It appears that Anton Ertl <an...@mips.complang.tuwien.ac.at> said:
> >I don't know much about how the 286 MMU and segmentation works and am
> >also no expert on MMUs in general, but anyway, here are my ideas on
> >that problem:
> It's worth learning about if only as a case study in how a large
> sophisticated organization can shoot itself in the foot.
>
I heard that many of the i432 designers worked on the 286.

Things that the 286 could have done to make things better:
1) Add FS & GS (like the 386 did, too little, too late). A lot
of the segment thrashing was because there was only 1 extra segment.
SS, DS & CS were pretty much untouchable (well, you could use DS if you were very careful).
(Indeed, this is how you can have more than 64K of "data" segment, use SS for some stuff, and DS for the other).
2) Interrupts were incredibly slow, they should have spent more time thinking these through.
3) Why did privileged calls only use the segment and not the offset?
4) Really did not need 4 rings, every OS that I used that used the 286 in protected mode only used 2.

Things the 386 could have done:
1) Make segment registers 32 bits. (The alternative would be to just get rid of them and call it a day)
2) refactor the instruction set. There is a lot of cruft from the 8086 that *still* (even in AMD64) is a pain
(shift counts in CL, MUL/DIV using *AX/*DX, *DI/*SI/*CX used by LODS/MOVS/STOS/CMPS/SCAS, etc, etc.)
3) Add more registers.

Segments on the 286 could be used to provide two features, security and virtual memory. VM is difficult with segments because
you need to find a contiguous chunk of physical memory big enough for the whole segment, however it is possible
(this was implemented in OS/2 & Xenix, for two examples).
With the 386, you can use the paging MMU to give you a much larger virtual space to place your segments (not to mention
that segments could be up to 4GBs on the 386). That way you didn't need to page in an entire segment when you got a segfault,
only the page that was needed (you still needed to reserve the space in your virtual memory for the entire segment).
So, security was still handled by segmentation, but VM was pretty much handled by the paging MMU.

- Tim

Reply all
Reply to author
Forward
0 new messages