The EA jump immediately after enabling protected mode by setting PE in CR0

James Harris

unread,

Apr 26, 2015, 2:54:07 PM4/26/15

to

You know that, per Intel's directions, after setting the CR0 PE flag
with

mov eax, cr0
or al, 1
mov cr0, eax

we are expected to have something like

jmp seg:pmode_running

I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.

1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.

2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.

I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.

3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as

EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)

where the Ss are the selector and the Os are the offset as hex bytes.

4. Depending on the mode the CPU thinks it is operating in at the time
that it hits the jump instruction the 16-bit and 32-bit forms may need
to be encoded with a leading 0x66 so they appear as

66 EA oo oo ss ss (16-bit form in 32-bit mode)
66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)

5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?

6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.

If the executing code is in the low 64k of a descriptor's space then we
can encode the simple

EA oo oo ss ss

because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.

66 EA oo oo oo oo ss ss

To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?

Solutions?

Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as

EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to

66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)

When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.

I should say that the above is just as written after working out what I
think was going on and may contain errors for which I would welcome your
corrections.

Interesting subtlety, no?

Any thoughts/comments?

James

wolfgang kern

unread,

Apr 26, 2015, 5:43:28 PM4/26/15

to

James Harris wrote:

> You know that, per Intel's directions, after setting the CR0 PE flag
> with
>
> mov eax, cr0
> or al, 1
> mov cr0, eax
>
> we are expected to have something like
>
> jmp seg:pmode_running
>
> I had taken that jump instruction for granted but the recent Qemu/GDT
> thread has brought up some issues about the jump, as follows.

> 1. The jump appears to be necessary in order to put the correct pmode
> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
> and also to set the low bits of CS so that they contain the CPL and TI,
> all of which should be zero.

the value in a PM seg-reg is nothing else than the offset within
the GDT (lower bits are ignored/used elsewhere, so the GDT must
be 8 byte aligned). Nothing is shifted here.

> 2. On the 386 any kind of jump was needed immediately following the MOV
> to CR0 - even a near jump - in order to flush the prefetch queue. On
> Pentium Pro and later (and maybe even on the Pentium 1) there is no need
> to flush the queue but the far jump keeps things compatible as it will
> flush the prefetch queue on early CPUs as well as load the Pmode CS on
> all of them.

?? isn't this far jmp 'the switch point'.

> I knew the above but the following points are of particular interest
> just now as I had not considered them before - or if I had then I had
> forgotten the subtleties of the problem.

> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
> encoded in hex as
>
> EA oo oo ss ss (16-bit form)
> EA oo oo oo oo ss ss (32-bit form)
>
> where the Ss are the selector and the Os are the offset as hex bytes.

> 4. Depending on the mode the CPU thinks it is operating in at the time
> that it hits the jump instruction the 16-bit and 32-bit forms may need
> to be encoded with a leading 0x66 so they appear as

> 66 EA oo oo ss ss (16-bit form in 32-bit mode)
> 66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)

> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
> still in 16-bit mode. Right?

Right, the far jmp is 'the switch'.

> 6. Now, where it gets interesting is that the offset field of the EA
> jump instruction seems to be an offset not from the jump instruction but
> from the start if the segment. Is that correct? If so then we have to be
> careful which jump form is encoded, as follows.

> If the executing code is in the low 64k of a descriptor's space then we
> can encode the simple
>
> EA oo oo ss ss

> because the offset can fit in 16 bits.

Yes, and it also works if the new PM-CS use RM-CS*16 as base, and
because the offsets are equal then it may help stupid asm-tools too.
If a boot-sequence use 0:7c00 the PM-base can be flat (zero) as well.

> But if the executing code is above the 64k mark relative to the
> start of the segment then we need to encode the 32-bit form for
> 16-bit mode, i.e.

> 66 EA oo oo oo oo ss ss

sure, that's the only way if wont fit into 16-bits.

> To make an example, say that the code that will enable Pmode is located
> in memory so that the jump target is at physical address 0x12345. If the
> GDT entry for privileged code has been set to describe all of memory,
> i.e. from address 0 to address ff....fff, then it will be impossible to
> use the 16-bit form of the EA jump instruction. Correct?

> Solutions?

> Solution 1. Set up a temporary GDT entry to point to the place in memory
> where the code is running. In the above case, the GDT entry could point
> at 0x10000 and then the jump offset would be 0x2345, leading to the jump
> instruction being encoded as

> EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

> Solution 2. Modify the jump instruction so that the code's location
> relative to the start of the privileged code segment does not matter.
> That leads to
>
> 66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)

> When I did this before I used a temporary GDT entry to point to the
> executing code, i.e. solution 1, but solution 2 also has merits.

> I should say that the above is just as written after working out what I
> think was going on and may contain errors for which I would welcome your
> corrections.
>
> Interesting subtlety, no?

> Any thoughts/comments?

my OS switches forth and back between modes, so there are both variants,
the shorter for PM16<->RM and the 8 byte form for PM16/RM<->PM32.

__
wolfgang

Rod Pemberton

unread,

Apr 27, 2015, 12:13:22 AM4/27/15

to

On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
<james.h...@gmail.com> wrote:

[snip]

> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
> encoded in hex as
>
> EA oo oo ss ss (16-bit form)
> EA oo oo oo oo ss ss (32-bit form)
>
> where the Ss are the selector and the Os are the offset as hex bytes.
>

Of course, the mixed mode code, i.e., using o32 (66h) and a32 (67h) for
NASM, will only work after one of the early processors, i.e., probably 386.
I.e., you may need the o32 override to use the 32-bit form in 16-bit code,
but that won't work for the oldest processors.

> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
> still in 16-bit mode. Right?

I'm not clear on all the details here ...

My understanding is that CR0 enables PM, but the code segment is still
functioning as 16-bit RM until the CS segment descriptor cache is updated
for PM which is done via a far jump setting the CS selector. The PM
selector determines a PM 16-bit or PM 32-bit code segment.

> 6. Now, where it gets interesting is that the offset field of the EA
> jump instruction seems to be an offset not from the jump instruction but
> from the start if the segment. Is that correct?

Seems so ...

If you use the offset generated by the assembler label, it's likely to
be wrong since your PM base address for the code selector is likely to
be different from the address for the RM code segment in CS. E.g., RM
CS might be 07C0h while PM CS base address might be zero. So, the
assembler offset is relative to 07C0h, while the instruction offset
would need to be relative to zero.

This is probably why I've found using far returns and indirect far jumps
to be easier. You can compute the offset at runtime to match your code's
actual location relative to the base address for the descriptor.

> If so then we have to be
> careful which jump form is encoded, as follows.
>
> If the executing code is in the low 64k of a descriptor's space then we
> can encode the simple
>
> EA oo oo ss ss
>
> because the offset can fit in 16 bits. But if the executing code is
> above the 64k mark relative to the start of the segment then we need to
> encode the 32-bit form for 16-bit mode, i.e.
>
> 66 EA oo oo oo oo ss ss

As noted above, overrides may only work on the 386 or later.
IIRC, you're wanting your code to work on an 8086.

> To make an example, say that the code that will enable Pmode is located
> in memory so that the jump target is at physical address 0x12345. If the
> GDT entry for privileged code has been set to describe all of memory,
> i.e. from address 0 to address ff....fff, then it will be impossible to
> use the 16-bit form of the EA jump instruction. Correct?
>
> Solutions?
>

1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.

Rod Pemberton

--
Cars kill more people than guns in the U.S.
Yet, no one is trying to take away your car.

wolfgang kern

unread,

Apr 27, 2015, 2:04:35 AM4/27/15

to

Rod Pemberton mentioned:

...

> As noted above, overrides may only work on the 386 or later.
> IIRC, you're wanting your code to work on an 8086.

set PM for 8086 ?
James need to try much harder then :)
... now at least there is VM86.
__
wolfgang

James Harris

unread,

Apr 27, 2015, 4:58:58 AM4/27/15

to

"wolfgang kern" <now...@never.at> wrote in message
news:mhjm5r$1hh$1...@speranza.aioe.org...
> James Harris wrote:

...

>> 1. The jump appears to be necessary in order to put the correct pmode
>> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3
>> bits) and also to set the low bits of CS so that they contain the CPL
>> and TI, all of which should be zero.
>
> the value in a PM seg-reg is nothing else than the offset within the
> GDT (lower bits are ignored/used elsewhere, so the GDT must be 8 byte
> aligned). Nothing is shifted here.

Did you see somewhere that the GDT must be so aligned? For sure it is a
good idea but is it needed?

>> 2. On the 386 any kind of jump was needed immediately following the
>> MOV to CR0 - even a near jump - in order to flush the prefetch queue.
>> On Pentium Pro and later (and maybe even on the Pentium 1) there is
>> no need to flush the queue but the far jump keeps things compatible
>> as it will flush the prefetch queue on early CPUs as well as load the
>> Pmode CS on all of them.
>
> ?? isn't this far jmp 'the switch point'.

Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
processing in Pmode before doing a far jump. That post-PE processing
could even include the LGDT instruction!

It seems the 486 was similar to the 386 in what was required after
setting PE. Link below but is a large download:

https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmersReferenceManual1990_29642780/i486_Processor_Programmers_Reference_Manual_1990.pdf

...

>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>> still in 16-bit mode. Right?
>
> Right, the far jmp is 'the switch'.

Not on the 386 or 486. Two more that you would classify as for the
museum...?

James

James Harris

unread,

Apr 27, 2015, 9:02:30 AM4/27/15

to

"Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message
news:op.xxqgoy2nwa0t4m@localhost...

> On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
> <james.h...@gmail.com> wrote:

...

>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>> still in 16-bit mode. Right?
>
> I'm not clear on all the details here ...
>
> My understanding is that CR0 enables PM, but the code segment is still
> functioning as 16-bit RM until the CS segment descriptor cache is
> updated
> for PM which is done via a far jump setting the CS selector. The PM
> selector determines a PM 16-bit or PM 32-bit code segment.

Those specifics seem to depend on which processor you are using. The
Pentium's requirements are different from those for 386 and 486, for
example. Fortunately Intel suggest a sequence that will work on anything
from a 386 upwards so we don't need to think about the differences.

>> 6. Now, where it gets interesting is that the offset field of the EA
>> jump instruction seems to be an offset not from the jump instruction
>> but
>> from the start if the segment. Is that correct?
>
> Seems so ...
>
> If you use the offset generated by the assembler label, it's likely to
> be wrong since your PM base address for the code selector is likely to
> be different from the address for the RM code segment in CS. E.g., RM
> CS might be 07C0h while PM CS base address might be zero. So, the
> assembler offset is relative to 07C0h, while the instruction offset
> would need to be relative to zero.
>
> This is probably why I've found using far returns and indirect far
> jumps
> to be easier. You can compute the offset at runtime to match your
> code's
> actual location relative to the base address for the descriptor.

Yes, as long as you use the 32-bit version of the indirect jump - i.e.
with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
allows the dword keyword for that.)

...

>> To make an example, say that the code that will enable Pmode is
>> located
>> in memory so that the jump target is at physical address 0x12345. If
>> the
>> GDT entry for privileged code has been set to describe all of memory,
>> i.e. from address 0 to address ff....fff, then it will be impossible
>> to
>> use the 16-bit form of the EA jump instruction. Correct?
>>
>> Solutions?
>>
>
> 1) use a far return
> 2) use an indirect far call
> 3) set the PM descriptor base address for CS to match the RM segment
> for CS, but only temporarily. After updating the base address to
> it's final value, you may need to reload the selectors.

I like the idea of the indirect far call or, at least, an indirect far
jump because it makes the values easy to calculate. I am not so sure
about the far return. It may work in some or all cases but doesn't seem
to be guaranteed to work.

Your (3) sounds like one I suggested. In fact the whole GDT can be
temporary. In my case I would rather allocate space for the real GDT
once memory management is working so a temporary GDT is good just to get
Pmode working.

James

wolfgang kern

unread,

Apr 27, 2015, 9:21:25 AM4/27/15

to

James Harris wrote:

> ...

>>> 1. The jump appears to be necessary in order to put the correct pmode
>>> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
>>> and also to set the low bits of CS so that they contain the CPL and TI,
>>> all of which should be zero.

>> the value in a PM seg-reg is nothing else than the offset within the GDT
>> (lower bits are ignored/used elsewhere, so the GDT must be 8 byte
>> aligned). Nothing is shifted here.

> Did you see somewhere that the GDT must be so aligned?
> For sure it is a good idea but is it needed?

Yes! seems all manuals from 286 onward tell it.
Yes! it will raise an exception on the first access otherwise.

>>> 2. On the 386 any kind of jump was needed immediately following the MOV
>>> to CR0 - even a near jump - in order to flush the prefetch queue. On
>>> Pentium Pro and later (and maybe even on the Pentium 1) there is no need
>>> to flush the queue but the far jump keeps things compatible as it will
>>> flush the prefetch queue on early CPUs as well as load the Pmode CS on
>>> all of them.

>> ?? isn't this far jmp 'the switch point'.

> Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
> processing in Pmode before doing a far jump. That post-PE processing
> could even include the LGDT instruction!

> It seems the 486 was similar to the 386 in what was required after
> setting PE. Link below but is a large download:

https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmers
ReferenceManual1990_29642780/i486_Processor_Programmers_Reference_
Manual_1990.pdf

I have all these expensive old Intel-books on my shelf.
But many time passed since I read them.

>>>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>>>> still in 16-bit mode. Right?

>>> Right, the far jmp is 'the switch'.

>> Not on the 386 or 486. Two more that you would classify as for the
>> museum...?

:) yes almost, I haven't seen any working 386 since long.
and I never had a 386 nor any Intels after 80286.

I would have to look at these dusty old books to confirm your staement,
but what I once tried on my old AMD 486 seem to work the same way as
modern CPUs on this matter: strange behavior if instructions are between
CR0-wr and jmp-far because this 'intermediate mode' may not work as
expected.
Even we know a few brave enough to make such a mode a special case,
there is no guarantee that this work on another CPU too.
__
wolfgang

James Harris

unread,

Apr 27, 2015, 10:19:27 AM4/27/15

to

"wolfgang kern" <now...@never.at> wrote in message

news:mhld4g$avp$1...@speranza.aioe.org...

...

>>> so the GDT must be 8 byte aligned ....

>> Did you see somewhere that the GDT must be so aligned?
>> For sure it is a good idea but is it needed?
>
> Yes! seems all manuals from 286 onward tell it.
> Yes! it will raise an exception on the first access otherwise.

Could it have been the case for the 286 and possibly some but not all
later processors? I ask because I have found a comment in a PPro manual,
as follows: "The base addresses of the GDT should be aligned on an
eight-byte boundary to yield the best processor performance."

That seems to say that it is advisable, and hence not mandatory, on the
PPro. The manual is the PPro Family Developer's Manual Vol 3: OS
Writer's Guide.

After a search I also found "The base addresses of the GDT and IDT
should be aligned on an eight-byte boundary to maximize performance of
cache line fills." in a PPro manual vol 3.

Again, "should" not "must"!

...

>>>>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU
>>>>> is
>>>>> still in 16-bit mode. Right?
>
>>>> Right, the far jmp is 'the switch'.
>
>>> Not on the 386 or 486. Two more that you would classify as for the
>>> museum...?
>
> :) yes almost, I haven't seen any working 386 since long.
> and I never had a 386 nor any Intels after 80286.
>
> I would have to look at these dusty old books to confirm your
> staement,
> but what I once tried on my old AMD 486 seem to work the same way as
> modern CPUs on this matter: strange behavior if instructions are
> between
> CR0-wr and jmp-far because this 'intermediate mode' may not work as
> expected.
> Even we know a few brave enough to make such a mode a special case,
> there is no guarantee that this work on another CPU too.

Yes, this is unimportant as long as we follow the standard sequence -
until someone someone asks why a certain piece of code is not working,
of course.

FYI

http://css.csail.mit.edu/6.858/2014/readings/i386/s10_03.htm

Here's a quote from a 486 manual dated 1990:

"Protected mode is entered by setting the PE bit in the CRO, register.
Either an LMSW or MOV CRO instruction may be used to set this bit (the
MSW register is part of the CRO register). Because the processor
overlaps the interpretation of several instructions, it is necessary to
discard the instructions which already have been read into the
processor. A JMP instruction immediately after the LMSW instruction
changes the flow of execution, so it has the effect of emptying the
processor of instructions which have been fetched or decoded.

"After entering protected mode, the segment registers continue to hold
the contents they had in real address mode. Software should reload all
the segment registers. Execution in protected mode begins with a CPL of
O."

So it seems that for 386 and 486 only a jmp is needed. I found a Pentium
manual which seems to say that no jump is needed for the Pentium but
should be included for compatibility with the 386 and 486.

Of course, this is just for enabling Pmode ... enabling paging has its
own stipulations for forward and backward compatibility. (The Pentium
manual is helpful here but off topic for this thread.)

James

Rod Pemberton

unread,

Apr 27, 2015, 6:46:14 PM4/27/15

to

On Mon, 27 Apr 2015 09:02:28 -0400, James Harris

<james.h...@gmail.com> wrote:
> "Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message
> news:op.xxqgoy2nwa0t4m@localhost...
>> On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
>> <james.h...@gmail.com> wrote:

>>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>>> still in 16-bit mode. Right?
>>
>> I'm not clear on all the details here ...
>>
>> My understanding is that CR0 enables PM, but the code segment is still
>> functioning as 16-bit RM until the CS segment descriptor cache is
>> updated for PM which is done via a far jump setting the CS selector.
>> The PM selector determines a PM 16-bit or PM 32-bit code segment.
>
> Those specifics seem to depend on which processor you are using. The
> Pentium's requirements are different from those for 386 and 486, for
> example. Fortunately Intel suggest a sequence that will work on anything
> from a 386 upwards so we don't need to think about the differences.

True, the manuals vary on the process needed. AIR, we've discussed some
aspects of that in the past, which were many and varied.

You deviated from my point that was in response to you saying "... the
CPU is still in 16-bit mode. Right?" AIUI, you're not fully in PM simply
by setting CR0.PE. The far jump also is required to activate PM for the
executing code segment.

>>> 6. Now, where it gets interesting is that the offset field of the EA
>>> jump instruction seems to be an offset not from the jump instruction
>>> but
>>> from the start if the segment. Is that correct?
>>
>> Seems so ...
>>
>> If you use the offset generated by the assembler label, it's likely to
>> be wrong since your PM base address for the code selector is likely to
>> be different from the address for the RM code segment in CS. E.g., RM
>> CS might be 07C0h while PM CS base address might be zero. So, the
>> assembler offset is relative to 07C0h, while the instruction offset
>> would need to be relative to zero.
>>
>> This is probably why I've found using far returns and indirect far
>> jumps to be easier. You can compute the offset at runtime to match
>> your code's actual location relative to the base address for the
>> descriptor.
>
> Yes, as long as you use the 32-bit version of the indirect jump - i.e.
> with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
> allows the dword keyword for that.)

Why would you need the 32-bit version when the code is to be loaded
below 1MB? ...

>>> To make an example, say that the code that will enable Pmode is
>>> located in memory so that the jump target is at physical address
>>> 0x12345. If the GDT entry for privileged code has been set to describe
>>> all of memory, i.e. from address 0 to address ff....fff, then it will
>>> be impossible to use the 16-bit form of the EA jump instruction.
>>> Correct?
>>>
>>> Solutions?
>>>
>>
>> 1) use a far return
>> 2) use an indirect far call
>> 3) set the PM descriptor base address for CS to match the RM segment
>> for CS, but only temporarily. After updating the base address to
>> it's final value, you may need to reload the selectors.
>
> I like the idea of the indirect far call or, at least, an indirect far
> jump because it makes the values easy to calculate.

...

> I am not so sure about the far return. It may work in some or all
> cases but doesn't seem to be guaranteed to work.

Why do you say it doesn't seem to be guaranteed to work? ...
I don't recall ever seeing anything in the manuals to indicate that.
Shouldn't any control transfer which sets CS to a PM selector work?

> Your (3) sounds like one I suggested. In fact the whole GDT can be
> temporary. In my case I would rather allocate space for the real GDT
> once memory management is working so a temporary GDT is good just to get
> Pmode working.

I took your comments to mean not re-using the same descriptor,
whereas mine was to re-use the same descriptor.

James Harris

unread,

Apr 28, 2015, 4:08:56 PM4/28/15

to

"Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message

news:op.xxrv7rmmwa0t4m@localhost...

I didn't mean to deviate but I can see I wasn't explicit. I meant that,
perhaps, for the 386 and 486 you really are in Pmode once the prefetch
queue has drained. I know that the segment registers will not have been
loaded with Pmode selectors but the early manuals seem to say that you
are nonetheless in Pmode. For example, I expect that data accesses will
be 32-bit by default at that point.

Why 1MB Rod? Wouldn't you need the 32-bit version above 64k?
Specifically, I think it would be needed any time the jump target was
more than 64k beyond the new CS.

Notably, even if the calculated address is above 64k the assembler may
*silently* truncate the jump offset to 16 bits so it's something to be
aware of. Some innocuous
change elsewhere in the code could make the code bigger and thereby push
the jump target over the threshold and cause the code to fail. So it may
be a good idea to code the dword version anyway. That would make the
code more resilient against changes.

>>>> To make an example, say that the code that will enable Pmode is
>>>> located in memory so that the jump target is at physical address
>>>> 0x12345. If the GDT entry for privileged code has been set to
>>>> describe
>>>> all of memory, i.e. from address 0 to address ff....fff, then it
>>>> will
>>>> be impossible to use the 16-bit form of the EA jump instruction.
>>>> Correct?
>>>>
>>>> Solutions?
>>>>
>>>
>>> 1) use a far return
>>> 2) use an indirect far call
>>> 3) set the PM descriptor base address for CS to match the RM segment
>>> for CS, but only temporarily. After updating the base address to
>>> it's final value, you may need to reload the selectors.
>>
>> I like the idea of the indirect far call or, at least, an indirect
>> far
>> jump because it makes the values easy to calculate.
>
> ...
>
>> I am not so sure about the far return. It may work in some or all
>> cases but doesn't seem to be guaranteed to work.
>
> Why do you say it doesn't seem to be guaranteed to work? ...
> I don't recall ever seeing anything in the manuals to indicate that.
> Shouldn't any control transfer which sets CS to a PM selector work?

Because Intel say that for backwards and forwards compatibility to do X
and X includes a far jump and a far call but not a far return.

Of course, Intel is not the only manufacturer. At one time there were
many. Hopefully they work with Intel's suggested initialisation
sequence.

>> Your (3) sounds like one I suggested. In fact the whole GDT can be
>> temporary. In my case I would rather allocate space for the real GDT
>> once memory management is working so a temporary GDT is good just to
>> get
>> Pmode working.
>
> I took your comments to mean not re-using the same descriptor,
> whereas mine was to re-use the same descriptor.

OK.

James

Rod Pemberton

unread,

Apr 28, 2015, 8:05:51 PM4/28/15

to

On Mon, 27 Apr 2015 00:13:36 -0400, Rod Pemberton <bo...@lllljunkqwer.cpm>
wrote:

>> To make an example, say that the code that will enable Pmode is located
>> in memory so that the jump target is at physical address 0x12345. If the
>> GDT entry for privileged code has been set to describe all of memory,
>> i.e. from address 0 to address ff....fff, then it will be impossible to
>> use the 16-bit form of the EA jump instruction. Correct?
>>
>> Solutions?
>>
>
> 1) use a far return
> 2) use an indirect far call
> 3) set the PM descriptor base address for CS to match the RM segment
> for CS, but only temporarily. After updating the base address to
> it's final value, you may need to reload the selectors.
>

Of course, the easiest solution is a PM descriptor base address of zero,
and matching RM CS equal to zero. But, CS == 0 brings up the entire
CS=0,IP=7C00h versus CS=07C0h,IP=0 debate, again, which we've gone
over countless times now.

James Harris

unread,

Apr 29, 2015, 4:44:24 AM4/29/15

to

"Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message

news:op.xxtukfnbwa0t4m@localhost...

> On Mon, 27 Apr 2015 00:13:36 -0400, Rod Pemberton
> <bo...@lllljunkqwer.cpm>
> wrote:
>
>>> To make an example, say that the code that will enable Pmode is
>>> located
>>> in memory so that the jump target is at physical address 0x12345. If
>>> the
>>> GDT entry for privileged code has been set to describe all of
>>> memory,
>>> i.e. from address 0 to address ff....fff, then it will be impossible
>>> to
>>> use the 16-bit form of the EA jump instruction. Correct?
>>>
>>> Solutions?
>>>
>>
>> 1) use a far return
>> 2) use an indirect far call
>> 3) set the PM descriptor base address for CS to match the RM segment
>> for CS, but only temporarily. After updating the base address to
>> it's final value, you may need to reload the selectors.
>>
>
> Of course, the easiest solution is a PM descriptor base address of
> zero,
> and matching RM CS equal to zero.

You would still need the "dword" or 0x66 variant to get to 0x12345,
wouldn't you? I've come to think that having the dword variant in at all
times is probably a good idea because it is more robust, albeit that it
is three bytes longer.

Put another way, if the dword variant was the default then when omitted
someone in someone could ask "why did you omit dword there?" and get the
answer "because we are guaranteed that the branch target will be within
64k of the Pmode CS" or similar. Having it omitted by default
(especially in sample code) leads to it not being considered and
therefore code which is either broken or fragile.

I may also use a temporary GDT entry (or a temporary GDT) which has a
privileged CS descriptor set to match the RM CS=0 point. That makes the
transition code simpler.

Another good option is the indirect jump you mentioned. That, too,
allows the target address to be calculated.

At the end of the day this has been an interesting thread for me because
I have realised that it can be too easy simply not to think about what
that jump does. Wolfgang had sussed it out but it is something I had
never thought about in enough detail because sample code just makes it
look like a simple jump - often a jump to the next instruction. Easy
right? No! As we have discussed, it is only that simple under certain
conditions, not always.

Another interesting finding was that the immediate jump naturally has
16-bit operands, not 32-bit ones. So in assembly source it's right to
place the "bits 32" directive *after* the jump and not before it.

> But, CS == 0 brings up the entire
> CS=0,IP=7C00h versus CS=07C0h,IP=0 debate, again, which we've gone
> over countless times now.

AIUI that is (or was) a different issue. Although the boot sector code
(if any) will be loaded at 0x7c00 there is no guarantee that it will
move itself out of the way and load any following code there. So the
subsequent code may be somewhere completely different.

Even with CS=0 a jump to the suggested jump-target address of 0x12345
would need the dword variant wouldn't it, because of the target being
beyond the 64k point?

James

James Harris

unread,

Jun 12, 2021, 3:06:12 PM6/12/21

to

On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:
> You know that, per Intel's directions, after setting the CR0 PE flag
> with
>
> mov eax, cr0
> or al, 1
> mov cr0, eax
>
> we are expected to have something like
>
> jmp seg:pmode_running
>
> I had taken that jump instruction for granted but the recent Qemu/GDT
> thread has brought up some issues about the jump, as follows.
>
> 1. The jump appears to be necessary in order to put the correct pmode
> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
> and also to set the low bits of CS so that they contain the CPL and TI,
> all of which should be zero.

Going back to this old thread as I have some more information.

The Sybex book Programming the 80386 says on the back that its authors are one of the 80386's logic designers and the 80386's (and thus x86-32's) chief architect, so they ought to know a thing or two about the design! On page 605 the book says that after setting the PE bit the processor enters "16-bit protected mode" - which I didn't know even existed (but see below).

The processor apparently stays in that mode (Protected Mode but 16-bit) for as long as we want, and what changes it to 32-bit PMode is an instruction which loads CS with the selector for a 32-bit descriptor.

Correspondingly, it would appear to be possible to change /back/ to 16-bit PMode by loading CS with the selector of a 16-bit descriptor.

As for what determines whether a descriptor is 32-bit or 16-bit remember that apart from Base and Limit a descriptor has one byte and one nibble for other fields. I thought the difference might be in the descriptor type code (which is in the byte) but it's actually in the B bit which is in the nibble. See

https://en.wikipedia.org/wiki/Segment_descriptor#Structure

Basically, B=0 means 16-bit and B=1 means 32-bit.

I knew about the bit, of course, but never really understood it; and I didn't realise that B=0 was the mode the processor executed in after enabling PMode prior to reloading CS.

>
> 2. On the 386 any kind of jump was needed immediately following the MOV
> to CR0 - even a near jump - in order to flush the prefetch queue. On
> Pentium Pro and later (and maybe even on the Pentium 1) there is no need
> to flush the queue but the far jump keeps things compatible as it will
> flush the prefetch queue on early CPUs as well as load the Pmode CS on
> all of them.

The different stages of enabling PMode are now possibly easier to guess at.

1. After enabling CR0.PE

The processor will be in 16-bit PM but early processors (including 386 and 486) did not automatically flush the prefetch queue so they could have already decoded some of the following bytes as RM. I would guess that many such decodings would be different but that some could be the same. If that's right then some instructions could be validly executed here even without flushing the prefetch queue. There's no value in doing so but ISTM informative to see exactly what it likely to change and when.

2. After flushing the prefetch queue

This will apparently be true 16-bit PM with a 64k limit on code addresses - and probably the same for data addresses.

From this point it turns out that contrary to normal practice one could load selectors for 32-bit /data/ descriptors while still running the code in 16-bit mode. I say that because the code in

https://archive.org/stream/bitsavers_intel80386ammersReferenceManual1986_27457025/230985-001_80386_Programmers_Reference_Manual_1986_djvu.txt

includes the following:

LGDT tGDT__pword

; switch to protected mode
MOV EAX,CR0
MOV EAX,1
MOV CR0,EAX

; clear prefetch queue
JMP SHORT flush
flush:

; set DS,ES,SS to address flat linear space (0 ... 4GB)
MOV BX,FLAT_DES-Temp_GDT
MOV DS,BX
MOV ES,BX
MOV SS,BX

Note the data selectors being loaded before the code selector (which the code changes much later) - and the botched update of CR0 which appeared in many Intel sources of the time.

FWIW, the code also goes on to do a bunch of other stuff such as

; initialize stack pointer to some (arbitrary) RAM location
MOV ESP, OFFSET end_Temp_GDT

; copy eprom GDT to RAM
MOV ESI, DWORD PTR GDT_eprom +2 ; get base of eprom GDT
MOV EDI,GDTbase
MOV CX,WORD PTR gdt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

; point ES:EDI to GDT base in RAM.
; limit of eprom GDT
; easier to move words

;copy eprom IDT to RAM
MOV ESI, DWORD PTR IDT_eprom +2 ; get base of eprom IDT
MOV EDI,IDTbase
MOV CX,WORD PTR idt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

etc, all before setting CS to a 32-bit descriptor.

3. After loading CS (via a far jump or fall call or by a TSS switch) to refer to a 32-bit descriptor.

The code will finally be in PM32.

>
> I knew the above but the following points are of particular interest
> just now as I had not considered them before - or if I had then I had
> forgotten the subtleties of the problem.

>
> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
> encoded in hex as
>
> EA oo oo ss ss (16-bit form)
> EA oo oo oo oo ss ss (32-bit form)
>
> where the Ss are the selector and the Os are the offset as hex bytes.

The above info implies that it's the /short/ version of the jump which is required.

...

> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
> still in 16-bit mode. Right?

That turns out to be right.

--
James Harris

James Harris

unread,

Jun 12, 2021, 3:23:37 PM6/12/21

to

On 12/06/2021 20:06, James Harris wrote:
> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:
>> You know that, per Intel's directions, after setting the CR0 PE flag
>> with
>>
>> mov eax, cr0
>> or al, 1
>> mov cr0, eax
>>
>> we are expected to have something like
>>
>> jmp seg:pmode_running
>>
>> I had taken that jump instruction for granted but the recent Qemu/GDT
>> thread has brought up some issues about the jump, as follows.

...

>>
>> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
>> encoded in hex as
>>
>> EA oo oo ss ss (16-bit form)
>> EA oo oo oo oo ss ss (32-bit form)
>>
>> where the Ss are the selector and the Os are the offset as hex bytes.
>
> The above info implies that it's the /short/ version of the jump which is required.

I found some code from AMD which confirms it. Note the two dws in the
following code fragment.

mov eax, 000000011h
mov cr0, eax

;Execute a far jump to turn protected mode on.
;code16_sel must point to the previously-established 16-bit
;code descriptor located in the GDT (for the code currently
;being executed).
db 0eah ;Far jump...
dw offset now_in_prot ; to offset...
dw code16_sel ; in current code segment

That's from the AMD64 Architecture Programmer’s Manual Volume 2: System
Programming, publication number 24593 revision 3.11 December 2005.

That's a good, clear confirmation.

BTW, it's unusual to see CR0's bit 1 set at the same time....

--
James Harris

Rod Pemberton

unread,

Jun 12, 2021, 6:29:58 PM6/12/21

to

On Sat, 12 Jun 2021 12:06:11 -0700 (PDT)
James Harris <james.h...@gmail.com> wrote:

> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:

> The Sybex book Programming the 80386 says on the back that its
> authors are one of the 80386's logic designers and the 80386's (and
> thus x86-32's) chief architect, so they ought to know a thing or two
> about the design! On page 605 the book says that after setting the PE
> bit the processor enters "16-bit protected mode" - which I didn't
> know even existed (but see below).
>

I didn't fully re-read the 2015 thread, but I think my perspective
hasn't changed any. I.e., after enabling CR0.PE, it's the re-loading
the CS selector which causes the processor to switch into protected
mode, be it 16-bit PM or 32-bit PM. Without the jump to re-load CS, CS
is still set to a 16-bit RM segment. I.e., the processor is still
executing 16-bit RM code after CR0.PE is set, but prior to a far jump
re-loading CS with a valid selector for PM. In other words, simply
enabling CR0.PE doesn't activate PM, the code is still 16-bit RM code,
but perhaps some of the PM features have been enabled by setting CR0.PE.

When CR0.PE is enabled, CS isn't a selector but a segment, and so CS
doesn't point to a descriptor. No descriptor means no CS.ar.D bit
which determines a 16-bit PM or 32-bit PM code segment. CS won't be
set to a selector with a valid descriptor with a CS.ar.D bit, until a
far jump re-loads CS, but only after CR0.PE is enabled.

> The different stages of enabling PMode are now possibly easier to
> guess at.
>
> 1. After enabling CR0.PE
>
> The processor will be in 16-bit PM but early processors (including
> 386 and 486) did not automatically flush the prefetch queue so they
> could have already decoded some of the following bytes as RM. I would
> guess that many such decodings would be different but that some could
> be the same. If that's right then some instructions could be validly
> executed here even without flushing the prefetch queue. There's no
> value in doing so but ISTM informative to see exactly what it likely
> to change and when.

I'm still of the mind that after CR0.PE is set, a valid 16-bit PM
selector must be loaded into CS via a far jump to enable 16-bit PM or
32-bit PM. Otherwise, CS still contains a 16-bit RM segment, and the
processor is still executing 16-bit RM code, perhaps with some PM
features enabled due to CR0.PE being set.

--
What is hidden in the ground, when found, is hidden there again?

wolfgang kern

unread,

Jun 13, 2021, 9:59:02 AM6/13/21

to

On 12.06.2021 21:06, James Harris wrote:
> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:
>> You know that, per Intel's directions, after setting the CR0 PE flag
>> with
>>
>> mov eax, cr0
>> or al, 1
>> mov cr0, eax
>>
>> we are expected to have something like
>>
>> jmp seg:pmode_running
>>
>> I had taken that jump instruction for granted but the recent Qemu/GDT
>> thread has brought up some issues about the jump, as follows.
>>
>> 1. The jump appears to be necessary in order to put the correct pmode
>> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
>> and also to set the low bits of CS so that they contain the CPL and TI,
>> all of which should be zero.
>
>
> Going back to this old thread as I have some more information.
>
> The Sybex book Programming the 80386 says on the back that its authors are one of the 80386's logic designers and the 80386's (and thus x86-32's) chief architect, so they ought to know a thing or two about the design! On page 605 the book says that after setting the PE bit the processor enters "16-bit protected mode" - which I didn't know even existed (but see below).

Old (some newer as well) docs are often written with doubtful wording.
I remember my early attempts to enter PM (but 486), it doesn't work that
way, PM16 or PM32 start is exactly at the point where CS become altered.

> The processor apparently stays in that mode (Protected Mode but 16-bit) for as long as we want, and what changes it to 32-bit PMode is an instruction which loads CS with the selector for a 32-bit descriptor.

PM16 or 32 just differ by one bit in the code segment selector.

> Correspondingly, it would appear to be possible to change /back/ to 16-bit PMode by loading CS with the selector of a 16-bit descriptor.

Yes, my mode switch from PM32 to RM16 needs one step to PM16 in between
but the switches from RM16 to PM32 or LM don't need a PM16 step.

...

> https://en.wikipedia.org/wiki/Segment_descriptor#Structure
> Basically, B=0 means 16-bit and B=1 means 32-bit.
> I knew about the bit, of course, but never really understood it; and I didn't realise that B=0 was the mode the processor executed in after enabling PMode prior to reloading CS.

yeah, it wasn't too well documented :)

> The different stages of enabling PMode are now possibly easier to guess at.

> 1. After enabling CR0.PE

> The processor will be in 16-bit PM

NO!

> but early processors (including 386 and 486) did not automatically flush the prefetch queue so they could have already decoded some of the following bytes as RM. I would guess that many such decodings would be different but that some could be the same. If that's right then some instructions could be validly executed here even without flushing the prefetch queue. There's no value in doing so but ISTM informative to see exactly what it likely to change and when.
>
> 2. After flushing the prefetch queue
>
> This will apparently be true 16-bit PM with a 64k limit on code addresses - and probably the same for data addresses.
>
> From this point it turns out that contrary to normal practice one could load selectors for 32-bit /data/ descriptors while still running the code in 16-bit mode. I say that because the code in
>
> https://archive.org/stream/bitsavers_intel80386ammersReferenceManual1986_27457025/230985-001_80386_Programmers_Reference_Manual_1986_djvu.txt

it's a bit dated :)

> includes the following:

> LGDT tGDT__pword

missing: CLI
and the GDT should already contain valid entries here.

> ; switch to protected mode

correction: prepare to switch

> MOV EAX,CR0
> MOV EAX,1
> MOV CR0,EAX

redundant:
;> ; clear prefetch queue

;> JMP SHORT flush
;> flush:

> ; set DS,ES,SS to address flat linear space (0 ... 4GB)
> MOV BX,FLAT_DES-Temp_GDT
> MOV DS,BX
> MOV ES,BX
> MOV SS,BX

> Note the data selectors being loaded before the code selector (which the code changes much later) - and the botched update of CR0 which appeared in many Intel sources of the time.

it doesn't matter where data/stack selectors were initialized as long
they aren't used. I make DS SS:ESP before, ES,FS.GS after the far jump.

> FWIW, the code also goes on to do a bunch of other stuff such as

> ; initialize stack pointer to some (arbitrary) RAM location
> MOV ESP, OFFSET end_Temp_GDT

wherever you want it to be :)

> ; copy eprom GDT to RAM
> MOV ESI, DWORD PTR GDT_eprom +2 ; get base of eprom GDT
> MOV EDI,GDTbase
> MOV CX,WORD PTR gdt_eprom +0
> INC CX
> SHR CX,1
> CLD
> REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]
>
> ; point ES:EDI to GDT base in RAM.
> ; limit of eprom GDT
> ; easier to move words
>
> ;copy eprom IDT to RAM
> MOV ESI, DWORD PTR IDT_eprom +2 ; get base of eprom IDT
> MOV EDI,IDTbase
> MOV CX,WORD PTR idt_eprom +0
> INC CX
> SHR CX,1
> CLD
> REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

this ROM GDT may point to an already wiped RAM area !!!
modern (already old now) BIOS use only temporary PM32 and LM.

> etc, all before setting CS to a 32-bit descriptor.

> 3. After loading CS (via a far jump or fall call or by a TSS switch) to refer to a 32-bit descriptor.

I wouldn't use a far CALL nor (total worse) a task-switch here.
if you're brave you could use what I do for mode switches but only after
the initial PM and stack setup:

PUSH EFL ;needs 66 if within 16 bit code
CLI
PUSH new_descriptor ;I use immediate constants here
PUSH new_offset ;
IRET

> The code will finally be in PM32.

could be in PM16 as well.

>> I knew the above but the following points are of particular interest
>> just now as I had not considered them before - or if I had then I had
>> forgotten the subtleties of the problem.

OK, nothing new on the matter to me.

>> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
>> encoded in hex as

>> EA oo oo ss ss (16-bit form)
>> EA oo oo oo oo ss ss (32-bit form)

>> where the Ss are the selector and the Os are the offset as hex bytes.
> The above info implies that it's the /short/ version of the jump which is required.

yes of course because this jump is still RM16 [NOT PM16] code.

>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>> still in 16-bit mode. Right?
> That turns out to be right.

Yes still in 16 bit Realmode, until CS is loaded.

>> 6. Now, where it gets interesting is that the offset field of the EA
>> jump instruction seems to be an offset not from the jump instruction but
>> from the start if the segment. Is that correct? If so then we have to be
>> careful which jump form is encoded, as follows.

>> If the executing code is in the low 64k of a descriptor's space then we
>> can encode the simple

>> EA oo oo ss ss
>>
>> because the offset can fit in 16 bits. But if the executing code is
>> above the 64k mark relative to the start of the segment then we need to
>> encode the 32-bit form for 16-bit mode, i.e.
>>
>> 66 EA oo oo oo oo ss ss

only if there is already code loaded (guess how to do before PM..).

>> Solution 1. Set up a temporary GDT entry to point to the place in memory
>> where the code is running. In the above case, the GDT entry could point
>> at 0x10000 and then the jump offset would be 0x2345, leading to the jump
>> instruction being encoded as

>> EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

0000:0600 66 EA 18 00 45 23 10 00 jmp far 0018:102345 ;assume FLAT CS
FFFF:2355 ;aka 0010:2345 PM32 code

James Harris

unread,

Feb 10, 2022, 9:47:55 AM2/10/22

to

On 12/06/2021 20:23, James Harris wrote:
> On 12/06/2021 20:06, James Harris wrote:
>> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:

>>> You know that, per Intel's directions, after setting the CR0 PE flag
>>> with
>>>
>>> mov eax, cr0
>>> or al, 1
>>> mov cr0, eax
>>>
>>> we are expected to have something like
>>>
>>> jmp seg:pmode_running
>>>
>>> I had taken that jump instruction for granted but the recent Qemu/GDT
>>> thread has brought up some issues about the jump, as follows.
>
> ...
>
>>>
>>> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
>>> encoded in hex as
>>>
>>> EA oo oo ss ss (16-bit form)
>>> EA oo oo oo oo ss ss (32-bit form)
>>>
>>> where the Ss are the selector and the Os are the offset as hex bytes.
>>
>> The above info implies that it's the /short/ version of the jump which
>> is required.
>
> I found some code from AMD which confirms it.

A further point about the machine's state just after setting CR0 bit 0
and some queries which can be taken as rhetorical as they are somewhat
academic but may be of note nonetheless if you are interested in the detail.

It boils down to how instructions are decoded after setting CR0.PE.

From memory, it is advised to put a jump immediately after enabling
Pmode so as to flush the instruction pipeline on early processors 386,
486, and possibly Pentium before Pentium Pro. But is the pipeline
flushed before or after the jump is executed? IOW, is the jump itself
decoded in real mode or in PMode16 (the state the processor is put in by
the load of CR0)?

Perhaps all jump instructions assemble to the same bytes so it wouldn't
matter. Certainly a jump to the next instruction will be coded as EB00
in either 16-bit or 32-bit mode.

In fact, as the assembler (Nasm, in this case but the point is
presumably generally applicable) recognises only "bits 16" or "bits 32"
and not Rmode and Pmode perhaps instructions in PM16 decode exactly as
they do in Real mode.

But then if the encodings are the same then why would flushing the
instruction queue be necessary?

To be clear, where I believe a "bits 32" directive sits is /after/ the
far jump as in

mov cr0, eax ;Set bit 0
... (potentially a large number of instructions)
jmp seg:offset
bits 32
offset:

It seems a bit of a conundrum and leads to the obvious question: exactly
what differences are there between instruction decoding in real mode and
in PM16 (the mode immediately after setting CR0 bit 0?

As I say, this is all largely academic but if you happen to know the
answer without doing any research do say as the details look interesting.

--
James Harris

anti...@math.uni.wroc.pl

unread,

Feb 10, 2022, 1:34:51 PM2/10/22

to

Your question is confused. Instructions are decoded in parallel with
execution of previous instructions. This may happen few clocks
before execution. So _normally_ jump is decoded before CR0 bit 0
is set, so decoded in real mode. But I can imagine rather obscure
situation so that jump is decoded after CR0 bit 0 is set, so in
protected mode.

> Perhaps all jump instructions assemble to the same bytes so it wouldn't
> matter. Certainly a jump to the next instruction will be coded as EB00
> in either 16-bit or 32-bit mode.

That is recomended instruction, safe under all cirumstances.

> In fact, as the assembler (Nasm, in this case but the point is
> presumably generally applicable) recognises only "bits 16" or "bits 32"
> and not Rmode and Pmode perhaps instructions in PM16 decode exactly as
> they do in Real mode.
>
> But then if the encodings are the same then why would flushing the
> instruction queue be necessary?

Correspondence between mnemonics and intruction bits is the same,
so no need for extra assembler directive. But processor treats
intruction bits differently depending on mode.

> To be clear, where I believe a "bits 32" directive sits is /after/ the
> far jump as in
>
> mov cr0, eax ;Set bit 0
> ... (potentially a large number of instructions)
> jmp seg:offset
> bits 32
> offset:
>
> It seems a bit of a conundrum and leads to the obvious question: exactly
> what differences are there between instruction decoding in real mode and
> in PM16 (the mode immediately after setting CR0 bit 0?

Note: "canonical" sequence has _two_ jumps: one short jump to
flush decode queue and long jump to load CS. Namely, setting
CR0 bit 0 gives you 16 bit proteded mode. You switch to 32-bit
mode by loading 32-bit segment descriptor.

IIRC simpler sequence failed (on actual 386). But I did my
tests more than 25 years ago so I am not 100% confident in my
memory. So, if you want to be sure or want to know what happens
beyond simple fail/works, then get real 386 and run enough tests...
OTOH I would not expect detailed explanations, during switch
processor is in strange transitory mode so can exhibit
weird behaviour. Intel documented how to avoid pitfals,
but they have no interest in providing deeper explanation.

--
Waldek Hebisch

wolfgang kern

unread,

Feb 10, 2022, 5:33:06 PM2/10/22

to

On 10/02/2022 15:47, James Harris wrote:
...

> It seems a bit of a conundrum and leads to the obvious question: exactly
> what differences are there between instruction decoding in real mode and
> in PM16 (the mode immediately after setting CR0 bit 0?

> As I say, this is all largely academic but if you happen to know the
> answer without doing any research do say as the details look interesting.

1. this EB 00 after write CR0 were never required, at least not by me.
2. setting PE does nothing on its own, the CPU remain in real mode until
the far jump which changes interpretation from segment to descriptor.
and its a 16:16 code without prefix

my RM->PM switches look like:
MOV eax,CR0
OR eax,1
MOV CR0,eax
push 0x20 ;prepared selectors
pop ds
push 0x20 ;20=flat data
pop es
push 0x10 ;10=restricted stack
pop ss
mov esp.0000xxxx
jmp 0018:PM16 or jmp 66 0028:PM32 or even jmp 66 0038:LM64

PMl6:
...
PM32:
...
__
wolfgang

James Harris

unread,

Feb 11, 2022, 6:41:27 AM2/11/22

to

On 10/02/2022 18:34, anti...@math.uni.wroc.pl wrote:
> James Harris <james.h...@gmail.com> wrote:

...

>> From memory, it is advised to put a jump immediately after enabling
>> Pmode so as to flush the instruction pipeline on early processors 386,
>> 486, and possibly Pentium before Pentium Pro. But is the pipeline
>> flushed before or after the jump is executed? IOW, is the jump itself
>> decoded in real mode or in PMode16 (the state the processor is put in by
>> the load of CR0)?
>
> Your question is confused. Instructions are decoded in parallel with
> execution of previous instructions. This may happen few clocks
> before execution. So _normally_ jump is decoded before CR0 bit 0
> is set, so decoded in real mode. But I can imagine rather obscure
> situation so that jump is decoded after CR0 bit 0 is set, so in
> protected mode.

The point is that the encoding of relative jumps appears to be the same
in Real Mode and 16-bit Protected Mode so why does it matter which mode
they are decoded in?

>
>> Perhaps all jump instructions assemble to the same bytes so it wouldn't
>> matter. Certainly a jump to the next instruction will be coded as EB00
>> in either 16-bit or 32-bit mode.
>
> That is recomended instruction, safe under all cirumstances.

So, it seems, are other jumps. And possibly most instructions.

Any idea which instructions would be coded differently in RM and PM16?

AFAICS almost none. I am beginning to suspect it's only direct memory
accesses (not relative ones such as relative jmp and call).

>
>> In fact, as the assembler (Nasm, in this case but the point is
>> presumably generally applicable) recognises only "bits 16" or "bits 32"
>> and not Rmode and Pmode perhaps instructions in PM16 decode exactly as
>> they do in Real mode.
>>
>> But then if the encodings are the same then why would flushing the
>> instruction queue be necessary?
>
> Correspondence between mnemonics and intruction bits is the same,
> so no need for extra assembler directive. But processor treats
> intruction bits differently depending on mode.

Can you think of an example other than direct memory accesses?

>
>> To be clear, where I believe a "bits 32" directive sits is /after/ the
>> far jump as in
>>
>> mov cr0, eax ;Set bit 0
>> ... (potentially a large number of instructions)
>> jmp seg:offset
>> bits 32
>> offset:
>>
>> It seems a bit of a conundrum and leads to the obvious question: exactly
>> what differences are there between instruction decoding in real mode and
>> in PM16 (the mode immediately after setting CR0 bit 0?
>
> Note: "canonical" sequence has _two_ jumps: one short jump to
> flush decode queue and long jump to load CS. Namely, setting
> CR0 bit 0 gives you 16 bit proteded mode. You switch to 32-bit
> mode by loading 32-bit segment descriptor.

Yes, and one can apparently mix the two modes - some segments in PM16
and some in PM32.

>
> IIRC simpler sequence failed (on actual 386). But I did my
> tests more than 25 years ago so I am not 100% confident in my
> memory. So, if you want to be sure or want to know what happens
> beyond simple fail/works, then get real 386 and run enough tests...
> OTOH I would not expect detailed explanations, during switch
> processor is in strange transitory mode so can exhibit
> weird behaviour. Intel documented how to avoid pitfals,
> but they have no interest in providing deeper explanation.

Indeed.

--
James Harris

James Harris

unread,

Feb 12, 2022, 6:08:44 AM2/12/22

to

On 10/02/2022 22:33, wolfgang kern wrote:
> On 10/02/2022 15:47, James Harris wrote:

I think I may have come up with a clearer insight into what happens at
each step of enabling Pmode - and it's very little! See below for details.

> ...
>> It seems a bit of a conundrum and leads to the obvious question:
>> exactly what differences are there between instruction decoding in
>> real mode and in PM16 (the mode immediately after setting CR0 bit 0?
>
>> As I say, this is all largely academic but if you happen to know the
>> answer without doing any research do say as the details look interesting.
>
> 1. this EB 00 after write CR0 were never required, at least not by me.

From what I've found recently it looks as though it would be rare for
anyone to need that jump. (Though it or something like it is still right
to include to cover the unusual cases.)

> 2. setting PE does nothing on its own, the CPU remain in real mode until
> the far jump which changes interpretation from segment to descriptor.
> and its a 16:16 code without prefix

I am not sure that's right, Wolfgang. I am beginning to think that once
PE is set the processor will be in 16-bit Protected Mode (PM16); in that
mode the encoding of instructions will be identical to RM; and the main
differences will be when loading segment registers. There may also be
some differences when /using/ segment registers but see below.

I have been looking at the 80286 manuals to find out more about PM16. It
turns out that it has the same addressing limitations as the 8086, i.e.

BP/BX + SI/DI + displacement

and apparently the same encoding.

(Ref. 80286 and 80287 programmer's manual 1987 section 2.3.3 and table 2-1)

If PM16 has the same encoding of instructions and of addressing modes as
RM then then the question arises as to what that jmp to flush the decode
queue is for?

After setting PE there will be zero or more of the following bytes
already decoded. (The number will vary according to the processor, its
caching, and the dynamic flow of prior instructions.) Those encodings
and meanings will be the same in PM16 as they were in RM for most
instructions so the question is Where does it matter?

I can think of two potential cases:

1. /Loading/ a segment register.

When a segment register is loaded in PM16 the processor gets a memory
descriptor from a descriptor table much as it does in PM32. The
descriptor includes a base and a limit. AFIACS this is the first
relevant difference from RM. A segment register reloaded in PM16 would
not necessarily refer to the same address as it would in RM. Consider a
sequence such as

lmsw ax ;Enter protected mode
mov ds, bx

In that, if the MOV has already been decoded then segment register DS
would be loaded with the value of BX but if the MOV had not been decoded
then DS would be loaded with the descriptor which BX selects.

2. Using a segment register.

The second potential scenario is /using/ a segment register. Consider

lmsw ax ;Enter protected mode
mov ax, [val]

The [val] reference would use DS so what is the code likely to do?

It is possible that a processor would trigger an interrupt to indicate
an exception because DS was not valid for PM. It could even lock up.

But there is another possibility. The 80286 and later CPUs could have
been designed to use the internal PM-type descriptors at all times, even
when running in RM. They would essentially need to make sure that when a
segment register is loaded in Real Mode that the base address, limit and
flags are set appropriately:

base = seg << 4
limit = 0xffff
flags = don't check anything

The hardware engineers could have taken either approach but I suspect
the latter. That's for three reasons. First, Pmode operation can subsume
that in Rmode allowing the difference to be only when segment registers
are loaded. Second, they did exactly that with the IVT. And third, they
would probably have had to do that for the code segment. I say that
because the CPU has to continue to retrieve bytes of code after the
switch to PM16 and as we've seen from Intel examples, such code could
continue for an arbitrary number of bytes before loading a new selector
into CS. So accesses relative to CS would /have/ to continue to use the
old interpretation of CS until the code reloaded it. It would be natural
to apply the same rules to DS and other segment registers.

IOW, after enabling Pmode the JMP instruction was required to ensure
that subsequent loads of segment registers used the Pmode interpretation
rather than the Rmode one ... and possibly nothing else.

Of course, it's still right to include an immediate jump for portability
and this is all speculation but ISTM that it does give a simple and
consistent model of the likely state of the processor between enabling
Pmode and reloading segment registers: Whether in Rmode or Pmode the PE
bit primarily determines what effect loading a segment register has.

In summary:

mov cr0, eax ;Enter Pmode
... nothing changed in the CPU other than the PE bit
... instructions using same encoding & addressing as in Rmode
mov ax, 8 ;Operates normally
jmp $ + 2
... still nothing else changed in the architectural state of the
... CPU except that we have ensured that the following segment
... register access will be interpreted correctly for Pmode
mov ds, ax
... DS now has base, limit and protections as loaded from GDT

That's it. Feel free to disagree.

But if I am right then it's amazing how little changes in the CPU
between each step.

--
James Harris

wolfgang kern

unread,

Feb 12, 2022, 7:01:30 AM2/12/22

to

On 12/02/2022 12:08, James Harris wrote:
...

>> 2. setting PE does nothing on its own, the CPU remain in real mode until
>> the far jump which changes interpretation from segment to descriptor.
>> and its a 16:16 code without prefix

> I am not sure that's right, Wolfgang. I am beginning to think that once
> PE is set the processor will be in 16-bit Protected Mode (PM16); in that
> mode the encoding of instructions will be identical to RM; and the main
> differences will be when loading segment registers. There may also be
> some differences when /using/ segment registers but see below.

...

> mov ds, ax
> ... DS now has base, limit and protections as loaded from GDT
>
> That's it. Feel free to disagree.

Yeah you're partly right :) the CPU isn't in PM unless you alter CS.
but write to a data segment register invokes UNREAL mode.
the only thing which is different with a set PE-bit is interpretation
changes from segment to descriptor but only when segreg is written to.

you could do after setting PE:
mov ds,[variable] ;the var is still a real mode DS address
;and DS became a descriptor after this.
also:
mov esp,[cs:d16] ;uses the current CS range

> But if I am right then it's amazing how little changes in the CPU
> between each step.

I see only one beside the final jump.

As long CS remain untouched there are no privilege checks, so it acts
like in real mode for ALL "otherwise protected" instructions.
That's why I said the change occur only on write CS.
OK I forgot the UNREAL exception here even I use that a lot.
__
wolfgang

James Harris

unread,

Feb 12, 2022, 11:35:52 AM2/12/22

to

On 12/02/2022 12:01, wolfgang kern wrote:
> On 12/02/2022 12:08, James Harris wrote:
> ...
>>> 2. setting PE does nothing on its own, the CPU remain in real mode until
>>>     the far jump which changes interpretation from segment to
>>> descriptor.
>>>     and its a 16:16 code without prefix
>
>> I am not sure that's right, Wolfgang. I am beginning to think that
>> once PE is set the processor will be in 16-bit Protected Mode (PM16);
>> in that mode the encoding of instructions will be identical to RM; and
>> the main differences will be when loading segment registers. There may
>> also be some differences when /using/ segment registers but see below.
> ...
>>    mov ds, ax
>>    ... DS now has base, limit and protections as loaded from GDT
>>
>> That's it. Feel free to disagree.
>
> Yeah you're partly right :) the CPU isn't in PM unless you alter CS.

Why could it not be in PM running a 16-bit code segment?

Remember the isolated nibble in the IA32 descriptor just after
Base31::24. Its bits can be seen as

GWZA

where
G = Granularity
W = Wide (0 means a 16-bit segment - yes, even in Pmode)
Z = Zero
A = Available for system programmer to use

That bit must be zero on the 286. Even on 386+, however, with W = 0 you
would still have a 16-bit segment. The default size of addresses and
operations would be 16-bit and I gather it would operate just as Real
Mode does except when segment registers are written to.

> but write to a data segment register invokes UNREAL mode.

OK.

> the only thing which is different with a set PE-bit is interpretation
> changes from segment to descriptor but only when segreg is written to.

Yes, that's what I was suggesting. And it would only matter when a
segreg is loaded.

>
> you could do after setting PE:
>    mov ds,[variable]      ;the var is still a real mode DS address
>                           ;and DS became a descriptor after this.
> also:
>    mov esp,[cs:d16]       ;uses the current CS range
>
>> But if I am right then it's amazing how little changes in the CPU
>> between each step.
>
> I see only one beside the final jump.
>
> As long CS remain untouched there are no privilege checks, so it acts
> like in real mode for ALL "otherwise protected" instructions.
> That's why I said the change occur only on write CS.
> OK I forgot the UNREAL exception here even I use that a lot.

Even with no change to CS wouldn't there be protection checks on data
accesses which have had their segment registers loaded? For example,
consider loading just ES as in

lmsw ax ;Switch to Pmode
jmp $ + 2
mov ax, 16
mov es, ax

After that I'd suggest that even though CS has not been reloaded

mov al, [es: val]

would include protection and range checks.

Furthermore, you could consider that accesses off DS would also include
checks but that the internal descriptor would have the limit set to
0xffff so nothing would be out of range.

Put another way, PM16 would look very like RM but it would not be RM
because loads of segment registers would have a different meaning; and
the only difference between RM and PM16 would be in how loads of segment
registers are carried out.

You say the CPU would still be operating in Rmode. I am putting forward
a suggestion that it would, in fact, be operating in Pm16 but that the
two are so similar that a lot of code would operate as though the CPU
were still in Rmode.

It makes sense! :-)

--
James Harris

wolfgang kern

unread,

Feb 12, 2022, 4:31:23 PM2/12/22

to

I could not see any protection on Data-ranges while in UnReal mode.
address beyond limits just wrap around (as it does within RM).
My Unreal DS is flat 4GB, but my stack is only 64K to match RM stack.

>> but write to a data segment register invokes UNREAL mode.
> OK.
>> the only thing which is different with a set PE-bit is interpretation
>> changes from segment to descriptor but only when segreg is written to.
> Yes, that's what I was suggesting. And it would only matter when a
> segreg is loaded.

>> As long CS remain untouched there are no privilege checks, so it acts
>> like in real mode for ALL "otherwise protected" instructions.
>> That's why I said the change occur only on write CS.
>> OK I forgot the UNREAL exception here even I use that a lot.
>
> Even with no change to CS wouldn't there be protection checks on data
> accesses which have had their segment registers loaded? For example,
> consider loading just ES as in
>
> lmsw ax ;Switch to Pmode
> jmp $ + 2
> mov ax, 16
> mov es, ax
>
> After that I'd suggest that even though CS has not been reloaded
>
> mov al, [es: val]
>
> would include protection and range checks.

I can't confirm this, but to be honest I never tried by intention,
and I figured a typo with DS==0x2B instead of 0x28 after many years
w/o causing any access restrictions.

> Furthermore, you could consider that accesses off DS would also include
> checks but that the internal descriptor would have the limit set to
> 0xffff so nothing would be out of range.

You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM :)

> Put another way, PM16 would look very like RM but it would not be RM
> because loads of segment registers would have a different meaning; and
> the only difference between RM and PM16 would be in how loads of segment
> registers are carried out.
>
> You say the CPU would still be operating in Rmode. I am putting forward
> a suggestion that it would, in fact, be operating in Pm16 but that the
> two are so similar that a lot of code would operate as though the CPU
> were still in Rmode.

try forcing exceptions in Unreal mode to see which handlers react.

> It makes sense! :-)

OK, some very old stuff may work different...
__
wolfgang

James Harris

unread,

Feb 13, 2022, 10:50:44 AM2/13/22

to

On 12/02/2022 21:31, wolfgang kern wrote:
> On 12/02/2022 17:35, James Harris wrote:
>> On 12/02/2022 12:01, wolfgang kern wrote:
>>> On 12/02/2022 12:08, James Harris wrote:

...

>> Why could it not be in PM running a 16-bit code segment?

...

>>    G = Granularity
>>    W = Wide (0 means a 16-bit segment - yes, even in Pmode)
>>    Z = Zero
>>    A = Available for system programmer to use
>>
>> That bit must be zero on the 286. Even on 386+, however, with W = 0
>> you would still have a 16-bit segment. The default size of addresses
>> and operations would be 16-bit and I gather it would operate just as
>> Real Mode does except when segment registers are written to.
>
> I could not see any protection on Data-ranges while in UnReal mode.
> address beyond limits just wrap around (as it does within RM).
> My Unreal DS is flat 4GB, but my stack is only 64K to match RM stack.

Not sure what you mean but AIUI the B bit (big bit) of the SS descriptor
selects the size of stack pointer (32-bit ESP or 16-bit SP) used for
implicit stack references.

Rather than having all segments 32-bit or all segments 16-bit it is
looking more and more likely that a programmer could use any arbitrary
mix of 16-bit and 32-bit segments - even on current processors - so
having a 'big' code segment would make operands and addresses default to
32-bit while simultaneously having a 'small' stack segment would make
implicit stack references use SP rather than ESP.

Further, loading CS with a selector for a 32-bit ('big') descriptor will
only affect the code segment. One or more data segments could still be
16-bit.

All this would make Real mode little more than a subset of Protected
mode. Or, put another way, one could say that Real mode *is* Protected
mode with:

1. certain values in the segment descriptors
2. different rules as to what it means to load a segment register

and very little else.

...

>>    lmsw ax   ;Switch to Pmode
>>    jmp $ + 2
>>    mov ax, 16
>>    mov es, ax
>>
>> After that I'd suggest that even though CS has not been reloaded
>>
>>    mov al, [es: val]
>>
>> would include protection and range checks.
>
> I can't confirm this, but to be honest I never tried by intention,
> and I figured a typo with DS==0x2B instead of 0x28 after many years
> w/o causing any access restrictions.

OK. Using 2B rather than 28 (and I can see why they might be confused
visually!) would set the bottom two bits which I think would give that
selector user privilege rather than supervisor privilege.

>
>> Furthermore, you could consider that accesses off DS would also
>> include checks but that the internal descriptor would have the limit
>> set to 0xffff so nothing would be out of range.
>
> You could setup smaller than 64K limits on Unreal Data segments.
> this might raise a real mode exception because still in RM :)

Or PM16. :-)

Whoever wrote the Wikipedia article on Unreal mode seems to back up my
supposition. After saying that Unreal mode is not really a separate
addressing mode it says:

... the 80286 and all later x86 processors use the base address, size
and other attributes stored in their internal segment descriptor cache
whenever computing effective memory addresses, even in real mode.

https://en.wikipedia.org/wiki/Unreal_mode

--
James Harris

Scott Lurndal

unread,

Feb 13, 2022, 12:17:30 PM2/13/22

to

James Harris <james.h...@gmail.com> writes:
>On 10/02/2022 22:33, wolfgang kern wrote:
>> On 10/02/2022 15:47, James Harris wrote:
>
>I think I may have come up with a clearer insight into what happens at
>each step of enabling Pmode - and it's very little! See below for details.
>
>> ...
>>> It seems a bit of a conundrum and leads to the obvious question:
>>> exactly what differences are there between instruction decoding in
>>> real mode and in PM16 (the mode immediately after setting CR0 bit 0?
>>
>>> As I say, this is all largely academic but if you happen to know the
>>> answer without doing any research do say as the details look interesting.
>>
>> 1. this EB 00 after write CR0 were never required, at least not by me.
>
> From what I've found recently it looks as though it would be rare for
>anyone to need that jump. (Though it or something like it is still right
>to include to cover the unusual cases.)
>
>> 2. setting PE does nothing on its own, the CPU remain in real mode until
>> the far jump which changes interpretation from segment to descriptor.
>> and its a 16:16 code without prefix
>
>I am not sure that's right, Wolfgang. I am beginning to think that once
>PE is set the processor will be in 16-bit Protected Mode (PM16); in that
>mode the encoding of instructions will be identical to RM; and the main
>differences will be when loading segment registers. There may also be
>some differences when /using/ segment registers but see below.

I suspect that this behavior varies based on the processor generation.

The original 80286 in 1982 (design started in the late 70's) is neither
pipelined nor superscalar.

While it is likely that the 8086 (1978) used a PLA
to drive the instruction decode and execution, much like the 6502 (1975),
thats definitely not the case for the 80286.

https://www.pagetable.com/?p=39 (the 6502 block diagram here is blurry, a
better version is available via google images).

The 80286 broke the chip up in to unit blocks (Instruction, Bus, Execution
and Address), where the instruction unit produced a queue (FIFO in hardware
terms) of decode instuctions passed to the execution unit.

This meant that there is a 'depth-of-decode-queue' window between
decoding and executing an instruction; for those instructions whose
decode stage involved knowledge of the PM flag but were decoded
before the instruction to set the flag was executed, they'll be
executing in an indeterminate state (unless the programmer knows
the absolute depth of the queue under all circumstances, the
programmer cannot make any assumptions about the environment of
any instructions between setting the PM flag and loading the CS
register via a jump instruction.

https://electronicsdesk.com/80286-microprocessor.html

wolfgang kern

unread,

Feb 13, 2022, 4:14:18 PM2/13/22

to

On 13/02/2022 16:50, James Harris wrote:
...
>>> Why could it not be in PM running a 16-bit code segment?

at which address would you see such PM16 code?
while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
PM16 blocks can reside anywhere within 4GB.
...

>> I could not see any protection on Data-ranges while in UnReal mode.
>> address beyond limits just wrap around (as it does within RM).
>> My Unreal DS is flat 4GB, but my stack is only 64K to match RM stack.

> Not sure what you mean but AIUI the B bit (big bit) of the SS descriptor
> selects the size of stack pointer (32-bit ESP or 16-bit SP) used for
> implicit stack references.

I use esp with upper half zeroed and 64k limit in the HMA for both RM
and PM32 because all my code in all modes share one single stack.

> Rather than having all segments 32-bit or all segments 16-bit it is
> looking more and more likely that a programmer could use any arbitrary
> mix of 16-bit and 32-bit segments - even on current processors - so
> having a 'big' code segment would make operands and addresses default to
> 32-bit while simultaneously having a 'small' stack segment would make
> implicit stack references use SP rather than ESP.

I have 16 bit RM code alive beside PM16, PM32 and LM64.
*true RM for BIOS calls and fastest hardware needs.
*UnReal during startup and mode switches
*PM16 as intermediate links and one protected core
*PM32 all OS specific
*LM64 except for the switches only used for user data.

> Further, loading CS with a selector for a 32-bit ('big') descriptor will
> only affect the code segment. One or more data segments could still be
> 16-bit.

been there :) it crashed if you leave data selectors RM styled.

> All this would make Real mode little more than a subset of Protected
> mode. Or, put another way, one could say that Real mode *is* Protected
> mode with:

> 1. certain values in the segment descriptors
> 2. different rules as to what it means to load a segment register

just a point of view matter ?
....

>>> mov al, [es: val]
>>> would include protection and range checks.

>> I can't confirm this, but to be honest I never tried by intention,
>> and I figured a typo with DS==0x2B instead of 0x28 after many years
>> w/o causing any access restrictions.

> OK. Using 2B rather than 28 (and I can see why they might be confused
> visually!) would set the bottom two bits which I think would give that
> selector user privilege rather than supervisor privilege.

>>> Furthermore, you could consider that accesses off DS would also
>>> include checks but that the internal descriptor would have the limit
>>> set to 0xffff so nothing would be out of range.

>> You could setup smaller than 64K limits on Unreal Data segments.
>> this might raise a real mode exception because still in RM :)

> Or PM16. :-)

:) look at the AMD pages which lists RM<>PM instruction differences,
and then check on a few to see if it is in RM or PM16.
hint: some instructions are privileged and a few aren't allowed.

> Whoever wrote the Wikipedia article on Unreal mode seems to back up my
> supposition. After saying that Unreal mode is not really a separate
> addressing mode it says:

> ... the 80286 and all later x86 processors use the base address, size
> and other attributes stored in their internal segment descriptor cache
> whenever computing effective memory addresses, even in real mode.
>
> https://en.wikipedia.org/wiki/Unreal_mode

sure, this internal descriptors defaults to RM limits.
But too many gadgeteers and wannabees made wiki no more trustworthy,
all is pretty dated there anyway.
__
wolfgang

wolfgang kern

unread,

Feb 14, 2022, 4:56:45 AM2/14/22

to

On 13/02/2022 16:50, James Harris wrote:

[about stack...]

> Not sure what you mean but AIUI the B bit (big bit) of the SS descriptor
> selects the size of stack pointer (32-bit ESP or 16-bit SP) used for
> implicit stack references.

> Rather than having all segments 32-bit or all segments 16-bit it is
> looking more and more likely that a programmer could use any arbitrary
> mix of 16-bit and 32-bit segments - even on current processors - so
> having a 'big' code segment would make operands and addresses default to
> 32-bit while simultaneously having a 'small' stack segment would make
> implicit stack references use SP rather than ESP.

you mean implicit stack references (all push pop call return)?

BUT how about
PM32:
8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
RM:
67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

and I'm not sure yet if my mixed code CALL/RET work on SP only due to
my 16 bit stack. OK I use 66 c3 and 66 E8xxxxxxxx here and there and my
esp is always in 16 bit range (initially decided to fit BIOS calls).
So I never noticed it's using only SP.
__
wolfgang

James Harris

unread,

Feb 14, 2022, 10:55:47 AM2/14/22

to

On 14/02/2022 09:56, wolfgang kern wrote:
> On 13/02/2022 16:50, James Harris wrote:
>
> [about stack...]

I have previously only ever had to think about Real Mode (which is
always 16-bit) or the form of Protected Mode which is entirely 32-bit,
i.e. where all segments are 32-bit. The idea of having PMode where some
segments are 16-bit and others are 32-bit is entirely new to me but I
think it is yielding insights into how the processor works.

>> Not sure what you mean but AIUI the B bit (big bit) of the SS
>> descriptor selects the size of stack pointer (32-bit ESP or 16-bit SP)
>> used for implicit stack references.
>
>> Rather than having all segments 32-bit or all segments 16-bit it is
>> looking more and more likely that a programmer could use any arbitrary
>> mix of 16-bit and 32-bit segments - even on current processors - so
>> having a 'big' code segment would make operands and addresses default
>> to 32-bit while simultaneously having a 'small' stack segment would
>> make implicit stack references use SP rather than ESP.
>
> you mean implicit stack references (all push pop call return)?

Yes.

When working with a mix of 16-bit and 32-bit segments it seems there are
at least THREE sizes we need to be aware of.

Address Size
Operand Size
Stack Address Size

The following definition of PUSH depends on the latter two. Note that
whether to use SP or ESP depends on StackAddrSize:

IF StackAddrSize = 16
THEN
IF OperandSize = 16 THEN
SP := SP - 2;
(SS:SP) := (SOURCE); (* word assignment *)
ELSE
SP := SP - 4;
(SS:SP) := (SOURCE); (* dword assignment *)
FI;
ELSE (* StackAddrSize = 32 *)
IF OperandSize = 16
THEN
ESP := ESP - 2;
(SS:ESP) := (SOURCE); (* word assignment *)
ELSE
ESP := ESP - 4;
(SS:ESP) := (SOURCE); (* dword assignment *)
FI;
FI;

Ref: http://www.scs.stanford.edu/05au-cs240c/lab/i386/PUSH.htm

So where are such sizes defined?

AddressSize and OperandSize are defined by the code segment's D bit
possibly overridden with an AdSiz or OpSiz prefix. (I think that bit
also determines how instruction bytes are interpreted. The 16-bit RM and
PM16 have essentially the same encodings but the encodings for 32-bit
code are different.)

StackAddrSize is more interesting - at least to me. Where is it defined?
It turns out that: "The stack address-size attribute is controlled by
the B-bit of the data-segment descriptor in the SS register."

Ref: http://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_01.htm

So that's the key: whether to use SP or ESP depends on the B bit in the
SS descriptor.

Notably, while the B bit sets only StackAddrSize the D bit sets the
defaults for both AddressSize and OperandSize.

BTW, the D bit is what Intel calls the B bit when it's on a code
descriptor. I don't know why they didn't use the same name.

>
> BUT how about
> PM32:
> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
> RM:
> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

How do you interpret those?

BTW, what happens when referring to BP or EBP as in

mov eax, [ebp + 4]
sub ebp, 8

Does such code use the SS descriptor's B bit?

>
> and I'm not sure yet if my mixed code CALL/RET work on SP only due to
> my 16 bit stack. OK I use 66 c3 and 66 E8xxxxxxxx here and there and my
> esp is always in 16 bit range (initially decided to fit BIOS calls).
> So I never noticed it's using only SP.

Does the above help?

--
James Harris

James Harris

unread,

Feb 14, 2022, 11:10:10 AM2/14/22

to

On 13/02/2022 21:14, wolfgang kern wrote:
> On 13/02/2022 16:50, James Harris wrote:

>>>> Why could it not be in PM running a 16-bit code segment?
>
> at which address would you see such PM16 code?

Within range 0 to 64k. See below.

> while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
> PM16 blocks can reside anywhere within 4GB.

I don't think so: PM16 is limited to 64k code segments. The following is
from the 80286 documentation.

The programmer views the virtual address space on the 80286 as a
collection of up to sixteen thousand linear subspaces, each with a
specified size or length. Each of these linear address spaces is called
a segment. A segment is a logical unit of contiguous memory. Segment
sizes may range from one byte up to 64K (65,536) bytes.

...

>> Further, loading CS with a selector for a 32-bit ('big') descriptor
>> will only affect the code segment. One or more data segments could
>> still be 16-bit.
>
> been there :) it crashed if you leave data selectors RM styled.

It should work, AFAICS.

>
>> All this would make Real mode little more than a subset of Protected
>> mode. Or, put another way, one could say that Real mode *is* Protected
>> mode with:
>
>> 1. certain values in the segment descriptors
>> 2. different rules as to what it means to load a segment register
>
> just a point of view matter ?

To me this is more about gaining an insight into what is likely
happening inside the processor, and thereby making it easier to understand.

...

>>>> Furthermore, you could consider that accesses off DS would also
>>>> include checks but that the internal descriptor would have the limit
>>>> set to 0xffff so nothing would be out of range.
>
>>> You could setup smaller than 64K limits on Unreal Data segments.
>>> this might raise a real mode exception because still in RM :)
>
>> Or PM16. :-)
>
> :) look at the AMD pages which lists RM<>PM instruction differences,
> and then check on a few to see if it is in RM or PM16.
> hint: some instructions are privileged and a few aren't allowed.

What about RM code? ISTM that a lot of RM code could work unchanged in PM16.

--
James Harris

James Harris

unread,

Feb 14, 2022, 11:23:03 AM2/14/22

to

On 13/02/2022 17:17, Scott Lurndal wrote:
> James Harris <james.h...@gmail.com> writes:
>> On 10/02/2022 22:33, wolfgang kern wrote:

...

>>> 1. this EB 00 after write CR0 were never required, at least not by me.
>>
>> From what I've found recently it looks as though it would be rare for
>> anyone to need that jump. (Though it or something like it is still right
>> to include to cover the unusual cases.)

...

> I suspect that this behavior varies based on the processor generation.

Yes, if you are talking about the length of the decode queue then I
agree. It will depend on the specific processor and it's
non-architectural so cannot be predetermined.

...

> This meant that there is a 'depth-of-decode-queue' window between
> decoding and executing an instruction; for those instructions whose
> decode stage involved knowledge of the PM flag but were decoded
> before the instruction to set the flag was executed, they'll be
> executing in an indeterminate state (unless the programmer knows
> the absolute depth of the queue under all circumstances, the
> programmer cannot make any assumptions about the environment of
> any instructions between setting the PM flag and loading the CS
> register via a jump instruction.

Yes, it's possible that all instructions would decode under the wrong
assumptions in certain processors; one would have to run tests to find
out for sure. But it's interesting that PM16 uses the /same/ instruction
encodings and addressing limitations as RM, and that the main semantic
differences are in the interpretation of loading segment registers.

--
James Harris

wolfgang kern

unread,

Feb 15, 2022, 9:07:12 PM2/15/22

to

On 14/02/2022 16:55, James Harris wrote:

>> [about stack...]

> I have previously only ever had to think about Real Mode (which is
> always 16-bit) or the form of Protected Mode which is entirely 32-bit,
> i.e. where all segments are 32-bit. The idea of having PMode where some
> segments are 16-bit and others are 32-bit is entirely new to me but I
> think it is yielding insights into how the processor works.

I played a lot around with these options and finally decided for a mix.

[the Big bit...]>> you mean implicit stack references (all push pop call

return)?
> Yes.

> When working with a mix of 16-bit and 32-bit segments it seems there are
> at least THREE sizes we need to be aware of.

...

> So where are such sizes defined?

...
I knew all that :) just didn't remember because there were no problems.

>> BUT how about
>> PM32:
>> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
>> RM:
>> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

> How do you interpret those?

my disassembler do this for me.

>
> BTW, what happens when referring to BP or EBP as in
>
> mov eax, [ebp + 4]
> sub ebp, 8
>
> Does such code use the SS descriptor's B bit?

Yes, at least on CPUs which still support the B bit.

>> and I'm not sure yet if my mixed code CALL/RET work on SP only due to
>> my 16 bit stack. OK I use 66 c3 and 66 E8xxxxxxxx here and there and
>> my esp is always in 16 bit range (initially decided to fit BIOS calls).
>> So I never noticed it's using only SP.

> Does the above help?

:) thanks it helped to remember.
__
wolfgang

wolfgang kern

unread,

Feb 15, 2022, 10:01:40 PM2/15/22

to

On 14/02/2022 17:10, James Harris wrote:

>>>>> Why could it not be in PM running a 16-bit code segment?

>> at which address would you see such PM16 code?

> Within range 0 to 64k. See below.

this would mean starting from 0000:0000
but what value will the CS descriptor have then ?

>> while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
>> PM16 blocks can reside anywhere within 4GB.

> I don't think so: PM16 is limited to 64k code segments. The following is
> from the 80286 documentation.

> The programmer views the virtual address space on the 80286 as a
> collection of up to sixteen thousand linear subspaces, each with a
> specified size or length. Each of these linear address spaces is called
> a segment. A segment is a logical unit of contiguous memory. Segment
> sizes may range from one byte up to 64K (65,536) bytes.

me too owe ye olde 286 manuals.
but later stuff said:

Protected Mode.
In this mode, the processor supports virtual-memory and physical-memory
spaces of 4 Gbytes and operand sizes of 16 or 32 bits. All segment
translation, segment protection, and hardware multitasking functions are
available. System software can use segmentation to relocate effective
addresses in virtual-address space. If paging is not enabled, virtual
addresses are equal to physical addresses. Paging can be optionally
enabled to allow translation of virtual addresses to physical addresses
and to use the page-based memory-protection mechanisms.

I once tested and set the base of an PM16 code descriptor to 3rd GB.
it worked but there was no much sense in doing that.

>>> Further, loading CS with a selector for a 32-bit ('big') descriptor
>>> will only affect the code segment. One or more data segments could
>>> still be 16-bit.

>> been there :) it crashed if you leave data selectors RM styled.

> It should work, AFAICS.

this BIG REAL mode (had been tried by a few brave enough) suffers a lot.
it can't use INT, IRQ and EXC by normal means. the workaround is a PITA.

>>> All this would make Real mode little more than a subset of Protected
>>> mode. Or, put another way, one could say that Real mode *is*
>>> Protected mode with:

>>> 1. certain values in the segment descriptors
>>> 2. different rules as to what it means to load a segment register

>> just a point of view matter ?

I see it the other way: PM came a long while after RM.

> To me this is more about gaining an insight into what is likely
> happening inside the processor, and thereby making it easier to understand.

But not much to learn if you're stuck with 286 ...:)

>>>>> Furthermore, you could consider that accesses off DS would also
>>>>> include checks but that the internal descriptor would have the
>>>>> limit set to 0xffff so nothing would be out of range.

>>>> You could setup smaller than 64K limits on Unreal Data segments.
>>>> this might raise a real mode exception because still in RM :)

>>> Or PM16. :-)

>> :) look at the AMD pages which lists RM<>PM instruction differences,
>> and then check on a few to see if it is in RM or PM16.
>> hint: some instructions are privileged and a few aren't allowed.

> What about RM code? ISTM that a lot of RM code could work unchanged in
> PM16.

again: look at privileged instructions, work in RM but may fail in PM.
__
wolfgang

James Harris

unread,

Feb 21, 2022, 2:04:00 AM2/21/22

to

On 16/02/2022 03:01, wolfgang kern wrote:
> On 14/02/2022 17:10, James Harris wrote:
>
>>>>>> Why could it not be in PM running a 16-bit code segment?
>
>>> at which address would you see such PM16 code?
>
>> Within range 0 to 64k. See below.
>
> this would mean starting from 0000:0000
> but what value will the CS descriptor have then ?

Your question may be rhetorical but if not then I'd say that the CS
descriptor would have the D bit (the B bit by any other name) as zero.

>
>>> while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
>>> PM16 blocks can reside anywhere within 4GB.
>
>> I don't think so: PM16 is limited to 64k code segments. The following
>> is from the 80286 documentation.
>
>> The programmer views the virtual address space on the 80286 as a
>> collection of up to sixteen thousand linear subspaces, each with a
>> specified size or length. Each of these linear address spaces is
>> called a segment. A segment is a logical unit of contiguous memory.
>> Segment sizes may range from one byte up to 64K (65,536) bytes.
>
> me too owe ye olde 286 manuals.

If you want to find definitions of ye olde PM16 then ye olde manuals are
the place to look. :)

> but later stuff said:
>
> Protected Mode.
> In this mode, the processor supports virtual-memory and physical-memory
> spaces of 4 Gbytes and operand sizes of 16 or 32 bits. All segment
> translation, segment protection, and hardware multitasking functions are
> available. System software can use segmentation to relocate effective
> addresses in virtual-address space. If paging is not enabled, virtual
> addresses are equal to physical addresses. Paging can be optionally
> enabled to allow translation of virtual addresses to physical addresses
> and to use the page-based memory-protection mechanisms.

I can't see the relevance of that. If it's not clear, the point in what
I posted from the 80286 manual was:

"Segment sizes may range from one byte up to 64K (65,536) bytes."

IOW PM16 segments cannot be larger than 64k so when you say that PM16
blocks can be anywhere in 4GB then I don't think that can be right. It
is true of what's called Unreal Mode (386 and above) but not of PM16
(286 and above).

If you mean PM16 "segments" then I think that still cannot be right.
PM16 is limited to 24 bits.

...

>>>> All this would make Real mode little more than a subset of Protected
>>>> mode. Or, put another way, one could say that Real mode *is*
>>>> Protected mode with:
>
>>>> 1. certain values in the segment descriptors
>>>> 2. different rules as to what it means to load a segment register
>
>>> just a point of view matter ?
>
> I see it the other way: PM came a long while after RM.
>
>> To me this is more about gaining an insight into what is likely
>> happening inside the processor, and thereby making it easier to
>> understand.
>
> But not much to learn if you're stuck with 286 ...:)

Well, don't modern CPUs still support the 80286's PM16, and isn't PM16
the mode that they are switched in to when CR0.PE is set before any
segment register is loaded?

>
>>>>>> Furthermore, you could consider that accesses off DS would also
>>>>>> include checks but that the internal descriptor would have the
>>>>>> limit set to 0xffff so nothing would be out of range.
>
>>>>> You could setup smaller than 64K limits on Unreal Data segments.
>>>>> this might raise a real mode exception because still in RM :)
>
>>>> Or PM16. :-)
>
>>> :) look at the AMD pages which lists RM<>PM instruction differences,
>>> and then check on a few to see if it is in RM or PM16.
>>> hint: some instructions are privileged and a few aren't allowed.
>
>> What about RM code? ISTM that a lot of RM code could work unchanged in
>> PM16.
> again: look at privileged instructions, work in RM but may fail in PM.

RM would have to work very hard to have privileged instructions. ;-)

Everything I've found recently suggests that after CR0.PE is set (and
the prefetch queue flushed if necessary) the processor will be in
Protected Mode whereas you, IIUC, are sure it would remain in Real Mode
until CS is reloaded.

If so, can you think of an instruction or sequence which would
distinguish between the two?

--
James Harris

James Harris

unread,

Feb 21, 2022, 7:14:01 AM2/21/22

to

On 16/02/2022 02:07, wolfgang kern wrote:
> On 14/02/2022 16:55, James Harris wrote:

...

>>> BUT how about
>>> PM32:
>>> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
>>> RM:
>>> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack
>
>> How do you interpret those?
>
> my disassembler do this for me.

:) I was asking what you thought they meant (in the context of
different descriptor settings).

I've been looking in to this a bit more and will have a go at it. See if
you think I've got it right or not.

Your first example:

PM32:
8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?

AIUI:

* that will NOT use the SS segment's 'B' bit (which selects SP or ESP
but does so only for /implicit/ references to the stack)

* it will use the CS segment's D bit being set to 1 in two ways:
1) it will have adsiz as 32-bit so recognising ESP rather than SP
2) it will have opsiz as 32-bit so recognising EAX rather than AX

* the accessible range will depend on DS.limit, not SS.limit

* it will access ESP relative to DS.base, not relative to SS.base -
which could be a source of significant confusion if they don't match.

Your second example:

RM:
67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

* again, won't use anything about the SS segment

* will use CS.D=0 to recognise AX

* will use CS.D=0 and adsiz to recognise ESP

* addressable range will depend on DS.limit, not SS.limit

* linear address will depend on DS.base, not SS.base.

Note that it appears that the Big bit on DS (i.e. DS.B) is ignored even
though the instructions access DS. Also, the base and limit and
everything else from SS are ignored for those instructions.

The findings may be unexpected (they certainly surprised me) but see the
section entitled Segment Descriptors in CHAPTER 3 of PROTECTED-MODE
MEMORY MANAGEMENT in Intel manuals.

>>
>> BTW, what happens when referring to BP or EBP as in
>>
>> mov eax, [ebp + 4]
>> sub ebp, 8
>>
>> Does such code use the SS descriptor's B bit?
>
> Yes, at least on CPUs which still support the B bit.

Any advance on "Yes"? :) As above, the Segment Descriptors section of
the Intel manual suggests otherwise.

--
James Harris

wolfgang kern

unread,

Feb 21, 2022, 9:30:48 AM2/21/22

to

On 21/02/2022 13:13, James Harris wrote:

>>>> BUT how about
>>>> PM32:
>>>> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
>>>> RM:
>>>> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

>>> How do you interpret those?
>> my disassembler do this for me.
> :) I was asking what you thought they meant (in the context of
> different descriptor settings).

> I've been looking in to this a bit more and will have a go at it. See if
> you think I've got it right or not.

can't give you an A here ...
both of my and your example instructions use SS by default.

> Your first example:
> PM32:
> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
> AIUI:
> * that will NOT use the SS segment's 'B' bit (which selects SP or ESP
> but does so only for /implicit/ references to the stack)

> * it will use the CS segment's D bit being set to 1 in two ways:
> 1) it will have adsiz as 32-bit so recognising ESP rather than SP
> 2) it will have opsiz as 32-bit so recognising EAX rather than AX

> * the accessible range will depend on DS.limit, not SS.limit
>
> * it will access ESP relative to DS.base, not relative to SS.base -
> which could be a source of significant confusion if they don't match.

My SS use a PM16 data-descriptor and my DS a Flat PM32 one.
No confusion detected :)

> Your second example:
> RM:
> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

> * again, won't use anything about the SS segment
> * will use CS.D=0 to recognise AX

yeah, RM uses 16 bit for AX by default, but 67 allow SIB (uses SS)

> * will use CS.D=0 and adsiz to recognise ESP

there aren't any [SP] addressing modes in RM nor in PM except
PUSH/POP/CALL/RET.

> * addressable range will depend on DS.limit, not SS.limit
> * linear address will depend on DS.base, not SS.base.

this is wrong!

> Note that it appears that the Big bit on DS (i.e. DS.B) is ignored even
> though the instructions access DS. Also, the base and limit and
> everything else from SS are ignored for those instructions.

> The findings may be unexpected (they certainly surprised me) but see the
> section entitled Segment Descriptors in CHAPTER 3 of PROTECTED-MODE
> MEMORY MANAGEMENT in Intel manuals.

such false statements are really printed in Intel docs ?
perhaps things changed meanwhile (40 years later) 80186/80286 books may
not apply to 80386++.

My AMD docs seem to follow the truth and I can confirm that all
[eBP/eSP]-based instructions including SIB-styled use SS.

>>> BTW, what happens when referring to BP or EBP as in
>>>
>>> mov eax, [ebp + 4]
>>> sub ebp, 8
>>>
>>> Does such code use the SS descriptor's B bit?

>> Yes, at least on CPUs which still support the B bit.

> Any advance on "Yes"? :) As above, the Segment Descriptors section of
> the Intel manual suggests otherwise.

__
wolfgang

wolfgang kern

unread,

Feb 21, 2022, 9:33:58 AM2/21/22

to

On 21/02/2022 08:03, James Harris wrote:
...

>> What about RM code? ISTM that a lot of RM code could work unchanged
>>> in PM16.
>> again: look at privileged instructions, work in RM but may fail in PM.
>
> RM would have to work very hard to have privileged instructions. ;-)
>
> Everything I've found recently suggests that after CR0.PE is set (and
> the prefetch queue flushed if necessary) the processor will be in
> Protected Mode whereas you, IIUC, are sure it would remain in Real Mode
> until CS is reloaded.

> If so, can you think of an instruction or sequence which would
> distinguish between the two?

give me some time to check in deep detail, I'll be back on this soon.
__
wolfgang

Scott Lurndal

unread,

Feb 21, 2022, 10:48:14 AM2/21/22

to

wolfgang kern <now...@nevernet.at> writes:
>On 21/02/2022 13:13, James Harris wrote:
>

>> The findings may be unexpected (they certainly surprised me) but see the
>> section entitled Segment Descriptors in CHAPTER 3 of PROTECTED-MODE
>> MEMORY MANAGEMENT in Intel manuals.
>
>such false statements are really printed in Intel docs ?
>perhaps things changed meanwhile (40 years later) 80186/80286 books may
>not apply to 80386++.
>
>My AMD docs seem to follow the truth and I can confirm that all
>[eBP/eSP]-based instructions including SIB-styled use SS.

The QEMU source base is a useful reference for these types of
questions.

wolfgang kern

unread,

Feb 21, 2022, 12:18:23 PM2/21/22

to

On 21/02/2022 15:33, wolfgang kern wrote:

>> If so, can you think of an instruction or sequence which would
>> distinguish between the two?

> give me some time to check in deep detail, I'll be back on this soon.

here we go:
this works in RM as long it points to an far RETURN:
you can test this yourself
9A xxxx 0000 call far 0000:xxxx ;raise exception if PM

PM only (raise invalid opcode exception in RM):
63 xx ARPL (seems obsolete)
0F 00 /1 STR an easy test possibility ie:
0F 00 c8 STR ax |eax |rax
0F 00 08 xx STR [mem16] ; all except RM

0F 00 /0 SLDT r16/r32/r64/m16
0F 00 /2 LLDT rm16

0F 00 /3 LTR
0F 02 /r LAR
0F 03 /r,[m] LSL

these work in UNreal after correct loaded
0F 00 /4 VERR
0F 00 /5 VERW
__
wolfgang

James Harris

unread,

Feb 21, 2022, 1:02:29 PM2/21/22

to

On 21/02/2022 14:30, wolfgang kern wrote:
> On 21/02/2022 13:13, James Harris wrote:
>
>>>>> BUT how about
>>>>> PM32:
>>>>> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
>>>>> RM:
>>>>> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack
>
>>>> How do you interpret those?
>>> my disassembler do this for me.
>> :) I was asking what you thought they meant (in the context of
>> different descriptor settings).
>
>> I've been looking in to this a bit more and will have a go at it. See
>> if you think I've got it right or not.
>
> can't give you an A here ...
> both of my and your example instructions use SS by default.

You are right about using SS. Since the earlier post I've found in the
Intel manual under Default Segment Selection Rules the text for uses of
SS and it mentions ESP as well as EBP: "Any memory reference which uses
the ESP or EBP register as a base register." I'd missed off ESP. I will
correct my comments, below, though please don't get diverted by that
because the selection of base register is a side issue compared with the
significance(s) of the D/B bit.

I should say that IME assemblers have no directives for RM or PM32 but
only for "16-bit" and "32-bit" and AISI they are correct in that because
for both of the 16-bit modes (RM and PM16) the instruction encodings are
identical.

Therefore, AISI your second example (which you labelled "RM") cannot
really be RM as it refers to ESP so I presume you mean PM16 (which uses
16-bit encodings but can refer to 32-bit registers).

BTW, to be clear, you referred to /my/ examples but I just copied yours
so there are only two examples, not four.

>
>> Your first example:
>> PM32:
>> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
>> AIUI:
>> * that will NOT use the SS segment's 'B' bit (which selects SP or ESP
>> but does so only for /implicit/ references to the stack)

I stand by that part of the assessment (until told otherwise!). The B
bit is only used for implicit stack operations and is ignored for
explicit ones.

>
>> * it will use the CS segment's D bit being set to 1 in two ways:
>> 1) it will have adsiz as 32-bit so recognising ESP rather than SP
>> 2) it will have opsiz as 32-bit so recognising EAX rather than AX

I stand by that, too.

>
>> * the accessible range will depend on DS.limit, not SS.limit
>>
>> * it will access ESP relative to DS.base, not relative to SS.base -
>> which could be a source of significant confusion if they don't match.
>
> My SS use a PM16 data-descriptor and my DS a Flat PM32 one.
> No confusion detected :)

Agreed. If MOV EAX,[ESP - 4] uses ESP as a base register then it /will/
use SS.limit and SS.base - although for that instruction the CPU will
still ignore SS.B. Agreed?

>
>> Your second example:
>> RM:
>> 67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack
>
>> * again, won't use anything about the SS segment
>> * will use CS.D=0 to recognise AX
>
> yeah, RM uses 16 bit for AX by default, but 67 allow SIB (uses SS)

Agreed.

>
>> * will use CS.D=0 and adsiz to recognise ESP
>
> there aren't any [SP] addressing modes in RM nor in PM except
> PUSH/POP/CALL/RET.
>
>> * addressable range will depend on DS.limit, not SS.limit
>> * linear address will depend on DS.base, not SS.base.
>
> this is wrong!

Agreed. Issue as above. SS.limit and SS.base will be used.

>
>> Note that it appears that the Big bit on DS (i.e. DS.B) is ignored
>> even though the instructions access DS. Also, the base and limit and
>> everything else from SS are ignored for those instructions.
>
>> The findings may be unexpected (they certainly surprised me) but see
>> the section entitled Segment Descriptors in CHAPTER 3 of
>> PROTECTED-MODE MEMORY MANAGEMENT in Intel manuals.
>
> such false statements are really printed in Intel docs ?

No, what Intel says is as follows. Remember this is about the B bit (aka
D bit on a code seg) which is supposed to be 0 for 16-bit and 1 for
32-bit operation - and which I think we disagree.

D/B (default operation size/default stack pointer size and/or upper
bound) flag

Performs different functions depending on whether the segment descriptor
is an executable code segment, an expand-down data segment, or a stack
segment. (This flag should always be set to 1 for 32-bit code and data
segments and to 0 for 16-bit code and data segments.)

• Executable code segment. The flag is called the D flag and it
indicates the default length for effective addresses and operands
referenced by instructions in the segment. If the flag is set, 32-bit
addresses and 32-bit or 8-bit operands are assumed; if it is clear,
16-bit addresses and 16-bit or 8-bit operands are assumed. The
instruction prefix 66H can be used to select an operand size other than
the default, and the prefix 67H can be used select an address size other
than the default.

• Stack segment (data segment pointed to by the SS register). The flag
is called the B (big) flag and it specifies the size of the stack
pointer used for implicit stack operations (such as pushes, pops, and
calls). If the flag is set, a 32-bit stack pointer is used, which is
stored in the 32-bit ESP register; if the flag is clear, a 16-bit stack
pointer is used, which is stored in the 16-bit SP register. If the stack
segment is set up to be an expand-down data segment (described in the
next paragraph), the B flag also specifies the upper bound of the stack
segment.

• Expand-down data segment. The flag is called the B flag and it
specifies the upper bound of the segment. If the flag is set, the upper
bound is FFFFFFFFH (4 GBytes); if the flag is clear, the upper bound is
FFFFH (64 KBytes).

That's all the Intel docs say in that section about the D/B flag. Some
points of note:

* The bit is in the last 16 bits of a descriptor so will be zero for
PM16 for which all those bits are expected to be zero.

* Although Intel say the bit should be set to 1 on all 32-bit segments,
according to the description Intel gives, above, the CPU doesn't check
the bit on any data segments other than SS.

* Even on the stack segment the bit still doesn't influence address size
or operand size of any explicit references to that segment.

IOW even if the B bit were to be clear on DS, ES etc you could still
access them as normal as 32-bit segments.

Going a little further, the B bit is distinct from the G (granularity)
bit so you could define a limit in terms of 4k chunks and access all 4G
with the B bit still being zero.

Surprised? I was.

>>>> BTW, what happens when referring to BP or EBP as in
>>>>
>>>> mov eax, [ebp + 4]
>>>> sub ebp, 8
>>>>
>>>> Does such code use the SS descriptor's B bit?
>
>>> Yes, at least on CPUs which still support the B bit.
>
>> Any advance on "Yes"? :) As above, the Segment Descriptors section of
>> the Intel manual suggests otherwise.

I stand by that, too. For the reasons mentioned above it seems that your
MOV-SUB pair would _not_ take any notice of the SS.B bit.

As ever, further corrections welcome. :)

--
James Harris

James Harris

unread,

Feb 21, 2022, 1:35:25 PM2/21/22

to

Sorry, I wasn't clear enough. What I was thinking about was a legitimate
instruction or sequence thereof which would work differently in RM and
PM16. ARPL and similar are not RM instructions.

For example, here's an instruction which is legitimate in both modes but
executes differently:

mov ds, ax

In PM16 it loads the hidden part of DS from the appropriate descriptor
table. In RM it doesn't.

What I am trying to prove is that once CR0.PE is set (and prefetch
queues flushed) the processor will be in PM, not RM, even _before_ CS is
reloaded.

AISI the different behaviour of MOV DS, AX shows that the processor /is/
in Protected Mode, not Real Mode, even if it's PM16 rather than PM32.

IIRC you thought the processor would still be in RM. If I'm wrong,
therefore, perhaps you can think of another instruction (a legitimate
one) or sequence which could be executed after setting CR0.PE but before
loading CS which would tell whether the CPU was in RM or PM.

--
James Harris

wolfgang kern

unread,

Feb 21, 2022, 3:42:06 PM2/21/22

to

assume or make your RM-CS 07c0 before the switch
and try this after it:
push cs
pop ax ;ax show the RM-segment and nothing else

or try self-modify and check where the change happens:
mov word [cs:00FE],31c8 ;or whatsoever. might crash if PM
__
wolfgang

wolfgang kern

unread,

Feb 21, 2022, 4:27:45 PM2/21/22

to

On 21/02/2022 19:02, James Harris wrote:
>....

> Therefore, AISI your second example (which you labelled "RM") cannot
> really be RM as it refers to ESP so I presume you mean PM16 (which uses
> 16-bit encodings but can refer to 32-bit registers).

NO, the 67 override alters only the addressing modes.
RM example: 67 8B 24 24 mov SP,[esp] ;JFI but not much sense.

>>> Your first example:
>>> PM32:
>>> 8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
>>> AIUI:
>>> * that will NOT use the SS segment's 'B' bit (which selects SP or ESP
>>> but does so only for /implicit/ references to the stack)

> I stand by that part of the assessment (until told otherwise!). The B
> bit is only used for implicit stack operations and is ignored for
> explicit ones.

might be worth to check on this again, I made my ESP 0000xxxx anyway,
perhaps BIOS calls weren't my only reason 30 years ago :)

>>> * it will use the CS segment's D bit being set to 1 in two ways:
>>> 1) it will have adsiz as 32-bit so recognising ESP rather than SP
>>> 2) it will have opsiz as 32-bit so recognising EAX rather than AX
> I stand by that, too.

Yeah, this is default of PM32 :)
...

>> My SS use a PM16 data-descriptor and my DS a Flat PM32 one.
>> No confusion detected :)
> Agreed. If MOV EAX,[ESP - 4] uses ESP as a base register then it /will/
> use SS.limit and SS.base - although for that instruction the CPU will
> still ignore SS.B. Agreed?

Fine if it uses ESP, but it wont ignore the limits.
My stack wraps around at 64K (if not misaligned)... so as if SP in use.
...Could be that my point of view differs because my stack resides on a
"normal" 64K data segment.
__
wolfgang

James Harris

unread,

Feb 22, 2022, 5:12:22 AM2/22/22

to

On 21/02/2022 20:42, wolfgang kern wrote:
> On 21/02/2022 19:35, James Harris wrote:

...

>> What I am trying to prove is that once CR0.PE is set (and prefetch
>> queues flushed) the processor will be in PM, not RM, even _before_ CS
>> is reloaded.

...

> assume or make your RM-CS 07c0 before the switch
> and try this after it:
> push cs
> pop ax ;ax show the RM-segment and nothing else

If the /user-visible/ part of CS is 07c0 then wouldn't that end up in AX
in either mode?

>
> or try self-modify and check where the change happens:
> mov word [cs:00FE],31c8 ;or whatsoever. might crash if PM

I can't see how modifying the instruction stream would do anything. RM
encodings are valid in PM16!

Consider an instruction such as

mov ds, ax

In Protected Mode that would do

DS.base = from descriptor
DS.limit = from descriptor
DS.access_rights = from descriptor

Wouldn't it make sense for the same instruction in Real Mode to do as
follows?

DS.base = AX shl 4
DS.limit = 0xffff
DS.access_rights = unrestricted

Then the same architectural parts (the hidden parts) could be used in
either RM or PM. That would keep the hardware design simpler and more
consistent than having two entirely separate modes.

In fact, surely the so-called Unreal Mode only works because the CPU
uses the hidden parts of the segment registers at all times - even when
in Real Mode (PE=0).

--
James Harris

wolfgang kern

unread,

Feb 24, 2022, 12:51:46 PM2/24/22

to

On 22/02/2022 11:12, James Harris wrote:
> On 21/02/2022 20:42, wolfgang kern wrote:
>> On 21/02/2022 19:35, James Harris wrote:
>
> ...
>
>>> What I am trying to prove is that once CR0.PE is set (and prefetch
>>> queues flushed) the processor will be in PM, not RM, even _before_ CS
>>> is reloaded.
>
> ...
>
>> assume or make your RM-CS 07c0 before the switch
>> and try this after it:
>> push cs
>> pop ax ;ax show the RM-segment and nothing else
>
> If the /user-visible/ part of CS is 07c0 then wouldn't that end up in AX
> in either mode?

Only in RM but not within PM, here AX would show a descriptor value.

>> or try self-modify and check where the change happens:
>> mov word [cs:00FE],31c8 ;or whatsoever. might crash if PM

> I can't see how modifying the instruction stream would do anything. RM
> encodings are valid in PM16!

OK this wasn't a good example, I meant it as a crash test because
exceptions work quite different.

> Consider an instruction such as

> mov ds, ax
>
> In Protected Mode that would do
>
>     DS.base = from descriptor
>     DS.limit = from descriptor
>     DS.access_rights = from descriptor
>
> Wouldn't it make sense for the same instruction in Real Mode to do as
> follows?
>
>     DS.base = AX shl 4
>     DS.limit = 0xffff
>     DS.access_rights = unrestricted
>
> Then the same architectural parts (the hidden parts) could be used in
> either RM or PM. That would keep the hardware design simpler and more
> consistent than having two entirely separate modes.

X86 grew up in large steps, so we see historical remains here and there.

> In fact, surely the so-called Unreal Mode only works because the CPU
> uses the hidden parts of the segment registers at all times - even when
> in Real Mode (PE=0).

yes, Unreal may not be designed by intention, but it became handy :)
__
wolfgang

James Harris

unread,

Feb 25, 2022, 10:31:03 AM2/25/22

to

On 24/02/2022 17:51, wolfgang kern wrote:
> On 22/02/2022 11:12, James Harris wrote:
>> On 21/02/2022 20:42, wolfgang kern wrote:
>>> On 21/02/2022 19:35, James Harris wrote:
>>
>> ...
>>
>>>> What I am trying to prove is that once CR0.PE is set (and prefetch
>>>> queues flushed) the processor will be in PM, not RM, even _before_
>>>> CS is reloaded.
>>
>> ...
>>
>>> assume or make your RM-CS 07c0 before the switch
>>> and try this after it:
>>> push cs
>>> pop ax ;ax show the RM-segment and nothing else
>>
>> If the /user-visible/ part of CS is 07c0 then wouldn't that end up in
>> AX in either mode?
>
> Only in RM but not within PM, here AX would show a descriptor value.

Wouldn't AX hold the selector - 16 bits? Descriptors are too wide for AX.

In fact, there's a diagram for early CPUs which I've not seen for later
ones. It sets out the /internal/ structure of segment registers as they
were at the time. It shows CS, DS, etc as 64-bit where the top 16 bits
are the visible selector. Full contents:

* 16 bits: selector <-- the bits we see in the segreg
* 8 bits: access rights
* 24 bits: base address (24 being all that was needed in days of yore)
* 16 bits: segment size (by which I think they mean 'limit')

That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
PROGRAMMER'S REFERENCE MANUAL 1987.

These days they are a little wider but a "MOV AX,DS" instruction will
still copy just the 16-bit selector field to AX.

The converse "MOV DS,AX" instruction will copy AX to the DS selector
field and, in PM, load the other fields from the descriptor table
whereas in RM it will load the base with 'selector shl 4' (see below).

...

>> Consider an instruction such as
>
>>    mov ds, ax
>>
>> In Protected Mode that would do
>>
>>      DS.base = from descriptor
>>      DS.limit = from descriptor
>>      DS.access_rights = from descriptor
>>
>> Wouldn't it make sense for the same instruction in Real Mode to do as
>> follows?
>>
>>      DS.base = AX shl 4
>>      DS.limit = 0xffff
>>      DS.access_rights = unrestricted

It turns out to be even simpler. According to the document below, in
Real Mode Intel CPUs load only the base, leaving limit and access_rights
unchanged.

To quote: "In real mode, when a segment register is loaded, only the
base field is changed, in particular the value placed into the base is
selector*16."

>>
>> Then the same architectural parts (the hidden parts) could be used in
>> either RM or PM. That would keep the hardware design simpler and more
>> consistent than having two entirely separate modes.
>
> X86 grew up in large steps, so we see historical remains here and there.
>
>> In fact, surely the so-called Unreal Mode only works because the CPU
>> uses the hidden parts of the segment registers at all times - even
>> when in Real Mode (PE=0).
>
> yes, Unreal may not be designed by intention, but it became handy :)

Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".

I found a great writeup of Unreal Mode at

http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/

It includes sources which make clear that in Real Mode only the base
address can be changed so, as it says, "8086-compatible attributes must
be loaded into the data segment descriptor registers before switching to
real mode".

--
James Harris

James Harris

unread,

Feb 25, 2022, 11:30:17 AM2/25/22

to

On 25/02/2022 15:31, James Harris wrote:

...

> Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
> even when they are running in what we call "Real Mode".

This has been a very revealing foray into what has to go on inside a CPU
in each step of enabling PMode. The surprising thing is how simple each
step is.

Here's a summary of where I've got to:

lgdt

That loads one register, the GDTR, and so tells the CPU where to look
for the Global Descriptor Table when we load a segment register in PMode.

cli

That sets a single bit in EFLAGS. It's recommended because we are about
to change /how interrupts are handled/ and so don't want an interrupt to
fire asynchronously mid change.

or eax, 1
mov cr0, eax

That sets a single bit in CR0 (the PE bit). Setting that bit _fully_
enables PMode. What magic does it do? What else does the CPU do at this
point other than set the bit? AFAICS, absolutely nothing! Zilch! Nada!

What that bit being set does, however, is affect future behaviour. Once
the bit is set the processor will respond differently in certain cases
such as loading a segment register or responding to an interrupt. But
nothing other than the bit changes immediately, and that bit being set
means the CPU /is/ in PMode.

jmp flushed

That jump (typically to the next instruction) changes no architectural
state and it's not even needed on modern CPUs. All it does is flush the
prefetch queue on early processors which don't do so automatically. Most
decodes are the same in either mode so it would likely only matter if a
segment register were to be loaded in the next few instructions. (But a
jump is, of course, still right to include in all cases for portability.)

As said, we will already be in PMode but it will be PM16 and we
typically want to use 32-bit accesses.

mov ds, bx

That would load DS with whatever base, limit and control bits we have
put in the selected descriptor - typically changing the limit field of
the segment from 0xffff to 0xffff_ffff although if we update DS before
CS then accesses to the segment would still, by default, use 16-bit
offsets: it doesn't make a lot of sense to load DS only but I put it
first, here, to emphasise that it is possible.

jmp codeseg:pm32start

That would load CS with whatever base, limit and control bits we have
put in the selected descriptor. (IOW essentially the same as loading a
data segment register.)

bits 32
pm32start:

I add those just to emphasise that although we would be in PMode as soon
as PE is set it would be PM16 and it's only from the label that the
instruction stream would be 32-bit PM32.

While in PMode we can switch in either direction between 16-bit and
32-bit code (both still in PMode, i.e. we can switch between PM16 and
PM32) simply by loading CS (using jump, call or task switch) with a
descriptor which has the D bit clear or set, as required.

CS.D sets the default address and operand sizes. SS.B sets whether SP or
ESP is used in stack operations such as pushes, calls, returns and pops.
And the B bit on other segment registers (such as DS.B) is ignored.

The above all comes with disclaimers but it's simple and flexible,
consistent with documentation, and it makes sense because as per Occam's
Razor there's no need to make things any more complex. So I offer it as,
if nothing else, a potential way to understand the internals of the
transition to PM32.

--
James Harris

wolfgang kern

unread,

Feb 25, 2022, 1:10:31 PM2/25/22

to

On 25/02/2022 16:31, James Harris wrote:
> On 24/02/2022 17:51, wolfgang kern wrote:
...
>>>> assume or make your RM-CS 07c0 before the switch
>>>> and try this after it:
>>>> push cs
>>>> pop ax ;ax show the RM-segment and nothing else

>>> If the /user-visible/ part of CS is 07c0 then wouldn't that end up in
>>> AX in either mode?

>> Only in RM but not within PM, here AX would show a descriptor value.

> Wouldn't AX hold the selector - 16 bits? Descriptors are too wide for AX.

the selector for a descriptor, so yes.

> In fact, there's a diagram for early CPUs which I've not seen for later
> ones. It sets out the /internal/ structure of segment registers as they
> were at the time. It shows CS, DS, etc as 64-bit where the top 16 bits
> are the visible selector. Full contents:

> * 16 bits: selector <-- the bits we see in the segreg
> * 8 bits: access rights
> * 24 bits: base address (24 being all that was needed in days of yore)
> * 16 bits: segment size (by which I think they mean 'limit')

> That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
> PROGRAMMER'S REFERENCE MANUAL 1987.

now try to read more recent stuff:
a selector is 16 bit wide but lowest 3 bits aren't use for selecting.
and as zero isn't allowed, 08 is the first valid value.
a descriptor (is a GDT-entry) and contain 64 byte (let aside LM64 yet):

So a selector addresses an GDT entrys just by its value (ANDed F8).
data and code descriptors are defined as:

00 limit 0..15
02 base 0..23
05 |P|DPL|01|E|W|A| for data |P|DPL|11|C|R|A| for code
other types have this different
06 |G|B|0|X| limit 15..19
07 base 24..31

I once posted my descriptor page in CLAX, lost it on defective PC.

> These days they are a little wider but a "MOV AX,DS" instruction will
> still copy just the 16-bit selector field to AX.

Oh, yes.
...

>> X86 grew up in large steps, so we see historical remains here and there.

>>> In fact, surely the so-called Unreal Mode only works because the CPU
>>> uses the hidden parts of the segment registers at all times - even
>>> when in Real Mode (PE=0).

>> yes, Unreal may not be designed by intention, but it became handy :)

> Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
> even when they are running in what we call "Real Mode".

If it would be in PM then all the PM instructions I listed earlier would
not crash or raise exceptions. Go figure :)

> I found a great writeup of Unreal Mode at

> http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/

> It includes sources which make clear that in Real Mode only the base
> address can be changed so, as it says, "8086-compatible attributes must
> be loaded into the data segment descriptor registers before switching to
> real mode".

Yes, my switches from any PM to RM needs to reset RM limits only before
calling BIOS-functions, all my other functions wouldn't care.
__
wolfgang

wolfgang kern

unread,

Feb 25, 2022, 5:32:27 PM2/25/22

to

> a descriptor (is a GDT-entry) and contain 64 bit(let aside LM64 yet):

corrected byte>>bits

James Harris

unread,

Feb 26, 2022, 12:04:59 PM2/26/22

to

On 25/02/2022 18:10, wolfgang kern wrote:
> On 25/02/2022 16:31, James Harris wrote:
>> On 24/02/2022 17:51, wolfgang kern wrote:

...

>> * 16 bits: selector <-- the bits we see in the segreg
>> * 8 bits: access rights
>> * 24 bits: base address (24 being all that was needed in days of yore)
>> * 16 bits: segment size (by which I think they mean 'limit')
>
>> That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
>> PROGRAMMER'S REFERENCE MANUAL 1987.
>
> now try to read more recent stuff:
> a selector is 16 bit wide but lowest 3 bits aren't use for selecting.

The parts of a selector today are

<index> <TI> <RPL>

As it happens, they were the same in olden times 286 days. :)

> and as zero isn't allowed, 08 is the first valid value.

For sure, 8 is the lowest selector which can be loaded into the register
in PMode although if one wanted to hack around unnecessarily ... then
one could probably get

mov ax, ds

to put a number lower than 8 (e.g. 7 in the code below) into AX even in
PMode.

I say that based on the understanding that a segment register has these
parts

<selector> <permissions> <base> <limit>

and that a load in Real Mode sets two of them: selector and base.

For example, starting in real mode

mov ax, 7
mov es, ax <--- ok in real mode, sets ES /selector and base/

mov eax, cr0
or eax, 1
mov cr0, eax <--- enter protected mode
jmp $ + 2

mov ax, es

then that should put a 7 into AX even though the CPU is in Pmode.

Just for fun. :)

> a descriptor (is a GDT-entry) and contain 64 byte (let aside LM64 yet):
>
> So a selector addresses an GDT entrys just by its value (ANDed F8).
> data and code descriptors are defined as:
>
> 00 limit 0..15
> 02 base 0..23
> 05 |P|DPL|01|E|W|A| for data |P|DPL|11|C|R|A| for code
> other types have this different
> 06 |G|B|0|X| limit 15..19
> 07 base 24..31

Yes, the descriptors I've been looking at are 64-bit and my comments are
only about RM and PM. As far as this discussion is concerned I've not
looked at LM at all.

...

>> Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
>> even when they are running in what we call "Real Mode".
>
> If it would be in PM then all the PM instructions I listed earlier would
> not crash or raise exceptions. Go figure :)

AIUI - at least on Intel - the instructions you listed should all
execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
happens on your hardware?

--
James Harris

wolfgang kern

unread,

Feb 26, 2022, 5:13:21 PM2/26/22

to

On 26/02/2022 18:04, James Harris wrote:
...

> For example, starting in real mode
>
> mov ax, 7
> mov es, ax <--- ok in real mode, sets ES /selector and base/
>
> mov eax, cr0
> or eax, 1
> mov cr0, eax <--- enter protected mode
> jmp $ + 2
>
> mov ax, es
> then that should put a 7 into AX even though the CPU is in Pmode.
> Just for fun. :)

yes of course, even a first attempt to access with ES would crash.
I ask again what you assume to be: base limit and PL of this pre-PM.

if the base would be 0000 or 07C0 then non of my switches would work.
and why should it be 07c0 ? my switches aren't in this region.

>>> Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
>>> even when they are running in what we call "Real Mode".
>>
>> If it would be in PM then all the PM instructions I listed earlier
>> would not crash or raise exceptions. Go figure :)
>
> AIUI - at least on Intel - the instructions you listed should all
> execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
> happens on your hardware?

not quite accurate, AMD-docs tell explicit that privileged instructions
work only in PM and not in UNREAL mode.
I didn't try by intention, but I remember some hard crashes during my
first attempts (>30 years ago) on making mixed modes work.

So between setting PE and loading CS there is the UNREAL story where
the CPU behave partly as PM (seg-regs) but wont execute PM only code.
__
wolfgang

James Harris

unread,

Mar 5, 2022, 10:41:43 AM3/5/22

to

On 26/02/2022 22:13, wolfgang kern wrote:
> On 26/02/2022 18:04, James Harris wrote:
> ...
>> For example, starting in real mode
>>
>>    mov ax, 7
>>    mov es, ax <--- ok in real mode, sets ES /selector and base/
>>
>>    mov eax, cr0
>>    or eax, 1
>>    mov cr0, eax <--- enter protected mode
>>    jmp $ + 2
>>
>>    mov ax, es
>> then that should put a 7 into AX even though the CPU is in Pmode.
>> Just for fun. :)
>
> yes of course, even a first attempt to access with ES would crash.
> I ask again what you assume to be: base limit and PL of this pre-PM.

Well, I wouldn't /recommend/ writing code with such an access. It would
probably go against the specs and might be inconsistent between
processors. But if one did then here's what I think the basic mechanism
would be - as may be implemented by the CPU's engineers.

The thing to bear in mind that segment registers have multiple fields,
most of which are hidden in Rmode. One could consider ES as having these
parts:

ES.selector
ES.base
ES.limit
ES.attributes

The only part we normally see is the selector.

Given that model of a segment register, the code

mov ax, 7
mov es, ax

when run in Rmode would have the following effect

ES.selector = 7
ES.base = 112 (7 times 16)

and, importantly, _nothing else_. It would only change the selector and
the base. It would not alter the other parts of ES. Hence they would
retain the values they had before, i.e.

ES.limit = whatever was there from before
ES.attributes = whatever was there from before

Therefore, if ES.limit was large enough and ES.attributes allowed the
access then

mov ax, [es:0]

would, in the model, load AX with the contents of address 112 (the
base). But you say it would crash. Could that be because the limit
and/or the attributes did not permit the access?

Remember that although the CPU powers up in Real mode and is typically
in Real mode when we get control it has probably taken a trip to Pmode
and back since being booted. If nothing else, the BIOS has to change to
Pmode (set PE = 1) so it can check the RAM.

In Pmode, the BIOS will have been able to control what is loaded into
base, limit and attributes of the segment registers. We don't know what
that will have been but we can guess at what's likely. When it changes
back to RM to boot our code it will probably have set

limit = 64k
attributes = writable, present, expand up, byte granularity, etc

With those values the segment register's hidden parts will be correctly
set for Rmode operation. In Rmode, loads of segment registers will only
have to change selector and base.

>
> if the base would be 0000 or 07C0 then non of my switches would work.
> and why should it be 07c0 ? my switches aren't in this region.

In Rmode moving 7 into ES would set the base to 112, AIUI, not anything
like 0x7c0.

By contrast, in Real mode loading ES with 0x07c0 would set the base to
0x7c00 - exactly as required for RM operation.

>
>>>> Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
>>>> even when they are running in what we call "Real Mode".
>>>
>>> If it would be in PM then all the PM instructions I listed earlier
>>> would not crash or raise exceptions. Go figure :)
>>
>> AIUI - at least on Intel - the instructions you listed should all
>> execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
>> happens on your hardware?
>
> not quite accurate, AMD-docs tell explicit that privileged instructions
> work only in PM and not in UNREAL mode.

I'd like to see that. I've only been looking at Intel so far. Do you
have a reference (manual and section)?

> I didn't try by intention, but I remember some hard crashes during my
> first attempts (>30 years ago) on making mixed modes work.
>
> So between setting PE and loading CS there is the UNREAL story where
> the CPU behave partly as PM (seg-regs) but wont execute PM only code.

Well, AIUI what's called "unreal" is the opposite: being in Real mode
(PE=0) but with unusual settings for the segments - typically a 4G limit
rather than 64k.

What specifically do you mean by 'PM-only code' that the CPU won't execute?

I could see a chip manufacturer /adding/ limitations on what can be done
in each mode (as Intel have done, e.g. in detecting invalid opcodes) but
they would had had to add those on top of the basic mechanisms.

--
James Harris

wolfgang kern

unread,

Mar 7, 2022, 4:57:33 AM3/7/22

to

On 05/03/2022 16:41, James Harris wrote:
...
>>> For example, starting in real mode
>>>
>>>    mov ax, 7
>>>    mov es, ax <--- ok in real mode, sets ES /selector and base/
>>>
>>>    mov eax, cr0
>>>    or eax, 1
>>>    mov cr0, eax <--- enter protected mode
>>>    jmp $ + 2

try this here:

0F 00 c8 STR ax

You win this dispute if it doesn't crash :)

>>> mov ax, es
>>> then that should put a 7 into AX even though the CPU is in Pmode.
>>> Just for fun. :)
>>
>> yes of course, even a first attempt to access with ES would crash.
>> I ask again what you assume to be: base limit and PL of this pre-PM.
>
> Well, I wouldn't /recommend/ writing code with such an access. It would
> probably go against the specs and might be inconsistent between
> processors. But if one did then here's what I think the basic mechanism
> would be - as may be implemented by the CPU's engineers.

A selector value 7 can't be accesses in PM (will raise an exception).
my question was about what you think is CS RM or PM limited ?

> The thing to bear in mind that segment registers have multiple fields,
> most of which are hidden in Rmode. One could consider ES as having these
> parts:
>
> ES.selector

a PM selector is just a pointer into the GDT (ANDed FFF8)
stored in seg-reg but not elsewhere.

> ES.base
> ES.limit
> ES.attributes
>
> The only part we normally see is the selector.

normally hidden? I see everything in a hex-dump of the GDT :)

> Given that model of a segment register, the code
>
> mov ax, 7
> mov es, ax
>
> when run in Rmode would have the following effect
>
> ES.selector = 7
> ES.base = 112 (7 times 16)
>
> and, importantly, _nothing else_. It would only change the selector and
> the base. It would not alter the other parts of ES. Hence they would
> retain the values they had before, i.e.
>
> ES.limit = whatever was there from before
> ES.attributes = whatever was there from before
>
> Therefore, if ES.limit was large enough and ES.attributes allowed the
> access then
>
> mov ax, [es:0]
>
> would, in the model, load AX with the contents of address 112 (the
> base). But you say it would crash. Could that be because the limit
> and/or the attributes did not permit the access?

it would crash for sure if run in PM (0007 isn't valid).

> Remember that although the CPU powers up in Real mode and is typically
> in Real mode when we get control it has probably taken a trip to Pmode
> and back since being booted. If nothing else, the BIOS has to change to
> Pmode (set PE = 1) so it can check the RAM.

> In Pmode, the BIOS will have been able to control what is loaded into
> base, limit and attributes of the segment registers. We don't know what
> that will have been but we can guess at what's likely. When it changes
> back to RM to boot our code it will probably have set

Yes, find tombstones in RAM of what the BIOS left behind after power up.
I once copied first 128MB RAM to disk before loading all of my OS...

> limit = 64k
> attributes = writable, present, expand up, byte granularity, etc
> With those values the segment register's hidden parts will be correctly
> set for Rmode operation. In Rmode, loads of segment registers will only
> have to change selector and base.

yes. even I'd recommend to use the term "selector" for PM only.

>> if the base would be 0000 or 07C0 then non of my switches would work.
>> and why should it be 07c0 ? my switches aren't in this region.

> In Rmode moving 7 into ES would set the base to 112, AIUI, not anything
> like 0x7c0.

> By contrast, in Real mode loading ES with 0x07c0 would set the base to
> 0x7c00 - exactly as required for RM operation.

you said that for CS earlier in this thread...
and that's what I see unanswered.

>>>>> Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL
>>>>> TIMES, even when they are running in what we call "Real Mode".

OR al,1
MOV CR0,eax
LDS si,[cs:0400] ;my 1st switch block is position independent

>>>> If it would be in PM then all the PM instructions I listed earlier
>>>> would not crash or raise exceptions. Go figure :)

>>> AIUI - at least on Intel - the instructions you listed should all
>>> execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
>>> happens on your hardware?

>> not quite accurate, AMD-docs tell explicit that privileged instructions
>> work only in PM and not in UNREAL mode.

> I'd like to see that. I've only been looking at Intel so far. Do you
> have a reference (manual and section)?

I remember it well but forgot where I read it,
.. started to scan my docs ... >600 PDFs may take a while...
You might find this sentence in Intel-docs as well (not older than 486+)

>> I didn't try by intention, but I remember some hard crashes during my
>> first attempts (>30 years ago) on making mixed modes work.

>> So between setting PE and loading CS there is the UNREAL story where
>> the CPU behave partly as PM (seg-regs) but wont execute PM only code.

> Well, AIUI what's called "unreal" is the opposite: being in Real mode
> (PE=0) but with unusual settings for the segments - typically a 4G limit
> rather than 64k.

> What specifically do you mean by 'PM-only code' that the CPU won't execute?

again:

PM only (raise invalid opcode exception in RM):

63 xx ARPL (seems obsolete now)

0F 00 /1 STR an easy test possibility ie:
0F 00 c8 STR ax |eax |rax
0F 00 08 xx STR [mem16] ; all except RM

0F 00 /0 SLDT r16/r32/r64/m16
0F 00 /2 LLDT rm16
0F 00 /3 LTR
0F 02 /r LAR
0F 03 /r,[m] LSL

> I could see a chip manufacturer /adding/ limitations on what can be done
> in each mode (as Intel have done, e.g. in detecting invalid opcodes) but
> they would had had to add those on top of the basic mechanisms.

I think it started with 286, added features vs. backwards compatibility.
__
wolfgang

James Harris

unread,

Mar 11, 2022, 6:45:28 AM3/11/22

to

On 07/03/2022 09:57, wolfgang kern wrote:
> On 05/03/2022 16:41, James Harris wrote:
> ...
>>>> For example, starting in real mode
>>>>
>>>>    mov ax, 7
>>>>    mov es, ax <--- ok in real mode, sets ES /selector and base/
>>>>
>>>>    mov eax, cr0
>>>>    or eax, 1
>>>>    mov cr0, eax <--- enter protected mode
>>>>    jmp $ + 2
>
> try this here:
> 0F 00 c8     STR ax
> You win this dispute if it doesn't crash :)

That's kind of you, Wolfgang, ... since it doesn't crash! :)

When I tested it it set AX to 0 which is what one would expect to be in
TR's selector at reset.

Why did you think it would crash?

>
>>>> mov ax, es
>>>> then that should put a 7 into AX even though the CPU is in Pmode.
>>>> Just for fun. :)
>>>
>>> yes of course, even a first attempt to access with ES would crash.
>>> I ask again what you assume to be: base limit and PL of this pre-PM.
>>
>> Well, I wouldn't /recommend/ writing code with such an access. It
>> would probably go against the specs and might be inconsistent between
>> processors. But if one did then here's what I think the basic
>> mechanism would be - as may be implemented by the CPU's engineers.
>
> A selector value 7 can't be accesses in PM (will raise an exception).

Be careful not to confuse loading with storing or using. A selector
value of 7 cannot be /loaded/ when in PM but it can be loaded before
switching to Pmode. Take this code:

mov ax, 7
mov fs, ax

mov eax, cr0
or al, 1
mov cr0, eax
jmp $ + 2

mov dx, fs
mov cx, [fs:0]

I tried that, too, and it works. The value copied from FS to DX is 7,
FS's selector which was the segment value loaded in RM and which cannot
be loaded in PM.

Furthermore, the access to FS:0 also works without crashing. No
exception is generated.

Try it. What do you get?

> my question was about what you think is CS RM or PM limited ?

If I understood the question I'd try to answer it but I cannot parse it. :(

...

>> mov ax, [es:0]
>>
>> would, in the model, load AX with the contents of address 112 (the
>> base). But you say it would crash. Could that be because the limit
>> and/or the attributes did not permit the access?
>
> it would crash for sure if run in PM (0007 isn't valid).

Why would it crash? (As above, it doesn't.) The /selector/ is not used
to access memory. The selector is only used to load the important parts
of a descriptor (BASE, LIMIT and ATTRIBUTES) into the CPU. Once they are
loaded future memory accesses can proceed without referring to the
selector at all.

The memory address accessed is BASE + OFFSET where BASE is in the hidden
portion of the descriptor and OFFSET is in the instruction (0 in this case).

In Protected mode a load sets BASE and the other fields to whatever
values are read from the descriptor table.

In Real mode a load sets BASE to 16 * the segment and leaves the other
fields alone.

But once the fields of a descriptor have been set, memory accesses can
be carried out in the same way in either mode.

This is a remarkably simple model. And it's clean. There's no need to
make it complicated. Segreg loads operate differently in RM and PM but
once the load has taken place then the descriptor has all the info it
needs to handle future memory accesses without using the
segment/selector value as part of the access.

...

> Yes, find tombstones in RAM of what the BIOS left behind after power up.
> I once copied first 128MB RAM to disk before loading all of my OS...

That's the kind of thing I would do. :)

...

>> In Rmode moving 7 into ES would set the base to 112, AIUI, not
>> anything like 0x7c0.
>
>> By contrast, in Real mode loading ES with 0x07c0 would set the base to
>> 0x7c00 - exactly as required for RM operation.
>
> you said that for CS earlier in this thread...
> and that's what I see unanswered.

I'll answer if I can but what is the question?!

...

>>> not quite accurate, AMD-docs tell explicit that privileged instructions
>>> work only in PM and not in UNREAL mode.
>
>> I'd like to see that. I've only been looking at Intel so far. Do you
>> have a reference (manual and section)?
>
> I remember it well but forgot where I read it,
> .. started to scan my docs ... >600 PDFs may take a while...
> You might find this sentence in Intel-docs as well (not older than 486+)

The word "unreal" doesn't appear anywhere in my full 7-volume set of
Intel manuals. That's unsurprising as it's not really a separate mode,
just Real mode with unusual limits in the segment descriptors. Maybe AMD
is the same.

To be clear, though, what's called "unreal mode" is really Real mode
(PE=0) with unusual segment settings. No PM instructions should work.

But the state after setting PE and before reloading CS is not unreal
mode. I suggest it's Protected mode but with D=0 hence 16-bit Pmode.

>
>>> I didn't try by intention, but I remember some hard crashes during
>>> my first attempts (>30 years ago) on making mixed modes work.

From what I've seen, interrupts, in particular, could be 'challenging'
with mixed segreg sizes.

>
>>> So between setting PE and loading CS there is the UNREAL story where
>>> the CPU behave partly as PM (seg-regs) but wont execute PM only code.
>
>> Well, AIUI what's called "unreal" is the opposite: being in Real mode
>> (PE=0) but with unusual settings for the segments - typically a 4G
>> limit rather than 64k.
>
>> What specifically do you mean by 'PM-only code' that the CPU won't
>> execute?
>
> again:
> PM only (raise invalid opcode exception in RM):
> 63 xx        ARPL (seems obsolete now)
>
> 0F 00 /1     STR   an easy test possibility ie:
> 0F 00 c8     STR ax |eax |rax
> 0F 00 08 xx STR [mem16] ; all except RM
>
> 0F 00 /0     SLDT r16/r32/r64/m16
> 0F 00 /2     LLDT rm16
> 0F 00 /3     LTR
> 0F 02 /r     LAR
> 0F 03 /r,[m] LSL

OK. I haven't tested them all but as above, STR executes. Further, so
does LSL. And Intel is very clear:

"The LSL instruction is not recognized in real-address mode"

Again, that backs up my thesis: PM is defined by PE=1 and nothing else.
Nothing else had been changed.

Are you expecting those instructions to execute differently in PM16
compared with what they would do after setting PE=1 and before reloading CS?

--
James Harris

wolfgang kern

unread,

Mar 13, 2022, 4:18:25 PM3/13/22

to

On 11/03/2022 12:45, James Harris wrote:

>>>>> For example, starting in real mode
>>>>>
>>>>>    mov ax, 7
>>>>>    mov es, ax <--- ok in real mode, sets ES /selector and base/
>>>>>
>>>>>    mov eax, cr0
>>>>>    or eax, 1
>>>>>    mov cr0, eax <--- enter protected mode
>>>>>    jmp $ + 2

>> try this here:
>> 0F 00 c8     STR ax
>> You win this dispute if it doesn't crash :)

> That's kind of you, Wolfgang, ... since it doesn't crash! :)
>
> When I tested it it set AX to 0 which is what one would expect to be in
> TR's selector at reset.
>
> Why did you think it would crash?

It would crash if in RM. So you're right and it is already in PM then.
Interesting, which CPU have you tried on ?
my notes for exactly this code on AMD 486 say: EXC_06 due to RM (~1994).

[about data selector 07]
...

> mov cx, [fs:0]
>
> I tried that, too, and it works. The value copied from FS to DX is 7,
> FS's selector which was the segment value loaded in RM and which cannot
> be loaded in PM.

you can load a data selector with 0000 also in PM
but better never us it for any access.

> Furthermore, the access to FS:0 also works without crashing. No
> exception is generated.

yes because it is still not modified after set PE.

> Try it. What do you get?

no need, I know that. why I said it will crash:
try your line after the final transition to PM.

mov cx, [fs:0] ; if FS is 0007 then it will crash

...

>> it would crash for sure if run in PM (0007 isn't valid).

> Why would it crash? (As above, it doesn't.) The /selector/ is not used
> to access memory. The selector is only used to load the important parts
> of a descriptor (BASE, LIMIT and ATTRIBUTES) into the CPU. Once they are
> loaded future memory accesses can proceed without referring to the
> selector at all.

OK, not much sense to do it except to use FS as a temporary scratch pad
...

>> you said that for CS earlier in this thread...
>> and that's what I see unanswered.

> I'll answer if I can but what is the question?!

:) it was about: If it's in PM after setting PE then CS remain RM ?
I can answer this now myself:
all segment registers keep their base/limit as long not altered
regardless of the PE-bit status.

> ...

>>>> not quite accurate, AMD-docs tell explicit that privileged instructions
>>>> work only in PM and not in UNREAL mode.
>>
>>> I'd like to see that. I've only been looking at Intel so far. Do you
>>> have a reference (manual and section)?
>>
>> I remember it well but forgot where I read it,
>> .. started to scan my docs ... >600 PDFs may take a while...
>> You might find this sentence in Intel-docs as well (not older than 486+)
>
> The word "unreal" doesn't appear anywhere in my full 7-volume set of
> Intel manuals. That's unsurprising as it's not really a separate mode,
> just Real mode with unusual limits in the segment descriptors. Maybe AMD
> is the same.
>
> To be clear, though, what's called "unreal mode" is really Real mode
> (PE=0) with unusual segment settings. No PM instructions should work.

Yes, UNreal starts with reset of PE and a far jump. So what I remembered
to read was right and privileged PM code wont run in (un)RealMode.

> But the state after setting PE and before reloading CS is not unreal
> mode. I suggest it's Protected mode but with D=0 hence 16-bit Pmode.

>>>> I didn't try by intention, but I remember some hard crashes during
>>>> my first attempts (>30 years ago) on making mixed modes work.

> From what I've seen, interrupts, in particular, could be 'challenging'
> with mixed segreg sizes.

Not that much of a challenge if you use my way:
all INT and IRQ use the same stacks the same variables and do exactly
the same in both RM, Unreal, P16M, PM32 and LM.
also all Exceptions from all modes were linked to a single debugger.

[...]

> Again, that backs up my thesis: PM is defined by PE=1 and nothing else.
> Nothing else had been changed.

YOU WON. It took a while to convince myself to see it different yet :)

> Are you expecting those instructions to execute differently in PM16
> compared with what they would do after setting PE=1 and before reloading
> CS?

I had some bad experience with featured switches back then, so finally
every switch became less smart aka simple and was easier to handle too.
Seems an old wiki note were burned into my brain:
"the transition from RM to PM occur only on the final far jump"
__
wolfgang

Scott Lurndal

unread,

Mar 14, 2022, 12:26:04 PM3/14/22

to

Chapter 8 80286 Compatability (80386 System Software Writers Guide - 1987):

"In protected mode, the processor interprets an instruction according
to the content of the descriptors that are in effect at the time the
instruction is executed. For example, suppose a JMP instruction target
is 100,000 bytes from the beginning of the code segment. The instruction
faults if the code segment was produced by a 80286 translator because
the code limit is 64 Kbytes. The same instruction doesn't fault if the
descriptor specifies a larger limit (i.e. if it was created by an 80386
translator). Thus code segments are self-identifying; they establish
either a 80286 or 80386 "execution environment" for each instruction.
Note, however, that the 80386 does not trap an attempt by a 80286
code segment to execute an 80386 instruction that is undefined for a
80286."

Chapter 9 8086 Compatability

"A debugged 8086 binary should not contain 80286 or 80386 instructions
because such an instruction causes undefined behavior if executed by
an 8086. Nevertheless, if such instructions are present in an 8086
program that is executed in real or V86 mode, they can be executed
as shown in table 9-1. Note that as described in the next section,
the 80386 32-bit instruction and operand addressing extensions can be
executed in real and V86 modes but are subject to 64 Kbyte limit
checking. For example, an attempt to JMP or CALL to an offset
greater than 64k results in a #GP, as does an attempt to address an
operand located at an offset higher than 64K.

...
"However, descriptors do not exist in real mode and V86 mode; therefore,
the contents of the descriptor registers in these modes are called
pseudodescriptors. ... The processor initializes some pseudodescriptor
values when it is reset (see Chapter 6) and loads others when it
switches from protected mode to real mode.

"Software running in real mode or V86 mode can change the base address
in a descriptor register. In these modes, the 80386 interprets a
selector operand as a 16-bit address. When the processor loads a
descriptor register in real mode or V86 mode, it shift the selector
value left by four and loads the resulting base address into the associated
descriptor register.

"The pseudodescriptor-based addressing used by the 80386 in real and
V86 modes differs from 8086 addressing (but is identical to 80286
addressing) at the one-megabyte 8086 address space boundary. (ed. A20)
Under the same conditions, the 80386 running in real or V86 mode
generates a linear address while the 8086 would have wrapped. The
maximum linear address in real mode is 0x10ffef.

As an aside, the IAPX 86/88 manual indicates that the BIU (Bus
Interface Unit) prefetches six instruction bytes on the 8086 and
three instruction bytes on the 8088.

wolfgang kern

unread,

Mar 15, 2022, 3:45:38 AM3/15/22

to

On 13/03/2022 21:18, wolfgang kern wrote:
> On 11/03/2022 12:45, James Harris wrote:

[...]

> YOU WON. It took a while to convince myself to see it different yet :)

I had to give my workstation to an employer, so I can't check myself
one more instruction may be worth to check between set PE and load CS:

2E 89 06 xx xx mov [cs:xxxx],ax ;use any harmless address here

would a set PE cause CS not writable ?
the RM default wont even test that.
__
wolfgang

James Harris

unread,

Mar 15, 2022, 11:54:23 AM3/15/22

to

On 15/03/2022 07:45, wolfgang kern wrote:
> On 13/03/2022 21:18, wolfgang kern wrote:
>> On 11/03/2022 12:45, James Harris wrote:
> [...]
>> YOU WON. It took a while to convince myself to see it different yet :)
>
> I had to give my workstation to an employer, so I can't check myself

I had a similar logistics problem. My current development is on Fat16
and runs in a VM. To test it on real hardware I'd have to write to a
partition on another machine's HD. That was too much work to set up
straight away so I pulled out some old code which has a Fat12 loader.

It was a bit of a nuisance as I had to build/mount/write/unmount,
transfer a floppy and reboot for each test but it was adequate as long
as there weren't too many experiments to run.

> one more instruction may be worth to check between set PE and load CS:
>
> 2E 89 06 xx xx mov [cs:xxxx],ax ;use any harmless address here
>
> would a set PE cause CS not writable ?
> the RM default wont even test that.

That's a good test but I haven't got time to run it ATM and I'm going to
be away for a few days. Will reply to posts when I get back.

We should try to predict what will happen. What would you expect? I
don't know yet but I'll have a think about it.

--
James Harris

James Harris

unread,

Mar 23, 2022, 1:50:25 PM3/23/22

to

On 13/03/2022 20:18, wolfgang kern wrote:
> On 11/03/2022 12:45, James Harris wrote:
>
>>>>>> For example, starting in real mode
>>>>>>
>>>>>>    mov ax, 7
>>>>>>    mov es, ax <--- ok in real mode, sets ES /selector and base/
>>>>>>
>>>>>>    mov eax, cr0
>>>>>>    or eax, 1
>>>>>>    mov cr0, eax <--- enter protected mode
>>>>>>    jmp $ + 2
>
>>> try this here:
>>> 0F 00 c8     STR ax
>>> You win this dispute if it doesn't crash :)
>
>> That's kind of you, Wolfgang, ... since it doesn't crash! :)
>>
>> When I tested it it set AX to 0 which is what one would expect to be
>> in TR's selector at reset.
>>
>> Why did you think it would crash?
>
> It would crash if in RM. So you're right and it is already in PM then.
> Interesting, which CPU have you tried on ?

I tried in VirtualBox first but as that's emulated I dug out an old test
machine which has an Intel Atom. Same behaviour on both.

> my notes for exactly this code on AMD 486 say: EXC_06 due to RM (~1994).
>
> [about data selector 07]
> ...
>> mov cx, [fs:0]
>>
>> I tried that, too, and it works. The value copied from FS to DX is 7,
>> FS's selector which was the segment value loaded in RM and which
>> cannot be loaded in PM.
>
> you can load a data selector with 0000 also in PM
> but better never us it for any access.

I realised after posting that 7 was a poor choice of value in one way
but it turned out to be good in another.

It was a poor choice as it's a valid selector in Pmode. I should have
chosen 0 to 3 which are always invalid (in the sense of being inaccessible).

On the other hand it was a good choice as in Pmode it refers to the LDT.
There was no LDT so access via it would have failed in Pmode.

Either way, it shows that the processor was still using the segreg
settings (Base, Limit, Attributes) carried over from Rmode.

But I think we need to go a little further and say that once PE is set
the CPU will use the settings the BIOS left in the descriptor of that
segreg. Remember the model that in Pmode all four parts of the segreg
are loaded (selector, base, limit, attributes) whereas in Rmode only
selector and base are changed. If that's the case then when we get
control the hidden parts of the descriptor will not necessarily be
pristine as they would have been on reset but they will be whatever the
BIOS left there. All the more reason, then, to reload all segment
registers after setting PE=1.

>
>> Furthermore, the access to FS:0 also works without crashing. No
>> exception is generated.
>
> yes because it is still not modified after set PE.

Yes.

>
>> Try it. What do you get?
>
> no need, I know that. why I said it will crash:
> try your line after the final transition to PM.
>
> mov cx, [fs:0] ; if FS is 0007 then it will crash

You seem to agree elsewhere in your post that the CPU is in Pmode when
PE=1 so by "the final transition to PM" I presume you mean loading
segment registers to switch to /32-bit/ PM.

Primarily:
CS.B (aka "D") says whether the PM mode is 16-bit or 32-bit
SS.B says whether to use SP or ESP

the B bits in the other segment registers don't seem to matter much
unless the segment is expand down. Though it makes sense to fully reload
all segment registers if only to ensure we are starting from a clean
state with nothing left over from the BIOS.

...

>>> you said that for CS earlier in this thread...
>>> and that's what I see unanswered.
>
>> I'll answer if I can but what is the question?!
>
> :) it was about: If it's in PM after setting PE then CS remain RM ?
> I can answer this now myself:
> all segment registers keep their base/limit as long not altered
> regardless of the PE-bit status.

Yes, I think that's the key to understanding this. For each segment
register the CPU uses Base, Limit and Attributes at all times and a
segreg load does:

if PE == 0 (* real mode *)
selector = segment value
base = segment value * 16
else if PE == 1 (* protected mode *)
selector = selector value
base, limit, attributes = from the descriptor
endif

To reiterate, AIUI in Real mode loading a segreg sets only Selector and
Base whereas in Protected mode it also sets Limit and Attributes.

For Intel and probably AMD it looks as though one can get the CPU to
report the Limit with an LSL instruction and the Attributes with LAR but
I don't know a way to get the base.

On Cyrix there's SVDC to write the entire Descriptor to ten bytes of memory.

...

>> Again, that backs up my thesis: PM is defined by PE=1 and nothing
>> else. Nothing else had been changed.
>
> YOU WON. It took a while to convince myself to see it different yet :)

Thanks, though it wasn't a competition. Just an exploration of the topic.

>
>> Are you expecting those instructions to execute differently in PM16
>> compared with what they would do after setting PE=1 and before
>> reloading CS?
>
> I had some bad experience with featured switches back then, so finally
> every switch became less smart aka simple and was easier to handle too.
> Seems an old wiki note were burned into my brain:
> "the transition from RM to PM occur only on the final far jump"

I've heard similar. Perhaps whoever wrote that didn't know about PM16 -
which I also didn't until a few weeks ago.

--
James Harris

wolfgang kern

unread,

Mar 25, 2022, 3:33:49 AM3/25/22

to

On 23/03/2022 18:50, James Harris wrote:
[agreed..]

> For Intel and probably AMD it looks as though one can get the CPU to
> report the Limit with an LSL instruction and the Attributes with LAR but
> I don't know a way to get the base.

get_base: ;selector in eax assume DS=flat
MOV esi,anybuffer
SGDT [ds:esi] ;now you know where the GDT resides
AND eax,FFF8 ;just in case
; SHL eax,3 ;mul by 8
; ADD esi,eax
LEA esi,[eax*8+esi] ;same as the two lines above but faster/shorter
MOV ecx,[esi+2] ;low 24 bits of base
AND ecx,00FFFFFF
MOV bl,[esi+7]
SHL ebx,24 ;decimal yet!
OR ecx,ebx ;ecx hold 32 bit base of selector eax yet

it works on both code and data descriptors.

> On Cyrix there's SVDC to write the entire Descriptor to ten bytes of
> memory.

I never used and wont recommend any exotic CPU to rely on.
...

>> Seems an old wiki note were burned into my brain:
>> "the transition from RM to PM occur only on the final far jump"

> I've heard similar. Perhaps whoever wrote that didn't know about PM16 -
> which I also didn't until a few weeks ago.

me too learned something new even it wont affect any of my code :)
__
wolfgang

James Harris

unread,

Mar 25, 2022, 5:50:14 AM3/25/22

to

On 25/03/2022 07:33, wolfgang kern wrote:
> On 23/03/2022 18:50, James Harris wrote:
> [agreed..]
>> For Intel and probably AMD it looks as though one can get the CPU to
>> report the Limit with an LSL instruction and the Attributes with LAR
>> but I don't know a way to get the base.
>
> get_base:        ;selector in eax assume DS=flat
> MOV esi,anybuffer
> SGDT [ds:esi]   ;now you know where the GDT resides
> AND eax,FFF8    ;just in case
> ; SHL eax,3       ;mul by 8
> ; ADD esi,eax
> LEA esi,[eax*8+esi]   ;same as the two lines above but faster/shorter
> MOV ecx,[esi+2]       ;low 24 bits of base
> AND ecx,00FFFFFF
> MOV bl,[esi+7]
> SHL ebx,24            ;decimal yet!
> OR ecx,ebx            ;ecx hold 32 bit base of selector eax yet
>
> it works on both code and data descriptors.

What if the segreg (and, hence, its Base) had been loaded before
switching to Pmode?

>
>> On Cyrix there's SVDC to write the entire Descriptor to ten bytes of
>> memory.
>
> I never used and wont recommend any exotic CPU to rely on.

Nor me. I want my code to be maximally portable. That means either
programming for the lowest common denominator (what you call "the
museum") or having ways to adapt to earlier CPUs.

--
James Harris

wolfgang kern

unread,

Mar 25, 2022, 3:46:15 PM3/25/22

to

On 25/03/2022 10:50, James Harris wrote:
> On 25/03/2022 07:33, wolfgang kern wrote:
>> On 23/03/2022 18:50, James Harris wrote:
>> [agreed..]
>>> For Intel and probably AMD it looks as though one can get the CPU to
>>> report the Limit with an LSL instruction and the Attributes with LAR
>>> but I don't know a way to get the base.
>>
>> get_base:        ;selector in eax assume DS=flat
>>    MOV esi,anybuffer
>>    SGDT [ds:esi]   ;now you know where the GDT resides
>>    AND eax,FFF8    ;just in case
>> ; SHL eax,3       ;mul by 8
>> ; ADD esi,eax
>>    LEA esi,[eax*8+esi]   ;same as the two lines above but faster/shorter
>>    MOV ecx,[esi+2]       ;low 24 bits of base
>>    AND ecx,00FFFFFF
>>    MOV bl,[esi+7]
>>    SHL ebx,24            ;decimal yet!
>>    OR ecx,ebx            ;ecx hold 32 bit base of selector eax yet
>>
>> it works on both code and data descriptors.
>
> What if the segreg (and, hence, its Base) had been loaded before
> switching to Pmode?

As long a GDT is already installed this works with 66 and 67 overrides
also in RM.
__
wolfgang

James Harris

unread,

Mar 26, 2022, 2:52:41 AM3/26/22

to

That's not what I mean. Rather, let's say selector and base had been
loaded in Real Mode, then you switched to Protected Mode, your SGDT
wouldn't tell you anything useful for such a case because the base
wouldn't have been loaded from the GDT.

In short, LSL will report the limit but nothing (AFAIK) will report the
base. It's not a problem, BTW, just an observation that there's no
equivalent of LSL for the base.

--
James Harris

wolfgang kern

unread,

Mar 26, 2022, 2:31:18 PM3/26/22

to

true after power up or hard RESET. If the BIOS may have played with it
then there must be valid GDT-entries at least for SS and DS.

But you can't switch to PM without valid entries in the GDT :)
I wont recommend to try this test between set PE and load CS.

> In short, LSL will report the limit but nothing (AFAIK) will report the
> base. It's not a problem, BTW, just an observation that there's no
> equivalent of LSL for the base.

But right, while in RM you can't be sure if a GDT exists or not.
Only after RESET you can rely on the default setting.

Finally I never needed that test because I decided what's where and how
all by myself :)
__
wolfgang