The EA jump immediately after enabling protected mode by setting PE in CR0

41 views
Skip to first unread message

James Harris

unread,
Apr 26, 2015, 2:54:07 PM4/26/15
to
You know that, per Intel's directions, after setting the CR0 PE flag
with

mov eax, cr0
or al, 1
mov cr0, eax

we are expected to have something like

jmp seg:pmode_running

I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.

1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.

2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.

I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.

3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as

EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)

where the Ss are the selector and the Os are the offset as hex bytes.

4. Depending on the mode the CPU thinks it is operating in at the time
that it hits the jump instruction the 16-bit and 32-bit forms may need
to be encoded with a leading 0x66 so they appear as

66 EA oo oo ss ss (16-bit form in 32-bit mode)
66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)

5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?

6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.

If the executing code is in the low 64k of a descriptor's space then we
can encode the simple

EA oo oo ss ss

because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.

66 EA oo oo oo oo ss ss

To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?

Solutions?

Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as

EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to

66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)

When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.

I should say that the above is just as written after working out what I
think was going on and may contain errors for which I would welcome your
corrections.

Interesting subtlety, no?

Any thoughts/comments?

James


wolfgang kern

unread,
Apr 26, 2015, 5:43:28 PM4/26/15
to

James Harris wrote:

> You know that, per Intel's directions, after setting the CR0 PE flag
> with
>
> mov eax, cr0
> or al, 1
> mov cr0, eax
>
> we are expected to have something like
>
> jmp seg:pmode_running
>
> I had taken that jump instruction for granted but the recent Qemu/GDT
> thread has brought up some issues about the jump, as follows.

> 1. The jump appears to be necessary in order to put the correct pmode
> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
> and also to set the low bits of CS so that they contain the CPL and TI,
> all of which should be zero.

the value in a PM seg-reg is nothing else than the offset within
the GDT (lower bits are ignored/used elsewhere, so the GDT must
be 8 byte aligned). Nothing is shifted here.

> 2. On the 386 any kind of jump was needed immediately following the MOV
> to CR0 - even a near jump - in order to flush the prefetch queue. On
> Pentium Pro and later (and maybe even on the Pentium 1) there is no need
> to flush the queue but the far jump keeps things compatible as it will
> flush the prefetch queue on early CPUs as well as load the Pmode CS on
> all of them.

?? isn't this far jmp 'the switch point'.

> I knew the above but the following points are of particular interest
> just now as I had not considered them before - or if I had then I had
> forgotten the subtleties of the problem.

> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
> encoded in hex as
>
> EA oo oo ss ss (16-bit form)
> EA oo oo oo oo ss ss (32-bit form)
>
> where the Ss are the selector and the Os are the offset as hex bytes.

> 4. Depending on the mode the CPU thinks it is operating in at the time
> that it hits the jump instruction the 16-bit and 32-bit forms may need
> to be encoded with a leading 0x66 so they appear as

> 66 EA oo oo ss ss (16-bit form in 32-bit mode)
> 66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)

> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
> still in 16-bit mode. Right?

Right, the far jmp is 'the switch'.

> 6. Now, where it gets interesting is that the offset field of the EA
> jump instruction seems to be an offset not from the jump instruction but
> from the start if the segment. Is that correct? If so then we have to be
> careful which jump form is encoded, as follows.

> If the executing code is in the low 64k of a descriptor's space then we
> can encode the simple
>
> EA oo oo ss ss

> because the offset can fit in 16 bits.

Yes, and it also works if the new PM-CS use RM-CS*16 as base, and
because the offsets are equal then it may help stupid asm-tools too.
If a boot-sequence use 0:7c00 the PM-base can be flat (zero) as well.

> But if the executing code is above the 64k mark relative to the
> start of the segment then we need to encode the 32-bit form for
> 16-bit mode, i.e.

> 66 EA oo oo oo oo ss ss

sure, that's the only way if wont fit into 16-bits.

> To make an example, say that the code that will enable Pmode is located
> in memory so that the jump target is at physical address 0x12345. If the
> GDT entry for privileged code has been set to describe all of memory,
> i.e. from address 0 to address ff....fff, then it will be impossible to
> use the 16-bit form of the EA jump instruction. Correct?

> Solutions?

> Solution 1. Set up a temporary GDT entry to point to the place in memory
> where the code is running. In the above case, the GDT entry could point
> at 0x10000 and then the jump offset would be 0x2345, leading to the jump
> instruction being encoded as

> EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

> Solution 2. Modify the jump instruction so that the code's location
> relative to the start of the privileged code segment does not matter.
> That leads to
>
> 66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)

> When I did this before I used a temporary GDT entry to point to the
> executing code, i.e. solution 1, but solution 2 also has merits.

> I should say that the above is just as written after working out what I
> think was going on and may contain errors for which I would welcome your
> corrections.
>
> Interesting subtlety, no?

> Any thoughts/comments?

my OS switches forth and back between modes, so there are both variants,
the shorter for PM16<->RM and the 8 byte form for PM16/RM<->PM32.

__
wolfgang





Rod Pemberton

unread,
Apr 27, 2015, 12:13:22 AM4/27/15
to
On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
<james.h...@gmail.com> wrote:

[snip]
> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
> encoded in hex as
>
> EA oo oo ss ss (16-bit form)
> EA oo oo oo oo ss ss (32-bit form)
>
> where the Ss are the selector and the Os are the offset as hex bytes.
>

Of course, the mixed mode code, i.e., using o32 (66h) and a32 (67h) for
NASM, will only work after one of the early processors, i.e., probably 386.
I.e., you may need the o32 override to use the 32-bit form in 16-bit code,
but that won't work for the oldest processors.

> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
> still in 16-bit mode. Right?

I'm not clear on all the details here ...

My understanding is that CR0 enables PM, but the code segment is still
functioning as 16-bit RM until the CS segment descriptor cache is updated
for PM which is done via a far jump setting the CS selector. The PM
selector determines a PM 16-bit or PM 32-bit code segment.

> 6. Now, where it gets interesting is that the offset field of the EA
> jump instruction seems to be an offset not from the jump instruction but
> from the start if the segment. Is that correct?

Seems so ...

If you use the offset generated by the assembler label, it's likely to
be wrong since your PM base address for the code selector is likely to
be different from the address for the RM code segment in CS. E.g., RM
CS might be 07C0h while PM CS base address might be zero. So, the
assembler offset is relative to 07C0h, while the instruction offset
would need to be relative to zero.

This is probably why I've found using far returns and indirect far jumps
to be easier. You can compute the offset at runtime to match your code's
actual location relative to the base address for the descriptor.

> If so then we have to be
> careful which jump form is encoded, as follows.
>
> If the executing code is in the low 64k of a descriptor's space then we
> can encode the simple
>
> EA oo oo ss ss
>
> because the offset can fit in 16 bits. But if the executing code is
> above the 64k mark relative to the start of the segment then we need to
> encode the 32-bit form for 16-bit mode, i.e.
>
> 66 EA oo oo oo oo ss ss

As noted above, overrides may only work on the 386 or later.
IIRC, you're wanting your code to work on an 8086.

> To make an example, say that the code that will enable Pmode is located
> in memory so that the jump target is at physical address 0x12345. If the
> GDT entry for privileged code has been set to describe all of memory,
> i.e. from address 0 to address ff....fff, then it will be impossible to
> use the 16-bit form of the EA jump instruction. Correct?
>
> Solutions?
>

1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.


Rod Pemberton

--
Cars kill more people than guns in the U.S.
Yet, no one is trying to take away your car.

wolfgang kern

unread,
Apr 27, 2015, 2:04:35 AM4/27/15
to

Rod Pemberton mentioned:

...
> As noted above, overrides may only work on the 386 or later.
> IIRC, you're wanting your code to work on an 8086.

set PM for 8086 ?
James need to try much harder then :)
... now at least there is VM86.
__
wolfgang

James Harris

unread,
Apr 27, 2015, 4:58:58 AM4/27/15
to
"wolfgang kern" <now...@never.at> wrote in message
news:mhjm5r$1hh$1...@speranza.aioe.org...
> James Harris wrote:

...

>> 1. The jump appears to be necessary in order to put the correct pmode
>> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3
>> bits) and also to set the low bits of CS so that they contain the CPL
>> and TI, all of which should be zero.
>
> the value in a PM seg-reg is nothing else than the offset within the
> GDT (lower bits are ignored/used elsewhere, so the GDT must be 8 byte
> aligned). Nothing is shifted here.

Did you see somewhere that the GDT must be so aligned? For sure it is a
good idea but is it needed?

>> 2. On the 386 any kind of jump was needed immediately following the
>> MOV to CR0 - even a near jump - in order to flush the prefetch queue.
>> On Pentium Pro and later (and maybe even on the Pentium 1) there is
>> no need to flush the queue but the far jump keeps things compatible
>> as it will flush the prefetch queue on early CPUs as well as load the
>> Pmode CS on all of them.
>
> ?? isn't this far jmp 'the switch point'.

Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
processing in Pmode before doing a far jump. That post-PE processing
could even include the LGDT instruction!

It seems the 486 was similar to the 386 in what was required after
setting PE. Link below but is a large download:

https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmersReferenceManual1990_29642780/i486_Processor_Programmers_Reference_Manual_1990.pdf

...

>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>> still in 16-bit mode. Right?
>
> Right, the far jmp is 'the switch'.

Not on the 386 or 486. Two more that you would classify as for the
museum...?

James


James Harris

unread,
Apr 27, 2015, 9:02:30 AM4/27/15
to
"Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message
news:op.xxqgoy2nwa0t4m@localhost...
> On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
> <james.h...@gmail.com> wrote:

...

>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>> still in 16-bit mode. Right?
>
> I'm not clear on all the details here ...
>
> My understanding is that CR0 enables PM, but the code segment is still
> functioning as 16-bit RM until the CS segment descriptor cache is
> updated
> for PM which is done via a far jump setting the CS selector. The PM
> selector determines a PM 16-bit or PM 32-bit code segment.

Those specifics seem to depend on which processor you are using. The
Pentium's requirements are different from those for 386 and 486, for
example. Fortunately Intel suggest a sequence that will work on anything
from a 386 upwards so we don't need to think about the differences.

>> 6. Now, where it gets interesting is that the offset field of the EA
>> jump instruction seems to be an offset not from the jump instruction
>> but
>> from the start if the segment. Is that correct?
>
> Seems so ...
>
> If you use the offset generated by the assembler label, it's likely to
> be wrong since your PM base address for the code selector is likely to
> be different from the address for the RM code segment in CS. E.g., RM
> CS might be 07C0h while PM CS base address might be zero. So, the
> assembler offset is relative to 07C0h, while the instruction offset
> would need to be relative to zero.
>
> This is probably why I've found using far returns and indirect far
> jumps
> to be easier. You can compute the offset at runtime to match your
> code's
> actual location relative to the base address for the descriptor.

Yes, as long as you use the 32-bit version of the indirect jump - i.e.
with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
allows the dword keyword for that.)

...

>> To make an example, say that the code that will enable Pmode is
>> located
>> in memory so that the jump target is at physical address 0x12345. If
>> the
>> GDT entry for privileged code has been set to describe all of memory,
>> i.e. from address 0 to address ff....fff, then it will be impossible
>> to
>> use the 16-bit form of the EA jump instruction. Correct?
>>
>> Solutions?
>>
>
> 1) use a far return
> 2) use an indirect far call
> 3) set the PM descriptor base address for CS to match the RM segment
> for CS, but only temporarily. After updating the base address to
> it's final value, you may need to reload the selectors.

I like the idea of the indirect far call or, at least, an indirect far
jump because it makes the values easy to calculate. I am not so sure
about the far return. It may work in some or all cases but doesn't seem
to be guaranteed to work.

Your (3) sounds like one I suggested. In fact the whole GDT can be
temporary. In my case I would rather allocate space for the real GDT
once memory management is working so a temporary GDT is good just to get
Pmode working.

James


wolfgang kern

unread,
Apr 27, 2015, 9:21:25 AM4/27/15
to

James Harris wrote:

> ...

>>> 1. The jump appears to be necessary in order to put the correct pmode
>>> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
>>> and also to set the low bits of CS so that they contain the CPL and TI,
>>> all of which should be zero.

>> the value in a PM seg-reg is nothing else than the offset within the GDT
>> (lower bits are ignored/used elsewhere, so the GDT must be 8 byte
>> aligned). Nothing is shifted here.

> Did you see somewhere that the GDT must be so aligned?
> For sure it is a good idea but is it needed?

Yes! seems all manuals from 286 onward tell it.
Yes! it will raise an exception on the first access otherwise.

>>> 2. On the 386 any kind of jump was needed immediately following the MOV
>>> to CR0 - even a near jump - in order to flush the prefetch queue. On
>>> Pentium Pro and later (and maybe even on the Pentium 1) there is no need
>>> to flush the queue but the far jump keeps things compatible as it will
>>> flush the prefetch queue on early CPUs as well as load the Pmode CS on
>>> all of them.

>> ?? isn't this far jmp 'the switch point'.

> Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
> processing in Pmode before doing a far jump. That post-PE processing
> could even include the LGDT instruction!

> It seems the 486 was similar to the 386 in what was required after
> setting PE. Link below but is a large download:

https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmers
ReferenceManual1990_29642780/i486_Processor_Programmers_Reference_
Manual_1990.pdf

I have all these expensive old Intel-books on my shelf.
But many time passed since I read them.

>>>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>>>> still in 16-bit mode. Right?

>>> Right, the far jmp is 'the switch'.

>> Not on the 386 or 486. Two more that you would classify as for the
>> museum...?

:) yes almost, I haven't seen any working 386 since long.
and I never had a 386 nor any Intels after 80286.

I would have to look at these dusty old books to confirm your staement,
but what I once tried on my old AMD 486 seem to work the same way as
modern CPUs on this matter: strange behavior if instructions are between
CR0-wr and jmp-far because this 'intermediate mode' may not work as
expected.
Even we know a few brave enough to make such a mode a special case,
there is no guarantee that this work on another CPU too.
__
wolfgang

James Harris

unread,
Apr 27, 2015, 10:19:27 AM4/27/15
to
"wolfgang kern" <now...@never.at> wrote in message
news:mhld4g$avp$1...@speranza.aioe.org...

...

>>> so the GDT must be 8 byte aligned ....

>> Did you see somewhere that the GDT must be so aligned?
>> For sure it is a good idea but is it needed?
>
> Yes! seems all manuals from 286 onward tell it.
> Yes! it will raise an exception on the first access otherwise.

Could it have been the case for the 286 and possibly some but not all
later processors? I ask because I have found a comment in a PPro manual,
as follows: "The base addresses of the GDT should be aligned on an
eight-byte boundary to yield the best processor performance."

That seems to say that it is advisable, and hence not mandatory, on the
PPro. The manual is the PPro Family Developer's Manual Vol 3: OS
Writer's Guide.

After a search I also found "The base addresses of the GDT and IDT
should be aligned on an eight-byte boundary to maximize performance of
cache line fills." in a PPro manual vol 3.

Again, "should" not "must"!

...


>>>>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU
>>>>> is
>>>>> still in 16-bit mode. Right?
>
>>>> Right, the far jmp is 'the switch'.
>
>>> Not on the 386 or 486. Two more that you would classify as for the
>>> museum...?
>
> :) yes almost, I haven't seen any working 386 since long.
> and I never had a 386 nor any Intels after 80286.
>
> I would have to look at these dusty old books to confirm your
> staement,
> but what I once tried on my old AMD 486 seem to work the same way as
> modern CPUs on this matter: strange behavior if instructions are
> between
> CR0-wr and jmp-far because this 'intermediate mode' may not work as
> expected.
> Even we know a few brave enough to make such a mode a special case,
> there is no guarantee that this work on another CPU too.

Yes, this is unimportant as long as we follow the standard sequence -
until someone someone asks why a certain piece of code is not working,
of course.

FYI

http://css.csail.mit.edu/6.858/2014/readings/i386/s10_03.htm

Here's a quote from a 486 manual dated 1990:

"Protected mode is entered by setting the PE bit in the CRO, register.
Either an LMSW or MOV CRO instruction may be used to set this bit (the
MSW register is part of the CRO register). Because the processor
overlaps the interpretation of several instructions, it is necessary to
discard the instructions which already have been read into the
processor. A JMP instruction immediately after the LMSW instruction
changes the flow of execution, so it has the effect of emptying the
processor of instructions which have been fetched or decoded.

"After entering protected mode, the segment registers continue to hold
the contents they had in real address mode. Software should reload all
the segment registers. Execution in protected mode begins with a CPL of
O."

So it seems that for 386 and 486 only a jmp is needed. I found a Pentium
manual which seems to say that no jump is needed for the Pentium but
should be included for compatibility with the 386 and 486.

Of course, this is just for enabling Pmode ... enabling paging has its
own stipulations for forward and backward compatibility. (The Pentium
manual is helpful here but off topic for this thread.)

James


Rod Pemberton

unread,
Apr 27, 2015, 6:46:14 PM4/27/15
to
On Mon, 27 Apr 2015 09:02:28 -0400, James Harris
<james.h...@gmail.com> wrote:
> "Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message
> news:op.xxqgoy2nwa0t4m@localhost...
>> On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
>> <james.h...@gmail.com> wrote:

>>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>>> still in 16-bit mode. Right?
>>
>> I'm not clear on all the details here ...
>>
>> My understanding is that CR0 enables PM, but the code segment is still
>> functioning as 16-bit RM until the CS segment descriptor cache is
>> updated for PM which is done via a far jump setting the CS selector.
>> The PM selector determines a PM 16-bit or PM 32-bit code segment.
>
> Those specifics seem to depend on which processor you are using. The
> Pentium's requirements are different from those for 386 and 486, for
> example. Fortunately Intel suggest a sequence that will work on anything
> from a 386 upwards so we don't need to think about the differences.

True, the manuals vary on the process needed. AIR, we've discussed some
aspects of that in the past, which were many and varied.

You deviated from my point that was in response to you saying "... the
CPU is still in 16-bit mode. Right?" AIUI, you're not fully in PM simply
by setting CR0.PE. The far jump also is required to activate PM for the
executing code segment.

>>> 6. Now, where it gets interesting is that the offset field of the EA
>>> jump instruction seems to be an offset not from the jump instruction
>>> but
>>> from the start if the segment. Is that correct?
>>
>> Seems so ...
>>
>> If you use the offset generated by the assembler label, it's likely to
>> be wrong since your PM base address for the code selector is likely to
>> be different from the address for the RM code segment in CS. E.g., RM
>> CS might be 07C0h while PM CS base address might be zero. So, the
>> assembler offset is relative to 07C0h, while the instruction offset
>> would need to be relative to zero.
>>
>> This is probably why I've found using far returns and indirect far
>> jumps to be easier. You can compute the offset at runtime to match
>> your code's actual location relative to the base address for the
>> descriptor.
>
> Yes, as long as you use the 32-bit version of the indirect jump - i.e.
> with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
> allows the dword keyword for that.)

Why would you need the 32-bit version when the code is to be loaded
below 1MB? ...

>>> To make an example, say that the code that will enable Pmode is
>>> located in memory so that the jump target is at physical address
>>> 0x12345. If the GDT entry for privileged code has been set to describe
>>> all of memory, i.e. from address 0 to address ff....fff, then it will
>>> be impossible to use the 16-bit form of the EA jump instruction.
>>> Correct?
>>>
>>> Solutions?
>>>
>>
>> 1) use a far return
>> 2) use an indirect far call
>> 3) set the PM descriptor base address for CS to match the RM segment
>> for CS, but only temporarily. After updating the base address to
>> it's final value, you may need to reload the selectors.
>
> I like the idea of the indirect far call or, at least, an indirect far
> jump because it makes the values easy to calculate.

...

> I am not so sure about the far return. It may work in some or all
> cases but doesn't seem to be guaranteed to work.

Why do you say it doesn't seem to be guaranteed to work? ...
I don't recall ever seeing anything in the manuals to indicate that.
Shouldn't any control transfer which sets CS to a PM selector work?

> Your (3) sounds like one I suggested. In fact the whole GDT can be
> temporary. In my case I would rather allocate space for the real GDT
> once memory management is working so a temporary GDT is good just to get
> Pmode working.

I took your comments to mean not re-using the same descriptor,
whereas mine was to re-use the same descriptor.

James Harris

unread,
Apr 28, 2015, 4:08:56 PM4/28/15
to
"Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message
news:op.xxrv7rmmwa0t4m@localhost...
I didn't mean to deviate but I can see I wasn't explicit. I meant that,
perhaps, for the 386 and 486 you really are in Pmode once the prefetch
queue has drained. I know that the segment registers will not have been
loaded with Pmode selectors but the early manuals seem to say that you
are nonetheless in Pmode. For example, I expect that data accesses will
be 32-bit by default at that point.
Why 1MB Rod? Wouldn't you need the 32-bit version above 64k?
Specifically, I think it would be needed any time the jump target was
more than 64k beyond the new CS.

Notably, even if the calculated address is above 64k the assembler may
*silently* truncate the jump offset to 16 bits so it's something to be
aware of. Some innocuous
change elsewhere in the code could make the code bigger and thereby push
the jump target over the threshold and cause the code to fail. So it may
be a good idea to code the dword version anyway. That would make the
code more resilient against changes.

>>>> To make an example, say that the code that will enable Pmode is
>>>> located in memory so that the jump target is at physical address
>>>> 0x12345. If the GDT entry for privileged code has been set to
>>>> describe
>>>> all of memory, i.e. from address 0 to address ff....fff, then it
>>>> will
>>>> be impossible to use the 16-bit form of the EA jump instruction.
>>>> Correct?
>>>>
>>>> Solutions?
>>>>
>>>
>>> 1) use a far return
>>> 2) use an indirect far call
>>> 3) set the PM descriptor base address for CS to match the RM segment
>>> for CS, but only temporarily. After updating the base address to
>>> it's final value, you may need to reload the selectors.
>>
>> I like the idea of the indirect far call or, at least, an indirect
>> far
>> jump because it makes the values easy to calculate.
>
> ...
>
>> I am not so sure about the far return. It may work in some or all
>> cases but doesn't seem to be guaranteed to work.
>
> Why do you say it doesn't seem to be guaranteed to work? ...
> I don't recall ever seeing anything in the manuals to indicate that.
> Shouldn't any control transfer which sets CS to a PM selector work?

Because Intel say that for backwards and forwards compatibility to do X
and X includes a far jump and a far call but not a far return.

Of course, Intel is not the only manufacturer. At one time there were
many. Hopefully they work with Intel's suggested initialisation
sequence.

>> Your (3) sounds like one I suggested. In fact the whole GDT can be
>> temporary. In my case I would rather allocate space for the real GDT
>> once memory management is working so a temporary GDT is good just to
>> get
>> Pmode working.
>
> I took your comments to mean not re-using the same descriptor,
> whereas mine was to re-use the same descriptor.

OK.

James


Rod Pemberton

unread,
Apr 28, 2015, 8:05:51 PM4/28/15
to
On Mon, 27 Apr 2015 00:13:36 -0400, Rod Pemberton <bo...@lllljunkqwer.cpm>
wrote:

>> To make an example, say that the code that will enable Pmode is located
>> in memory so that the jump target is at physical address 0x12345. If the
>> GDT entry for privileged code has been set to describe all of memory,
>> i.e. from address 0 to address ff....fff, then it will be impossible to
>> use the 16-bit form of the EA jump instruction. Correct?
>>
>> Solutions?
>>
>
> 1) use a far return
> 2) use an indirect far call
> 3) set the PM descriptor base address for CS to match the RM segment
> for CS, but only temporarily. After updating the base address to
> it's final value, you may need to reload the selectors.
>

Of course, the easiest solution is a PM descriptor base address of zero,
and matching RM CS equal to zero. But, CS == 0 brings up the entire
CS=0,IP=7C00h versus CS=07C0h,IP=0 debate, again, which we've gone
over countless times now.

James Harris

unread,
Apr 29, 2015, 4:44:24 AM4/29/15
to
"Rod Pemberton" <bo...@lllljunkqwer.cpm> wrote in message
news:op.xxtukfnbwa0t4m@localhost...
> On Mon, 27 Apr 2015 00:13:36 -0400, Rod Pemberton
> <bo...@lllljunkqwer.cpm>
> wrote:
>
>>> To make an example, say that the code that will enable Pmode is
>>> located
>>> in memory so that the jump target is at physical address 0x12345. If
>>> the
>>> GDT entry for privileged code has been set to describe all of
>>> memory,
>>> i.e. from address 0 to address ff....fff, then it will be impossible
>>> to
>>> use the 16-bit form of the EA jump instruction. Correct?
>>>
>>> Solutions?
>>>
>>
>> 1) use a far return
>> 2) use an indirect far call
>> 3) set the PM descriptor base address for CS to match the RM segment
>> for CS, but only temporarily. After updating the base address to
>> it's final value, you may need to reload the selectors.
>>
>
> Of course, the easiest solution is a PM descriptor base address of
> zero,
> and matching RM CS equal to zero.

You would still need the "dword" or 0x66 variant to get to 0x12345,
wouldn't you? I've come to think that having the dword variant in at all
times is probably a good idea because it is more robust, albeit that it
is three bytes longer.

Put another way, if the dword variant was the default then when omitted
someone in someone could ask "why did you omit dword there?" and get the
answer "because we are guaranteed that the branch target will be within
64k of the Pmode CS" or similar. Having it omitted by default
(especially in sample code) leads to it not being considered and
therefore code which is either broken or fragile.

I may also use a temporary GDT entry (or a temporary GDT) which has a
privileged CS descriptor set to match the RM CS=0 point. That makes the
transition code simpler.

Another good option is the indirect jump you mentioned. That, too,
allows the target address to be calculated.

At the end of the day this has been an interesting thread for me because
I have realised that it can be too easy simply not to think about what
that jump does. Wolfgang had sussed it out but it is something I had
never thought about in enough detail because sample code just makes it
look like a simple jump - often a jump to the next instruction. Easy
right? No! As we have discussed, it is only that simple under certain
conditions, not always.

Another interesting finding was that the immediate jump naturally has
16-bit operands, not 32-bit ones. So in assembly source it's right to
place the "bits 32" directive *after* the jump and not before it.

> But, CS == 0 brings up the entire
> CS=0,IP=7C00h versus CS=07C0h,IP=0 debate, again, which we've gone
> over countless times now.

AIUI that is (or was) a different issue. Although the boot sector code
(if any) will be loaded at 0x7c00 there is no guarantee that it will
move itself out of the way and load any following code there. So the
subsequent code may be somewhere completely different.

Even with CS=0 a jump to the suggested jump-target address of 0x12345
would need the dword variant wouldn't it, because of the target being
beyond the 64k point?

James


James Harris

unread,
Jun 12, 2021, 3:06:12 PMJun 12
to
On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:
> You know that, per Intel's directions, after setting the CR0 PE flag
> with
>
> mov eax, cr0
> or al, 1
> mov cr0, eax
>
> we are expected to have something like
>
> jmp seg:pmode_running
>
> I had taken that jump instruction for granted but the recent Qemu/GDT
> thread has brought up some issues about the jump, as follows.
>
> 1. The jump appears to be necessary in order to put the correct pmode
> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
> and also to set the low bits of CS so that they contain the CPL and TI,
> all of which should be zero.


Going back to this old thread as I have some more information.

The Sybex book Programming the 80386 says on the back that its authors are one of the 80386's logic designers and the 80386's (and thus x86-32's) chief architect, so they ought to know a thing or two about the design! On page 605 the book says that after setting the PE bit the processor enters "16-bit protected mode" - which I didn't know even existed (but see below).

The processor apparently stays in that mode (Protected Mode but 16-bit) for as long as we want, and what changes it to 32-bit PMode is an instruction which loads CS with the selector for a 32-bit descriptor.

Correspondingly, it would appear to be possible to change /back/ to 16-bit PMode by loading CS with the selector of a 16-bit descriptor.

As for what determines whether a descriptor is 32-bit or 16-bit remember that apart from Base and Limit a descriptor has one byte and one nibble for other fields. I thought the difference might be in the descriptor type code (which is in the byte) but it's actually in the B bit which is in the nibble. See

https://en.wikipedia.org/wiki/Segment_descriptor#Structure

Basically, B=0 means 16-bit and B=1 means 32-bit.

I knew about the bit, of course, but never really understood it; and I didn't realise that B=0 was the mode the processor executed in after enabling PMode prior to reloading CS.


>
> 2. On the 386 any kind of jump was needed immediately following the MOV
> to CR0 - even a near jump - in order to flush the prefetch queue. On
> Pentium Pro and later (and maybe even on the Pentium 1) there is no need
> to flush the queue but the far jump keeps things compatible as it will
> flush the prefetch queue on early CPUs as well as load the Pmode CS on
> all of them.

The different stages of enabling PMode are now possibly easier to guess at.

1. After enabling CR0.PE

The processor will be in 16-bit PM but early processors (including 386 and 486) did not automatically flush the prefetch queue so they could have already decoded some of the following bytes as RM. I would guess that many such decodings would be different but that some could be the same. If that's right then some instructions could be validly executed here even without flushing the prefetch queue. There's no value in doing so but ISTM informative to see exactly what it likely to change and when.

2. After flushing the prefetch queue

This will apparently be true 16-bit PM with a 64k limit on code addresses - and probably the same for data addresses.

From this point it turns out that contrary to normal practice one could load selectors for 32-bit /data/ descriptors while still running the code in 16-bit mode. I say that because the code in

https://archive.org/stream/bitsavers_intel80386ammersReferenceManual1986_27457025/230985-001_80386_Programmers_Reference_Manual_1986_djvu.txt

includes the following:

LGDT tGDT__pword

; switch to protected mode
MOV EAX,CR0
MOV EAX,1
MOV CR0,EAX

; clear prefetch queue
JMP SHORT flush
flush:

; set DS,ES,SS to address flat linear space (0 ... 4GB)
MOV BX,FLAT_DES-Temp_GDT
MOV DS,BX
MOV ES,BX
MOV SS,BX

Note the data selectors being loaded before the code selector (which the code changes much later) - and the botched update of CR0 which appeared in many Intel sources of the time.

FWIW, the code also goes on to do a bunch of other stuff such as

; initialize stack pointer to some (arbitrary) RAM location
MOV ESP, OFFSET end_Temp_GDT

; copy eprom GDT to RAM
MOV ESI, DWORD PTR GDT_eprom +2 ; get base of eprom GDT
MOV EDI,GDTbase
MOV CX,WORD PTR gdt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

; point ES:EDI to GDT base in RAM.
; limit of eprom GDT
; easier to move words

;copy eprom IDT to RAM
MOV ESI, DWORD PTR IDT_eprom +2 ; get base of eprom IDT
MOV EDI,IDTbase
MOV CX,WORD PTR idt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

etc, all before setting CS to a 32-bit descriptor.


3. After loading CS (via a far jump or fall call or by a TSS switch) to refer to a 32-bit descriptor.

The code will finally be in PM32.

>
> I knew the above but the following points are of particular interest
> just now as I had not considered them before - or if I had then I had
> forgotten the subtleties of the problem.


>
> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
> encoded in hex as
>
> EA oo oo ss ss (16-bit form)
> EA oo oo oo oo ss ss (32-bit form)
>
> where the Ss are the selector and the Os are the offset as hex bytes.

The above info implies that it's the /short/ version of the jump which is required.

...

> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
> still in 16-bit mode. Right?


That turns out to be right.
--
James Harris

James Harris

unread,
Jun 12, 2021, 3:23:37 PMJun 12
to
On 12/06/2021 20:06, James Harris wrote:
> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:
>> You know that, per Intel's directions, after setting the CR0 PE flag
>> with
>>
>> mov eax, cr0
>> or al, 1
>> mov cr0, eax
>>
>> we are expected to have something like
>>
>> jmp seg:pmode_running
>>
>> I had taken that jump instruction for granted but the recent Qemu/GDT
>> thread has brought up some issues about the jump, as follows.

...

>>
>> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
>> encoded in hex as
>>
>> EA oo oo ss ss (16-bit form)
>> EA oo oo oo oo ss ss (32-bit form)
>>
>> where the Ss are the selector and the Os are the offset as hex bytes.
>
> The above info implies that it's the /short/ version of the jump which is required.

I found some code from AMD which confirms it. Note the two dws in the
following code fragment.

mov eax, 000000011h
mov cr0, eax

;Execute a far jump to turn protected mode on.
;code16_sel must point to the previously-established 16-bit
;code descriptor located in the GDT (for the code currently
;being executed).
db 0eah ;Far jump...
dw offset now_in_prot ; to offset...
dw code16_sel ; in current code segment


That's from the AMD64 Architecture Programmer’s Manual Volume 2: System
Programming, publication number 24593 revision 3.11 December 2005.


That's a good, clear confirmation.

BTW, it's unusual to see CR0's bit 1 set at the same time....



--
James Harris

Rod Pemberton

unread,
Jun 12, 2021, 6:29:58 PMJun 12
to
On Sat, 12 Jun 2021 12:06:11 -0700 (PDT)
James Harris <james.h...@gmail.com> wrote:

> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:

> The Sybex book Programming the 80386 says on the back that its
> authors are one of the 80386's logic designers and the 80386's (and
> thus x86-32's) chief architect, so they ought to know a thing or two
> about the design! On page 605 the book says that after setting the PE
> bit the processor enters "16-bit protected mode" - which I didn't
> know even existed (but see below).
>

I didn't fully re-read the 2015 thread, but I think my perspective
hasn't changed any. I.e., after enabling CR0.PE, it's the re-loading
the CS selector which causes the processor to switch into protected
mode, be it 16-bit PM or 32-bit PM. Without the jump to re-load CS, CS
is still set to a 16-bit RM segment. I.e., the processor is still
executing 16-bit RM code after CR0.PE is set, but prior to a far jump
re-loading CS with a valid selector for PM. In other words, simply
enabling CR0.PE doesn't activate PM, the code is still 16-bit RM code,
but perhaps some of the PM features have been enabled by setting CR0.PE.

When CR0.PE is enabled, CS isn't a selector but a segment, and so CS
doesn't point to a descriptor. No descriptor means no CS.ar.D bit
which determines a 16-bit PM or 32-bit PM code segment. CS won't be
set to a selector with a valid descriptor with a CS.ar.D bit, until a
far jump re-loads CS, but only after CR0.PE is enabled.

> The different stages of enabling PMode are now possibly easier to
> guess at.
>
> 1. After enabling CR0.PE
>
> The processor will be in 16-bit PM but early processors (including
> 386 and 486) did not automatically flush the prefetch queue so they
> could have already decoded some of the following bytes as RM. I would
> guess that many such decodings would be different but that some could
> be the same. If that's right then some instructions could be validly
> executed here even without flushing the prefetch queue. There's no
> value in doing so but ISTM informative to see exactly what it likely
> to change and when.

I'm still of the mind that after CR0.PE is set, a valid 16-bit PM
selector must be loaded into CS via a far jump to enable 16-bit PM or
32-bit PM. Otherwise, CS still contains a 16-bit RM segment, and the
processor is still executing 16-bit RM code, perhaps with some PM
features enabled due to CR0.PE being set.

--
What is hidden in the ground, when found, is hidden there again?

wolfgang kern

unread,
Jun 13, 2021, 9:59:02 AMJun 13
to
On 12.06.2021 21:06, James Harris wrote:
> On Sunday, 26 April 2015 at 19:54:07 UTC+1, James Harris wrote:
>> You know that, per Intel's directions, after setting the CR0 PE flag
>> with
>>
>> mov eax, cr0
>> or al, 1
>> mov cr0, eax
>>
>> we are expected to have something like
>>
>> jmp seg:pmode_running
>>
>> I had taken that jump instruction for granted but the recent Qemu/GDT
>> thread has brought up some issues about the jump, as follows.
>>
>> 1. The jump appears to be necessary in order to put the correct pmode
>> GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
>> and also to set the low bits of CS so that they contain the CPL and TI,
>> all of which should be zero.
>
>
> Going back to this old thread as I have some more information.
>
> The Sybex book Programming the 80386 says on the back that its authors are one of the 80386's logic designers and the 80386's (and thus x86-32's) chief architect, so they ought to know a thing or two about the design! On page 605 the book says that after setting the PE bit the processor enters "16-bit protected mode" - which I didn't know even existed (but see below).

Old (some newer as well) docs are often written with doubtful wording.
I remember my early attempts to enter PM (but 486), it doesn't work that
way, PM16 or PM32 start is exactly at the point where CS become altered.

> The processor apparently stays in that mode (Protected Mode but 16-bit) for as long as we want, and what changes it to 32-bit PMode is an instruction which loads CS with the selector for a 32-bit descriptor.

PM16 or 32 just differ by one bit in the code segment selector.

> Correspondingly, it would appear to be possible to change /back/ to 16-bit PMode by loading CS with the selector of a 16-bit descriptor.

Yes, my mode switch from PM32 to RM16 needs one step to PM16 in between
but the switches from RM16 to PM32 or LM don't need a PM16 step.

...
> https://en.wikipedia.org/wiki/Segment_descriptor#Structure
> Basically, B=0 means 16-bit and B=1 means 32-bit.
> I knew about the bit, of course, but never really understood it; and I didn't realise that B=0 was the mode the processor executed in after enabling PMode prior to reloading CS.

yeah, it wasn't too well documented :)

> The different stages of enabling PMode are now possibly easier to guess at.

> 1. After enabling CR0.PE

> The processor will be in 16-bit PM

NO!

> but early processors (including 386 and 486) did not automatically flush the prefetch queue so they could have already decoded some of the following bytes as RM. I would guess that many such decodings would be different but that some could be the same. If that's right then some instructions could be validly executed here even without flushing the prefetch queue. There's no value in doing so but ISTM informative to see exactly what it likely to change and when.
>
> 2. After flushing the prefetch queue
>
> This will apparently be true 16-bit PM with a 64k limit on code addresses - and probably the same for data addresses.
>
> From this point it turns out that contrary to normal practice one could load selectors for 32-bit /data/ descriptors while still running the code in 16-bit mode. I say that because the code in
>
> https://archive.org/stream/bitsavers_intel80386ammersReferenceManual1986_27457025/230985-001_80386_Programmers_Reference_Manual_1986_djvu.txt

it's a bit dated :)

> includes the following:

> LGDT tGDT__pword

missing: CLI
and the GDT should already contain valid entries here.

> ; switch to protected mode

correction: prepare to switch

> MOV EAX,CR0
> MOV EAX,1
> MOV CR0,EAX

redundant:
;> ; clear prefetch queue
;> JMP SHORT flush
;> flush:

> ; set DS,ES,SS to address flat linear space (0 ... 4GB)
> MOV BX,FLAT_DES-Temp_GDT
> MOV DS,BX
> MOV ES,BX
> MOV SS,BX

> Note the data selectors being loaded before the code selector (which the code changes much later) - and the botched update of CR0 which appeared in many Intel sources of the time.

it doesn't matter where data/stack selectors were initialized as long
they aren't used. I make DS SS:ESP before, ES,FS.GS after the far jump.

> FWIW, the code also goes on to do a bunch of other stuff such as

> ; initialize stack pointer to some (arbitrary) RAM location
> MOV ESP, OFFSET end_Temp_GDT

wherever you want it to be :)

> ; copy eprom GDT to RAM
> MOV ESI, DWORD PTR GDT_eprom +2 ; get base of eprom GDT
> MOV EDI,GDTbase
> MOV CX,WORD PTR gdt_eprom +0
> INC CX
> SHR CX,1
> CLD
> REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]
>
> ; point ES:EDI to GDT base in RAM.
> ; limit of eprom GDT
> ; easier to move words
>
> ;copy eprom IDT to RAM
> MOV ESI, DWORD PTR IDT_eprom +2 ; get base of eprom IDT
> MOV EDI,IDTbase
> MOV CX,WORD PTR idt_eprom +0
> INC CX
> SHR CX,1
> CLD
> REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

this ROM GDT may point to an already wiped RAM area !!!
modern (already old now) BIOS use only temporary PM32 and LM.

> etc, all before setting CS to a 32-bit descriptor.

> 3. After loading CS (via a far jump or fall call or by a TSS switch) to refer to a 32-bit descriptor.

I wouldn't use a far CALL nor (total worse) a task-switch here.
if you're brave you could use what I do for mode switches but only after
the initial PM and stack setup:

PUSH EFL ;needs 66 if within 16 bit code
CLI
PUSH new_descriptor ;I use immediate constants here
PUSH new_offset ;
IRET

> The code will finally be in PM32.

could be in PM16 as well.

>> I knew the above but the following points are of particular interest
>> just now as I had not considered them before - or if I had then I had
>> forgotten the subtleties of the problem.

OK, nothing new on the matter to me.

>> 3. That jump instruction has a 16-bit form and a 32-bit form. It is
>> encoded in hex as

>> EA oo oo ss ss (16-bit form)
>> EA oo oo oo oo ss ss (32-bit form)

>> where the Ss are the selector and the Os are the offset as hex bytes.
> The above info implies that it's the /short/ version of the jump which is required.

yes of course because this jump is still RM16 [NOT PM16] code.

>> 5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
>> still in 16-bit mode. Right?
> That turns out to be right.

Yes still in 16 bit Realmode, until CS is loaded.

>> 6. Now, where it gets interesting is that the offset field of the EA
>> jump instruction seems to be an offset not from the jump instruction but
>> from the start if the segment. Is that correct? If so then we have to be
>> careful which jump form is encoded, as follows.

>> If the executing code is in the low 64k of a descriptor's space then we
>> can encode the simple

>> EA oo oo ss ss
>>
>> because the offset can fit in 16 bits. But if the executing code is
>> above the 64k mark relative to the start of the segment then we need to
>> encode the 32-bit form for 16-bit mode, i.e.
>>
>> 66 EA oo oo oo oo ss ss

only if there is already code loaded (guess how to do before PM..).

>> Solution 1. Set up a temporary GDT entry to point to the place in memory
>> where the code is running. In the above case, the GDT entry could point
>> at 0x10000 and then the jump offset would be 0x2345, leading to the jump
>> instruction being encoded as

>> EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

0000:0600 66 EA 18 00 45 23 10 00 jmp far 0018:102345 ;assume FLAT CS
FFFF:2355 ;aka 0010:2345 PM32 code
Reply all
Reply to author
Forward
0 new messages