PDOS/86

356 views
Skip to first unread message

muta...@gmail.com

unread,
Jul 9, 2021, 9:44:39 AMJul 9
to
I think PDOS/86 should probably only be compatible
with MSDOS at the PosWriteFile() etc level, just as
PDOS/386 is.

In particular, for allocating memory, I would like the
"allocate memory" call to return a seg:offs rather than
just a seg. In a flexible segment shift environment,
this would allow multiple requests to share a single
64k page, even if the segment shift is 16 bits.

But this introduces another issue. Callers have
traditionally cared about being properly aligned, but
segmentation introduces a requirement for the
caller to be able to request that the entire block
be within a single segment. Huge memory model
callers don't care about that, but everyone else
does, so some sort of indicator, or different
INT 21H function code would seem to be
appropriate.

BFN. Paul.

muta...@gmail.com

unread,
Jul 11, 2021, 1:23:49 AMJul 11
to
On Friday, July 9, 2021 at 11:44:39 PM UTC+10, muta...@gmail.com wrote:

> But this introduces another issue. Callers have
> traditionally cared about being properly aligned, but
> segmentation introduces a requirement for the
> caller to be able to request that the entire block
> be within a single segment. Huge memory model
> callers don't care about that, but everyone else
> does, so some sort of indicator, or different
> INT 21H function code would seem to be
> appropriate.

I think we have the following use cases:

1. Less than 64k memory requested, caller expecting
an offset of 0.

2. More than 64k memory requested, caller expecting
an offset of 0.

3. Less than 64k memory requested, caller can handle
a non-zero offset, but the memory block must be fully
contained within the segment returned.

4. Less than 64k memory requested, caller can handle
a block spanning a segment, but the offset must be 0.

5. More than 64k memory requested, caller can handle
both a non-zero offset and the memory spanning more
than 1 segment.

I think implementing 4 should be delayed until there is
an actual use case though.

5 is what huge memory model programs should be requesting,
to be nice.

3 is useful for non-huge memory model programs when
segment shifts are increased to 16 bits or whatever.

1 and 2 are the only thing MSDOS provides, and are
distinguished by the number of pages requested.

BFN. Paul.

muta...@gmail.com

unread,
Jul 11, 2021, 1:27:35 AMJul 11
to
On Sunday, July 11, 2021 at 3:23:49 PM UTC+10, muta...@gmail.com wrote:

> 4. Less than 64k memory requested, caller can handle
> a block spanning a segment, but the offset must be 0.

> I think implementing 4 should be delayed until there is
> an actual use case though.

Strike number 4. The memory won't span a segment if
the offset is 0, so it's the same as 1.

BFN. Paul.

muta...@gmail.com

unread,
Jul 11, 2021, 2:05:00 AMJul 11
to
On Sunday, July 11, 2021 at 3:23:49 PM UTC+10, muta...@gmail.com wrote:

> 5. More than 64k memory requested, caller can handle
> both a non-zero offset and the memory spanning more
> than 1 segment.

More than 64k of memory spans a segment by definition,
so that can be reduced to:

5. More than 64k memory requested, caller can handle
a non-zero offset.

Therefore the only thing required is whether you can handle
a non-zero offset.

We just need another allocate-memory call that takes a
32-bit size and a flag to say whether it needs a 0 offset
(ie be on a segment boundary).

BFN. Paul.

wolfgang kern

unread,
Jul 11, 2021, 2:57:45 AMJul 11
to
On 11.07.2021 08:04, muta...@gmail.com wrote:
...
> We just need another allocate-memory call that takes a
> 32-bit size and a flag to say whether it needs a 0 offset
> (ie be on a segment boundary).

while you have your own OS (even a fictive one yet) then
you can use just one single GETMEM variant like I use in
my OS:

getmem:

called by RM16 or PM32 entry points or INT7F for both

input:AL=mode 0=seg:offset 1=32bit flat
ECX= requested size in bytes

returns:
Carry+Z size not available, Cy+NZ no more handles
No Carry: means no error, then
BX=handle mode0 EDI=flat start
mode1 ES:DI = start

and yes, the flat mode works for RM16(unreal) as well.
__
wolfgang

muta...@gmail.com

unread,
Jul 11, 2021, 3:38:45 AMJul 11
to
On Sunday, July 11, 2021 at 4:57:45 PM UTC+10, wolfgang kern wrote:
> On 11.07.2021 08:04, muta...@gmail.com wrote:
> ...
> > We just need another allocate-memory call that takes a
> > 32-bit size and a flag to say whether it needs a 0 offset
> > (ie be on a segment boundary).

> while you have your own OS (even a fictive one yet) then

PDOS/86 is not fictional. It's been available for decades
and predates PDOS/386.

> you can use just one single GETMEM variant like I use in
> my OS:
>
> getmem:
>
> called by RM16 or PM32 entry points or INT7F for both
>
> input:AL=mode 0=seg:offset 1=32bit flat
> ECX= requested size in bytes
>
> returns:
> Carry+Z size not available, Cy+NZ no more handles
> No Carry: means no error, then
> BX=handle mode0 EDI=flat start
> mode1 ES:DI = start
>
> and yes, the flat mode works for RM16(unreal) as well.

I'm using normal RM16, not unreal.

I want a solution that works on the 8086.

BFN. Paul.

wolfgang kern

unread,
Jul 11, 2021, 10:13:48 AMJul 11
to
On 11.07.2021 09:38, muta...@gmail.com wrote:
...
> I'm using normal RM16, not unreal.
> I want a solution that works on the 8086.

you posted an alloc option >64KB. even possible with
two merged 16 bit registers you'll break your nails
on the required overhead for consecutive blocks fit.

OK I see, you want an 8086 with 32 bit addressing :)
__
wolfgang

Branimir Maksimovic

unread,
Jul 11, 2021, 10:28:48 AMJul 11
to
Heh my T27 emulator used 64kb for code and 64kb for data.
it is full featured Unisys terminal, with features like
web browsers nowadays :P
> __
> wolfgang


--
bmaxa now listens Drowned in Fear by Graveworm from Engraved in Black

Scott Lurndal

unread,
Jul 11, 2021, 1:32:04 PMJul 11
to
Branimir Maksimovic <branimir....@gmail.com> writes:
>On 2021-07-11, wolfgang kern <now...@never.at> wrote:
>> On 11.07.2021 09:38, muta...@gmail.com wrote:
>> ...
>>> I'm using normal RM16, not unreal.
>>> I want a solution that works on the 8086.
>>
>> you posted an alloc option >64KB. even possible with
>> two merged 16 bit registers you'll break your nails
>> on the required overhead for consecutive blocks fit.
>>
>> OK I see, you want an 8086 with 32 bit addressing :)
>
>Heh my T27 emulator used 64kb for code and 64kb for data.
>it is full featured Unisys terminal, with features like
>web browsers nowadays :P

I have written a T27 emulator in C++/X11 for my V-series
emulator. I don't know that including a Web browser
in the emulator rather than using a dedicated browser
on the same host running the emulator is the most optimal solution.

What do you connect yours too? MCP Express?

Branimir Maksimovic

unread,
Jul 11, 2021, 2:34:28 PMJul 11
to
This was written for DOS to replace real T27 connected
to Unisis machine via serial port :P
Unisys price for it was 800$ :P


--
bmaxa now listens Satanic Propaganda (S.N.T.F.) by Diabolos Rising from Blood Vampirism And Sadism

muta...@gmail.com

unread,
Jul 11, 2021, 9:30:40 PMJul 11
to
On Monday, July 12, 2021 at 12:13:48 AM UTC+10, wolfgang kern wrote:
> On 11.07.2021 09:38, muta...@gmail.com wrote:
> ...
> > I'm using normal RM16, not unreal.
> > I want a solution that works on the 8086.

> you posted an alloc option >64KB. even possible with

Note that MSDOS already provides this, and things like
executables require that, since executables can be
greater than 64k.

> two merged 16 bit registers you'll break your nails
> on the required overhead for consecutive blocks fit.

Can you elaborate on this? PDOS/86 huge memory
model is still only theoretical. I am determined to
make this code:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/__memmgr.c

which was a general-purpose memory-manager for
use on the original MVS/380 (where the OS wouldn't
manage memory above 16 MiB for you - the app had
to do that itself), work as the memory manager for
PDOS/86, no matter how many gerbils need to die.

> OK I see, you want an 8086 with 32 bit addressing :)

I want an 8086 with 20-bit addressing, an 80286 with
24-bit addressing, and a theoretical x86 processor with
32-bit addressing. Come to think of it, it might be possible
to use an actual 80386 to do effective 16-bit segment
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.

BFN. Paul.

muta...@gmail.com

unread,
Jul 11, 2021, 10:05:18 PMJul 11
to
On Monday, July 12, 2021 at 11:30:40 AM UTC+10, muta...@gmail.com wrote:

> shifts. Or surely I can at least match the 80286 and do
> (effective) 8-bit shifts. That would be a load of fun.
> I guess it depends how many selectors I can define on
> the 80386. I'll run everything in supervisor mode, so I
> can use both GDT and LDT if that helps.

And if I can get the 8086 to trap and ignore db'66' and
db'67' I will be able to have 16-bit executables that work
on either shift value.

And since I want the OS to return to real mode to do
BIOS calls, the 80386 is probably much better than the
80286 for this.

But I believe the 80286 can do it anyway, with sufficient
hardware support, and that would be a fun thing to do
as well.

BFN. Paul.

wolfgang kern

unread,
Jul 12, 2021, 4:53:07 AMJul 12
to
On 12.07.2021 04:05, muta...@gmail.com wrote:

>> shifts. Or surely I can at least match the 80286 and do
>> (effective) 8-bit shifts. That would be a load of fun.
>> I guess it depends how many selectors I can define on
>> the 80386. I'll run everything in supervisor mode, so I
>> can use both GDT and LDT if that helps.

so why don't you stick to 386 and forget no more existing
old crap ?
> And if I can get the 8086 to trap and ignore db'66' and
> db'67' I will be able to have 16-bit executables that work
> on either shift value.

Now that's a really bad idea. compiled code may look like:

66 b8 44 33 22 11 MOV eax,imm32
67 03 84 11 55 44 33 22 ADD ax,[ecx+edx+d32]

so guess what's left w/o these prefixes:

b8 44 33 MOV ax,imm16
22 11 AND dl,[bx+di]

03 84 11 55 ADD ax,[si+d16]
44 INC sp
33 22 XOR sp,[[bp+si]

But IIRC 66 and 67 were ignored anyway on real 86/88.

> And since I want the OS to return to real mode to do
> BIOS calls, the 80386 is probably much better than the
> 80286 for this.

yes.

> But I believe the 80286 can do it anyway, with sufficient
> hardware support, and that would be a fun thing to do
> as well.

no. I wont recommend to use a solder iron on CMOS CPUs.
__
wolfgang

muta...@gmail.com

unread,
Jul 12, 2021, 7:15:50 AMJul 12
to
On Monday, July 12, 2021 at 6:53:07 PM UTC+10, wolfgang kern wrote:

> >> shifts. Or surely I can at least match the 80286 and do
> >> (effective) 8-bit shifts. That would be a load of fun.
> >> I guess it depends how many selectors I can define on
> >> the 80386. I'll run everything in supervisor mode, so I
> >> can use both GDT and LDT if that helps.

> so why don't you stick to 386 and forget no more existing
> old crap ?

I would like to have a solution to the 80286 too.

> > And if I can get the 8086 to trap and ignore db'66' and
> > db'67' I will be able to have 16-bit executables that work
> > on either shift value.

> Now that's a really bad idea. compiled code may look like:
>
> 66 b8 44 33 22 11 MOV eax,imm32

No, I won't use 32-bit instructions.

> 67 03 84 11 55 44 33 22 ADD ax,[ecx+edx+d32]
>
> so guess what's left w/o these prefixes:
>
> b8 44 33 MOV ax,imm16
> 22 11 AND dl,[bx+di]

If I just have bb 44 33, will it work on both the 8086
and the 80386?

> But IIRC 66 and 67 were ignored anyway on real 86/88.

Well that's fantastic then.

Unless I'm missing something. Something that the
compiler can't work around.

BFN. Paul.

muta...@gmail.com

unread,
Jul 12, 2021, 9:41:39 PMJul 12
to
On Monday, July 12, 2021 at 11:30:40 AM UTC+10, muta...@gmail.com wrote:

> 24-bit addressing, and a theoretical x86 processor with
> 32-bit addressing. Come to think of it, it might be possible
> to use an actual 80386 to do effective 16-bit segment
> shifts. Or surely I can at least match the 80286 and do
> (effective) 8-bit shifts. That would be a load of fun.
> I guess it depends how many selectors I can define on
> the 80386. I'll run everything in supervisor mode, so I
> can use both GDT and LDT if that helps.

I looked up Wikipedia:

https://en.wikipedia.org/wiki/Global_Descriptor_Table

and there are 8192*2 selectors available, meaning that
16-bit programs can access 1 GiB maximum.

4-bit segment shifts make sense when there is 1 MiB of
memory available.

5-bit when there is 2 MiB.

6-bit when there is 4 MiB.

So the pattern is:

C:\devel\bochs>zcalc 65536*2**14/1024/1024
Calculated Value is 1024.000000
Thank you for using the calculator

13 bit shifts are the last purposeful one, to give 512 MiB,
and after that, you may as well just use 16 bit shifts.

Given the restrictions of the 80386.

BFN. Paul.

wolfgang kern

unread,
Jul 12, 2021, 11:05:23 PMJul 12
to
On 12.07.2021 13:15, muta...@gmail.com wrote:

>> so why don't you stick to 386 and forget no more existing
>> old crap ?
> I would like to have a solution to the 80286 too.

you wont find a working 286 anymore, not even in the museum.
and return from PM needs hardware to react on forced crash.

>>> And if I can get the 8086 to trap and ignore db'66' and
>>> db'67' I will be able to have 16-bit executables that work
>>> on either shift value.

>> Now that's a really bad idea. compiled code may look like:
>> 66 b8 44 33 22 11 MOV eax,imm32

> No, I won't use 32-bit instructions.

with the prefix it is still a valid (386) 16 bit instruction!
and many applications may use overrides, BIOS also do it.

> If I just have bb 44 33, will it work on both the 8086
> and the 80386?

Yes as long 386 is in RM or PM16, but it become a five byte
opcode in PM32:
bb 44 33 22 11 MOV ebx.imm32
or a four byte:
66 bb 44 33 MOV bx.imm16
__
wolfgang

muta...@gmail.com

unread,
Jul 12, 2021, 11:28:18 PMJul 12
to
On Tuesday, July 13, 2021 at 1:05:23 PM UTC+10, wolfgang kern wrote:

> >> so why don't you stick to 386 and forget no more existing
> >> old crap ?

> > I would like to have a solution to the 80286 too.

> you wont find a working 286 anymore, not even in the museum.

On a phone call at work about 2 weeks ago, someone
was bragging that they had an AT.

On discord about 3 weeks ago, someone said they had
brushed off an old XT and wanted to do OS programming
on it and thus wanted a 16-bit compiler, at least ideally.

> and return from PM needs hardware to react on forced crash.

Ok.

> >>> And if I can get the 8086 to trap and ignore db'66' and
> >>> db'67' I will be able to have 16-bit executables that work
> >>> on either shift value.
>
> >> Now that's a really bad idea. compiled code may look like:
> >> 66 b8 44 33 22 11 MOV eax,imm32
>
> > No, I won't use 32-bit instructions.

> with the prefix it is still a valid (386) 16 bit instruction!

Ok, I'm after instructions that work on both an 80386 in
protected mode (adding a x'66' is fine) and an 8086
(with or without the ignored x'66').

Basically the intention is to get the C compiler to generate
16-bit instructions that work in either environment, adding
an x'66' when it knows it needs to.

Is there enough instructions that will work this way?

> and many applications may use overrides, BIOS also do it.

Ok, the BIOS can be dedicated 80386 or dedicated 8086,
I don't care about that.

For now I only care about applications.

> > If I just have bb 44 33, will it work on both the 8086
> > and the 80386?

> Yes as long 386 is in RM or PM16,

No, I don't want that. I want PM32.

> but it become a five byte
> opcode in PM32:
> bb 44 33 22 11 MOV ebx.imm32
> or a four byte:
> 66 bb 44 33 MOV bx.imm16

This 4 byte one looks like it will work on both PM32
and RM16.

That's exactly what I'm after.

It will all work, right?

Thanks. Paul.

wolfgang kern

unread,
Jul 12, 2021, 11:31:29 PMJul 12
to
On 13.07.2021 03:41, muta...@gmail.com wrote:
> On Monday, July 12, 2021 at 11:30:40 AM UTC+10, muta...@gmail.com wrote:
>
>> 24-bit addressing, and a theoretical x86 processor with
>> 32-bit addressing. Come to think of it, it might be possible
>> to use an actual 80386 to do effective 16-bit segment
>> shifts. Or surely I can at least match the 80286 and do
>> (effective) 8-bit shifts. That would be a load of fun.
>> I guess it depends how many selectors I can define on
>> the 80386. I'll run everything in supervisor mode, so I
>> can use both GDT and LDT if that helps.
>
> I looked up Wikipedia:
>
> https://en.wikipedia.org/wiki/Global_Descriptor_Table
>
> and there are 8192*2 selectors available, meaning that
> 16-bit programs can access 1 GiB maximum.

> 4-bit segment shifts make sense when there is 1 MiB of
> memory available.

you totally misunderstood how descriptors work.
1. GDT/LDT are only in effect in PM and VM
2. a GDT entry can span a 4GB memory range.
3. a descriptor can be set for either 16 or 32 bit limit.
4. there are several descriptor types [code/data/TSS].
5. the count of entries got nothing to do with accessible range.
6. segment descriptors ARE NOT extended RM-segment registers,
they work complete different.
__
wolfgang

wolfgang kern

unread,
Jul 12, 2021, 11:39:35 PMJul 12
to
On 13.07.2021 05:28, muta...@gmail.com wrote:
...
>>> No, I won't use 32-bit instructions.

>> with the prefix it is still a valid (386) 16 bit instruction!

> Ok, I'm after instructions that work on both an 80386 in
> protected mode (adding a x'66' is fine) and an 8086
> (with or without the ignored x'66').

> Basically the intention is to get the C compiler to generate
> 16-bit instructions that work in either environment, adding
> an x'66' when it knows it needs to.
>
> Is there enough instructions that will work this way?

Not much.

>>> If I just have bb 44 33, will it work on both the 8086
>>> and the 80386?

>> Yes as long 386 is in RM or PM16,

> No, I don't want that. I want PM32.

>> but it become a five byte
>> opcode in PM32:
>> bb 44 33 22 11 MOV ebx.imm32
>> or a four byte:
>> 66 bb 44 33 MOV bx.imm16

> This 4 byte one looks like it will work on both PM32
> and RM16.
>
> That's exactly what I'm after.

> It will all work, right?

No, the 66 override reverses the default operand size.
so it makes 16 bit within PM32 and 32 bit within PM16/RM.
__
wolfgang

muta...@gmail.com

unread,
Jul 13, 2021, 12:13:33 AMJul 13
to
You misunderstand my intentions.

I will set up 16384 selectors, each pointing to consecutive
64k buffers. If the user has 1 GiB of memory, then we are
using EFFECTIVE 16-bit shifts. What that means is that
select 0 points to the first 64k. The second selector points
to address x'10000' for a length of 64k. The third selector
points to address x'20000' for a length of 64k.

So x'10000' * 16384 will eventually get us up to 1 GiB.

If we only had 512 MiB installed we would instead use
15 bit shifts to give better granularity.

So selector 0 points to address 0. Selector 1 points to
address x'8000' for a length of 64k. Selector 2 points to
address x'10000' for a length of 64k.

x'8000' * 16384 will eventually get us up to 512 MiB.

BFN. Paul.

Frank Kotler

unread,
Jul 13, 2021, 12:22:06 AMJul 13
to
On 07/12/2021 11:39 PM, wolfgang kern wrote:
> On 13.07.2021 05:28, muta...@gmail.com wrote:
...
>> It will all work, right?
>
> No, the 66 override reverses the default operand size.
> so it makes 16 bit within PM32 and 32 bit within PM16/RM.
> __
> wolfgang

Hi Wolfgang,

Hope you are well! I am good. Old and tired, but I feel pretty
good.

You may remember John Fine, who used to write for the assembly language
groups. He wrote up differences between real and protected mode and how
it affected segmentation... I think it might help Paul understand why
what he wants won't work.

Found it!

http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect

Best,
Frank

muta...@gmail.com

unread,
Jul 13, 2021, 12:34:22 AMJul 13
to
On Tuesday, July 13, 2021 at 1:39:35 PM UTC+10, wolfgang kern wrote:

> >> but it become a five byte
> >> opcode in PM32:
> >> bb 44 33 22 11 MOV ebx.imm32
> >> or a four byte:
> >> 66 bb 44 33 MOV bx.imm16
>
> > This 4 byte one looks like it will work on both PM32
> > and RM16.
> >
> > That's exactly what I'm after.
>
> > It will all work, right?

> No, the 66 override reverses the default operand size.
> so it makes 16 bit within PM32 and 32 bit within PM16/RM.

No. On a real 8086. The x'66' will be ignored. So a real
8086 and PM32 will both work, right?

BFN. Paul.

Frank Kotler

unread,
Jul 13, 2021, 12:35:00 AMJul 13
to
On 07/13/2021 12:11 AM, Frank Kotler wrote:
> http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect

Shift! 404

http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect

I think that's the same. Search for "john fine segment"

Best,
Frank

wolfgang kern

unread,
Jul 13, 2021, 1:16:03 AMJul 13
to
On 13.07.2021 06:34, muta...@gmail.com wrote:

>>>> but it become a five byte
>>>> opcode in PM32:
>>>> bb 44 33 22 11 MOV ebx.imm32
>>>> or a four byte:
>>>> 66 bb 44 33 MOV bx.imm16
>>
>>> This 4 byte one looks like it will work on both PM32
>>> and RM16.
>>>
>>> That's exactly what I'm after.
>>
>>> It will all work, right?
>
>> No, the 66 override reverses the default operand size.
>> so it makes 16 bit within PM32 and 32 bit within PM16/RM.
>
> No. On a real 8086. The x'66' will be ignored. So a real
> 8086 and PM32 will both work, right?

Not sure why you want a 16 bit operand in PM32. the 66 "may"
be ignored by 8086/88, could be an alias for other as well.

If you want to always use 16 bit code then use PM16.
only but all segment registers are different in PM16.

AND with PM16 you can use 64KB blocks anywhere within 4GB.
__
wolfgang

wolfgang kern

unread,
Jul 13, 2021, 1:25:36 AMJul 13
to
On 13.07.2021 06:11, Frank Kotler wrote:

> Hi Wolfgang,

> Hope you are well! I am good. Old and tired, but I feel pretty
> good.

good to see you're in good shape, 77 isn't that old :)
I'm almost fine thanks to my oxygen generator...
and I will get a new set of eyes, already scheduled for next week.
hope I can avoid the white crane after scientist gadget around on me.
__
wolfgang

wolfgang kern

unread,
Jul 13, 2021, 1:56:37 AMJul 13
to
your calculation sucks :)

and any valid selector can start at any address within 4GB.
(but any means aligned to bounds here) the selector number
got nothing to do with the address it is pointing to !

selector 0 is invalid
first usable selector number is 8 which is also the offset
in the GDTable (numbers 9..15 are same but restricted PL)
next usable is 0x10 and so on

but you may need at least two selectors per 64K block, one
for code and one for data (some folks have third for stack)

So even if you fill the GDT with max possible entries:
last will be 0xFFF8, that means (65536-8)/16 = 4095 pairs

hope you realize that your GDT use a full 64K block by itself.
my GDT solution has only 32 entries, including 6 VBIOS selectors.
__
wolfgang

muta...@gmail.com

unread,
Jul 13, 2021, 2:48:19 AMJul 13
to
On Tuesday, July 13, 2021 at 11:41:39 AM UTC+10, muta...@gmail.com wrote:

> 13 bit shifts are the last purposeful one, to give 512 MiB,
> and after that, you may as well just use 16 bit shifts.
>
> Given the restrictions of the 80386.

Actually, we always want to max out the selectors. It is
the granularity that needs to give.

Until we're down to 4-bit shifts as there is no point
going below that. So we can reduce the number of
selectors if we're down to that much memory. Which
happens when we have 256k of memory.

So only if we have less than 256k of memory would
we be in a realistic position to reduce the number of
selectors.

BFN. Paul.

muta...@gmail.com

unread,
Jul 13, 2021, 3:48:00 AMJul 13
to
On Tuesday, July 13, 2021 at 3:56:37 PM UTC+10, wolfgang kern wrote:

> and any valid selector can start at any address within 4GB.
> (but any means aligned to bounds here) the selector number
> got nothing to do with the address it is pointing to !

There is no point going above 1 GB when I am
doing 16-bit programming.

> selector 0 is invalid
> first usable selector number is 8 which is also the offset
> in the GDTable (numbers 9..15 are same but restricted PL)
> next usable is 0x10 and so on

Oh, ok. That's fine. I can live with that. I just need to
make the rules for huge pointers that when you
cross the 64k boundary, the segment gets incremented
by 0x10 or whatever.

> but you may need at least two selectors per 64K block, one
> for code and one for data (some folks have third for stack)

Is there a reason I can't just have on selector for each
64k block?

> So even if you fill the GDT with max possible entries:
> last will be 0xFFF8, that means (65536-8)/16 = 4095 pairs

Ok.

> hope you realize that your GDT use a full 64K block by itself.

That's fine. That's peanuts for what I'm doing.

BFN. Paul.

muta...@gmail.com

unread,
Jul 13, 2021, 4:05:18 AMJul 13
to
On Tuesday, July 13, 2021 at 3:16:03 PM UTC+10, wolfgang kern wrote:

> > No. On a real 8086. The x'66' will be ignored. So a real
> > 8086 and PM32 will both work, right?

> Not sure why you want a 16 bit operand in PM32.

I'm trying to do 16-bit programming with a bigger
EFFECTIVE shift than 4 bits.

> the 66 "may"
> be ignored by 8086/88, could be an alias for other as well.

How can it be an alias on an 8086?

> If you want to always use 16 bit code then use PM16.

You know what? That's a great idea! I didn't think of the
murky PM16 world.

> only but all segment registers are different in PM16.

They are selectors. But I can make them look like
segment registers with a particular shift.

> AND with PM16 you can use 64KB blocks anywhere within 4GB.

I assume I still have 16384 selectors, so I will still be limited
to a total of 1 GiB.

BFN. Paul.

wolfgang kern

unread,
Jul 13, 2021, 11:30:08 AMJul 13
to
On 13.07.2021 10:05, muta...@gmail.com wrote:
> On Tuesday, July 13, 2021 at 3:16:03 PM UTC+10, wolfgang kern wrote:
>
>>> No. On a real 8086. The x'66' will be ignored. So a real
>>> 8086 and PM32 will both work, right?
>
>> Not sure why you want a 16 bit operand in PM32.
>
> I'm trying to do 16-bit programming with a bigger
> EFFECTIVE shift than 4 bits.
>
>> the 66 "may"
>> be ignored by 8086/88, could be an alias for other as well.
>
> How can it be an alias on an 8086?

would need to remove several layers of dust from my collection
of Intel books to check if it is mentioned at all.
the code 0x60..group could be alias for 0x70.. or 0x50..
similar to 0x82 which is (was until 2000) an alias for 0x80

>> If you want to always use 16 bit code then use PM16.

> You know what? That's a great idea! I didn't think of the
> murky PM16 world.

>> only but all segment registers are different in PM16.

> They are selectors. But I can make them look like
> segment registers with a particular shift.

Shift wont do what you want, look at the layout of descriptors.

>> AND with PM16 you can use 64KB blocks anywhere within 4GB.

> I assume I still have 16384 selectors, so I will still be limited
> to a total of 1 GiB.

No you can have maximal 8191 selectors and you'll need CS-DS pairs.
so 4095*64KB ~ 512MB.
But try what older EMM did: have only a few variable selectors
and put your address bits extension right into the start field.

you could even define this start-addresses as C-variables..
oh, did I say this yet? :)
__
wolfgang

wolfgang kern

unread,
Jul 13, 2021, 11:36:08 AMJul 13
to
On 13.07.2021 09:47, muta...@gmail.com wrote:
...
>> but you may need at least two selectors per 64K block, one
>> for code and one for data (some folks have third for stack)

> Is there a reason I can't just have on selector for each
> 64k block?

you can have that, but either 64K code only or 64K data only.
but application code usually need both within one 64 K block.
__
wolfgang

muta...@gmail.com

unread,
Jul 13, 2021, 6:04:02 PMJul 13
to
On Wednesday, July 14, 2021 at 1:36:08 AM UTC+10, wolfgang kern wrote:

> >> but you may need at least two selectors per 64K block, one
> >> for code and one for data (some folks have third for stack)
>
> > Is there a reason I can't just have on selector for each
> > 64k block?

> you can have that, but either 64K code only or 64K data only.
> but application code usually need both within one 64 K block.

I see. That's not very nice. But given that the selectors
don't start from a neat 0 anyway, I guess pairs are OK.

The assembler generated from C code in the huge memory
model will not care. It just needs to know how much to
increment the "segment" (selector) whenever it crosses
a 64k boundary.

BFN. Paul.

muta...@gmail.com

unread,
Jul 13, 2021, 6:32:53 PMJul 13
to
On Wednesday, July 14, 2021 at 1:30:08 AM UTC+10, wolfgang kern wrote:

> > You know what? That's a great idea! I didn't think of the
> > murky PM16 world.
>
> >> only but all segment registers are different in PM16.
>
> > They are selectors. But I can make them look like
> > segment registers with a particular shift.

> Shift wont do what you want, look at the layout of descriptors.

I don't know what you are talking about.

Let me just restate my goal.

With a fixed GDT/LDT maxed out, and (for argument's
sake) a machine with 1 GiB of memory, I want the
equivalent of an 8086 (4-bit shift) except I want
EFFECTIVE 16-bit shifts. What that means is that
every time you do the appropriate adjustment of
the "segment" register, you get to the next 64k
boundary. As opposed to when you do the appropriate
adjustment of the segment register you get to the
next 16-byte boundary.

On the 8086, the adjustment you do on the segment
register to get to the next boundary is to add 1.
Note that applications on the 8086 do not normally
adjust the segment register in that way at all, at
least not in C-generated code. The only place where
C-generated code does this is when using the rare
huge memory model. Or the equivalent of using
huge pointers.

I have looked at the code generated by Watcom's C
compiler for the huge memory model, and when you
add a 32-bit value to a pointer in the huge memory
model, it calls a routine to do the addition.

I have my own C library. I don't use Watcom's. I only
use their compiler. So currently I don't actually support
huge memory model. I need to write that routine.

How do I write that routine?

Let's say that I have an address of 4000:0000 and I
have been asked to add 64k to it. I know I need to
jump up to the next 64k boundary. Which would
mean adding x'1000' to the above address to get
5000:0000.

I could hardcode that value x'1000'. Actually the
pointers will all be normalized, so I am looking for
multiples of 16, ie 1. I could hardcode both of those
values - 16 and 1. ie divide by 16 and add 1 times
that amount.

Rather than hardcode those numbers, I want to get
them from global variables in my C program.

And I want those global variables to be set by doing a
PDOS/86 INT 21H call.

On an 80286 or 80386, if I linearly map the memory
with selectors, the above 16-bit application will continue
to run, completely unchanged. The only thing that needs
to change is the 16 and 1. On an 80386 the appropriate
selectors are all spaced 0x20 apart, so the 1 will become
0x20. Regardless of whether I am using a 16-bit shift or
a 15-bit shift etc. The only thing that changes is the
divisor. In the case of a 16-bit shift, the divisor will be
0x10000. In the case of a 15-bit shift, the divisor will be
0x8000. ie for every multiple of 0x8000 that you need
to increase the boundary that the segment is pointing
to, adjust the segment register (selector) by a standard
amount (0x20).

> >> AND with PM16 you can use 64KB blocks anywhere within 4GB.
>
> > I assume I still have 16384 selectors, so I will still be limited
> > to a total of 1 GiB.

> No you can have maximal 8191 selectors and you'll need CS-DS pairs.
> so 4095*64KB ~ 512MB.

That's 256 MiB.

And that's a reason to get x'66' working on the 80386 and
8086, so that I can have 1 GiB instead. Although the x'66'
may add say 5% to the program's footprint. And impact
the 8086 too.

> But try what older EMM did: have only a few variable selectors
> and put your address bits extension right into the start field.
>
> you could even define this start-addresses as C-variables..
> oh, did I say this yet? :)

I don't understand this proposal, but will it allow 8086
(with ignored x'66') huge memory model programs to
run unchanged on the 80386?

Thanks. Paul.

Rod Pemberton

unread,
Jul 13, 2021, 10:45:08 PMJul 13
to
On Tue, 13 Jul 2021 17:30:05 +0200
wolfgang kern <now...@never.at> wrote:

> On 13.07.2021 10:05, muta...@gmail.com wrote:

> > How can it be an alias on an 8086?
>
> would need to remove several layers of dust from my collection
> of Intel books to check if it is mentioned at all.
> the code 0x60..group could be alias for 0x70.. or 0x50..
> similar to 0x82 which is (was until 2000) an alias for 0x80
>

That would be interesting to know for sure, as I've not noticed that
set of aliases mentioned anywhere.

--
The Chinese have such difficulty with English ... The word is not
"reunification" but "revenge".

wolfgang kern

unread,
Jul 14, 2021, 1:17:24 AMJul 14
to
On 14.07.2021 05:46, Rod Pemberton wrote:

>>> How can it be an alias on an 8086?

>> would need to remove several layers of dust from my collection
>> of Intel books to check if it is mentioned at all.
>> the code 0x60..group could be alias for 0x70.. or 0x50..
>> similar to 0x82 which is (was until 2000) an alias for 0x80

> That would be interesting to know for sure, as I've not noticed that
> set of aliases mentioned anywhere.

They were never documented by the vendors, but back then (1995) when I
wrote my value tracking code analyzer as part of disassembler and
debugger I tested all code variants and found _some_ alias on +486.
sorry for I lost my handwritten notes on it due a water disaster.
Meanwhile sandpile.org show new usage of many opcodes, but there are
still a lot doubles left undocumented especially in the 0Fxx groups.
__
wolfgang

wolfgang kern

unread,
Jul 14, 2021, 1:28:54 AMJul 14
to
On 14.07.2021 00:32, muta...@gmail.com wrote:

>>> I assume I still have 16384 selectors, so I will still be limited
>>> to a total of 1 GiB.
>> No you can have maximal 8191 selectors and you'll need CS-DS pairs.

>> so 4095*64KB ~ 512MB.
> That's 256 MiB.

Yeah, test passed.

> And that's a reason to get x'66' working on the 80386 and
> 8086, so that I can have 1 GiB instead. Although the x'66'
> may add say 5% to the program's footprint. And impact
> the 8086 too.
>
>> But try what older EMM did: have only a few variable selectors
>> and put your address bits extension right into the start field.
>>
>> you could even define this start-addresses as C-variables..
>> oh, did I say this yet? :)
>
> I don't understand this proposal, but will it allow 8086
> (with ignored x'66') huge memory model programs to
> run unchanged on the 80386?

I try yet to show you how descriptors look like:
(copy as text, undo linewrap, rename to HTML and watch)
__
wolfgang

<!--
DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"
"http://www.w3.org/TR/html4/strict.dtd"
-->
<html>
<head><title>x86descriptors</title></head>

<!--
translated from page 366 of the "Holy Book of KESYS" Jan.1999,
added x86-64 descriptors 2004.
Author: Wolfgang Kern, Vienna Austria (LEOC, KESYS-development)
-->

<body bgcolor="#c0c0c0" text="#000000">
<basefont face="Lucida Console">

<p style="position:absolute; left:70pt; top:05pt; font-size:12pt;">
<u><b>x86 PM16/32 Descriptors</b></u><p>

<table style="position:absolute; left:70pt; top:40pt; font-size:8pt;"
border="2"; frame="box"; rules="all"; bgcolor="#FFFFFF"; height="200";
width="380"; cellspacing="0"; cellpadding="0"; bordercolor="#000000"; >
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr align="center"; style="font-size:7pt;" > <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>

<td style="font-size:10pt"; align="left" valign="top" rowspan="9"> <b><u>
DATA [GDT,LDT]</b></u><pre>
G 4Kb granular limit
B 32-bit stack
0 MBZ
P present
E expand down [stack]
W writable
A accessed
[93]</pre></td></tr>

<tr align="center"><td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center"><td>6</td>
<td><b>G</td><td><b>B</td><td><b>0</td><td>x</td>
<td colspan="4">LIM 16..19</td></tr>
<tr align="center"><td>5</td><td>P</td><td colspan="2">DPL</td>
<td colspan="2" style="border-width:medium; border-color:#000000;
border-style:double;">
<b>1 0</td> <td><b>E</td> <td><b>W</td> <td><b>A</td></tr>
<tr align="center"><td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td></tr>
<tr align="center"><td>3</td></tr>
<tr align="center"><td>2</td></tr>
<tr align="center"><td>1</td> <td rowspan="2"; colspan="8">LIMIT
0..15</td></tr>
<tr align="center"><td>0</td></tr>
</table>


<table style="position:absolute; left:380pt; top:40pt;
font-size:8pt;"border="2"; frame="box"; rules="all"; bgcolor="#FFFFFF";
height="200"; width="380"; cellspacing="0"; cellpadding="0";
bordercolor="#808080"; >
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
CODE [GDT,LDT]</u></b><pre>
G 4Kb granular
B 32-bit
0 MBZ
P present
C confirming
R readable
A accessed
[9b]</pre></td></tr>
<tr align="center"> <td>7</td> <td colspan="8" align="center">BASE
24..31</td> </tr>
<tr align="center"> <td>6</td> <td><b>G</td> <td><b>B</td> <td><b>0</td>
<td>x</td>
<td nowrap colspan="4">LIM 16..19</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td> <td
colspan="2"
style="border-width:medium;border-color:#000000; border-style:double;
padding:0px;">
<b>1 1</td> <td><b>C</td> <td><b>R</td> <td><b>A</td> </tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td> </tr>
<tr align="center"> <td>3</td></tr>
<tr align="center"> <td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td> </tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:70pt; top:200pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
INT-GATE [IDT]</b></u><pre>
T 32-bit
disables IRQ,
TRAP and NT cleared
until IRET
[86/8e]</pre></td></tr>
<tr align="center"> <td>7</td>
<td colspan="8" rowspan="2" align="center" valign="middle">Offset
16..31</td></tr>
<tr align="center"> <td>6</td></tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double"><b>0</td>
<td><b>T</td>
<td colspan="3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 1 0</td></tr>
<tr align="center"> <td>4</td> <td colspan="8">reserved</td> </tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td> </tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td> <td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">Offset
0..15</td> </tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:380pt; top:200pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16"span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt;"align="center";>
<td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
INT-TRAP [IDT]</b></u><pre>
T 32-bit
IRQ-status unchanged,
TRAP and NT cleared
until IRET
[87/8f]</pre></td></tr>
<tr align="center"> <td>7</td> <td rowspan="2" colspan="8">Offset
16..31</td></tr>
<tr align="center"> <td>6</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double"><b>0</td>
<td><b>T</td>
<td colspan="3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 1 1</td></tr>
<tr align="center"> <td>4</td> <td colspan="8">reserved</td></tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td></tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td><td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">Offset
0..15</td> </tr>
<tr align="center"> <td>0</td></tr> </table>


<table style="position:absolute; left:70pt; top:360pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
TASK-switch [GDT]</u></b><pre>
G 4Kb granular limit
P present
T 32-bit
BS task is busy
[81/89]</pre></td></tr>
<tr align="center"> <td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center">
<td>6</td><td><b>G</td><td><b>0</td><td><b>0</td><td>x</td>
<td nowrap colspan="4">LIM 16..19</td></tr>
<tr align="center"><td>5</td><td>P</td><td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double"><b>0</td>
<td><b>T</td>
<td colspan= "3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 BS 1</td></tr>
<tr align="center"><td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td> </tr>
<tr align="center"><td>3</td></tr>
<tr align="center"><td>2</td></tr>
<tr align="center"><td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td> </tr>
<tr align="center"><td>0</td></tr></table>


<table style="position:absolute; left:380pt; top:360pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
LDT [GDT]</b></u> <pre>
0 and reserved: MBZ
[82]</pre></td></tr>
<tr align="center"> <td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center"> <td>6</td> <td><b>G</td> <td><b>x</td>
<td><b>x</td> <td>x</td>
<td colspan="4">LIM 16..19</td></tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5" style="border-width:medium; border-color:#000000;
border-style:double;">
<b>0 0 0 1 0</td></tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td></tr>
<tr> <td>3</td></tr><tr><td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td></tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:70pt; top:520pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" align="middle" span=8>
<col width="200" align="left" valign="top">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
Call-GATE [GDT,LDT]</b></u><pre>
T 32-bit
L LDT (else GDT)
Dwords copied from
callers stack.
[84/8c]</pre></td></tr>
<tr align="center"> <td>7</td> <td rowspan="2" colspan="8">Offset
16..31</td></tr>
<tr align="center"> <td>6</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td style="border-color:#000000;border-width:medium;border-style:double">
<b>0</td> <td><b>T</td>
<td colspan= "3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 0 0</td></tr>
<tr align="center"> <td>4</td> <td colspan="4">0 0 0 0</td>
<td colspan="4">Dwords </td> </tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td> </tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td> <td>L</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">Offset
0..15</td></tr>
<tr align="center"><td>0</td></tr></table>


<table style="position:absolute; left:380pt; top:520pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" align="middle" span=8>
<col width="200" align="left" valign="top">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
TASK-GATE [GDT,IDT,LDT]</b></u><pre>
T: 32-bit
[85/8d]</pre></td></tr>
<tr align="center"> <td>7</td> <td rowspan="2" colspan="8">reserved</td>
<tr align="center"> <td>6</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double">
<b>0</td>
<td><b>T</td>
<td colspan= "3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 0 1</td> </tr>

<tr align="center"> <td>4</td> <td colspan="8">reserved</td> </tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td> </tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td> <td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">reserved</td>
</tr>
<tr align="center"> <td>0</td></tr></table>

<pre><p style="position:absolute; left:70pt; top:700pt; font-size:10pt;"><b>
comments after uppercase character mean "if bit SET"
* and x mean don't care (can be used by OS as flags)
MBZ Must Be Zero
reserved bits should be written with zeros.
</b></p></pre>




<p style="position:absolute; left:70pt; top:770pt; font-size:12pt;"><b><u>
64-bit descriptors</u></b> (IDT entries, LDT and TSS descriptors are
expanded to 64 bits)</p>

<table style="position:absolute; left:70pt; top:800pt;
font-size:8pt;"border="2"; frame="box" rules="all" bgcolor="#FFFFFF"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"> <b><u>
DATA [GDT,LDT]</b></u>[93]
<pre> G 4Kb granular limit
B 32-bit stack
P present
E expand down [stack]
W writable
A accessed

ALL except P,DPL and "10"
is ignored in LONG Mode
but see FS and GS.</pre></td></tr>
<tr align="center"> <td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center">
<td>6</td><td><b>G</td><td><b>B</td><td><b>0</td><td>x</td>
<td colspan="4">LIM 16..19</td></tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="2" style="border-width:medium; border-color:#000000;
border-style:double;">
<b>1 0</td> <td><b>E</td> <td><b>W</td> <td><b>A</td></tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td></tr>
<tr> <td>3</td></tr><tr><td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td></tr>
<tr> <td>0</td></tr></table>


<table style="position:absolute; left:380pt; top:800pt; font-size:8pt;"
border="2" frame="box" rules="all" bgcolor="#FFFFFF" height="200"
width="380" cellspacing="0" cellpadding="0" bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
CODE [GDT,LDT]</u></b>[9b/98]
<pre> G 4Kb granular *
B 32-bit/16bit *
L 64-bit/compatible mode
P present
C confirming
R readable *
A accessed *

base,limit and "*"
ignored in 64bit mode</pre></td></tr>

<tr align="center"> <td>7</td> <td colspan="8" align="center">BASE
24..31</td> </tr>
<tr align="center"> <td>6</td> <td><b>G</td> <td><b>B</td> <td><b>L</td>
<td>x</td>
<td nowrap colspan="4">LIM 16..19</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td> <td
colspan="2"
style="border-width:medium;border-color:#000000; border-style:double;
padding:0px;">
<b>1 1</td> <td><b>C</td> <td><b>R</td> <td><b>A</td> </tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td> </tr>
<tr align="center"> <td>3</td></tr>
<tr align="center"> <td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td> </tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:70pt; top:960pt;
font-size:8pt;"border="4" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="550" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="16">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>

<td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
INT-GATE/INT-TRAP [IDT]</b></u><pre>
[0e/0f]
V GATE/trap

(GATE disables IRQ.
TRAP-bit and NT cleared
until IRET on both)

IST: Interrupt Service Index
</pre></td></tr>
<tr align="center"> <td>E</td> <td colspan="16"
rowspan="2">ignored</td></tr>
<tr align="center"> <td>C</td> </tr>
<tr align="center"> <td>A</td> <td colspan="16" rowspan="2"> Offset
32..63</td></tr>
<tr align="center"> <td>8</td> </td> </tr>
<tr align="center"> <td>6</td> <td colspan="16">Offset 16..31</td> </tr>
<tr align="center"> <td>4</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 0 1 1 V</td> <td colspan="5">reserved</td><td
colspan="3">IST</td></tr>
<tr align="center"><td>2</td> <td colspan="13">SEGMENT-SELECTOR</td>
<td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"><td>0</td> <td colspan="16">Offset 0..15</td></tr>
</table>


<table style="position:absolute; left:70pt; top:1120pt; font-size:8pt;"
border="4" frame="box" rules="all" bgcolor="#ffffff" height="200"
width="550" cellspacing="0" cellpadding="0" bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="16">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td>#</td>

<td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
CALLGATE -----------[GDT/LDT]</b></u>[0C]<pre>
L LDT (else GDT)

</pre></td></tr>

<tr align="center"> <td>E</td> <td colspan="16">reserved</td></tr>
<tr align="center"> <td>C</td> <td colspan="3">* * *</td> <td
colspan="5">0 0 0 0 0</td>
<td colspan="8">reserved</td></tr>
<tr align="center"> <td>A</td> <td colspan="16" rowspan="2"> Offset
32..63</td></tr>
<tr align="center"> <td>8</td> </td> </tr>
<tr align="center"> <td>6</td> <td colspan="16">Offset 16..31</td> </tr>
<tr align="center"> <td>4</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 1 1 0 0</td> <td colspan="8">reserved</td></tr>
<tr align="center"> <td>2</td> <td colspan="13">SEGMENT-SELECTOR</td>
<td><b>L</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>0</td> <td colspan="16">Offset 0..15</td></tr>
</table>



<table style="position:absolute; left:70pt; top:1280pt;
font-size:8pt;"border="4" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="550" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="16">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>

<td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
LDT/TSS---------[GDT]</b></u>[02/09]<pre>
TYPE:
0010 LDT
1001 TSS
1011 busy TSS
G: 4K granular limit
</pre></td></tr>
<tr align="center"> <td>E</td> <td colspan="16">reserved</td></tr>
<tr align="center"> <td>C</td> <td colspan="3"> * * * </td><td
colspan="5">0 0 0 0 0</td>
<td colspan="8">reserved</td></tr>
<tr align="center"> <td>A</td> <td colspan="16" rowspan="2"> BASE
32..63</td></tr>
<tr align="center"> <td>8</td> </td> </tr>
<tr align="center"> <td>6</td> <td colspan="8">BASE 24..31</td>
<td colspan="4"><b> G x x * </td> <td colspan="4">LIM 16..19</td> </tr>
<tr align="center"> <td>4</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 T Y P E</td> <td colspan="8">BASE 16..23</td></tr>
<tr align="center"><td>2</td> <td colspan="16">BASE 0..15</td> </tr>
<tr align="center"><td>0</td> <td colspan="16">LIMIT 0..15</td></tr></table>

<p style="position:absolute; left:40pt; top:1500pt;">eop

</body>
</html>

muta...@gmail.com

unread,
Jul 14, 2021, 6:59:14 AMJul 14
to
On Wednesday, July 14, 2021 at 3:28:54 PM UTC+10, wolfgang kern wrote:

> > That's 256 MiB.

> > And that's a reason to get x'66' working on the 80386 and
> > 8086, so that I can have 1 GiB instead. Although the x'66'

Sorry, I meant 512 MiB because as you said, I need
separate cs and ds. But it's still double, and running
in PM32 instead of PM16 is cool regardless.

> >> But try what older EMM did: have only a few variable selectors
> >> and put your address bits extension right into the start field.
> >>
> >> you could even define this start-addresses as C-variables..
> >> oh, did I say this yet? :)
> >
> > I don't understand this proposal, but will it allow 8086
> > (with ignored x'66') huge memory model programs to
> > run unchanged on the 80386?

> I try yet to show you how descriptors look like:
> (copy as text, undo linewrap, rename to HTML and watch)

I have no idea what point you are trying to make. If you
write in German, I'll cut and paste into google translate
and hopefully see what you are talking about.

BFN. Paul.

wolfgang kern

unread,
Jul 14, 2021, 9:28:58 AMJul 14
to
On 14.07.2021 12:59, muta...@gmail.com wrote:
...
>> I try yet to show you how descriptors look like:
>> (copy as text, undo linewrap, rename to HTML and watch)

> I have no idea what point you are trying to make. If you
> write in German, I'll cut and paste into google translate
> and hopefully see what you are talking about.

I appended HTML-source (from my teachers help pages) to show you the
layout of available segment-selectors, no German sentence in there.

So you can see that bit shifts aren't a good choice for your task.
__
wolfgang





muta...@gmail.com

unread,
Jul 14, 2021, 2:18:59 PMJul 14
to
On Wednesday, July 14, 2021 at 11:28:58 PM UTC+10, wolfgang kern wrote:

> >> I try yet to show you how descriptors look like:
> >> (copy as text, undo linewrap, rename to HTML and watch)
>
> > I have no idea what point you are trying to make. If you
> > write in German, I'll cut and paste into google translate
> > and hopefully see what you are talking about.

> I appended HTML-source (from my teachers help pages) to show you the
> layout of available segment-selectors, no German sentence in there.

I didn't say there was German in there. I said that I would
be happy for you to make your point in German and I will
translate it.

> So you can see that bit shifts aren't a good choice for your task.

Again, I have no idea what you are talking about.

At this point I don't even know if you understand my proposal.

However, I have two more pieces of information:

1. I mentioned doing a divide followed by a multiply. Actually
that can be combined into a single divide so long as the
boundary is a multiple of 2 * GDT size. Which may tie you
down to a particular processor implementation. Probaby
better to keep it separate.

2. Since you aren't sure whether the x'66' is a no-op on the
8086, can you suggest a bit of assembler that can be
entered into debug.com on MSDOS so that I can ask the
guy with an XT to try it out and provide the result? I don't
want to recompile all of my C programs with x'66' littered
throughout them only to find out that x'66' makes the
8086 explode because it is an alias of HCF.

Thanks. Paul.

wolfgang kern

unread,
Jul 14, 2021, 4:03:56 PMJul 14
to
On 14.07.2021 20:18, muta...@gmail.com wrote:

[about...]
my English is good enough to understand what you try to achieve.
but it seems you haven't got how segment descriptors are built
and how segment selectors interact with the GDT.

you sure can map many 64KB physical consecutive blocks to also
consecutive segment descriptors i.e.:

0008 base 0000_0000 code limit ffff
0010 base 0000_0000 data
0018 base 0001_0000
0020 base 0001_0000

but the selector must be calculated, simple shift wont help here.
so your idea to make it an easy solution fell flat :)
__
wolfgang

muta...@gmail.com

unread,
Jul 14, 2021, 6:18:41 PMJul 14
to
On Thursday, July 15, 2021 at 6:03:56 AM UTC+10, wolfgang kern wrote:

> my English is good enough to understand what you try to achieve.

It's not a question of good English skills. Native English
speakers often don't understand what I am proposing
either.

> but it seems you haven't got how segment descriptors are built
> and how segment selectors interact with the GDT.

Note that I have written an 80386 operating system, so
at some point I do get things, at least partially, but then
I forget them again (perhaps some mental deficiency),
which is why you had to make multiple corrections.

> you sure can map many 64KB physical consecutive blocks to also
> consecutive segment descriptors i.e.:
>
> 0008 base 0000_0000 code limit ffff
> 0010 base 0000_0000 data
> 0018 base 0001_0000
> 0020 base 0001_0000
>
> but the selector must be calculated, simple shift wont help here.
> so your idea to make it an easy solution fell flat :)

I have told you at least twice that my calculation will be
a division and a multiplication. Not a shift.

And the calculation doesn't start from a random value,
it starts from an existing segment/selector.

If I currently have a data (ie not cs) seg:off of 3000:0000,
which on the 80386 with effective 16-bit shifts would be
x'3000' / 64k = 3 * x'10' (distance between data selectors)
+ x'10' (starting point of data selectors, not code selectors)
= x'40', and then the program adds 64k to this pointer, and
so the new target is 4000:0000 there will be a non-hardcoded
divide and multiply, in our case because we're doing 16-bit
effective shifts the divisor is 64k, and because we're using
an 80386 not some theoretical 99986 processor (with a
distance of say 5 bytes), the distance between data selectors
is x'10' so the multiplier is x'10' then you have:

x'40' + 1 * x'10' = x'50'

which will correctly get you to the next selector.

On an 8086 the divisor (provided by the OS to the application
at startup time, exactly the same way when running on an
80386) will always be 16 and the multiplier will always
be 1, so to jump 64k starting at 3000:0000 you will have

64k / 16 = 4096 * 1 = 4096 = x'1000' added to x'3000' and
you have x'4000' so you end up at 4000:0000.

Which bit won't work?

I'm sure plenty of people won't like the design, but will it
work or not?

If it works then the next question is - does the 8086 ignore
x'66' and advance to the next byte? Or can x'66' be trapped
and ignored? Or if it is an alias, what effect does it have,
and can memory be structures to keep that x'66' from
harming anything important?

If the answer to the above is "ok it will work, but the trapping
will make it 5% slower", that is within my 10% flexibility, so
the next question is:

What 16-bit instructions are available on the 80386 if I code
x'66' or x'67' wherever necessary?

And finally, with the above instructions, can a Turing
machine be constructed on the 8086 and if so, how
much slower would a C program be compared to if
I had access to the full range of 8086 instructions?

Since this is my own C applications we are talking
about, I probably don't care (even if the year is 1988)
if my program runs 3 times slower than the machine
code generated by the same C compiler if it has
access to the full range of 8086 instructions rather
than just the subset that work on 80386. I'm more
interested in producing a future-proofed 16-bit
executable than being "first and best" in 1988.

BFN. Paul.

muta...@gmail.com

unread,
Jul 14, 2021, 8:36:30 PMJul 14
to
On Thursday, July 15, 2021 at 8:18:41 AM UTC+10, muta...@gmail.com wrote:

> If it works then the next question is - does the 8086 ignore
> x'66' and advance to the next byte? Or can x'66' be trapped
> and ignored? Or if it is an alias, what effect does it have,
> and can memory be structures to keep that x'66' from
> harming anything important?

And same question for a real 80286. There was no
32-bit "other" to worry about.

BFN. Paul.

wolfgang kern

unread,
Jul 15, 2021, 3:02:54 AMJul 15
to
On 15.07.2021 02:36, muta...@gmail.com wrote:
> On Thursday, July 15, 2021 at 8:18:41 AM UTC+10, muta...@gmail.com wrote:
>
>> If it works then the next question is - does the 8086 ignore
>> x'66' and advance to the next byte? Or can x'66' be trapped
>> and ignored? Or if it is an alias, what effect does it have,
>> and can memory be structures to keep that x'66' from
>> harming anything important?

trap seems impossible (would mean all code run single step debug),
and who knows if it acts as a NOP on all 8086 ?

> And same question for a real 80286. There was no
> 32-bit "other" to worry about.

some brands may ignore it while others may have it as an alias,
so you better forget about using 66 in both worlds.

[your segment descriptor calculation...]
you still haven't realized that base bits 0..23 and 24..31 are
NOT adjective. There are two bytes in between.
You would have known this if you already wrote an 386 OS.
__
wolfgang

muta...@gmail.com

unread,
Jul 15, 2021, 4:45:04 AMJul 15
to
On Thursday, July 15, 2021 at 5:02:54 PM UTC+10, wolfgang kern wrote:

> >> If it works then the next question is - does the 8086 ignore
> >> x'66' and advance to the next byte? Or can x'66' be trapped
> >> and ignored? Or if it is an alias, what effect does it have,
> >> and can memory be structures to keep that x'66' from
> >> harming anything important?

> trap seems impossible (would mean all code run single step debug),
> and who knows if it acts as a NOP on all 8086 ?

There are different types of 8086???

> [your segment descriptor calculation...]
> you still haven't realized that base bits 0..23 and 24..31 are
> NOT adjective. There are two bytes in between.
> You would have known this if you already wrote an 386 OS.

Why do I care if the base bits are not adjacent?

BFN. Paul.

muta...@gmail.com

unread,
Jul 15, 2021, 6:40:31 AMJul 15
to
On Thursday, July 15, 2021 at 6:45:04 PM UTC+10, muta...@gmail.com wrote:

> > you still haven't realized that base bits 0..23 and 24..31 are
> > NOT adjective. There are two bytes in between.
> > You would have known this if you already wrote an 386 OS.

> Why do I care if the base bits are not adjacent?

Just to be clear.

When PDOS/86 is running on an 80386, at startup time, it will
set all 16384 selectors to map the first 512 MiB of memory,
and then never ever change them.

When PDOS/86 is running on an 8086, at startup time, it will
do nothing.

When PDOS/86 is running in those two different environments,
when it loads a 16-bit APPLICATION into memory, it will adjust all
the segments (a normal part of the load process), but the
adjustment will be completely different for these 2 different CPUs.

tiny, small, medium, compact and large memory model programs
do not ever manipulation segment registers, they only ever load
them. (at least the applications supported by PDOS/86).

Only huge memory model programs manipulate the segment
registers. The PDOS/86 rules for huge memory models doing
segment adjustment when they have a value of e.g. 64k added
to them, and the segment needs to adjust by whatever the
representation of 64k is, is to do this by doing a divide, and
then a multiply. Both the value to divide by, and the value to
multiply by, are required to be obtained from the operating
system when the huge memory model application begins
execution. The values from PDOS/86 when running on an
8086 will always be the same. The values from PDOS/86
when running on an 80386 will always be very different
from when PDOS/86 is running on an 8086.

Two separate PDOS/86 systems, both running on an
80386 with 512 MiB memory will have the exact same
values for the divide and multiply.

But if one PDOS/86 is running on an 80386 with 256 MiB
of memory, it will have a different divisor. The smaller
divisor allows finer granularity, for the same reason that
Intel made the 8086 shift 4 bits instead of 16.

The multiplier on the 80386 will always be 0x10, because
that is the distance between two GDT data (same thing
applies to code) entries.

As far as I can tell, based on your mention of base values
not being adjacent in the GDT entry, you do not understand
my proposal.

I'm not saying that's your fault. I'm not very good at
explaining things, and it's only when I have working code
and can say "look at that", that people understand.

Although sometimes I actually show it working and people
still say it won't work.

BFN. Paul.

Joe Monk

unread,
Jul 15, 2021, 8:39:27 AMJul 15
to

> When PDOS/86 is running on an 80386, at startup time, it will
> set all 16384 selectors to map the first 512 MiB of memory,
> and then never ever change them.

First off, there arent 16384 selectors on an 80386.

A selector is only 13 bits wide. 2^13 = 8192.

Bits 0-1 of a selector are the privilege level, and bit 2 specifies which descriptor table (GDT,LDT) you are indexing into.

https://pdos.csail.mit.edu/6.828/2005/readings/i386/c05.htm

Joe

Joe Monk

unread,
Jul 15, 2021, 9:21:27 AMJul 15
to

> When PDOS/86 is running on an 80386, at startup time, it will
> set all 16384 selectors to map the first 512 MiB of memory,
> and then never ever change them.

If youre talking about running in real mode, then you should take note:

The 80386 provides a one Mbyte + 64 Kbyte memory space for an 8086 program. Segment relocation is performed as in the 8086: the 16-bit value in a segment selector is shifted left by four bits to form the base address of a segment. The effective address is extended with four high order zeros and added to the base to form a linear address as Figure 14-1 illustrates. (The linear address is equivalent to the physical address, because paging is not used in real-address mode.) Unlike the 8086, the resulting linear address may have up to 21 significant bits. There is a possibility of a carry when the base address is added to the effective address. On the 8086, the carried bit is truncated, whereas on the 80386 the carried bit is stored in bit position 20 of the linear address.

https://pdos.csail.mit.edu/6.828/2005/readings/i386/s14_01.htm

Joe

Joe Monk

unread,
Jul 15, 2021, 9:25:49 AMJul 15
to
And finally, just a little light reading on the differences between 8086 real mode and 80386 real mode.

https://pdos.csail.mit.edu/6.828/2005/readings/i386/s14_07.htm

Joe

anti...@math.uni.wroc.pl

unread,
Jul 15, 2021, 11:11:09 AMJul 15
to
Depends very much what "works" mean. Running unmodified
86 binaries: AFAICS no. You are creating a virtual machine
which with little care (and assuming divisor 16) will run the
same binary on 86 and 386.

> If it works then the next question is - does the 8086 ignore
> x'66' and advance to the next byte? Or can x'66' be trapped
> and ignored? Or if it is an alias, what effect does it have,
> and can memory be structures to keep that x'66' from
> harming anything important?

There were several 86 processors: original 86, 88 (in original PC)
"compatible" processors by AMD and Harris, NEC variant (which
had Z80 mode so certainly used different mask). Some machines
used 186 (which at lowest level was incompatible, but some
manufactutes pushed "PC-s" based on them), there is 286. There
were various steppings and speed grades. Steppings fixed bugs
and were supposed to make no change to documented instructions,
but it is hard to tell what could happend to undocumented stuff.

> If the answer to the above is "ok it will work, but the trapping
> will make it 5% slower", that is within my 10% flexibility, so
> the next question is:

Hmm. On 86 multiply and divide take time comparable to tens
of simpler instructions. You also have problem of finding
place to keep your divisor. If you keep divisor in register,
that alone will blow up your 10% budget (losing a register on
86 is closer to 20% drop in performace) and introduce seroius
incompatiblity with traditional 86 code. If you keep divisor in
memory there will be extra segment manipulations to fetch
divisor.

> What 16-bit instructions are available on the 80386 if I code
> x'66' or x'67' wherever necessary?
>
> And finally, with the above instructions, can a Turing
> machine be constructed on the 8086 and if so, how
> much slower would a C program be compared to if
> I had access to the full range of 8086 instructions?
>
> Since this is my own C applications we are talking
> about, I probably don't care (even if the year is 1988)
> if my program runs 3 times slower than the machine
> code generated by the same C compiler if it has
> access to the full range of 8086 instructions rather
> than just the subset that work on 80386. I'm more
> interested in producing a future-proofed 16-bit
> executable than being "first and best" in 1988.

Well, if you have C source, then there is obvious way
to get good speed: recompile for newer machine.
Note that fancy "binary translation" schemes claim
speed comparable to native speed. Even less fancy
schemes should be not much worse than 3 times slower
compared to native. Your scheme for huge model
programs is likely to lead to much more significant
slowdown.

BTW: Forth threaded code scheme leads to speed loss
2-4 times compared to machine code, is quite easy to
implement and "interpreter" is tiny.

OTOH politicians and religous leaders understand quite
well that ideological purity is more important than real
world performance...

--
Waldek Hebisch

Scott Lurndal

unread,
Jul 15, 2021, 11:47:19 AMJul 15
to
anti...@math.uni.wroc.pl writes:
>muta...@gmail.com <muta...@gmail.com> wrote:
>> On Thursday, July 15, 2021 at 6:03:56 AM UTC+10, wolfgang kern wrote:

>> If the answer to the above is "ok it will work, but the trapping
>> will make it 5% slower", that is within my 10% flexibility, so
>> the next question is:
>
>Hmm. On 86 multiply and divide take time comparable to tens
>of simpler instructions.
Perhaps in 1985.

2008:

IMUL has a latency of 10 cycles (i.e. it completes in 10 cycles).
It has a throughput of 1 cycle (i.e. there must be one cycle between
successive imuls).

IDIV has circa 70 cycle latency, with 30 cycle throughput.

FMUL has 5 cycle latency and 2 cycle throughput.

(These numbers were as of December 2008, I'm sure they've gotten
better since - downloads the latest copy of 64-ia-32-architectures-optimization-manual.pdf:

2021:

IMUL is down to 3 cycle latency with 1 cycle throughput.
IDIV latencies aren't listed, but throughput for 32-bit integer division is about 20 cycles
and 90 cycles for 64-bit division.
MUL r64 has 4-5 cycle latency and 1 cycle throughput.


8. The throughput of "DIV/IDIV r64" varies with the number of significant digits in the input RDX:RAX.
The throughput is significantly higher if RDX input is 0, similar to those of "DIV/IDIV r32". If RDX is
not zero, the throughput is significantly lower, as shown in the range. The throughput decreases
(increasing numerical value in cycles) with increasing number of significant bits in the input RDX:RAX
(relative to the number of significant bits of the divisor) or the output quotient. The latency of
"DIV/IDIV r64" also varies with the significant bits of input values. For a given set of input values, the
latency is about the same as the throughput in cycles.

9. The throughput of "DIV/IDIV r32" varies with the number of significant digits in the input EDX:EAX
and/or of the quotient of the division for a given size of significant bits in the divisor r32. The
throughput decreases (increasing numerical value in cycles) with increasing number of significant
bits in the input EDX:EAX or the output quotient. The latency of "DIV/IDIV r32" also varies with the
significant bits of the input values. For a given set of input values, the latency is about the same as
the throughput in cycles.

You also have problem of finding
>place to keep your divisor. If you keep divisor in register,

Good thing they added 8 more registers to x86_64 :-).

Not to mention ARMv8 with 32 registers.

anti...@math.uni.wroc.pl

unread,
Jul 15, 2021, 12:04:25 PMJul 15
to
Scott Lurndal <sc...@slp53.sl.home> wrote:
> anti...@math.uni.wroc.pl writes:
> >muta...@gmail.com <muta...@gmail.com> wrote:
> >> On Thursday, July 15, 2021 at 6:03:56 AM UTC+10, wolfgang kern wrote:
>
> >> If the answer to the above is "ok it will work, but the trapping
> >> will make it 5% slower", that is within my 10% flexibility, so
> >> the next question is:
> >
> >Hmm. On 86 multiply and divide take time comparable to tens
> >of simpler instructions.
> Perhaps in 1985.

Yes. Paul wants to run on real 86 (or rather 88) from 1982.

> 2008:
>
> IMUL has a latency of 10 cycles (i.e. it completes in 10 cycles).
> It has a throughput of 1 cycle (i.e. there must be one cycle between
> successive imuls).
>
> IDIV has circa 70 cycle latency, with 30 cycle throughput.

Note that this is not so great: peak is 4 instructions per cycle,
average between 1 and 2 per cycle, so this throughput is comparable
to tens of normal instructions. It seems that division goes
in parallel with other onstructions, so as long as number of
divisions in moderate overall impact is limited. Figures that
I saw were better of what you give above. But Paul want to put
multiply and divide in data fetch patch. It is likely that his
performance will be limited by latency, so quite bad...

> 2021:
>
> IMUL is down to 3 cycle latency with 1 cycle throughput.
> IDIV latencies aren't listed, but throughput for 32-bit integer division is about 20 cycles
> and 90 cycles for 64-bit division.
> MUL r64 has 4-5 cycle latency and 1 cycle throughput.

Yes, butter than older figures but still not so great (especially
concerning division).

> You also have problem of finding
> >place to keep your divisor. If you keep divisor in register,
>
> Good thing they added 8 more registers to x86_64 :-).
>
> Not to mention ARMv8 with 32 registers.

But Paul insist on code that runs on original 86, so do not
want to use extra registers.

--
Waldek Hebisch

Scott Lurndal

unread,
Jul 15, 2021, 1:10:41 PMJul 15
to
anti...@math.uni.wroc.pl writes:
>Scott Lurndal <sc...@slp53.sl.home> wrote:
>> anti...@math.uni.wroc.pl writes:

>> >Hmm. On 86 multiply and divide take time comparable to tens
>> >of simpler instructions.
>> Perhaps in 1985.
>
>Yes. Paul wants to run on real 86 (or rather 88) from 1982.
>

>> IMUL is down to 3 cycle latency with 1 cycle throughput.
>> IDIV latencies aren't listed, but throughput for 32-bit integer division is about 20 cycles
>> and 90 cycles for 64-bit division.
>> MUL r64 has 4-5 cycle latency and 1 cycle throughput.
>
>Yes, butter than older figures but still not so great (especially
>concerning division).

Although if the operations he's contemplating are
by powers of two, then bit shifts are single-cycle high-throughput
operations (ARMv8 actually includes bit shifts in some operands automatically).

muta...@gmail.com

unread,
Jul 15, 2021, 3:53:13 PMJul 15
to
On Thursday, July 15, 2021 at 10:39:27 PM UTC+10, Joe Monk wrote:

> > When PDOS/86 is running on an 80386, at startup time, it will
> > set all 16384 selectors to map the first 512 MiB of memory,
> > and then never ever change them.

> First off, there arent 16384 selectors on an 80386.
>
> A selector is only 13 bits wide. 2^13 = 8192.
>
> Bits 0-1 of a selector are the privilege level, and bit 2 specifies which descriptor table (GDT,LDT) you are indexing into.

In my scheme, as far as I can tell, I can line up the 8192
LDT selectors after the 8192 GDT selectors for a total
of 16384 selectors.

I'm trying to run 8086 programs.

Selective 8086 programs.

Specifically programs that deliberately or accidentally
followed rules that I'm making up right now. Once I know
what the rules are, I'll make Open Watcom or some other
compiler conform to the rules. And then I'll recompile
all my programs, now that I know how to "program
properly".

Note that when I created AM32 I found that PDPCLIB
had forced AM31 in some places (in mvssupa) requiring
me to fix that, and recompile all my applications.

BFN. Paul.

muta...@gmail.com

unread,
Jul 15, 2021, 3:55:11 PMJul 15
to
On Thursday, July 15, 2021 at 11:21:27 PM UTC+10, Joe Monk wrote:

> > When PDOS/86 is running on an 80386, at startup time, it will
> > set all 16384 selectors to map the first 512 MiB of memory,
> > and then never ever change them.

> If youre talking about running in real mode, then you should take note:

No, I'm running 8086 programs in PM32. That's why
there is discussion about x'66' - I'm trying to see if I
can get some extra 8086 instructions available. They
may or may not be required for a Turing machine, I
am not familiar with that.

BFN. Paul.

muta...@gmail.com

unread,
Jul 15, 2021, 4:08:27 PMJul 15
to
On Friday, July 16, 2021 at 1:11:09 AM UTC+10, anti...@math.uni.wroc.pl wrote:

> Depends very much what "works" mean. Running unmodified
> 86 binaries: AFAICS no.

Do you mean running ALL unmodified 8086 binaries?

If so, you are correct.

But all C-generated code, other than huge memory
model, doesn't manipulate the segment registers
as far as hardcoding a 4-bit shift.

So. It depends. Some 8086 binaries will work, some
won't.

I can ensure that all of my C90-compliant programs
work, since even in huge memory model I have control
of the C library with PDPCLIB.

> You are creating a virtual machine

Why are you calling it a virtual machine??? By what
definition? Because it doesn't use the full capability
of the 80386?

> which with little care (and assuming divisor 16) will run the
> same binary on 86 and 386.

Why am I assuming a divisor of 16?

> Hmm. On 86 multiply and divide take time comparable to tens
> of simpler instructions. You also have problem of finding
> place to keep your divisor. If you keep divisor in register,
> that alone will blow up your 10% budget (losing a register on
> 86 is closer to 20% drop in performace) and introduce seroius
> incompatiblity with traditional 86 code. If you keep divisor in
> memory there will be extra segment manipulations to fetch
> divisor.

I'm not worried at all about performance of huge memory
model programs, which are the only ones that need to do
the divide and multiply.

That's the price you pay when you exceed 64k in a single
data structure on a machine that was only designed to
run 16-bit programs. There comes a point when you're
supposed to be recompiling as 32-bit. If you insist on
maintaining 8086 compatibility, then you will take a
performance hit.

All other memory models will run full speed, other than
any x'66' that are being skipped.

> Well, if you have C source, then there is obvious way
> to get good speed: recompile for newer machine.

Sure. But I'm trying to construct the best 16-bit
machine at the moment, using existing tools.

Ideally I can have EFFECTIVE 16-bit segment shifts
and address an entire 4 GiB, but they didn't provide
enough selectors for that, and I can only get 512
MiB, so the equivalent of an 8086 with a 13-bit
segment shift instead of a 4-bit segment shift.

> Note that fancy "binary translation" schemes claim
> speed comparable to native speed. Even less fancy
> schemes should be not much worse than 3 times slower
> compared to native. Your scheme for huge model
> programs is likely to lead to much more significant
> slowdown.

Huge memory model is rare - I'm not worried about that.
Even if it is 3 times slower (the whole application). It
would be good if other memory models ran at full speed
though, but I don't know how many of the instructions
disappear in PM32.

BFN. Paul.

muta...@gmail.com

unread,
Jul 15, 2021, 4:14:26 PMJul 15
to
On Friday, July 16, 2021 at 3:10:41 AM UTC+10, Scott Lurndal wrote:

> Although if the operations he's contemplating are
> by powers of two, then bit shifts are single-cycle high-throughput
> operations (ARMv8 actually includes bit shifts in some operands automatically).

They are in fact always going to be powers of two, and
I could in fact get a right-shift and a left-shift value from
the OS instead of divide and multiply values.

That's a great idea. It does rely on the distance between
selectors being a power of 2 though. Is it wise to mandate
such a processor?

Also, it is sounding like this will work in PM32. Will it also
work in long mode? I'm not familiar with their selectors
or anything else.

Thanks. Paul.

Scott Lurndal

unread,
Jul 15, 2021, 4:27:05 PMJul 15
to
Long mode doesn't use selectors. There is one flat physical address
space (2^40 to 2^64 bits in size depending on processor implementation).
Once paging is enabled, the virtual address is a full 64-bits.

Segment registers (e.g. %fs, %gs) can be loaded with an address that will
be summed with the computed operand address when the segment override is
used. There are special instructions to load and swap %fs values. Linux
uses a segment register to point to per-core operating system data.

This is all documented in the Intel reference manuals, which you
should be intimately familiar with if you desire to write an operating
system for the intel processor family.

muta...@gmail.com

unread,
Jul 15, 2021, 4:53:28 PMJul 15
to
On Friday, July 16, 2021 at 6:27:05 AM UTC+10, Scott Lurndal wrote:

> >Also, it is sounding like this will work in PM32. Will it also
> >work in long mode? I'm not familiar with their selectors
> >or anything else.

> Long mode doesn't use selectors. There is one flat physical address
> space (2^40 to 2^64 bits in size depending on processor implementation).

Ok, thanks. In that case, my huge memory model programs,
on a machine with 4 GiB of memory, should be able to do
a divide of 64k followed by a multiply of 1, right?

Do I lose the ability to lose some 16-bit instructions,
even compared to PM32?

Scratch that. I just realized - my 8086 programs need
to manipulate segment registers. PM32 with 512 MiB
addressability is as high as I can go unless some other
processor provides more selectors. Or instead of
selectors, flexible shifts instead of a hardcoded 4.
Well, a hardcoded 16 would be fine these days too,
if such a processor (or mode) is made.

> Once paging is enabled, the virtual address is a full 64-bits.

I'm not interested in paging except if it fixes a problem
with what I wanted to do un-paged, e.g. fill holes caused
by the A20 line not being enabled, or blocking NULL pointer
assignment, or protecting low memory so the OS can
protect itself. But I don't want to consider that for the
moment. I'm still trying to write 8086 programs "properly".

> Segment registers (e.g. %fs, %gs) can be loaded with an address that will
> be summed with the computed operand address when the segment override is
> used. There are special instructions to load and swap %fs values. Linux
> uses a segment register to point to per-core operating system data.

Ok, I don't need that at the moment. I'm just trying to
get the equivalent of MSDOS to work. Single core.

> This is all documented in the Intel reference manuals, which you
> should be intimately familiar with if you desire to write an operating
> system for the intel processor family.

If I had done that, I would have been stuck in a rut.

It took more than 10 years I think for "separate memory"
to be invented (by somitcw) to solve the problem of
multiple ATL address spaces, despite the solution being
very obvious and conceptually simple once verbalized.
Despite plenty of good brains being thrown at the
problem. Absolutely everyone, including him, was stuck
in a rut. Even then he didn't create it in a vacuum, but as
part of yet another discussion on how to solve the problem.

I just need a little bit of specific knowledge to help me on
my way for the ultimate 8086+ processor. If that matches
what Intel/AMD came up with, cool.

BFN. Paul.

Joe Monk

unread,
Jul 15, 2021, 5:30:44 PMJul 15