Call-gate-like mechanism

Glen Herrmannsfeldt

unread,

Nov 4, 2003, 12:13:43 AM11/4/03

to

"Robert Wessel" <robert...@yahoo.com> wrote in message
news:bea2590e.03110...@posting.google.com...
> "Glen Herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
news:<wdZob.82551$e01.268843@attbi_s02>...

> > I have wondered before about segment descriptors, and which processors
would
> > supply a cache for them, speeding up the loading of segment selectors.
I
> > have asked before and never received an unambiguous answer. Such a
cache
> > should help the context switch time for this case, as well as ordinary
> > segment selector loads.

> To the best of my knowledge, no x86s implement any kind of special
> descriptor cache (except the required architectural one). OTOH, all
> x86s do cache descriptor table entries in the normal caches (at least
> those x86s that have caches).

I never found anything in intel documentation describing it, but someone in
comp.sys.intel, I thought from intel, seemed to say there was. Though maybe
they meant the regular cache. Somehow I never thought about the effect of
the regular cache. Mostly I was doing this on a 486, but then it had a
cache, too.

Now this reminds me of the question that has come up at various times
related to PAE and addressing more than 4G. I had thought it would require
OS support, trapping on loading segment selectors describing memory space
not described in the page tables and modifying the page tables. I still
think that is what would be required, but an intel web page seems to say
that loading segment selectors would be enough. In any case, I don't know
of any OS that implements it.

-- glen

Terje Mathisen

unread,

Nov 4, 2003, 2:03:34 AM11/4/03

to

Glen Herrmannsfeldt wrote:
> "Robert Wessel" <robert...@yahoo.com> wrote in message

>>To the best of my knowledge, no x86s implement any kind of special
>>descriptor cache (except the required architectural one). OTOH, all
>>x86s do cache descriptor table entries in the normal caches (at least
>>those x86s that have caches).

I seem to remember that AMD had at least one chip which some form of
caching/virtualization of segment entries, specifically to speedup
16-bit code, but I might be wrong.

>
> I never found anything in intel documentation describing it, but someone in
> comp.sys.intel, I thought from intel, seemed to say there was. Though maybe
> they meant the regular cache. Somehow I never thought about the effect of
> the regular cache. Mostly I was doing this on a 486, but then it had a
> cache, too.
>
> Now this reminds me of the question that has come up at various times
> related to PAE and addressing more than 4G. I had thought it would require
> OS support, trapping on loading segment selectors describing memory space
> not described in the page tables and modifying the page tables. I still
> think that is what would be required, but an intel web page seems to say
> that loading segment selectors would be enough. In any case, I don't know
> of any OS that implements it.

The application would only need to load segment selectors, but the OS
would have to be involved as soon as you did any load or store operation
pointing to currently unmapped regions.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Robert Wessel

unread,

Nov 4, 2003, 1:57:46 PM11/4/03

to

"Glen Herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message news:<bAGpb.99717$HS4.841226@attbi_s01>...

> Now this reminds me of the question that has come up at various times
> related to PAE and addressing more than 4G. I had thought it would require
> OS support, trapping on loading segment selectors describing memory space
> not described in the page tables and modifying the page tables. I still
> think that is what would be required, but an intel web page seems to say
> that loading segment selectors would be enough. In any case, I don't know
> of any OS that implements it.

On all 32 bit x86s, the linear address space into which segments are
mapped is 32 bits (4GB) in size, PAE or no PAE. You can define 16K
segments, but all would point into the one 4GB linear address space.

PAE allows you to have more physical pages than fit in a single
address space. At any time a single process can only address 4GB
worth of memory, but several independent processes could (together)
use rather more.

An OS *could* simulate a larger linear address space by playing games
with the page and segment tables every time a segment register was
reloaded (the OS would mark all segments not currently mapped into the
"real" 4GB space as not present, and then rearrange things as needed
on a seg fault). The OS would need to limit segment sizes in order to
pull this off, since there may be 6 different segments addressable at
any one time, and the OS would need to be able to map an arbitrary
collection of six segments into a single 4GB space (less any OS
requirement and whatnot). Performance would be awful though for any
segment register loads that needed a page table remapping operation.

It wouldn't be a huge architectural change in x86 to allow linear
address spaces larger than 4GB, and to allow segments to use the
larger space, thus providing a bigger address space (something on the
order of 46 bits), albeit with segmentation. In fact I posted a
speculation over on RWT that such a change might be the increased
addressing support rumored for Prescott.

Glen Herrmannsfeldt

unread,

Nov 4, 2003, 3:06:20 PM11/4/03

to

"Terje Mathisen" <terje.m...@hda.hydro.com> wrote in message
news:bo7j07$8pu$1...@osl016lin.hda.hydro.com...

> Glen Herrmannsfeldt wrote:
> > "Robert Wessel" <robert...@yahoo.com> wrote in message
> >>To the best of my knowledge, no x86s implement any kind of special
> >>descriptor cache (except the required architectural one). OTOH, all
> >>x86s do cache descriptor table entries in the normal caches (at least
> >>those x86s that have caches).
>
> I seem to remember that AMD had at least one chip which some form of
> caching/virtualization of segment entries, specifically to speedup
> 16-bit code, but I might be wrong.

I used to like far pointers for the protection against addressing where I
wasn't supposed to. I had one program that used an array of far pointers,
and I had it allocate a segment for each one. I would then get a hardware
trap on read or write outside of the allocated space. Very good for
debugging programs with complicated array addressing.

> > Now this reminds me of the question that has come up at various times
> > related to PAE and addressing more than 4G. I had thought it would
require
> > OS support, trapping on loading segment selectors describing memory
space
> > not described in the page tables and modifying the page tables. I still
> > think that is what would be required, but an intel web page seems to say
> > that loading segment selectors would be enough. In any case, I don't
know
> > of any OS that implements it.
>
> The application would only need to load segment selectors, but the OS
> would have to be involved as soon as you did any load or store operation
> pointing to currently unmapped regions.

Yes. In the IA-32 Intel(R) Architecture Software developers manual, Volume
1, section 3.3.3, they state:

"A program can switch between linear address spaces within this 64GB
physical address space by changing segment selectors in the segment
registers."

They then reference Volume 3 where PAE and PSE are fully described. Only
when you get to Volume 3 do they say that OS intervention is required.
Still, it shouldn't be all that hard to do in an OS, yet I still don't know
of any that do it. Solaris-x86 is supposed to have some support for PAE,
but as far as I know it doesn't support segment selectors for selecting
address spaces.

Well, it may not be much longer before the AMD processors with 64 bit
addressing are cheaper than PAE processors. There are times when it is nice
to have a reasonably affordable machine with a large address space.

-- glen

Glen Herrmannsfeldt

unread,

Nov 4, 2003, 5:44:24 PM11/4/03

to

"Robert Wessel" <robert...@yahoo.com> wrote in message

news:bea2590e.0311...@posting.google.com...

(snip of PAE question)

> On all 32 bit x86s, the linear address space into which segments are
> mapped is 32 bits (4GB) in size, PAE or no PAE. You can define 16K
> segments, but all would point into the one 4GB linear address space.

> PAE allows you to have more physical pages than fit in a single
> address space. At any time a single process can only address 4GB
> worth of memory, but several independent processes could (together)
> use rather more.

> An OS *could* simulate a larger linear address space by playing games
> with the page and segment tables every time a segment register was
> reloaded (the OS would mark all segments not currently mapped into the
> "real" 4GB space as not present, and then rearrange things as needed
> on a seg fault). The OS would need to limit segment sizes in order to
> pull this off, since there may be 6 different segments addressable at
> any one time, and the OS would need to be able to map an arbitrary
> collection of six segments into a single 4GB space (less any OS
> requirement and whatnot). Performance would be awful though for any
> segment register loads that needed a page table remapping operation.

It should still be a lot faster than paging it in from disk. Also, a lot
faster than any tricks the program might use to limit its memory usage.

> It wouldn't be a huge architectural change in x86 to allow linear
> address spaces larger than 4GB, and to allow segments to use the
> larger space, thus providing a bigger address space (something on the
> order of 46 bits), albeit with segmentation. In fact I posted a
> speculation over on RWT that such a change might be the increased
> addressing support rumored for Prescott.

When I first was reading about PAE I had thought that what it did was to
bypass the PMMU when the high bits of the segment origin + offset where not
zero. I am not sure which problems that would help, and which ones it
wouldn't.

As far as I know, the Watcom 32 bit compilers can generate large model (48
bit pointers) code. It might be that OS/2 can run them, but maybe not any
other OS. With the 64 bit processors coming out now, though, there is
probably no reason for any extensions to IA-32.

-- glen

Robert Wessel

unread,

Nov 5, 2003, 12:18:33 AM11/5/03

to

"Glen Herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message news:<cZVpb.80772$ao4.230091@attbi_s51>...

> "Robert Wessel" <robert...@yahoo.com> wrote in message
> news:bea2590e.0311...@posting.google.com...

> > An OS *could* simulate a larger linear address space by playing games
> > with the page and segment tables every time a segment register was
> > reloaded (the OS would mark all segments not currently mapped into the
> > "real" 4GB space as not present, and then rearrange things as needed
> > on a seg fault). The OS would need to limit segment sizes in order to
> > pull this off, since there may be 6 different segments addressable at
> > any one time, and the OS would need to be able to map an arbitrary
> > collection of six segments into a single 4GB space (less any OS
> > requirement and whatnot). Performance would be awful though for any
> > segment register loads that needed a page table remapping operation.
>
> It should still be a lot faster than paging it in from disk. Also, a lot
> faster than any tricks the program might use to limit its memory usage.

Yes, but there are other ways to do the same thing. Windows defines a
set of APIs called "Address Windowing Extensions" (AWE), which allows
the application to allocate additional pages of storage, and then swap
those in and out of the address space. Linux has similar functions.

> > It wouldn't be a huge architectural change in x86 to allow linear
> > address spaces larger than 4GB, and to allow segments to use the
> > larger space, thus providing a bigger address space (something on the
> > order of 46 bits), albeit with segmentation. In fact I posted a
> > speculation over on RWT that such a change might be the increased
> > addressing support rumored for Prescott.
>
> When I first was reading about PAE I had thought that what it did was to
> bypass the PMMU when the high bits of the segment origin + offset where not
> zero. I am not sure which problems that would help, and which ones it
> wouldn't.
>
> As far as I know, the Watcom 32 bit compilers can generate large model (48
> bit pointers) code. It might be that OS/2 can run them, but maybe not any
> other OS. With the 64 bit processors coming out now, though, there is
> probably no reason for any extensions to IA-32.

I've seen several compilers that can generate 16:32 pointers. And
there have certainly been OS's that can handle 48 bit far pointers,
although none of them (as far as I know) allow the total address space
for a process to exceed 4GB (excepting AWE-like tricks). So you can
have many segments, but the total address space of all of your
segments still cannot exceed 4GB.

FWIW, OS/2 32 bit was pretty much flat model, except for some device
driver and 16 bit support related stuff. You could have some
additional selectors, but they always mapped a subset of the basic
linear address space. Linux x86 provides a similar mechanism (a user
process can pretty much define it's own LDT via a set of APIs - but
still there only one 4GB address space).