Virtual Memory Management

Eli Gottlieb

unread,

Jan 28, 2007, 7:53:56 PM1/28/07

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here's one for the record...

Writing functions handling the mechanisms of paging, and allowing the
kernel to write page tables when it needs to write them, isn't that
hard. Each context gets a page directory with some tables in it that map
physical pages. OK.

Individual pages should be available (as their own abstraction) to
handle for memory sharing, swapping and copy-on-write purposes.

However, I don't want users to see individual pages and I certainly
don't want to handle user programs as individual pages. So I want the
kernel to present the abstraction of sections or segments -- regions of
pages that start at an arbitrary page-aligned address, continue for an
arbitrary length of pages, and share protection properties.

So the job of the virtual memory system is to present the abstractions
of contexts (address spaces) and segments to user-level programs and the
abstraction of individual pages within a segment and context to
operating-system services.

What data structures should the VMM use? When should it update the
actual page tables and directory: when its data structures change
(immediate impact) or at an appointed update time (more efficient for
large numbers of changes)? What algorithms should it use to perform that
update?

To update page tables I use a sandbox page table. When I want to update
a page table, I pass its physical address to a function, which maps
that table in somewhere in the upper 4MB of memory and returns the new
virtual address of the page. Lots of people map page tables and the
directory directly into themselves, making each context's page tables
visible to the kernel at a specified address in that context, but I
don't want a stray pointer bug to corrupt my page tables.

I've tried binary-search trees of segments mapping themselves directly
to page tables, and I've tried sorted, double-linked lists of segments
that map themselves onto page table objects, which belong to a page
directory object, that build the actual tables when necessary just
before a context switch.

I've tried dividing segments into Mapping Regions of all the pages
belonging to that segment within a single page table, and using the
mapping regions to alter the real page tables. I've also tried using
modulo math and an array of pointers to real page tables to give the
illusion of updating consecutive entries of a single, gigantic page table.

But I've never had a VMM that took less than 400 lines of code and
didn't mysteriously crash under some situation or other. Anyone know
what I could do better?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFvUWjXxM/2T/QnN0RAofVAJ0ZeXVo3W3GepWSHuNB16guu3MjkgCgobrZ
MxIQ2smhdDzw07zubx8UQus=
=6mb+
-----END PGP SIGNATURE-----

Ciaran Keating

unread,

Jan 28, 2007, 11:19:45 PM1/28/07

to

On Mon, 29 Jan 2007 11:53:56 +1100, Eli Gottlieb <eligo...@gmail.com>
wrote:

> To update page tables I use a sandbox page table. When I want to update
> a page table, I pass its physical address to a function, which maps
> that table in somewhere in the upper 4MB of memory and returns the new
> virtual address of the page. Lots of people map page tables and the
> directory directly into themselves, making each context's page tables
> visible to the kernel at a specified address in that context, but I
> don't want a stray pointer bug to corrupt my page tables.

Hi Eli,

Over the weekend (it was a long weekend here, for Australia Day on Friday)
I ran into the age-old problem of how to bootstrap dynamically-allocated
page table pages - how to use the page's linear address in order to work
with it, but before the page has been fully mapped in. Haven't figure it
out yet - I know people have talked about mapping pages onto themselves,
but until now I hadn't really understood why - and indeed, I still don't,
because I haven't got it working!

Anyhow, your idea of a sandbox seems inspired. When I get back to my OS
work I'll think about doing it that way. Thanks :-)

> But I've never had a VMM that took less than 400 lines of code and
> didn't mysteriously crash under some situation or other. Anyone know
> what I could do better?

Sorry to just take, take, take... I can't offer any suggestions because
you're ahead of me in this game. Sorry!

--
Ciaran Keating
Amadán Technologies Pty Ltd

Alexei A. Frounze

unread,

Jan 29, 2007, 4:17:39 AM1/29/07

to

Eli Gottlieb wrote:
...

> To update page tables I use a sandbox page table. When I want to
> update a page table, I pass its physical address to a function, which
> maps
> that table in somewhere in the upper 4MB of memory and returns the new
> virtual address of the page. Lots of people map page tables and the
> directory directly into themselves, making each context's page tables
> visible to the kernel at a specified address in that context, but I
> don't want a stray pointer bug to corrupt my page tables.

If you can't use pointers properly, never do or learn to. Mind you, if you
corrupt some kernel structure or data other than page tables, you won't be
any better off.

...

> But I've never had a VMM that took less than 400 lines of code

And you probably never will. My last implementation consisted of:
- 273 lines for page mapping/unmapping & top level wrapper functions
(dealing with mutexes)
- 375 lines for simple general-purpose memory manager that manages the
address space and calls page map/unmap functions
- 539 lines for the red-black tree (used to organize and search the address
space ranges during allocation)
- a few lines of ASM code for INVLPG, MOV CR3
- some substantial code elsewhere (in the loader) that determines the amount
of memory and creates a stack of free page frames
And the above doesn't include anything related to the address space changing
in the scheduler nor does it include all the necessary header files.

> and
> didn't mysteriously crash under some situation or other.

There can only be one reason for that: a code bug. That may have different
nature, either something ordinary or having to do with the TLB.

> Anyone know
> what I could do better?

At least debugging and testing. I actually wrote a test for the above,
another 200 lines of code and tested the general-purpose memory manager and
red-black tree that it's using and it survived 10**5 random allocations and
deallocations. I don't remember if I ran this same test under the system or
not, but I'm pretty sure there're no surprises. Page allocation/deallocation
and mapping/unmapping are trivial and tested otherwise.

Alex

Alexei A. Frounze

unread,

Jan 29, 2007, 4:31:22 AM1/29/07

to

Ciaran Keating wrote:
> On Mon, 29 Jan 2007 11:53:56 +1100, Eli Gottlieb
> <eligo...@gmail.com> wrote:
>
>> To update page tables I use a sandbox page table. When I want to
>> update a page table, I pass its physical address to a function,
>> which maps that table in somewhere in the upper 4MB of memory and
>> returns the new virtual address of the page. Lots of people map
>> page tables and the directory directly into themselves, making each
>> context's page tables visible to the kernel at a specified address
>> in that context, but I don't want a stray pointer bug to corrupt my
>> page tables.
>
> Hi Eli,
>
> Over the weekend (it was a long weekend here, for Australia Day on
> Friday) I ran into the age-old problem of how to bootstrap
> dynamically-allocated page table pages - how to use the page's linear
> address in order to work with it, but before the page has been fully
> mapped in. Haven't figure it out yet - I know people have talked
> about mapping pages onto themselves, but until now I hadn't really
> understood why - and indeed, I still don't, because I haven't got it
> working!

First you have to understand the gist of it. Just draw the complete page
table hierarchy as in the intel manuals. Then next to that draw one where a
PDE points to the PD. Expand/work the diagram out for the case when a linear
address involves that PDE in the translation. By looking at the two pics (or
maybe just one) you'll see it all.

> Anyhow, your idea of a sandbox seems inspired. When I get back to my
> OS work I'll think about doing it that way. Thanks :-)

At first I had that one and it even worked, but I didn't like it for various
reasons (ease of use, performance, how the code looked). So, when I figured
out that I can get away without this special sandbox for page table
management, I ditched it.

Alex

Ciaran Keating

unread,

Jan 29, 2007, 6:00:58 AM1/29/07

to

On Mon, 29 Jan 2007 20:31:22 +1100, Alexei A. Frounze <ale...@chat.ru>
wrote:

>> Over the weekend (it was a long weekend here, for Australia Day on
>> Friday) I ran into the age-old problem of how to bootstrap
>> dynamically-allocated page table pages - how to use the page's linear
>> address in order to work with it, but before the page has been fully
>> mapped in. Haven't figure it out yet - I know people have talked
>> about mapping pages onto themselves, but until now I hadn't really
>> understood why - and indeed, I still don't, because I haven't got it
>> working!
>
> First you have to understand the gist of it. Just draw the complete page
> table hierarchy as in the intel manuals. Then next to that draw one
> where a PDE points to the PD. Expand/work the diagram out for the case
> when a linear address involves that PDE in the translation. By looking
> at the two pics (or maybe just one) you'll see it all.

Yeah, I've been doing that. I suppose I didn't really say what I was
thinking. I can see the need for having a page table refer to itself, but
I can't see how to fill in that first self-referential PTE. In order to
say something like "table[0] = table", table has to be already mapped. So
when I said "I still don't", I suppose I meant that I could see why it
makes sense but I couldn't see how it solved anything - because I couldn't
get the last bit working. Any hints?

>> Anyhow, your idea of a sandbox seems inspired. When I get back to my
>> OS work I'll think about doing it that way. Thanks :-)
>
> At first I had that one and it even worked, but I didn't like it for
> various reasons (ease of use, performance, how the code looked). So,
> when I figured out that I can get away without this special sandbox for
> page table management, I ditched it.

Well, I'm not really concerned about performance at this time. And ease of
use shouldn't be an issue because it's all wrapped up in a MapPhysicalPage
function. I've got it working with the special sandbox page - well, it
probably needs a little tidying up, but the first page table and the first
page have worked out OK. Then I had dinner :-)

Alexei A. Frounze

unread,

Jan 29, 2007, 11:00:41 AM1/29/07

to

Ciaran Keating wrote:
> On Mon, 29 Jan 2007 20:31:22 +1100, Alexei A. Frounze
> <ale...@chat.ru> wrote:
>
>>> Over the weekend (it was a long weekend here, for Australia Day on
>>> Friday) I ran into the age-old problem of how to bootstrap
>>> dynamically-allocated page table pages - how to use the page's
>>> linear address in order to work with it, but before the page has
>>> been fully mapped in. Haven't figure it out yet - I know people
>>> have talked about mapping pages onto themselves, but until now I
>>> hadn't really understood why - and indeed, I still don't, because I
>>> haven't got it working!
>>
>> First you have to understand the gist of it. Just draw the complete
>> page table hierarchy as in the intel manuals. Then next to that draw
>> one where a PDE points to the PD. Expand/work the diagram out for
>> the case when a linear address involves that PDE in the translation.
>> By looking at the two pics (or maybe just one) you'll see it all.
>
> Yeah, I've been doing that. I suppose I didn't really say what I was
> thinking. I can see the need for having a page table refer to itself,
> but I can't see how to fill in that first self-referential PTE. In
> order to say something like "table[0] = table", table has to be
> already mapped. So when I said "I still don't", I suppose I meant
> that I could see why it makes sense but I couldn't see how it solved
> anything - because I couldn't get the last bit working. Any hints?

You begin with page translation already enabled and working -- you may have
for example first 4 MB identity mapped, linear = physical.
The CPU starts address translation from the PD, right? So suppose the top 10
(of 32) address bits select that PDE that points back to the PD. Now that
the CPU's got what it thinks is the address of a PT (but really is the PD's
address), it can do the 2nd look up in that using those next 10 address
bits. So, it uses those other 10 bits and gets what it thinks is the address
of a page frame, which really is either the address of the PD (if the linear
address is such that bits 12-21 equal to bits 22-31 and those bits 22-31
select the PDE that contains PD's address) or address of some PT to which
one of PDEs points (if bits 22-31 still select that PDE that points back to
the PD, but those bits aren't the same as bits 12-21).
So, in order to change something in the PD you modify it at a linear address
that selects/goes through the back-pointing PDE twice. Hence you map more
PTs into the PD. Then to change something in a PT you modify it at a
corresponding linear address that selects/goes through the back-pointing PDE
only once. And hence you map a page into the address space.

Alex

Maxim S. Shatskih

unread,

Jan 29, 2007, 11:27:22 AM1/29/07

to

> What data structures should the VMM use?

a) Address Space Descriptors, called Virtual Address Descriptors (VADs) in
Windows and Virtual Memory Areas (VMAs) in Linux.

b) A table of physical pages (a structure per physical page).

c) Some structures to support the shared pages - "prototype PTE tables" aka
"segments" in Windows, also used in Linux for SysV shm_xxx syscalls
implementation - see the source.

The difference here is that Windows uses the same tables for memory-mapped
files too, while Linux uses another - and IMHO better - implementation for
them.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
ma...@storagecraft.com
http://www.storagecraft.com

Maxim S. Shatskih

unread,

Jan 29, 2007, 11:32:59 AM1/29/07

to

> out yet - I know people have talked about mapping pages onto themselves,
> but until now I hadn't really understood why - and indeed, I still don't,
> because I haven't got it working!

Set PDE #768 to point to the PD page itself.

This will mean - PD is also a PT for a 4MB range from 0xc0000000 up, which also
means - PTs are mapped as data pages from 0xc0000000 up.

Simple and useful.

Maxim S. Shatskih

unread,

Jan 29, 2007, 11:35:21 AM1/29/07

to

> PTs into the PD. Then to change something in a PT you modify it at a

Note: any updates to _present_ PTE - including, but not limited to, making it
non-present - require INVLPG.

Updates to non-present PTE, including, but not limited to, making it present -
do not require INLVPG.

Zeljko Vrba

unread,

Jan 29, 2007, 2:18:17 PM1/29/07

to

On 2007-01-29, Maxim S. Shatskih <ma...@storagecraft.com> wrote:
>
> Updates to non-present PTE, including, but not limited to, making it present -
> do not require INLVPG.
>

Hmm, where did you get this info from? I have read Intel and AMD manuals
regarding TLBs, and nothing in there contradicts with the possibility of
negative caching (ie. the TLB caching also not-present entries).

Yes, it'd be foolish to waste TLB entries in this manner, but I'd be more
comfortable if your statement were "officially" sanctioned in some manual.

Eli Gottlieb

unread,

Jan 29, 2007, 2:24:12 PM1/29/07

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexei A. Frounze wrote:
> If you can't use pointers properly, never do or learn to. Mind you, if
> you corrupt some kernel structure or data other than page tables, you
> won't be any better off.

If I have my page tables mapped into virtual address space *all the
time* and I corrupt them by a stray pointer error, I get a triple fault
when the system tries to use those page tables. And the TLB only makes
the problem worse by delaying the effect until a TLB reload.

If I keep the page tables mapped out most of the time, a stray pointer
can corrupt a kernel data structure but still, most of the time, leave
enough intact to run an exception handler.

> And you probably never will. My last implementation consisted of:
> - 273 lines for page mapping/unmapping & top level wrapper functions
> (dealing with mutexes)
> - 375 lines for simple general-purpose memory manager that manages the
> address space and calls page map/unmap functions
> - 539 lines for the red-black tree (used to organize and search the
> address space ranges during allocation)
> - a few lines of ASM code for INVLPG, MOV CR3
> - some substantial code elsewhere (in the loader) that determines the
> amount of memory and creates a stack of free page frames
> And the above doesn't include anything related to the address space
> changing in the scheduler nor does it include all the necessary header
> files.

When I say 400 lines, I'm not including my page frame allocator or
low-level paging functions. But thanks for the discouragement from
trying to write a simpler, more understandable VMM with fewer bugs!

>
>> and
>> didn't mysteriously crash under some situation or other.
>
> There can only be one reason for that: a code bug. That may have
> different nature, either something ordinary or having to do with the TLB.

Wow, really? Now why do we get code bugs? Because the human programmer
does not fully understand his code. And why does that happen? Because
the code is too complex.
- --
The science of economics is the cleverest proof of free will yet
constructed.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFvkncXxM/2T/QnN0RAjOUAJ9fnOn6GsENn3y6NVsIMF/goYnnUwCfTCHT
clip2CXGC5ld+En3tNfzC3Y=
=rroV
-----END PGP SIGNATURE-----

Eli Gottlieb

unread,

Jan 29, 2007, 2:25:22 PM1/29/07

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Maxim S. Shatskih wrote:
>> What data structures should the VMM use?
>
> a) Address Space Descriptors, called Virtual Address Descriptors (VADs) in
> Windows and Virtual Memory Areas (VMAs) in Linux.

And what, precisely, are these? I haven't found any documentation of
what they actually are and how they're actually implemented.

- --
The science of economics is the cleverest proof of free will yet
constructed.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFvkoiXxM/2T/QnN0RApZ/AJ9rTTUvMdwSGTKGqZGGQqAkeoRplQCgicnb
gOxb0DlFYVAXv6s5aBoMzJg=
=Gz/U
-----END PGP SIGNATURE-----

Maxim S. Shatskih

unread,

Jan 29, 2007, 2:40:37 PM1/29/07

to

> Hmm, where did you get this info from?

From reverse-engineered code of Windows (NT4).

Maxim S. Shatskih

unread,

Jan 29, 2007, 2:41:52 PM1/29/07

to

> >> What data structures should the VMM use?
> >
> > a) Address Space Descriptors, called Virtual Address Descriptors (VADs) in
> > Windows and Virtual Memory Areas (VMAs) in Linux.
>
> And what, precisely, are these? I haven't found any documentation of
> what they actually are and how they're actually implemented.

These structures describes ranges in virtual address space, and how to handle
page faults in these ranges.

Ciaran Keating

unread,

Jan 29, 2007, 4:36:16 PM1/29/07

to

On Tue, 30 Jan 2007 03:00:41 +1100, Alexei A. Frounze <ale...@chat.ru>
wrote:

> The CPU starts address translation from the PD, right? So suppose the

> top 10 (of 32) address bits select that PDE that points back to the PD.

Ah, that's something that hadn't occurred to me - to have the PD point
back to itself. I have the PTs pointing back to themselves, but not the PD.

> Now that the CPU's got what it thinks is the address of a PT (but really
> is the PD's address), it can do the 2nd look up in that using those next
> 10 address bits. So, it uses those other 10 bits and gets what it thinks
> is the address of a page frame, which really is either the address of
> the PD (if the linear address is such that bits 12-21 equal to bits
> 22-31 and those bits 22-31 select the PDE that contains PD's address) or
> address of some PT to which one of PDEs points (if bits 22-31 still
> select that PDE that points back to the PD, but those bits aren't the
> same as bits 12-21).
> So, in order to change something in the PD you modify it at a linear
> address that selects/goes through the back-pointing PDE twice. Hence you
> map more PTs into the PD. Then to change something in a PT you modify it
> at a corresponding linear address that selects/goes through the
> back-pointing PDE only once. And hence you map a page into the address
> space.

Plenty there to think about. Now that I've got the mapping working with
the help of a sacrificial page (inspired by Eli's sandbox idea) I'm going
to stop working on my OS again for a little while and get back to earning
a living. But in a week or two when I have some more time I'll look into
this in detail. Thanks for the tips :-) And Max, too - you mentioned the
same thing in another reply.

Cheers,
Ciaran

Alexei A. Frounze

unread,

Jan 30, 2007, 12:12:00 AM1/30/07

to

Eli Gottlieb wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei A. Frounze wrote:
>> If you can't use pointers properly, never do or learn to. Mind you,
>> if you corrupt some kernel structure or data other than page tables,
>> you won't be any better off.
>
> If I have my page tables mapped into virtual address space *all the
> time* and I corrupt them by a stray pointer error, I get a triple
> fault when the system tries to use those page tables.

You may not get it immediately.

> And the TLB
> only makes the problem worse by delaying the effect until a TLB
> reload.

Right.

> If I keep the page tables mapped out most of the time, a stray pointer
> can corrupt a kernel data structure but still, most of the time, leave
> enough intact to run an exception handler.

That all is true if you can always handle some unexpected exception in a
meaningful way, which I doubt (it's hard to check everywhere and everything
and be ready for that everything), and if what's damaged is unimportant,
which you can't know before it happened. What if your stray pointer damages
the disk buffer? You loose data on the disk, which can be any data or
application, including the OS itself. Now, what if it damages the file
system control structures, not just file internals? It's probably worse. And
if the file system code is implemented without enough checks, those errors
may easily lead to further damage of the file system.

>> And you probably never will. My last implementation consisted of:
>> - 273 lines for page mapping/unmapping & top level wrapper functions
>> (dealing with mutexes)
>> - 375 lines for simple general-purpose memory manager that manages
>> the address space and calls page map/unmap functions
>> - 539 lines for the red-black tree (used to organize and search the
>> address space ranges during allocation)
>> - a few lines of ASM code for INVLPG, MOV CR3
>> - some substantial code elsewhere (in the loader) that determines the
>> amount of memory and creates a stack of free page frames
>> And the above doesn't include anything related to the address space
>> changing in the scheduler nor does it include all the necessary
>> header files.
>
> When I say 400 lines, I'm not including my page frame allocator or
> low-level paging functions. But thanks for the discouragement from
> trying to write a simpler, more understandable VMM with fewer bugs!
>
>>
>>> and
>>> didn't mysteriously crash under some situation or other.
>>
>> There can only be one reason for that: a code bug. That may have
>> different nature, either something ordinary or having to do with the
>> TLB.
>
> Wow, really? Now why do we get code bugs? Because the human
> programmer does not fully understand his code. And why does that
> happen? Because the code is too complex.

Sure, and it's not only about misunderstanding code, hardware also. It's
just that there're some very obvious things which do not require in-depth
and broad understanding of the involved things and there're some subtle,
which do. Caching of any kind is usually among those subtleties. They're
rarely taught of understood quickly and fully.

Alex

Alexei A. Frounze

unread,

Jan 30, 2007, 12:16:21 AM1/30/07

to

Maxim S. Shatskih wrote:
>> PTs into the PD. Then to change something in a PT you modify it at a
>
> Note: any updates to _present_ PTE - including, but not limited to,
> making it non-present - require INVLPG.
>
> Updates to non-present PTE, including, but not limited to, making it
> present - do not require INLVPG.

Of course.

Alex

Marven Lee

unread,

Jan 31, 2007, 2:56:48 PM1/31/07

to

Eli Gottlieb wrote:
> What data structures should the VMM use?

I use several arrays of structures to manage physical
memory, ranges of virtual address space and pagetables.
As I don't use copy-on-write at the moment it makes
things simpler.

My kernel resides at the bottom of the virtual address
space. When initialising the memory manager the arrays
of structures are all allocated contiguously in physical
memory before they are identity mapped when paging
is enabled.

The size of these arrays are mostly dependent on the
amount of memory the PC has. Upper and lower
limits are used for the array sizes and a scaling factor to
calculate the size of the arrays based on the amount
of physical memory.

Once paging is enabled the kernel memory map and
physical memory map looks a bit like this dependent
on the amount of memory. I'll describe the structures
later.

0x00230000
Other preallocated data
0x00220000
Pagetable pool (pool of page tables)
0x00220000
DMA MemRegion table
0x00210000
Pmap Descriptor table (for allocating page tables)
0x00200000
Pageframe table (memory map)
0x00140000
MemRegion table (pool of structs that describe memory ranges)
0x00130000
Process table
0x00120000
DMA memory pool
0x00110000
Kernel (and embedded temporary stack)
0x00100000
Lower memory
0x00001000
Marked as not present
0x00000000

The arrays dedicated to memory management are the
Pageframe table, MemRegion table, Pmap Descriptor table
and Pagetable pool. Processes contain an AddressSpace
structure to maintain lists of MemRegions, Pageframes
and pagetables belonging a process.

** Pageframe Table **

One Pageframe structure is used to describe each page of
normal physical memory. It hold the physical address of the
page. If the page is in-use it is attached to a MemRegion
structure and it's virtual address is set.

The memregion_entry is used to attach it to a list as part
of the memregion structure. If the memory is unused/free
then it is attached to a global unused pageframe list. Having
2 list entries is a bit redundant as a Pageframe can only
be on one or the other.

struct Pageframe
{
vm_addr physical_addr;
vm_addr virtual_addr;
struct MemRegion *mr;
LIST_ENTRY (Pageframe) unused_entry;
LIST_ENTRY (Pageframe) memregion_entry;
uint32 state; /* Reserved, Free, In-Use */
};

** MemRegion Table **

The array acts as a pool of free MemRegion structures. These
describe areas of a virtual address space, the range of
virtual addresses they occupy, their properties and
contains a list of the Pageframes that will map into this
area. Even areas of the virtual address space that
are not mapped have a MemRegion. When an address
space is created, it has one MemRegion that spans
the entire address space.

A MemRegion in my OS are what Maxim called
Virtual Address Descriptors in Windows and
Virtual Memory Areas in Linux

struct MemRegion
{
vm_addr base_addr;
vm_addr ceiling_addr;
LIST_ENTRY (MemRegion) sorted_entry;
LIST_ENTRY (MemRegion) free_entry;
LIST_ENTRY (MemRegion) unused_entry;
struct AddressSpace *as;
uint32 type;
uint32 prot;
uint32 flags;
vm_addr phys_base_addr;
struct Pageframe *pageframe_hint;
LIST (Pageframe) pageframe_list;
};

Splitting a "free" memregion normally requires allocating
one or two MemRegion structures from the unused MemRegion
pool.

MemRegions are attached to AddressSpace
structures that are part of each Process's structure.
There is also one special AddressSpace on its own
and that is for managing the kernel memory.

struct AddressSpace
{
LIST (MemRegion) sorted_memregion_list;
LIST (MemRegion) free_memregion_list;
struct MemRegion *hint;
struct Pmap pmap;
};

An AddressSpace can be thought of as combination
of a machine-independent description and machine-dependent
description of an address space

The machine-independent part is composed of a list of MemRegion
structures and each MemRegion containing a list of Pageframes.
This forms a complete description of an address space.

Here's a little diagram of what a small address space might look
like.

AddressSpace (range 0x0000 to 0x10000) ---> Pmap (pagetables)
|
\|/
MemRegion7 (0x0000 to 0x1000, mapped) -> pf1
|
MemRegion3 (0x1000 to 0x3000, unmapped)
|
MemRegion1 (0x3000 to 0x6000, mapped) -> pf2 - pf5 - pf7
|
MemRegion4 (0x6000 to 0x8000, mapped) -> pf6 - pf3
|
MemRegion2 (0x8000 to 0x10000, unmapped)

The machine-dependent part is called the Pmap and contains
the page directory and page tables of an address space. It also
contains a list of Pmap Descriptor structures which are used to
keep track of what pagetables have been allocated from the page
table pool.

struct PmapDesc
{
LIST_ENTRY (PmapDesc) pmapdesc_entry;
uint32 *addr; /* Address of Page table/directory */
struct Pmap *pmap; /* Pmap it belongs to */
int reference_cnt;
};

A pool of memory is allocated during initialization for the page tables
and an array is allocated for Pmap descriptors. There is one
Pmap descriptor for each Page that shall be used as a page table
or directory.

The PmapDesc structures are used to maintain a free pool of
pagetables. To allocate a page, find the first PmapDesc on the
list, remove it and the virtual/physical address of the corresponding
pagetable can be got from the "addr" field. Another way of
calculating the address is using the offset in the PmapDesc array
and applying it to the array of page table pages.

As page tables are identity mapped along with all of the other
arrays and tables their physical and virtual addresses are the same,
this means there is no need for self mapping a page directory
into itself. It also means that all pagetables of all processes are
accessible and can be modified by other processes.

Eventually there may be no pagetables free, in which case if the
memory is allocated immediately it can fail. What I intend to do
is have the page tables discardable, freeing a pagetable from
another process and inserting it into the page faulting process.
This is possible as it is the MemRegion and Pageframe structures
that manages and keeps a record of the virtual address space
and not the pagetables. I don't have discardable pagetables
implemented yet.

My memory manager doesn't implement copy-on-write yet,
though I'm sure I'll eventually have it. My current fork()
implementation copies the entire address space which is
quite slow, but at least it works.

There are quite a few docs on Mach, FreeBSD, Linux and Solaris
VMM systems on the web. Unfortunately a lot of the documents
seem to move to new places or disappear altogether.

Here's a few on FreeBSD/Mach that I found.

http://people.freebsd.org/~nik/article.html-text

http://docs.freebsd.org/44doc/papers/newvm.html

http://eloas.net/~webs/madchat/?cat=./sysadm/kern/kern.bsd/FreeBSDVM.txt

http://216.239.59.104/search?q=cache:-ppbZYPCEFAJ:www.celabo.org/freebsd/john1.txt+%22FreeBSD+MACH-based+VM+system&hl=en&gl=uk&ct=clnk&cd=3

Their is a large document on the NetBSD UVM memory manager by
Chuck Cranor that you might find on his website or on Citeseer.

--
Marv

Eli Gottlieb

unread,

Jan 31, 2007, 3:26:04 PM1/31/07

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexei A. Frounze wrote:
> That all is true if you can always handle some unexpected exception in a
> meaningful way, which I doubt (it's hard to check everywhere and
> everything and be ready for that everything), and if what's damaged is
> unimportant, which you can't know before it happened.

It's not always possible to handle exceptions meaningfully and recover,
but an exception-handler that dumps the registers and panics is always
far more helpful in debugging than a triple-fault.

> What if your stray pointer damages the disk buffer? You loose data on the disk, which
> can be any data or application, including the OS itself. Now, what if it
> damages the file system control structures, not just file internals?
> It's probably worse. And if the file system code is implemented without
> enough checks, those errors may easily lead to further damage of the
> file system.

Which is why I'm writing a micro-kernel that will put the file-system
safely away from such an unsubtle thing as a stray pointer bug in the
kernel. It takes a smart bug to corrupt data structures of a user
process from the kernel while leaving the rest of the process intact,
and smart bugs (ie: bugs that use a lot of complicated processing to
screw things up in a very small, specific way that propagates without
detection) don't occur nearly as frequently as stupid ones (ie: bugs
that can be detected early and often).

In conclusion, proper coding turns smart bugs into stupid ones by
checking everything with enough frequency to stop bad data or bad code
before it runs too far.

- --
The science of economics is the cleverest proof of free will yet
constructed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFwPtbXxM/2T/QnN0RAmIFAJ98kXXOS2conzmDhGtYbFxR3UWGIACeOFUL
DVoTiLIpZaX9+6Q1NrpmXjk=
=lO6A
-----END PGP SIGNATURE-----

Alexei A. Frounze

unread,

Feb 2, 2007, 12:19:15 AM2/2/07

to

Maxim S. Shatskih wrote:
>> PTs into the PD. Then to change something in a PT you modify it at a
>
> Note: any updates to _present_ PTE - including, but not limited to,
> making it non-present - require INVLPG.
>
> Updates to non-present PTE, including, but not limited to, making it
> present - do not require INLVPG.

Btw, the bug with missing INVLPGs may be extremely hard to find and it may
or may not show up on different CPUs. The dependency is on the code being
executed, on the TLB size and TLB implementation. It may even be so that the
missing INVLPG is OK because other memory accesses may timely evict what
must have been evicted by the missing INVLPG. And you don't know you have a
bug because it doesn't show up.

Alex