smilinggoomba wrote:
> Is this thread dead...
> I didn't mean to make it sound like I had my answer when I said:
I'm not sure if it will be of help to you but I'll explain the basics
of my VM initialization code.
What I do is split my kernel code into 2 parts so that the initialization
routine that enables paging is mapped with a physical address
and virtual address 0x00100000. The real part of the kernel
is mapped with a physical address of 0x00400000 but with
a virtual address of 0xF0000000.
This is done with a linker script and using the AT directive
to set the physical address. It looks something like this:
OUTPUT_FORMAT("elf32-i386")
OUTPUT_ARCH(i386)
SEARCH_DIR("/cross/lib/gcc-lib/i386-elf/3.3.3")
ENTRY(Entry)
SECTIONS
{
. = 0x00100000; /* Physical and virtual address of Init code */
.inittext :
{
_init_stext = .;
kernel/i386/init/entry.o(.text)
kernel/i386/init/*.o(.text)
}
.initrodata :
{
kernel/i386/init/*.o(.rodata)
kernel/i386/init/*.o(.rodata.*)
kernel/i386/init/*.o(.rodata1)
_init_etext = .;
}
.initdata ALIGN (0x1000) :
{
_init_sdata = .;
kernel/i386/init/*.o(.data)
_init_edata = .;
}
.initbss :
{
_init_sbss = .;
kernel/i386/init/*.o(.bss)
kernel/i386/init/*.o(COMMON)
_init_ebss = .;
}
. = 0xF0000000; /* Virtual address of kernel */
.text : AT (_stext - (0xF0000000 - 0x00400000)) /* Kernel loaded at 4MB
physical addr */
{
_stext = .;
_stext_phys = _stext - (0xF0000000 - 0x00400000);
*(.text)
}
.rodata : AT (_srodata - (0xF0000000 - 0x00400000))
{
_srodata = .;
*(.rodata)
*(.rodata.*)
*(.rodata1)
_etext = .;
_etext_phys = _etext - (0xF0000000 - 0x00400000);
}
.data ALIGN (0x1000) : AT (_sdata - (0xF0000000 - 0x00400000))
{
_sdata = .;
_sdata_phys = _sdata - (0xF0000000 - 0x00400000);
*(.data)
_edata = .;
}
.bss : AT (_sbss - (0xF0000000 - 0x00400000))
{
_sbss = .;
_sbss_phys = _sbss - (0xF0000000 - 0x00400000);
*(.bss)
*(COMMON)
_ebss = .;
_ebss_phys = _ebss - (0xF0000000 - 0x00400000);
}
}
So the init part gets loaded by GRUB at 0x00100000 and the
main kernel part gets loaded at 0x00400000. One of the first
functions called is InitVM() that sets up all of the memory
management data structures and enables paging, identity mapping
the init code to 0x00100000 but remapping the kernel to
0xF0000000.
A simple heap allocator is used to allocate kernel data tables
and structures above the end of the kernel's data. These
structures include those for managing free pages, managing
free areas of a process's memory and importantly a fixed
sized pool of pagetables, allocated as a proportion of the
total amount of physical memory, I think I use about 1/32
and have a minimum limit. So the physical address space
looks a bit like this:
-- end of kernel heap
pagetable_pool
pmap_desc_array - used to allocate pagetables
pageframe_array
memregion_array
process_array
-- beginning of kernel heap
&__ebss_phys
kernel .data
kernel .text
&__stext_phys
0x00400000
...
Init. data
Init .text
0x00100000
The kernel and the heap above it get mapped to 0xF000000 when
paging is enabled.
A function MapMem() is use to initialize the page tables. A separate set
of functions that manipulate the page tables are used once the kernel
is initialized, these are the PmapInit(), PmapEnter() and PmapRemove()
functions.
MapMem() is called several times, for 0-1mb area, the init code, init data,
kernel code and kernel data. vbase and vceiling are the virtual addresses
to map, pbase is the physical address of the base of the region and pte_bits
are used to set the attributes of the page table entry.
root_pagedirectory is the first page of the pagetable pool. This is the
page directory of the root process. current_pagetable is incremented by
PAGE_SIZE when a page-table needs to be allocated.
Initially :
current_pagetable = pagetable_pool;
root_pagedirectory = current_pagetable;
current_pagetable = (uint32 *)((uint8 *)current_pagetable + PAGE_SIZE);
void MapMem (vm_addr vbase, vm_addr vceiling, vm_addr pbase, uint32
pte_bits)
{
vm_addr pa, va;
uint32 *pd, *pt;
uint32 pde_idx, pte_idx;
if (CPUSupportsGlobalPaging() == FALSE)
pte_bits &= ~PG_GLOBAL;
for (pa = pbase, va = vbase; va < vceiling; pa+= PAGE_SIZE, va +=PAGE_SIZE)
{
pde_idx = (va >> PDE_SHIFT) & PDE_MASK;
pte_idx = (va >> PTE_SHIFT) & PTE_MASK;
pd = root_pagedirectory;
/* Allocate a page table if needed and get the page table pointer */
if ((*(pd + pde_idx) & PG_PRESENT) == 0)
{
pt = current_pagetable;
current_pagetable = (uint32 *)((uint8 *)current_pagetable + PAGE_SIZE);
*(pd + pde_idx) = ((uint32)pt & PDE_ADDR_MASK) | PG_USER | PG_READWRITE |
PG_PRESENT;
}
else
{
pt = (uint32 *)(*(pd + pde_idx) & PDE_ADDR_MASK);
}
*(pt + pte_idx) = (pa & PTE_ADDR_MASK) | pte_bits | PG_PRESENT;
}
}
Once the areas are mapped using this function paging is enabled. Then all of
the
other data structures are initialized.
A proper allocator is later used to allocate and free the pagetables for the
rest
of the system. Basically a separate array is used to maintain a list of free
page tables I'm sure a bitmap could be use just as well.
Also not that once paging is enabled it is easy enough to convert from
the virtual address of page tables and page directories to their physical
addresses using
phys_addr = virt_addr - (0xF0000000 - 0x00400000)
virt_addr = phys_addr + (0xF0000000 - 0x00400000)
CR3, page directory and page table entries need physical addresses, but
in order to access them you use the virtual addresses.
So that's basically the initialization. There's a lot more to it though,
the memory management is divided into two parts, a high level
description of address spaces and a hardware specific portion.
The high level portion comprises of AddressSpace, MemRegion
and Pageframe structures to describe a process's address space.
A cpu specific part called the Pmap is used to manage the
actual page tables.
An old post describes the scheme, except back then the kernel
was identity mapped with the init code at 0x00100000.
http://groups.google.com/group/alt.os.development/msg/862869c76fdfd5cf
I don't think I've described the Pmap pagetable routines anywhere
so I'll briefly mention them here. Basically Pmap is a structure that
points to a page directory and is part of my AddressSpace structure.
Whenever a page table entry needs to be changed in the kernel or user
processes the following calls are used.
bool PmapEnter (struct Pmap *pmap, vm_offset va, vm_offset pa, uint32 prot);
bool PmapProtect (struct Pmap *pmap, vm_offset va, uint32 prot);
bool PmapRemove (struct Pmap *pmap, vm_offset va);
These are similar to MapMem() above but only change a single pagetable
entry at a time. A flush of the TLBs by reloading CR3 needs to be performed
once the page tables have been updated.
I've posted the PmapEnter() code below to give you an idea of what they
look like:
bool PmapEnter (struct Pmap *pmap, vm_offset va, vm_offset pa, uint32 prot)
{
uint32 *pd, *pt;
uint32 pde_idx, pte_idx;
struct PmapDesc *pt_desc, *pd_desc;
uint32 pde_bits;
SpinLock (&pmap_slock);
pde_idx = (va >> PDE_SHIFT) & PDE_MASK;
pte_idx = (va >> PTE_SHIFT) & PTE_MASK;
pd = pmap->page_directory;
if ((*(pd + pde_idx) & PG_PRESENT) == 0)
{
if ((pt = PmapAllocPagetable (pmap, PMAP_THROWAWAY)) == NULL)
{
SpinUnlock (&pmap_slock);
return FALSE;
}
*(pd + pde_idx) = (uint32) (PmapToPhys((vm_addr) pt) & PDE_ADDR_MASK) |
PG_USER | PG_READWRITE | PG_PRESENT;
pd_desc = pmapdesc + (((vm_addr)pd - (vm_addr)pagetable)/PAGE_SIZE);
pd_desc->reference_cnt ++;
pt_desc = pmapdesc + (((vm_addr)pt - (vm_addr)pagetable)/PAGE_SIZE);
pt_desc->pde_idx = pde_idx;
pt_desc->parent_pdesc = pd_desc;
}
else
{
pt = (uint32 *)PmapToVirt((vm_addr)(*(pd + pde_idx) & PDE_ADDR_MASK));
pt_desc = pmapdesc + (((vm_addr)pt - (vm_addr)pagetable)/PAGE_SIZE);
}
pt_desc->reference_cnt ++;
pde_bits = PG_USER | PG_PRESENT;
if (prot & VM_PROT_WRITE)
pde_bits |= PG_READWRITE;
*(pt + pte_idx) = (pa & PTE_ADDR_MASK) | pde_bits;
SpinUnlock (&pmap_slock);
return TRUE;
}
The Pmap idea is a simplification of the Mach VM system, I think that
was written by John Dyson and then later ported to FreeBSD by others.
VM is quite complicated especially in mainstream OSes. I hope my
brief description of my page table code above will help or at least give
you some ideas.
I've actually been thinking of simplifying my memory management,
well the AddressSpace and MemRegion stuff in the post I linked to.
--
Marv