MINIX 3 and PAE

194 views
Skip to first unread message

Jean-Baptiste Boric

unread,
Apr 28, 2017, 10:23:53 AM4/28/17
to minix3
Hi all,

I've started to investigate on what it would take to make MINIX 3 use PAE on i386. While it would not make much of a difference in the short term when compared to PSE (2 MiB instead of 4 MiB big pages and NX bit), that will be eventually a hard requirement for x86_64.

I've tried to make a proof-of-concept the cheapest way possible (https://github.com/boricj/minix/tree/pae_hack), so far I'm alternating between clean VM panics and triple faults after each fix, but it does run a little bit of user-land (namely, parts of VM initialization).

Here are my findings :
  • VM and the micro-kernel assumes a 2-level page layout throughout their code and switching to another layout would require TONS of rewrites ;
  • VM owns the entire page tables and CR3, so it has in practice kernel-level privileges ;
  • VM and the micro-kernel are rather crufty (paging code assumes page table entries are u32_t, assumptions about 32 bit wide pointers, large amounts of duplicated code and define spaghetti between ports...).
The 2-level layout can be worked around with folding and tricks inside the micro-kernel, but that won't work for x86_64. VM owning the page tables and CR3 is really scary for a user-land process since any bug inside it could lead to kernel pwning. As far as I can tell, every other micro-kernel operating system keeps the page tables outside of reach from user-land precisely because of this. It also outright excludes the MINIX micro-kernel from any serious isolation or hypervisor role.

I'm wondering how to proceed right now. The best plan I can come up with is :
  • Clean up enough code to make PAE possible ;
  • Move page tables, CR3 handling and enough paging logic inside the micro-kernel to "tame" VM ;
  • Keep VM's 2-level page layout as an abstraction, but use the real deal inside the micro-kernel.
That should be enough for i386 PAE, but those are rather big and possibly controversial changes... Is that plan acceptable ? I'd rather not end up creating a MINIX fork by accident a few months down the road if I end up doing this.

Just to be clear, I'm not advocating for putting all paging policy inside the micro-kernel, merely that we align ourselves to the common model in other micro-kernels.

David van Moolenbroek

unread,
May 2, 2017, 8:01:16 AM5/2/17
to minix3
Hey,


On Friday, April 28, 2017 at 4:23:53 PM UTC+2, Jean-Baptiste Boric wrote:
That should be enough for i386 PAE, but those are rather big and possibly controversial changes... Is that plan acceptable ? I'd rather not end up creating a MINIX fork by accident a few months down the road if I end up doing this.

I'd say go for it. I for one have never seriously looked into the VM service or the pagetable management, so I cannot properly evaluate the full consequences of your plan, but we are definitely not very attached to the current design. I think it's fair to say that the original reasons for the current VM/kernel separation were along the lines of "because we can," "it should keep the microkernel more lightweight," and "the more code we run without privileges, the more likely it is that we still stop *some* bad things" - later this also evolved into "we can do live updates on the VM service", but as I described all over the live update documentation [1], live updating the VM service comes with so many restrictions that it is very unlikely to be useful in practice the way it is. As long as your approach does not significantly complicate the system (and the microkernel in particular) and does not break anything else, I think the rest is not so important and this sounds like a very welcome kind of change..

Regards,
David

[1] http://wiki.minix3.org/doku.php?id=developersguide:liveupdate

bertbr...@googlemail.com

unread,
May 9, 2017, 5:24:08 AM5/9/17
to minix3
Hi Jean-Baptiste,

PAE is also on "our" roadmap. From my POV, we should go for making it work for the three level addressing, even it means a lot to rewrite. Mapping a two layers into 3 layers, is pretty easy, but we will be "forced" to have three layers all the time (even on ARM), maybe we can also discuss it per PM how to split the work? After having the three layer approach working i would go for the other steps?

brgds,
Bert


On Friday, April 28, 2017 at 4:23:53 PM UTC+2, Jean-Baptiste Boric wrote:

Jean-Baptiste Boric

unread,
May 9, 2017, 8:18:48 AM5/9/17
to minix3
 
PAE is also on "our" roadmap. From my POV, we should go for making it work for the three level addressing, even it means a lot to rewrite. Mapping a two layers into 3 layers, is pretty easy, but we will be "forced" to have three layers all the time (even on ARM), maybe we can also discuss it per PM how to split the work? After having the three layer approach working i would go for the other steps?

The big problem with folding from 3 to 2 layers for PAE is the CR3 register.

With normal paging, it's a pointer to the page directory table. With PAE, it's a pointer to the page directory pointer table. VM assumes the former, so even if we lie to VM about CR3's real value with the micro-kernel, it's not enough to make that work. So that leaves us with two options: either properly fix VM's 2 layer assumption by rewriting large portions of it, or move page handling inside the micro-kernel and gut VM (we still need it for caching, page fault handling and as a middle-man).

The second option is the sanest one, since we'll align ourselves with the common model used by every other micro-kernel. By hiding page table details from user-space, VM won't need to know if the hardware has a 2 or 3 layer layout. Also, it's the only option that does not require a single humongous commit to keep both Gerrit and Jenkins happy. On the downside, that will introduce dynamic memory allocation to the micro-kernel, but I'm perfectly fine about that.

Once that work is done, moving from PSE to PAE should be fairly easy, since only the micro-kernel should be impacted.

About splitting the work: obviously, this is a lot of work for just one person. However, right now I don't see how to split it up so that several people can work on it at the same time without tight synchronization. I need to come up with an proper action plan first and see if some tasks can be done in parallel ; I'll post it here once I have it.

Jean-Baptiste Boric

unread,
May 9, 2017, 2:57:38 PM5/9/17
to minix3
I've quickly checked VM's source code, and here are the big tasks:
  • Move physical memory allocation into the micro-kernel (vm/alloc.c), kinda easy
  • Move page table stuff into the micro-kernel (vm/pagetable.c), not that easy
  • Check if other stuff needs to be moved into the micro-kernel, unknown
  • Add PAE support to micro-kernel, should be easy
My objective is to remove enough power from VM so that it can't hose the micro-kernel, that's why item #4 is a big unknown for now. Nevertheless, once both physical memory allocation and page tables are moved into the micro-kernel, PAE becomes possible.

Unless there are more things to do than the first two points, I don't see a way to split that work meaningfully since item #2 depends on item #1.


Reply all
Reply to author
Forward
0 new messages