On 5/24/2023 1:12 PM, Dan Cross wrote:
> In article <u4lhjo$31dht$
1...@dont-email.me>, BGB <
cr8...@gmail.com> wrote:
>> On 5/24/2023 6:33 AM, Dan Cross wrote:
>>>> I think a lot of this is making a big fuss over nothing, FWIW.
>>>
>>> You can think that all you want, but sadly, that doesn't mean
>>> that these aren't actual problems for real-world systems.
>>>
>>>> But, in any case, SuperH (along with PA-RISC, MIPS, SPARC, etc) got
>>>> along reasonably well with software-managed TLB.
>>>
>>> In a very different time, with very different demands on the
>>> architecture.
>>
>> Depends on what one wants.
>>
>> I am mostly imagining an architecture for embedded-systems style
>> use-cases (but, more DSP-like than microcontroller-like).
>
> This perhaps explains why you seem to be discounting the use
> cases others are telling your are important in other application
> domains.
>
Possibly, I am not trying to design a CPU for desktop PCs or servers...
Granted, I had considered trying to use it for a CNC controller, but
this use-case is served reasonably well with something like an ARM-based
microcontroller (and having an full OS, or virtual memory, on a CNC
controller actually makes it worse).
This is where ASIDs come in:
They allow multiple address spaces to coexist in the TLB at the same
time, while being mutually invisible to each other.
So, say:
0123_456789AB with ASID=1234
And:
0123_456789AB with ASID=5678
Can both exist in the TLB at the same time.
This is provided one can give each thing its own ASID, which is
potentially limiting in that only 65536 ASIDs can be in use at a time.
AS soon as one changes the value in the TTB(63:48), then whatever was in
the TLB before (that belongs to the other address space) is now ignored.
So, when one switches to the guest OS, they can switch TTB over to the
"guest page table" (possibly actually just a virtual TLB), with its own
ASID. When control moves back to the host, the host loads TTB with its
host page-table.
I can note that I am already using a mechanism like this to implement
system calls:
User process triggers a System Call;
SYSCALL ISR performs a task switch to the SYSCALL handler task;
Handler does its thing;
It invokes the SYSCALL ISR again, which transfers control back to the
caller task.
In this case, the user program can run in user-mode, whereas the syscall
handler task runs in supervisor mode.
It is not handled directly by the ISR, mostly because the ISR's run in a
special mode which has the MMU disables and which can't handle interrupt
(and a fault here will cause the CPU to lock-up until a RESET signal is
received, such as from pressing an external reset button). It is
possible a flag could be added to auto-reboot the CPU though.
Translating things in 96-bit space is another option, but fully
generalizing this would likely require adding an additional translation
layer. But, this could potentially be used to sidestep the "only 64K
unique ASIDs" limitation.
Then one could have effectively a space of up 2^64 possible 48-bit
address spaces...
For now though, the number of PIDs+threads in my use-cases small enough
that the 64K ASID limit isn't too much of an issue.
As-is, I would run out of both RAM and pagefile space well before I run
out of ASIDs, if I were using this way.
Granted, for most normal tasks, I am currently running them in a shared
address space, and instead the idea is that ACL checks would be used to
keep one process from stomping on another process's memory (which
ironically, effectively creates sub-rings within User Mode).
> There are several answers here, btw; the obvious one is trap and
> emulate the entirety of the guest's access to this region of the
> virtual address space, but that's a) complex, and b) expensive.
>
And, unnecessary...
> Another is to make the hypervisor relocable, and only trap into
> a small, position-independent trampoline stub that can, say, set
> a base register and jump somewhere else. This will break down
> if the guest uses too much of the virtual address space.
>
> Yet another, since you control the hardware, is to have a
> separate "guest" hardware TLB and an execution mode that uses
> it, but that adds complexity to the hardware.
>
And, is also unnecessary, given one can have multiple address spaces
present in the same TLB at the same time without them conflicting with
each other (provided each has a different ASID).
> Another option is to locate the hypervisor somewhere random in
> the virtual address space that is unlikely to conflict with a
> guest and simply declare it off-limits by convention, but guests
> don't necessarily need to obey convention.
>
> None of these are particularly great options.
>
The latter is also possible.
Within the high 48 bits, there is plenty of space...
Theoretically, one could generate a good 48-bit random number and have
space that is shared between the host and guest, if needed...
>> The rate of guest TLB misses could be reduced by giving it a bigger TLB
>> than on the actual hardware, say, 1024x 4-way, as this part is
>> "basically invisible" to the guest OS (apart from reducing the number of
>> TLB misses).
>
>>>> Or, stated another way, the entire "physical address" space for the
>>>> guest would itself be a virtual memory space running in user-mode.
>>>
>>> Sure, but this isn't about the physical address space; it's
>>> about management of the virtual address space.
>>
>> The guest's physical space would be the virtual address space for the host.
>
> Cool. So what provide's the guests virtual address space?
>
Giving it its own ASID's...
Probably using a Guest -> Host ASID remapping table.
>> Hardware page walking and plain page tables are overly limiting and
>> inflexible though.
>>
>> [snip]
>
> I see no evidence for that, and plenty of disconfirming
> evidence.
>
How many hardware page-walker implementations:
Support layouts other than an N-level page table?
Support adding per-thread access permissions to pages within a single
address space? (Say, similar to access permissions in a Unix-style
filesystems, or file-access permissions in NTFS?)
...
Or, if one were going to do so (without TLB Miss and ACL Miss
interrupts), how would they do so?...
Or, say, if you wanted to fake 32-bit x86 segmentation on top of the
MMU, how would one do so?
...
If you do it in hardware, every possibility also needs to be supported
in hardware, so you either have fewer possibilities, or the hardware
becomes needlessly complex.
It is like, the whole x86 TSS thing...
They handled this mechanism in hardware, but it doesn't need hardware...
One could instead have the ISR go through an arcane ritual to try to get
all of the CPU registers saved to and restored from memory without
(accidentally) changing the value of any of the registers in the process.
Granted, it is a PITA to figure out how to save and restore all of the
registers with no spare registers to use as scratch-pad, but it is
possible...
Originally, SuperH banked out some of the registers, but I got rid of
this as it was cheaper for the FPGA to not have any banked registers...
(If, albeit, slower and more awkward for the ISR handlers).
So, usually the first priority is to get a few of the scratch registers
saved off and freed up so that it can use these to set up the ISR
stack-frame and get all the other registers saved off.
Though, there is a designated area of scratchpad RAM located around
0000_0000C000 .. 0000_0000DFFF (in the physical map) to help with some
of this (also used by the Boot ROM before switching over to external DRAM).
The actual mechanism itself being essentially: Update a few special
registers, change some processor mode flags, and perform a computed
branch relative to a special register.
Unlike either the 8086 or SH-2 mechanism, the ISR mechanism does not
need to access memory.
Ironically, RISC-V went to the other extreme, having effectively 3
copies of the register space (User, Supervisor, Machine). Rather than
just a single register space...
But, this means that a BJX2 core, despite having 64 GPRs (rather than
32), ends up having a smaller register space in practice than RISC-V's
"Privileged ISA" spec (for RV64IMAFD or similar).
...