Unfortunately, the ObjectHeader references can't be turned into cursors. ObjectHeaders appear at the "front" of each durable object type (exception: pages, where they are a side table). Each durable object type lives in a type specific array. Sometimes more than one, because system physical memory isn't actually contiguous on real machines, and that can't be cleaned up in the virtual map.
Note also that we have two logically distinct kinds of mutations going on to the target object. One is for the purpose of changing the object's metadata (to mark it dirty). That operation requires a "fence" on the object, but we do not want it to invalidate outstanding references for read purposes. The other (which doesn't always arise) is for modifying the object's data. The Rust reference taking mechanism doesn't recognize that these are logically distinct.
At the moment, I'm not seeing a Rust path through this example without resorting to raw pointers. Not because the pointers are invalid, but because the borrow checker's concurrency guards are so thoroughly in opposition to the Coyotos concurrency guards and object graph. The ObjectHeader* pointers appear so pervasively that I think this amounts to abandoning the baked-in Rust concurrency guards for all cross-CPU purposes.
What do others see here in terms of "how to convert to Rust" that I am failing to see?
Unfortunately, the ObjectHeader references can't be turned into cursors. ObjectHeaders appear at the "front" of each durable object type (exception: pages, where they are a side table). Each durable object type lives in a type specific array. Sometimes more than one, because system physical memory isn't actually contiguous on real machines, and that can't be cleaned up in the virtual map.Is this still true on 64-bit systems? I get that it can be a pain in the neck to embed the assumption that the object tables are contiguous into the system, especially with hot-swappable RAM and CXL.
On Thu, Feb 19, 2026 at 3:03 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:Note also that we have two logically distinct kinds of mutations going on to the target object. One is for the purpose of changing the object's metadata (to mark it dirty). That operation requires a "fence" on the object, but we do not want it to invalidate outstanding references for read purposes. The other (which doesn't always arise) is for modifying the object's data. The Rust reference taking mechanism doesn't recognize that these are logically distinct.Distinct in what sense? Do you mean that modifying the metadata “doesn’t count” in some sense? In that case, consider whether you can store the metadata in atomic types or (if atomic types cannot express what you need) UnsafeCell, which indicates that the contents of the cell may be modified (unsafely) even when an & reference to the cell exists, and leaves it up to you to avoid data races.
let ro_ref = &target_object;let mut mut_ref = &mut ro_ref;
At the moment, I'm not seeing a Rust path through this example without resorting to raw pointers. Not because the pointers are invalid, but because the borrow checker's concurrency guards are so thoroughly in opposition to the Coyotos concurrency guards and object graph. The ObjectHeader* pointers appear so pervasively that I think this amounts to abandoning the baked-in Rust concurrency guards for all cross-CPU purposes.I would say that, in general, "writing a garbage collector in Rust" typically requires at least some raw pointers and unsafe code — outside of “toy virtual machine” cases where the memory you’re garbage collecting is an explicitly accessed array, of course. For example, in the Rust standard library, Arc provides memory management that the borrow checker does not understand, using raw pointers internally, and yet also cooperates with the borrow checker.
What do others see here in terms of "how to convert to Rust" that I am failing to see?It’s hard to say more without more concrete details — actual data structures and function signatures that could be improved.
The OTEntry references here could hypothetically be replaced by cursors, but the offset computation introduced to turn those cursors into array entry references would actually be noticeable at the margin. Computing that offset is a dependent and depended-on computation. Every cycle we lose in the kernel is multiplied everywhere.
Also: thank you for the question, because it shows I said something that conveyed a badly wrong impression. It's not that the metadata update to "dirty" doesn't "count". It's that being dirty does not, in and of itself, mean that anything has yet been mutated.But it definitely counts! If the target object has not been marked dirty, it is an ERROR to have an outstanding mutable reference to that object. By which I mean:let ro_ref = &target_object;let mut mut_ref = &mut ro_ref;is a static error, because the mutable ref cannot safely be taken until we know the target object has been marked dirty. Not "it's immutable to code outside the same crate", but "taking the &mut reference is just not okay at all until the target object is dirty". This may be a use case for a read-only wrapper - I need to go look at that.Mind you, we didn't actually have that pointer-level permission guard in Coyotos. It's just that &mut T as implemented in Rust is a form of privilege escalation. The fact that the mut_ref binding above is possible tells us that &T does not mean & readonly T.
let mut ro_ref: &T = &target_object;let mut_ref: &mut &T = &mut ro_ref;
On Fri, 20 Feb 2026 at 09:03, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:The OTEntry references here could hypothetically be replaced by cursors, but the offset computation introduced to turn those cursors into array entry references would actually be noticeable at the margin. Computing that offset is a dependent and depended-on computation. Every cycle we lose in the kernel is multiplied everywhere.I should measure this, but in the spirit of "the processor is hurtling head-long into its next cache miss" I think we have plenty of time waiting for dependent pointer lookups. In the IPC path, for example, we have the chain from the entry cap to the endpoint, to the process it targets, to the object table entry on that process. A little address calculation in there might be completely free.
In Rust, creating an &mut T that points to the same T as a &T is prohibited by safe Rust and undefined behavior in unsafe Rust (with one possible exception around UnsafeCell which is not yet settled). The privilege escalation you describe does not exist.
But the idea that the lock is released as a consequence of the guard object going out of scope may be a problem. We work like crazy to avoid unwinding the stack because it is mostly cold cache, so those unwinds would depend on somehow knowing when to release when you reach the "bottom" of the call stack and drop down to assembly code. But the cost of keeping a table of guards that need unwinding is really high - that was our original implementation for PrepLock release in EROS before it occurred to me to just bump the transaction counter.I suppose one could hack up the Rust runtime to use a modified mutex structure that works the way the Coyotos mutex works. I just haven't gotten deep enough into the bowels of Rust to understand how many parts of the compiler stack think they know something about the internals of a mutex. It's one of those things that's going to either be really easy or really hard.
Unfortunately, the ObjectHeader references can't be turned into cursors.
Is this still true on 64-bit systems? I get that it can be a pain in the neck to embed the assumption that the object tables are contiguous into the system, especially with hot-swappable RAM and CXL.
I should measure this, but in the spirit of "the processor is hurtling head-long into its next cache miss" I think we have plenty of time waiting for dependent pointer lookups.
There is no “Rust runtime” except in the sense of the code that manages specific things like the global allocator and panicking. std::sync::Mutex is not integrated with the compiler at all. You can freely use any mutex implementation you want and Rust-the-language does not mind. If you want to explore uses of mutexes that don't follow the exact std::sync::MutexGuard pattern, I recommend checking out lock_api. Its main goal is to provide a similar Mutex<T> type built on pluggable implementations, but you can also directly use any of the many third-party implementations of its RawMutex types yourself in order to lock and unlock without using RAII-style guards.The basic primitive behind either Mutex, and anything else which provides mutation behind &, is UnsafeCell (which is special to the compiler); a Mutex<T> is a pair of a raw mutex and an UnsafeCell<T> containing the data, with an API that always checks the raw mutex before granting access to the UnsafeCell.
--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CAAP%3D3QPMkB1WY_PT-eft__pkhVvVBiW%3DmAEsAQL%2BwieyVLR7Ew%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CAK5yZYhVLBM4uyvVmzXDdLqY1WvjfoM8GA4wKJ9CLoZMO%3DSxvQ%40mail.gmail.com.
On Tue, Feb 24, 2026 at 3:40 PM Kevin Reid <kpr...@switchb.org> wrote:There is no “Rust runtime” except in the sense of the code that manages specific things like the global allocator and panicking. std::sync::Mutex is not integrated with the compiler at all. You can freely use any mutex implementation you want and Rust-the-language does not mind. If you want to explore uses of mutexes that don't follow the exact std::sync::MutexGuard pattern, I recommend checking out lock_api. Its main goal is to provide a similar Mutex<T> type built on pluggable implementations, but you can also directly use any of the many third-party implementations of its RawMutex types yourself in order to lock and unlock without using RAII-style guards.The basic primitive behind either Mutex, and anything else which provides mutation behind &, is UnsafeCell (which is special to the compiler); a Mutex<T> is a pair of a raw mutex and an UnsafeCell<T> containing the data, with an API that always checks the raw mutex before granting access to the UnsafeCell.I'd have to go look at some things, but I suspect we will want the mutex to be a field of the object it guards, also not at the front of it. Which would seem to put the mutex inside the UnsafeCell containing the object that it guards. Given that mutex accesses are atomic, I imagine we can make something unsavory like that work, but it seems like a "hold your nose" sort of design pattern in the eyes of Rustaceans :-)
On Thu, Feb 19, 2026 at 6:26 PM William ML Leslie <william.l...@gmail.com> wrote:Unfortunately, the ObjectHeader references can't be turned into cursors.Is this still true on 64-bit systems? I get that it can be a pain in the neck to embed the assumption that the object tables are contiguous into the system, especially with hot-swappable RAM and CXL.I think so, because a 64 bit reference won't fit in the current capability structure and I'm not thrilled about the disk impact of increasing the size of a capability.
The current 64-bit OID values get us to plenty of pages, but the limit of a 32-bit swizzled capability reference to those pages is more restrictive. Not for pages, because we only map those transiently, but for other objects.
If you actually manage to use a large physical memory, I'm guessing you'll end up at something like 8 pages per GPT. If so, then 128GB of physical memory needs (128G/16/4096) * 300 bytes per GPT ~= 630MB for GPT space within the kernel virtual space. So by using 32-bit offsets from KVA we're probably still OK to 256GB of physical ram. GPT virtual space is contiguous, so we can stretch that to 32TB physical ram using an index into that array in the prepared capability. But past that we would need to extend kernel virtual space beyond 4GB or come up with a way to map GPTs transiently.
Transient GPTs seem tricky because of the mutual dependency tracking between GPTs and page tables.Part of the answer is that I need to implement large pages, which we need in any case.
The root problem is the byte ratio of GPTs to pages mapped, which is currently at 300 : 65536. And I haven't considered the depend table space yet. So it would be really nice to shrink the capability size rather than grow it, but I'm not seeing a path to get four bytes out of the capability format right now. Oh. Hmm. Maybe I do, but it would need a complete re-think of how OIDs work on the store, and also a re-think of the OTEntry trick.
On Sun, 1 Mar 2026 at 08:34, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:If you actually manage to use a large physical memory, I'm guessing you'll end up at something like 8 pages per GPT. If so, then 128GB of physical memory needs (128G/16/4096) * 300 bytes per GPT ~= 630MB for GPT space within the kernel virtual space. So by using 32-bit offsets from KVA we're probably still OK to 256GB of physical ram. GPT virtual space is contiguous, so we can stretch that to 32TB physical ram using an index into that array in the prepared capability. But past that we would need to extend kernel virtual space beyond 4GB or come up with a way to map GPTs transiently.+1 for more kernel virtual space on 64-bit systems.
FWIW there is 256MiB of kernel virtual space on 32-bit. The rest of the 4GiB is the current process.
Yes. I have some hacks, but they don't work with the transmap or when huge pages get aged out. Thinking I'll do different oid ranges for regions of different sizes, like 6.2 did for device ranges.
The root problem is the byte ratio of GPTs to pages mapped, which is currently at 300 : 65536. And I haven't considered the depend table space yet. So it would be really nice to shrink the capability size rather than grow it, but I'm not seeing a path to get four bytes out of the capability format right now. Oh. Hmm. Maybe I do, but it would need a complete re-think of how OIDs work on the store, and also a re-think of the OTEntry trick.Asymptotic 0.5% is not so scary, but we also need to account for the RevMap and Depends infrastructure, both not persistent, and on old crusty CPU architectures you also need the hardware mapping structures. All I can hope is that not all GPTs need to be resident once we have the hardware Mappings built.
--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CAAP%3D3QPa0wP_9_kmfW-6%2B%2BSWrMQ%3DvSPND3_VZNSHUwPmpxmLqQ%40mail.gmail.com.
At current DRAM provisionings (e.g. my laptop has 12*GB), we seem to be within sight of exhausting a 4GB kernel virtual region. The real problem is that this is true because we need that much physical RAM for metadata. So I think we need to think about nwo to thin the current data structures. The corresponding kernel space overheads in Linux aren't free, but the Coyotos overheads are quite a bit higher.
The reason I held to 256MiB was that this was compatible with then-existing application-level address space assumptions for Linux applications. I figured binary compatibility might be helpful if we ever got around to building a Linux subsystem. On 64-bit systems that same argument gives us the upper half. If we manage to arrive at a memory overhead rate of 1:1, I'll be perversely impressed.
New crusty CPU architectures also have hardware mapping tables. :-)
If you take the GPTs out of residence you no longer know what's in them, so I think you have to whack the caches that depend on their state. Though if you do take them out, you pretty well know they aren't changing, so maybe that's worth re-thinking. Strictly speaking it's not the residence of the GPT that invalidates the cached information, but the modification of the GPT that does so.
On Mon, 2 Mar 2026 at 04:23, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:New crusty CPU architectures also have hardware mapping tables. :-)RISC-V: Boring enough to succeed (TM :-)
If you take the GPTs out of residence you no longer know what's in them, so I think you have to whack the caches that depend on their state. Though if you do take them out, you pretty well know they aren't changing, so maybe that's worth re-thinking. Strictly speaking it's not the residence of the GPT that invalidates the cached information, but the modification of the GPT that does so.Right now if you remove a reference to a GPT from another, we walk the entire affected tree looking for memory handlers and wake up everything waiting on them. I don't know that this is strictly necessary, but the bigger problem with that is its time complexity, so I think it was always something we wanted to address.
> Nice call. We seem to be accumulating some interesting architectural evolution questions here.The reference implementation is something you built with zero help and a looming deadline, and yet you can build real systems on it. I always assumed there were corners we were going to want to revisit.
If you take the GPTs out of residence you no longer know what's in them, so I think you have to whack the caches that depend on their state. Though if you do take them out, you pretty well know they aren't changing, so maybe that's worth re-thinking. Strictly speaking it's not the residence of the GPT that invalidates the cached information, but the modification of the GPT that does so.Right now if you remove a reference to a GPT from another, we walk the entire affected tree looking for memory handlers and wake up everything waiting on them. I don't know that this is strictly necessary, but the bigger problem with that is its time complexity, so I think it was always something we wanted to address.Hmm. Can you identify where that happens? I'd like to refresh myself on that code and figure out why it is doing that. There are odd correctness conditions all over the place, but that sounds like a bug.
> Nice call. We seem to be accumulating some interesting architectural evolution questions here.The reference implementation is something you built with zero help and a looming deadline, and yet you can build real systems on it. I always assumed there were corners we were going to want to revisit.I actually had quite a lot of help from Jonathan Adams, who is superb.
Looking back now, I think I was conflating two code paths. The one from GPT.setSlot goes via depend_invalidate_slot, which calls depend_entry_invalidate on anything that depends on it. But, that's just clearing entries in the page table. The other is from obhdr_invalidate() -> memhdr_invalidate_cached_state() -> depend_invalidate() on each slot -> depend_entry_invalidate() -> pte_invalidate().So we _only_ wake up anything sending to a memory handler if the GPT handler slot specifically is cleared. cap_handlerBeingOverwritten is called in two different places but neither of their callers visit any more GPTs.