Coyotos update/question: pointers in cyclic graphs vs. safe Rust

15 views
Skip to first unread message

Jonathan S. Shapiro

unread,
Feb 19, 2026, 6:03:30 PM (yesterday) Feb 19
to cap-talk
As I said in my note from an hour ago, I've been digging pretty thoroughly into some of the low-level coyotos data structures and how they get translated into Rust. I've hit an issue that I don't know how to deal with, so I want to lay it out and see what advice people may have.

Let me preface this by saying that I've now been working with the borrow checker long enough to pretty much have my head around it. I definitely see benefits for concurrent applications. I also see the benefits of the linear typing that emerges as an (intended) side effect. That said, I now feel a fair bit of confidence that the Rust borrow checker is categorically incompatible with the way Coyotos guards concurrency, and I don't see a path from where Coyotos is to a performant rewrite in safe Rust.

Before getting into the crucial example, a quick reminder: the Coyotos kernel doesn't ever want to deallocate memory in the sense of free(). Literally all of the references being thrown around are &'static T, so borrowing (or, more precisely, dropping) doesn't actually make any sense for them for memory release purposes. The remaining use is to implement the &mut T vs &T exclusion for concurrency purposes, which, as I'm about to describe, is also a problem for the Coyotos kernel.

To illustrate, let's talk about how Coyotos builds [explicitly guarded] cyclic reference graphs pervasively. The most obvious example of this is the troika of Capability, ObjectHeader, and OTEntry:

    -----> OtEntry <--------
   /          |             \
Cap           |               Cap2, ... CapN
   \          v             /
    ----->ObjectHeader<-----

The purpose of the OTEntry is to serve as a canary when target objects are removed from memory. It allows us to unswizzle or invalidate capabilities that didn't get the memo about the object's removal and GC them back to unswizzled form later. Note that there are no back-pointers from objects to their capabilities. That's a change from EROS and KeyKOS, and it took a lot of re-work (and was comprehensively dictated by cache impacts).

There are both invariants and mutexes that keep this troika concurrency-safe. The first is that the Cap necessarily lives inside an object that we already hold a mutex over. The other has to do with the checks required to extract the ObjectHeader*, which can only be distracted by calling (in Rust terms) a method on the capability that performs the following checks on the capability:
  1. Capability is swizzled and valid (according to OTEntry). Lock the target object for the duration of the system call, fetch the ObjectHeader* from the capability, and proceed.
  2. Capability is swizzled but not valid (stale - would eventually be GC'd).
    1. If the OTEntry says target was destroyed, unswizzle to void capability and re-start the system call.
    2. Otherwise unswizzle the capability to its valid-but-unswizzled form and re-start the system call.
  3. The capability is unswizzled. Look up the target object by (type, OID). Sub-cases:
    1. Not found in memory. Initiate object fetch, queue on completion, and abandon current system call.
    2. Allocation counts do not match. Rewrite capability as void and restart system call.
    3. Allocation counts match. Rewrite capability in swizzled form and proceed.
In Rust terms, the sequence above is (effectively) the implementation of a borrow on the object named by a capability.

Side notes:
  1. Changing a capability from swizzled to unswizzled (or the other way) requires marking its containing object dirty, which can trigger a system call re-start as well. That's OK. Actually, it's exactly what we want in this circumstance. Doesn't matter that the borrow hasn't (yet) succeeded, because we'll abandon the control flow if we can't mark the container dirty.
  2. At the end of this operation, we haven't established that it's OK to write to the target object. That requires a further step that exists mainly to ensure the target object is marked dirty before we write it.
  3. "Lock" means "to the currently executing CPU and system call." It's completely OK to do that more than once on the same object.

The OTEntry references here could hypothetically be replaced by cursors, but the offset computation introduced to turn those cursors into array entry references would actually be noticeable at the margin. Computing that offset is a dependent and depended-on computation. Every cycle we lose in the kernel is multiplied everywhere.

Unfortunately, the ObjectHeader references can't be turned into cursors. ObjectHeaders appear at the "front" of each durable object type (exception: pages, where they are a side table). Each durable object type lives in a type specific array. Sometimes more than one, because system physical memory isn't actually contiguous on real machines, and that can't be cleaned up in the virtual map. This means that translating a cursor into a reference is both type-dependent and complicated. And not really necessary from a safety perspective, since the type field of the capability serves as a type tag telling us what the containing object type is for this particular ObjectHeader structure and we already have the concurrency guard logic that I've described above. There are a (very) few places where we use this tag to justify a type up-cast to the containing type.

Increasing the size of these references is not an option. Adding a word to every capability because of a fat reference could very well turn into terabytes of added storage requirement on the disk.


I can see a case where we have the prepare() call return a target object &T reference (at least in the case where it returns). Taking mutable reference to the target requires some work that the Rust compiler doesn't know about, because we have to mark the target object dirty along the way. I'm actually not sure if it's possible to intercede in the middle of a reference's mutability upgrade in Rust. Haven't looked yet.

Note that we can have multiple capabilities referencing the same object, and in perverse cases we can end up with two or more being passed during the same system call. This is not an error.

Note also that we have two logically distinct kinds of mutations going on to the target object. One is for the purpose of changing the object's metadata (to mark it dirty). That operation requires a "fence" on the object, but we do not want it to invalidate outstanding references for read purposes. The other (which doesn't always arise) is for modifying the object's data. The Rust reference taking mechanism doesn't recognize that these are logically distinct.


At the moment, I'm not seeing a Rust path through this example without resorting to raw pointers. Not because the pointers are invalid, but because the borrow checker's concurrency guards are so thoroughly in opposition to the Coyotos concurrency guards and object graph. The ObjectHeader* pointers appear so pervasively that I think this amounts to abandoning the baked-in Rust concurrency guards for all cross-CPU purposes.


I'm totally willing to keep going for a bit to find out what the impact really is. Even if it turns out to be a negative result, it's an interesting negative result.


What do others see here in terms of "how to convert to Rust" that I am failing to see?


Jonathan

William ML Leslie

unread,
Feb 19, 2026, 9:26:51 PM (22 hours ago) Feb 19
to cap-...@googlegroups.com
On Fri, 20 Feb 2026 at 09:03, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
Unfortunately, the ObjectHeader references can't be turned into cursors. ObjectHeaders appear at the "front" of each durable object type (exception: pages, where they are a side table). Each durable object type lives in a type specific array. Sometimes more than one, because system physical memory isn't actually contiguous on real machines, and that can't be cleaned up in the virtual map.


Is this still true on 64-bit systems?  I get that it can be a pain in the neck to embed the assumption that the object tables are contiguous into the system, especially with hot-swappable RAM and CXL.
 
--
William ML Leslie
A tool for making incorrect guesses and generating large volumes of plausible-looking nonsense.  Who is this very useful tool for?

Kevin Reid

unread,
Feb 19, 2026, 10:19:42 PM (21 hours ago) Feb 19
to cap-...@googlegroups.com
On Thu, Feb 19, 2026 at 3:03 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
Note also that we have two logically distinct kinds of mutations going on to the target object. One is for the purpose of changing the object's metadata (to mark it dirty). That operation requires a "fence" on the object, but we do not want it to invalidate outstanding references for read purposes. The other (which doesn't always arise) is for modifying the object's data. The Rust reference taking mechanism doesn't recognize that these are logically distinct.

Distinct in what sense? Do you mean that modifying the metadata “doesn’t count” in some sense? In that case, consider whether you can store the metadata in atomic types or (if atomic types cannot express what you need) UnsafeCell, which indicates that the contents of the cell may be modified (unsafely) even when an & reference to the cell exists, and leaves it up to you to avoid data races.

At the moment, I'm not seeing a Rust path through this example without resorting to raw pointers. Not because the pointers are invalid, but because the borrow checker's concurrency guards are so thoroughly in opposition to the Coyotos concurrency guards and object graph. The ObjectHeader* pointers appear so pervasively that I think this amounts to abandoning the baked-in Rust concurrency guards for all cross-CPU purposes.

I would say that, in general, "writing a garbage collector in Rust" typically requires at least some raw pointers and unsafe code — outside of “toy virtual machine” cases where the memory you’re garbage collecting is an explicitly accessed array, of course. For example, in the Rust standard library, Arc provides memory management that the borrow checker does not understand, using raw pointers internally, and yet also cooperates with the borrow checker.
 
What do others see here in terms of "how to convert to Rust" that I am failing to see?

It’s hard to say more without more concrete details — actual data structures and function signatures that could be improved.

Jonathan S. Shapiro

unread,
Feb 19, 2026, 10:57:58 PM (20 hours ago) Feb 19
to cap-...@googlegroups.com
On Thu, Feb 19, 2026 at 6:26 PM William ML Leslie <william.l...@gmail.com> wrote:
On Fri, 20 Feb 2026 at 09:03, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
Unfortunately, the ObjectHeader references can't be turned into cursors. ObjectHeaders appear at the "front" of each durable object type (exception: pages, where they are a side table). Each durable object type lives in a type specific array. Sometimes more than one, because system physical memory isn't actually contiguous on real machines, and that can't be cleaned up in the virtual map.


Is this still true on 64-bit systems?  I get that it can be a pain in the neck to embed the assumption that the object tables are contiguous into the system, especially with hot-swappable RAM and CXL.

Sadly yes, it's still true. Regardless, a lot of deeply embedded systems will continue to be 32-bit for quite some time. There's no compelling reason to make a smart light switch 64-bit.


Jonathan

Jonathan S. Shapiro

unread,
Feb 19, 2026, 11:20:22 PM (20 hours ago) Feb 19
to cap-...@googlegroups.com
On Thu, Feb 19, 2026 at 7:19 PM Kevin Reid <kpr...@switchb.org> wrote:
On Thu, Feb 19, 2026 at 3:03 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
Note also that we have two logically distinct kinds of mutations going on to the target object. One is for the purpose of changing the object's metadata (to mark it dirty). That operation requires a "fence" on the object, but we do not want it to invalidate outstanding references for read purposes. The other (which doesn't always arise) is for modifying the object's data. The Rust reference taking mechanism doesn't recognize that these are logically distinct.

Distinct in what sense? Do you mean that modifying the metadata “doesn’t count” in some sense? In that case, consider whether you can store the metadata in atomic types or (if atomic types cannot express what you need) UnsafeCell, which indicates that the contents of the cell may be modified (unsafely) even when an & reference to the cell exists, and leaves it up to you to avoid data races.

That's a very useful thing to point out. Thank you. Obviously I'm wandering around in a part of Rust that's not the "standard problem", so thank you!

Also: thank you for the question, because it shows I said something that conveyed a badly wrong impression. It's not that the metadata update to "dirty" doesn't "count". It's that being dirty does not, in and of itself, mean that anything has yet been mutated.

But it definitely counts! If the target object has not been marked dirty, it is an ERROR to have an outstanding mutable reference to that object. By which I mean:

let ro_ref = &target_object;
let mut mut_ref = &mut ro_ref;

is a static error, because the mutable ref cannot safely be taken until we know the target object has been marked dirty. Not "it's immutable to code outside the same crate", but "taking the &mut reference is just not okay at all until the target object is dirty". This may be a use case for a read-only wrapper - I need to go look at that.

Mind you, we didn't actually have that pointer-level permission guard in Coyotos. It's just that &mut T as implemented in Rust is a form of privilege escalation. The fact that the mut_ref binding above is possible tells us that &T does not mean & readonly T.
 
At the moment, I'm not seeing a Rust path through this example without resorting to raw pointers. Not because the pointers are invalid, but because the borrow checker's concurrency guards are so thoroughly in opposition to the Coyotos concurrency guards and object graph. The ObjectHeader* pointers appear so pervasively that I think this amounts to abandoning the baked-in Rust concurrency guards for all cross-CPU purposes.

I would say that, in general, "writing a garbage collector in Rust" typically requires at least some raw pointers and unsafe code — outside of “toy virtual machine” cases where the memory you’re garbage collecting is an explicitly accessed array, of course. For example, in the Rust standard library, Arc provides memory management that the borrow checker does not understand, using raw pointers internally, and yet also cooperates with the borrow checker.

Because it's all pre-allocated arrays with a free list, the garbage collector won't actually be the problem. The big mess is going to be with the capability troika that I discussed.
 
 
What do others see here in terms of "how to convert to Rust" that I am failing to see?

It’s hard to say more without more concrete details — actual data structures and function signatures that could be improved.

Well, that's certainly fair. Unfortunately every last one of those data structures contains capabilities first.


Jonathan 

William ML Leslie

unread,
2:28 AM (17 hours ago) 2:28 AM
to cap-...@googlegroups.com
On Fri, 20 Feb 2026 at 09:03, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
The OTEntry references here could hypothetically be replaced by cursors, but the offset computation introduced to turn those cursors into array entry references would actually be noticeable at the margin. Computing that offset is a dependent and depended-on computation. Every cycle we lose in the kernel is multiplied everywhere.

I should measure this, but in the spirit of "the processor is hurtling head-long into its next cache miss" I think we have plenty of time waiting for dependent pointer lookups.  In the IPC path, for example, we have the chain from the entry cap to the endpoint, to the process it targets, to the object table entry on that process.  A little address calculation in there might be completely free.

Matt Rice

unread,
11:45 AM (7 hours ago) 11:45 AM
to cap-...@googlegroups.com
On Thu, Feb 19, 2026 at 11:03 PM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> Before getting into the crucial example, a quick reminder: the Coyotos kernel doesn't ever want to deallocate memory in the sense of free().
>

One of the things this reminds me of are the various arenas which
don't deallocate until the arena is deallocated
which seems slightly more similar to the coyotos usage than the usual
scope based per allocation dropping mechanism. But as Kevin said, it's
pretty hard without a pesky level of detail.

Kevin Reid

unread,
11:46 AM (7 hours ago) 11:46 AM
to cap-...@googlegroups.com
On Thu, Feb 19, 2026 at 8:20 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
Also: thank you for the question, because it shows I said something that conveyed a badly wrong impression. It's not that the metadata update to "dirty" doesn't "count". It's that being dirty does not, in and of itself, mean that anything has yet been mutated.

But it definitely counts! If the target object has not been marked dirty, it is an ERROR to have an outstanding mutable reference to that object. By which I mean:

let ro_ref = &target_object;
let mut mut_ref = &mut ro_ref;

is a static error, because the mutable ref cannot safely be taken until we know the target object has been marked dirty. Not "it's immutable to code outside the same crate", but "taking the &mut reference is just not okay at all until the target object is dirty". This may be a use case for a read-only wrapper - I need to go look at that.

Mind you, we didn't actually have that pointer-level permission guard in Coyotos. It's just that &mut T as implemented in Rust is a form of privilege escalation. The fact that the mut_ref binding above is possible tells us that &T does not mean & readonly T.

The Rust code you have quoted does not do what you say it does. It doesn’t compile, but supposing we fix the mut annotations so it does, and add type annotations for illustration, we get:
 
let mut ro_ref: &T = &target_object;
let mut_ref: &mut &T = &mut ro_ref;

That is, mut_ref does not allow mutation of *ro_ref. Rather, it allows mutation of ro_ref, i.e. changing ro_ref to point to something else.

In Rust, creating an &mut T that points to the same T as a &T is prohibited by safe Rust and undefined behavior in unsafe Rust (with one possible exception around UnsafeCell which is not yet settled). The privilege escalation you describe does not exist.
Reply all
Reply to author
Forward
0 new messages