Opinions sought: Capabilty OS Book

41 views
Skip to first unread message

Jonathan Shapiro

unread,
Aug 25, 2025, 9:09:17 PMAug 25
to friam
One of the things Norm and I were contemplating in 1992 was a book that would talk through how KeyKOS/EROS worked, more or less from the ground up. Listening to the tapes as I digitized them has rekindled my interest in doing this, but I'm wondering at this point whether either of those systems should be the basis.

A number of changes were made on the way from KeyKOS to EROS. Some were basically clean-ups and terminology updates. Initial program load was completely re-worked. A few very dramatic speedups were identified through hardcore performance work. A number of validating experiments were done that (at least to my mind) validated some of the concerns I'd had about KeyKOS inter-process communication and some of its scheduling implications. And there was cross-fertilization in the mid 2000s between the Coyotos work and the L4 community. A log-structured checkpoint system was designed, but never implemented.

Many of the insights from those efforts got unleashed in Coyotos, where major changes to the system architecture were introduced:
  • Coyotos fully supports symmetric multiprocessing - one of those things that sounds simple until you actually do it.:-) Doing this drove us to rework most of the internals of the prior systems, and in consequence re-think a bunch of the correctness invariants.
  • The persistence architecture was reworked to allow more than two object types. This greatly simplified the rest of the architecture.
  • Processes became first class, retiring the "three nodes make a process" pun and its associated complexity.
  • Memory mapping objects became first class as well, borrowing ideas from Liedtke's guarded page tables. Holy hanna did the mapping logic and its invariants get simpler!
  • Capability pages were added. I'd toyed with them in EROS but hadn't done anything serious with them.
  • The "single instruction un-pin" trick from EROS was recycled for various other things, notably SMP lock release.
  • IPC receive was revised to use scheduler activations, making it electively non-blocking and enabling first-class user-mode threads.
  • With the exception of the boot console and the primary interrupt handling logic, driver and device support was migrated to non-kernel code.
Jonathan Adams and I built implementations for both i386 (SMP) and Coldfire (soft-managed TLB).

Which leaves me a bit torn. Hearing Norm's voice again makes me want to finish the book we never quite started, capturing the path from GNOSIS to Coyotos is captured. If I don't do that, I suspect nobody will. The problem is: it triples or quadruples the scale of the effort and would probably never be completed.

The other approach would be to do ports of Coyotos to amd64 and ARM.v8 or ARM.v9 so that it runs on modern hardware and document how Coyotos works, with occasional digressions on how this or that mechanism evolved from the originals. I guess my thought is that it's better to have a solid description of something that actually runs.

What's the sense of the community on this?


Thanks!


Jonathan


Kris Kowal

unread,
Aug 25, 2025, 10:06:45 PMAug 25
to friam
I feel a much smaller endeavor would be of great value: a “how to design an object capability OS” that lays out the common architectural components in the abstract, then identifies the concrete design tensions and rules weakly one way or the other, or identifies concrete examples and the consequences. The balance on some of those design tensions undoubtedly shift as the compute to memory to storage economics change, or just the appearance of hypervisors on commodity hardware.

I did a similar exercise after absorbing what I could from MarkM and Tyler’s work on Waterken’s Q, which I think may’ve been timely for the advent of promises in JavaScript. https://kriskowal.com/promise-design/. Before we could arrive at Promises as part of the language, we had to pass through a period where everybody was making their own promise libraries, as an exercise in growing understanding. Having this primer helped herd folks toward a design that looked much farther ahead.

We may be at a similar moment for object capability operating systems, where a critical mass of folks who understand the architecture well enough to go off and recapitulate the mistakes might emerge, and at the end of this epoch, we’ll have a few mainstream open source options.

Kris Kowal

Jonathan S. Shapiro

unread,
Aug 25, 2025, 10:50:10 PMAug 25
to fr...@googlegroups.com
On Mon, Aug 25, 2025 at 7:06 PM Kris Kowal <cowber...@gmail.com> wrote:
I feel a much smaller endeavor would be of great value: a “how to design an object capability OS” that lays out the common architectural components in the abstract, then identifies the concrete design tensions and rules weakly one way or the other, or identifies concrete examples and the consequences. The balance on some of those design tensions undoubtedly shift as the compute to memory to storage economics change, or just the appearance of hypervisors on commodity hardware.

First, thank you. I think you're making a valuable suggestion.

Up to a point, I don't think I've found that increased memory changes the principles. What it does do is reduce the need for contortions to make everything fit, and that can be greatly simplifying.

What really does change things is the memory coherency approach and the CPU clustering approach. Coherency for obvious reasons. "Symmetric" multiprocessors today aren't all that symmetric. First off, there's the cores vs threads distinction, but at some point you have enough cores that the core-to-core communication layer develops hierarchical structure that in turn makes for non-uniform latencies, which has implications for both compute placement and data placement throughout a system. At some point this requires that the notions of processor and processor cluster are exposed to the scheduling subsystem in a first-class way.

Putting that another way, there comes a point where you can no longer abstract something like that away and pretend your solution is still adequate.

We may be at a similar moment for object capability operating systems, where a critical mass of folks who understand the architecture well enough to go off and recapitulate the mistakes might emerge, and at the end of this epoch, we’ll have a few mainstream open source options.

That's an interesting statement. Do you have any particular groups in mind? The CHERI work looks really promising, but I suspect people will be enhancing L4 variants for some time to come. And it's interesting to me that some of the architectural principals selected for L4 are so different from the ones in KeyKOS, EROS, and Coyotos.


Jonathan

Bakul Shah

unread,
Aug 25, 2025, 11:33:42 PMAug 25
to fr...@googlegroups.com
To me a historical account would be far more interesting & valuable. What changes were made and why, options considered, what was planned (but not implemented), what you think of those decisions in hindsight, etc. 

Bakul Shah

--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/friam/9a0d8f23-61db-46ac-9d19-a1e382b0b816n%40googlegroups.com.

Matt Rice

unread,
Aug 26, 2025, 12:56:15 AMAug 26
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 2:50 AM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> On Mon, Aug 25, 2025 at 7:06 PM Kris Kowal <cowber...@gmail.com> wrote:
>>
>> I feel a much smaller endeavor would be of great value: a “how to design an object capability OS” that lays out the common architectural components in the abstract, then identifies the concrete design tensions and rules weakly one way or the other, or identifies concrete examples and the consequences. The balance on some of those design tensions undoubtedly shift as the compute to memory to storage economics change, or just the appearance of hypervisors on commodity hardware.
>
>
> First, thank you. I think you're making a valuable suggestion.
>
> Up to a point, I don't think I've found that increased memory changes the principles. What it does do is reduce the need for contortions to make everything fit, and that can be greatly simplifying.
>
> What really does change things is the memory coherency approach and the CPU clustering approach. Coherency for obvious reasons. "Symmetric" multiprocessors today aren't all that symmetric. First off, there's the cores vs threads distinction, but at some point you have enough cores that the core-to-core communication layer develops hierarchical structure that in turn makes for non-uniform latencies, which has implications for both compute placement and data placement throughout a system. At some point this requires that the notions of processor and processor cluster are exposed to the scheduling subsystem in a first-class way.
>
> Putting that another way, there comes a point where you can no longer abstract something like that away and pretend your solution is still adequate.

*shrug* a lot of seL4 proofs are built on an abstract model written in
haskell, which then has a relation between the model, and the kernel
implemented on hardware that people actually run.
I feel like they have shown there is a certain amount of room for this
abstract model approach to work in practice. But then again i'm fond
of "literate formal verification", so of course I'd love to see
an abstract model that was useful not only for a book form, but for
proof purposes on a substrate more "ideal" than the hardware of today.
At the very least I feel like there are a *lot* of details and (both
things that need to be explained, and properties that can be proven)
for which an implementation in metal isn't totally necessary and
for those things may merely serve to complicate things. Like I don't
even know where to start proving process confinement on a hardware
model, where I can think of a few approaches towards
proving it on an abstract model. It may be that like the seL4 abstract
model, and kernel implementation on actual hardware that in the book
case the two approaches may also be complementary... don't know

>> We may be at a similar moment for object capability operating systems, where a critical mass of folks who understand the architecture well enough to go off and recapitulate the mistakes might emerge, and at the end of this epoch, we’ll have a few mainstream open source options.
>
>
> That's an interesting statement. Do you have any particular groups in mind? The CHERI work looks really promising, but I suspect people will be enhancing L4 variants for some time to come. And it's interesting to me that some of the architectural principals selected for L4 are so different from the ones in KeyKOS, EROS, and Coyotos.
>
>
> Jonathan
>
> --
> You received this message because you are subscribed to the Google Groups "friam" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/friam/CAAP%3D3QOSioO8Ljv%3DP2obEn%3D4vAYC-M67geeuMo6kJEN8Lnt51A%40mail.gmail.com.

Jonathan S. Shapiro

unread,
Aug 26, 2025, 2:04:03 AMAug 26
to fr...@googlegroups.com
On Mon, Aug 25, 2025 at 8:33 PM 'Bakul Shah' via friam <fr...@googlegroups.com> wrote:
To me a historical account would be far more interesting & valuable. What changes were made and why, options considered, what was planned (but not implemented), what you think of those decisions in hindsight, etc. 

Bakul Shah

Thank you, Bakul.

Here is an example that may help to frame the challenge.

One of the primordial decisions in GNOSIS and KeyKOS was that the universe consisted of nodes (which held capabilities) and pages (which hold data). Every system construct in KeyKOS is either primitive (you don't need a node or a page to halt the system), or is constructed as an arrangement of nodes and pages. Ultimately, the motivation for this was that the disk layout was partitioned into node spaces and page spaces at installation time and unchanged thereafter. In the 1970s, 100MB was a very big disk drive, so every megabyte wasted was significant and every new object type carried a risk that the balance of object types in the partition would be wrong. Because of this, processes and memory maps in GNOSIS/KeyKOS/EROS were a "pun" on nodes. A significant amount of code and complexity existed to preserve and guard invariants that support that pun.

At least in the US, I can buy a 4TB SSD today for a very small fraction of the cost of that 1970s hard drive. Completely wasting 100MB of disk space to support a simpler set of on-disk and in-memory data structures is a no-brainer.

If the goal is for people to understand the result, maybe it's better to avoid getting tangled up in the complexity of constructions that are no longer relevant 30 years later. I'm not saying that as a "this is it" position, but as food for thought.

And holy cr*p, the notion that we might even talk about continued 30 year relevance in the field of computing is kind of mind blowing!

I also think this depends on what the existing papers describe. I wrote The KeyKOS Nanokernel Architecture paper 33 years ago. I should probably have a look at it to refresh my memory about what it says. :-)


Jonathan 

Jonathan S. Shapiro

unread,
Aug 26, 2025, 2:24:54 AMAug 26
to fr...@googlegroups.com
On Mon, Aug 25, 2025 at 9:56 PM Matt Rice <rat...@gmail.com> wrote:
... a lot of seL4 proofs are built on an abstract model written in

haskell, which then has a relation between the model, and the kernel
implemented on hardware that people actually run.

Yes. The most important question is whether the model faithfully represents the behavior of the system. In my subjective and biased opinion, the correspondence of Sam Weber's model is pretty direct. The correspondence of Scott Doerrie's model in his mechanical verification would be more obvious if I could browbeat him into publishing his work in something more contained than a thesis. But I have to say that the correspondence that Gerwin Klein established for OKL4 is very damned good. Once you have that, there's a problem of stating the goal correctly. That was perhaps simpler for EROS than it was for OKL4.
 
I feel like they have shown there is a certain amount of room for this
abstract model approach to work in practice.  But then again i'm fond
of "literate formal verification", so of course I'd love to see
an abstract model that was useful not only for a book form, but for
proof purposes on a substrate more "ideal" than the hardware of today.

I think it's fair to say that abstraction in the model is one of the major risks in verification. If you don't abstract, verification becomes intractable. If you do abstract, the scope of the verification impact is broader but you then need to show that your concrete system falls within the abstracted model. One of the beauties of Sam Weber's model is that it's so abstract that it carries almost all imaginable cases that work.
 
At the very least I feel like there are a *lot* of details and (both
things that need to be explained, and properties that can be proven)
for which an implementation in metal isn't totally necessary...

I agree. But I'm not (for now) thinking about a text covering verification. I'm thinking about a text that covers (a) how an ocap operating system is structured as an abstract machine related to a concrete [hardware] machine, (b) what the kernel invariants are to ensure correctness and how to maintain them, and (c) how to think about building a system on top of this.

But I'm thrilled to see you arguing about documenting the verification. When I started this, the prevailing wisdom was that the verification was not, in principle, possible, because "capability systems cannot enforce the confinement property" (and by implication other forms of isolation).
 
for those things may merely serve to complicate things. Like I don't
even know where to start proving process confinement on a hardware
model....

Agreed, though I could set forth some fairly clear conditions that simplify that. But it's not strictly necessary. The other approach is to show that there is a bi-directional correspondence between the hardware process and the modelled process. Which is what the most of the kernel invariants are about.

That said, I don't want to get overly hung up in verification in this text. I'm more interested in putting together something that would help a newcomer think about building such a system in practice. One step at a time.
 

Jonathan

Matt Rice

unread,
Aug 26, 2025, 3:13:10 AMAug 26
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 6:24 AM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> On Mon, Aug 25, 2025 at 9:56 PM Matt Rice <rat...@gmail.com> wrote:
>>
>> ... a lot of seL4 proofs are built on an abstract model written in
>> haskell, which then has a relation between the model, and the kernel
>> implemented on hardware that people actually run.
>
>
> Yes. The most important question is whether the model faithfully represents the behavior of the system. In my subjective and biased opinion, the correspondence of Sam Weber's model is pretty direct. The correspondence of Scott Doerrie's model in his mechanical verification would be more obvious if I could browbeat him into publishing his work in something more contained than a thesis. But I have to say that the correspondence that Gerwin Klein established for OKL4 is very damned good. Once you have that, there's a problem of stating the goal correctly. That was perhaps simpler for EROS than it was for OKL4.
>
>>
>> I feel like they have shown there is a certain amount of room for this
>> abstract model approach to work in practice. But then again i'm fond
>> of "literate formal verification", so of course I'd love to see
>> an abstract model that was useful not only for a book form, but for
>> proof purposes on a substrate more "ideal" than the hardware of today.
>
>
> I think it's fair to say that abstraction in the model is one of the major risks in verification. If you don't abstract, verification becomes intractable. If you do abstract, the scope of the verification impact is broader but you then need to show that your concrete system falls within the abstracted model. One of the beauties of Sam Weber's model is that it's so abstract that it carries almost all imaginable cases that work.
>
>>
>> At the very least I feel like there are a *lot* of details and (both
>> things that need to be explained, and properties that can be proven)
>> for which an implementation in metal isn't totally necessary...
>
>
> I agree. But I'm not (for now) thinking about a text covering verification. I'm thinking about a text that covers (a) how an ocap operating system is structured as an abstract machine related to a concrete [hardware] machine, (b) what the kernel invariants are to ensure correctness and how to maintain them, and (c) how to think about building a system on top of this.
>
> But I'm thrilled to see you arguing about documenting the verification. When I started this, the prevailing wisdom was that the verification was not, in principle, possible, because "capability systems cannot enforce the confinement property" (and by implication other forms of isolation).
>

FWIW, I think this is fair, understandable and totally reasonable, a
formal verification project alone is difficult, a book alone is
difficult.
I'm certain a combined effort only adds overhead to both. I just
wanted to convey that this model, verification of properties, high
level overview of the verification in book form
from a "roughly 75% of excruciating details removed" goal in a
literate programming style is the basic blueprint i took in my own
attempt at a capability book.
I don't want to misrepresent the amount of progress I actually
achieved, but part of this was due to also writing all the proof
language, and trying to make the whole toolchain self hosting,
and improve the verification to cover more cross-userspace properties,
and covering various trust relationships for networks of kernels.
That is to say the model I used the kernel->kernel boundary looks alot
like the cross userspace process boundaries and contains a model of
ambient authority upon which to show how properties are not provable.
It is no doubt I bit off more than I could chew though.

>> for those things may merely serve to complicate things. Like I don't
>> even know where to start proving process confinement on a hardware
>> model....
>
>
> Agreed, though I could set forth some fairly clear conditions that simplify that. But it's not strictly necessary. The other approach is to show that there is a bi-directional correspondence between the hardware process and the modelled process. Which is what the most of the kernel invariants are about.
>
> That said, I don't want to get overly hung up in verification in this text. I'm more interested in putting together something that would help a newcomer think about building such a system in practice. One step at a time.
>
>
> Jonathan
>
> --
> You received this message because you are subscribed to the Google Groups "friam" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/friam/CAAP%3D3QMvCAsXeUWU4QyqqLSOtFbACeLKT0h0xf7EtW16XbPUEw%40mail.gmail.com.

Jonathan S. Shapiro

unread,
Aug 26, 2025, 3:33:42 AMAug 26
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 12:13 AM Matt Rice <rat...@gmail.com> wrote:
>> But then again i'm fond of "literate formal verification"...

I'm totally making up in my head what that means. I'll note only that as bad as it is to state requirements in natural language, and as necessary as that is for human consumption, at least the mathematical formulations are rigorous. Almost universally wrong for the first 20 or so drafts, but at least it's possible [for a formal methodist] to understand what they said. I mean no disrespect to formal atheists, formal protestants, formal catholics, and other religious followings who couldn't read biblical hebrew. :-) In this small respect, even rabbinical dropouts probably enjoy a small advantage.

> But I'm thrilled to see you arguing about documenting the verification. When I started this, the prevailing wisdom was that the verification was not, in principle, possible, because "capability systems cannot enforce the confinement property" (and by implication other forms of isolation).

FWIW, I think this is fair, understandable and totally reasonable, a
formal verification project alone is difficult, a book alone is
difficult.

Yes. But given a choice between explaining this stuff to thousands of people or tens of people...

I don't want to misrepresent the amount of progress I actually
achieved, but part of this was due to also writing all the proof
language...

I'd really love to see what you managed. In my turn, I don't want to misrepresent my ability to comprehend it. But I'd like to give it a try.

>> Like I don't
>> even know where to start proving process confinement on a hardware
>> model....

Sure. That's why it's done on an abstract model and a separate correspondence argument is crafted.
 

Jonathan

Matt Rice

unread,
Aug 26, 2025, 4:19:25 AMAug 26
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 7:33 AM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> On Tue, Aug 26, 2025 at 12:13 AM Matt Rice <rat...@gmail.com> wrote:
>>
>> >> But then again i'm fond of "literate formal verification"...
>
>
> I'm totally making up in my head what that means.

Literate formal verification is essentially just Knuths literate
programming applied to formal verification.
Basically including all the documentation generation tools of modern
programming languages applied to
proof languages.

I guess it can be as simple as extracting a high level
textual/hand-written proof from the sources of the machine proof.
Commonly how I work in reverse... throw a bunch of latex for a hand
written proof in a comment,
then in the source text start translating it into a machine proof.

Additionally I worked on editor components so you could rendering the
doc comments in the editor, so you would have to tab
in order to edit the doc comment. In many ways modern proof engines
are also like how programmers think of debuggers,
allowing you to step through proofs. I also attempted to bring in the
rendering of summary text to this.

Some of my initial attempts inspired the lean documentation tools...
They ended up wanting to go with a fully web based workflow using
katex or something like it,
as opposed to the full latex engine I was using.

https://leanprover-community.github.io/mathlib4_docs/Mathlib/CategoryTheory/Functor/Basic.html#CategoryTheory.Functor
https://github.com/leanprover-community/mathlib4/blob/bd6f86a791af552c50e0a22e99ddb6e7a8bde223/Mathlib/CategoryTheory/Functor/Basic.lean#L30-L46

Anyhow basically to me literate verification is essentially combining
a hand-written proof intended for human consumption with a machine
readable proof.
Ideally these things would be complementary in their development and
tooling to help ensure that the ground truth is recognizable from the
high level description.

I'd be glad to discuss the whole thing sometime, but I'm not sure it's
of much use except perhaps as a cautionary tale?

William ML Leslie

unread,
Aug 26, 2025, 4:43:52 AMAug 26
to fr...@googlegroups.com
On Tue, 26 Aug 2025 at 18:19, Matt Rice <rat...@gmail.com> wrote:
On Tue, Aug 26, 2025 at 7:33 AM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> On Tue, Aug 26, 2025 at 12:13 AM Matt Rice <rat...@gmail.com> wrote:
>>
>> >> But then again i'm fond of "literate formal verification"...
>
>
> I'm totally making up in my head what that means.

Literate formal verification is essentially just Knuths literate
programming applied to formal verification.
Basically including all the documentation generation tools of modern
programming languages applied to
proof languages.

It's a shame the proofs aren't literate since the model is entirely Literate Haskell.

Am I alone in finding Isabelle hard to read?  I wonder if it'd make sense in Agda.

--
William ML Leslie

Pierre Thierry

unread,
Aug 26, 2025, 9:20:49 AMAug 26
to fr...@googlegroups.com
Le 26/08/2025 à 04:49, Jonathan S. Shapiro a écrit :
I suspect people will be enhancing L4 variants for some time to come. And it's interesting to me that some of the architectural principals selected for L4 are so different from the ones in KeyKOS, EROS, and Coyotos.
Now that immediately peaked my interest.

Curiously,
Pierre Thierry
--
pie...@nothos.net
0xD9D50D8A
OpenPGP_0xC5ED7720D9D50D8A.asc
OpenPGP_signature.asc

Mark S. Miller

unread,
Aug 26, 2025, 4:17:34 PM (14 days ago) Aug 26
to fr...@googlegroups.com
Given that, today, seL4 is the natural attractor, I have wondered to what degree we could consolidate our interests in KeyKOS-lineage OSes (KeyKOS, CapROS, EROS, Coyotos) into efforts to improve seL4 as needed so we're not losing any of the *significant* differential value of the KeyKOS line. So let's start with a simpler question: What if we just dropped the KeyKOS line entirely and contributed all of that effort and interest in the seL4 system? What would we be missing? Of that, what is hard to live without?



--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.


--
  Cheers,
  --MarkM

Mark S. Miller

unread,
Aug 26, 2025, 4:26:25 PM (14 days ago) Aug 26
to fr...@googlegroups.com
Perfect security tech only makes the world safer if it gets adopted.

IMO, Both the KeyKOS line and seL4 adoption suffer from the difficulty of programming only with explicitly allocated memory. Most language-based and protocol-based ocap systems adopt the conventional memory-safe language approach of implicit allocation. 
- As a local language-based system, HardenedJS/Endo implicitly allocate and gc memory. As a protocol, OCapN (the latest CapTP) assumes implicit allocation and provides for distributed acyclic gc. SwingSet is our KeyKOS inspired OS kernel for running multiple vats on a blockchain communicating via OCapN semantics, both to other vats on the same chain/platform and to vats running on other mutually suspicious platforms. We do "perfectly" protect integrity. But because of implicit allocation, can at most deter rather than prevent attacks on availability. Much greater memory headroom makes such deterrence better than us old timers may have expected. But still, it is necessarily much worse than actual protection.
- Rust is in the middle of this spectrum for a language. Cap'n Proto (a statically typed CapTP) is in the middle for a protocol (and for most of its respective language bindings). These middles are much harder to program to, but still don't provide the defense of availability possible in the KeyKOS or seL4 lines.
- I am not aware of any safe language that treats memory resources at the language level with the same explicitness as the KeyKOS or seL4 lines. This is largely because the usability problems are so obvious in any language design that language designers give up before they get very far. But this is symptomatic of the usability (and teaching) problems that arise even at the OS level.

Btw, personally, I love what I remember of the KeyKOS design decisions driven by explicit memory. But I rapidly became disillusioned to trying to teach this style of programming. It infects everything.



--
  Cheers,
  --MarkM

๏̯͡๏ Jasvir Nagra

unread,
Aug 26, 2025, 4:57:17 PM (14 days ago) Aug 26
to fr...@googlegroups.com

Jasvir Nagra


On Tue, Aug 26, 2025 at 1:26 PM Mark S. Miller <eri...@gmail.com> wrote:
Perfect security tech only makes the world safer if it gets adopted.

IMO, Both the KeyKOS line and seL4 adoption suffer from the difficulty of programming only with explicitly allocated memory. Most language-based and protocol-based ocap systems adopt the conventional memory-safe language approach of implicit allocation. 
- As a local language-based system, HardenedJS/Endo implicitly allocate and gc memory. As a protocol, OCapN (the latest CapTP) assumes implicit allocation and provides for distributed acyclic gc. SwingSet is our KeyKOS inspired OS kernel for running multiple vats on a blockchain communicating via OCapN semantics, both to other vats on the same chain/platform and to vats running on other mutually suspicious platforms. We do "perfectly" protect integrity. But because of implicit allocation, can at most deter rather than prevent attacks on availability. Much greater memory headroom makes such deterrence better than us old timers may have expected. But still, it is necessarily much worse than actual protection.
- Rust is in the middle of this spectrum for a language. Cap'n Proto (a statically typed CapTP) is in the middle for a protocol (and for most of its respective language bindings). These middles are much harder to program to, but still don't provide the defense of availability possible in the KeyKOS or seL4 lines.
- I am not aware of any safe language that treats memory resources at the language level with the same explicitness as the KeyKOS or seL4 lines. This is largely because the usability problems are so obvious in any language design that language designers give up before they get very far. But this is symptomatic of the usability (and teaching) problems that arise even at the OS level.

Btw, personally, I love what I remember of the KeyKOS design decisions driven by explicit memory. But I rapidly became disillusioned to trying to teach this style of programming. It infects everything.

I realize this might be forking the thread but "it infects everything" really resonates with me but in two different ways that yet come out poorly for good security design. When I have argued for capability like designs in the past where a system is just being created, the argument is "that's too complicated - I don't need delegation - this is just a simple system to do X. I prefer simplicity (this is the most infuriating thing people say when they argue this!)". This apparently, at least in the minds of people I worked with designing such systems, was ACLs. It did what they wanted good enough. But those systems would then evolve and they would need delegation and attenuation but it was too late and the design choices were entrenched - it infects everything that comes after it.

In other cases I have seen, designers recognize and anticipate how much a capability design might "infect everything" and the argument I hear then is "yes yes it might be better but once we go down this path, we'll have to change everything else to fit it. We can't do that - there are third parties and legacy and all those other things which you, naive little engineer, have no understanding of. So no we won't choose that design because it will infect everything."

I realize I might be embellishing the extent of it but I have seen these two responses often enough that "it infects everything" is a trigger for me.  

Matt Rice

unread,
Aug 26, 2025, 6:44:03 PM (14 days ago) Aug 26
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 8:26 PM Mark S. Miller <eri...@gmail.com> wrote:
>
> Perfect security tech only makes the world safer if it gets adopted.
>
> IMO, Both the KeyKOS line and seL4 adoption suffer from the difficulty of programming only with explicitly allocated memory.

I'd also argue that both KeyKOS line and seL4 adoption suffer from the
lack of self hosting,
developers develop tools for themselves to ease their own pain and
suffering, I'd argue that there are more developer tools than
developers
probably some multiple. If all these tools are being written on, and
developers themselves are using another OS and cross-building to
the target system, or building the target system under emulation.
None of those tools are generally useful on the target system.
Nor are the developers experiencing it in their everyday practice.
Certainly the difficulty of explicit allocation doesn't help, but
I feel there is also a certain amount of not experiencing the daily
pain such that one builds tools to allow them to cope.
At the very least I've always felt that this was just as big of a
factor in its lack of adoption.

Jonathan S. Shapiro

unread,
Aug 27, 2025, 3:17:09 PM (13 days ago) Aug 27
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 1:17 PM Mark S. Miller <eri...@gmail.com> wrote:
Given that, today, seL4 is the natural attractor, I have wondered to what degree we could consolidate our interests in KeyKOS-lineage OSes (KeyKOS, CapROS, EROS, Coyotos) into efforts to improve seL4 as needed so we're not losing any of the *significant* differential value of the KeyKOS line. So let's start with a simpler question: What if we just dropped the KeyKOS line entirely and contributed all of that effort and interest in the seL4 system? What would we be missing? Of that, what is hard to live without?

Not clear if you'd want to build on seL4 or OKL4 - there's a lot to be said for building on a system with billions of deployments.

It's a fair question, and it's related to my comment about how different the two are. There is something I've been feeling cautious about saying because I don't want to be misheard:

It appears to me that *L4 has succeeded as a microviser that provides isolation, not as an operating system platform.

If you look at the applications over those billions of deployments essentially none of them are general purpose application platform deployments. They are either enhanced hypervisors or the only application of interest is some variant of UNIX, and UNIX is what the user sees as the operating system. An isolation platform is quite a lot more helpful than it sounds, but if the goal is to show how ocap operating systems can actually solve the more general problem, these platforms haven't done it. In fairness, that's because it's a multi-billion dollar problem to create an app ecosystem. Meanwhile, both Microsoft and Apple have gotten a lot better at security than they once were.

The alternative we really need to be paying attention to is the CHERI-derived work out of Cambridge, which is a small mod to conventional RISC processors that back-provides capability protections. It's mostly done at the L1 cache bus. Prototypes have already been done for multiple hardware architectures, and ARM was looking at making it a standard part of the architecture going forward.

The question we've actually been up against all along is needing to demonstrate how an object-based OS gets used to build applications, and how defense in depth is actually designed and implemented. The L4 team and the CHERI team haven't really tackled the protected modularity problem.


Jonathan

Jonathan S. Shapiro

unread,
Aug 27, 2025, 3:51:12 PM (13 days ago) Aug 27
to fr...@googlegroups.com
On Tue, Aug 26, 2025 at 1:26 PM Mark S. Miller <eri...@gmail.com> wrote:
Perfect security tech only makes the world safer if it gets adopted.

IMO, Both the KeyKOS line and seL4 adoption suffer from the difficulty of programming only with explicitly allocated memory. Most language-based and protocol-based ocap systems adopt the conventional memory-safe language approach of implicit allocation.

I'm a very strong fan of GC, but I deeply disagree. And I think the strongly rising popularity of Rust (for all of its complexity) can't be ignored as an alternative story. If this is what you actually believe, why didn't somebody implement a GC-based language to try out the alternative? Or am I misunderstanding the statement entirely?

Regarding GC, there are two problems. The first is in two parts: (1) real-time collection hasn't taken hold, (2) therefore, latency problems at unpredictable moments continue to plague managed memory runtimes. The second is that managed memory runtimes require 5x to 10x the amount of application memory as manually managed solutions. This translates directly to a corresponding hardware cost if you want stuff to run well. Then we can talk about the impacts on system-wide memory pressure and OS management algorithms for that.

And when you hit these issues, they are very hard to debug.

In short: GC doesn't scale in the economic sense. I'm actually very sad about that, but the numbers are what they are. I spent quite a long time looking for ways to address this during the BitC effort.

As a protocol, OCapN (the latest CapTP) assumes implicit allocation and provides for distributed acyclic gc.

I'm very sad to hear that, because it means OCapN is a dead end for the overwhelming majority of systems in the world. IoT systems don't have the resources to support this. They exist in an environment where fractions of a penny in device cost matter and every erg of power consumed counts. Which is a polite way of saying they aren't going to have memory to spare any time soon, nor CPU to spare for collection (though the marginal cost of that could probably be engineered out given some work).
 
- I am not aware of any safe language that treats memory resources at the language level with the same explicitness as the KeyKOS or seL4 lines. This is largely because the usability problems are so obvious in any language design that language designers give up before they get very far...

This statement doesn't sound like my experience even slightly. which makes me wonder what assumptions are being made about who is managing the application address space. Mainly because it was never a design goal (or a requirement) for applications to manage their own memory in a fine-grained way. That's what address space keepers are for. So I'm puzzled.

There were four places in the EROS/Coyotos line where applications dealt with address space construction explicitly:
  1. The space bank (for obvious reductio reasons).
  2. Some obscure bootstrapping code.
  3. address space keepers. Though there was talk in the KeyKOS discussions about the idea of building custom keepers, we found in practice that a small portfolio of these keepers was enough for almost every application we could think of.
  4. Shared memory across suspicious collaborators. Which actually isn't a problem with fine grain memory management. It's a problem with revocation. This issue came up in the high-speed networking work, where two mutually distrusting parties are obliged to share a memory region and one can destroy that region out from under the other. The problem here wasn't constructing the memory region. It was the the fact that there were sections of code that had to survive having memory regions disappear out from under them - something that no high-level language can really handle and is difficult to fold into an exception-like recovery model.
The first two, and (for the most part) the third seem outside of the concerns you are raising. The fourth is a different issue entirely. I'm clearly missing something and I'd love to understand what that might be.

Btw, personally, I love what I remember of the KeyKOS design decisions driven by explicit memory. But I rapidly became disillusioned to trying to teach this style of programming. It infects everything.

That really wasn't our experience, and I'm not aware of any argument for why an application should take on this complexity without external support.


Jonathan

Jonathan S. Shapiro

unread,
Aug 27, 2025, 4:09:01 PM (13 days ago) Aug 27
to fr...@googlegroups.com
Before I wander off from the thread for the day, I have realized that I owe Charlie an apology. I had not intended to, but the argument I made about focusing on Coyotos for the book kind of implied that KeyKOS and EROS and CapROS were no longer relevant. That would be a crazy thing to say, if only because of the number of things we learned from them.

I think my reluctance to try to document backwards is coming from two places: the number of large steps forward in Coyotos and the number of pedagogical problems that arise from "everything is a node or a page" in the older systems. That pun was motivated when disks were smaller, but it is neither motivated nor necessary today. The amount of kernel complexity and the number of kernel invariants and the amount of kernel code it introduces is surprisingly large. Explaining all of it isn't quite a book in itself, but it's not trivial either.

So I apologize if I offended because I wrote without thinking. Nonetheless, I think clarity is more important than delving into details of systems that will probably never be seen again (KeyKOS, EROS). That may turn out to be true for Coyotos as well, of course, but given a clear exposition who knows what may become possible.


Jonathan

Matt Rice

unread,
Aug 27, 2025, 5:52:03 PM (13 days ago) Aug 27
to fr...@googlegroups.com
On Wed, Aug 27, 2025 at 7:51 PM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> On Tue, Aug 26, 2025 at 1:26 PM Mark S. Miller <eri...@gmail.com> wrote:
>>
>> - I am not aware of any safe language that treats memory resources at the language level with the same explicitness as the KeyKOS or seL4 lines. This is largely because the usability problems are so obvious in any language design that language designers give up before they get very far...
>
>
> This statement doesn't sound like my experience even slightly.

Not that I feel like anyone here, or anywhere else is actually
clamoring for such a language, but I did at least experiment with such
a memory model,
using power of 2 page allocations, and arbitrary bounded pointer
arithmetic within the page. It didn't give "exactly" the same safety
requirements, whether or not you consider use-after-free when there is
no free besides page zapping.

I figured no one cares, and it's not like bounded pointer arithmetic
was free either.

Chip Morningstar

unread,
Aug 27, 2025, 6:27:27 PM (13 days ago) Aug 27
to fr...@googlegroups.com


> On Aug 27, 2025, at 12:50 PM, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
>
> Regarding GC, there are two problems. The first is in two parts: (1) real-time collection hasn't taken hold, (2) therefore, latency problems at unpredictable moments continue to plague managed memory runtimes. The second is that managed memory runtimes require 5x to 10x the amount of application memory as manually managed solutions. This translates directly to a corresponding hardware cost if you want stuff to run well. Then we can talk about the impacts on system-wide memory pressure and OS management algorithms for that.
>
> And when you hit these issues, they are very hard to debug.
>
> In short: GC doesn't scale in the economic sense. I'm actually very sad about that, but the numbers are what they are. I spent quite a long time looking for ways to address this during the BitC effort.

How then do you account for the ginormous popularity of JavaScript running in NodeJS as the server platform of choice for new web applications?
I think Node has about a 5% market share for server stuff overall, but the 95% includes all the legacy stuff that runs the world. For new projects I wouldn’t say nobody uses anything else (Go seems to have its enthusiasts for example), but in the circles I travel in that comes close.

This feels to me a lot like the debates 40 years ago about whether we could afford the performance and memory overhead of using C for performance sensitive code. Despite all the hand wringing about how impractical it was, the huge improved productivity benefit pushed developers over the edge and then in fairly short order Moore’s Law made up for the cost hit.

Chip


Bakul Shah

unread,
Aug 27, 2025, 6:47:15 PM (13 days ago) Aug 27
to fr...@googlegroups.com
On Aug 25, 2025, at 11:03 PM, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:

On Mon, Aug 25, 2025 at 8:33 PM 'Bakul Shah' via friam <fr...@googlegroups.com> wrote:
To me a historical account would be far more interesting & valuable. What changes were made and why, options considered, what was planned (but not implemented), what you think of those decisions in hindsight, etc. 

Bakul Shah

Thank you, Bakul.

Here is an example that may help to frame the challenge.

One of the primordial decisions in GNOSIS and KeyKOS was that the universe consisted of nodes (which held capabilities) and pages (which hold data). Every system construct in KeyKOS is either primitive (you don't need a node or a page to halt the system), or is constructed as an arrangement of nodes and pages. Ultimately, the motivation for this was that the disk layout was partitioned into node spaces and page spaces at installation time and unchanged thereafter. In the 1970s, 100MB was a very big disk drive, so every megabyte wasted was significant and every new object type carried a risk that the balance of object types in the partition would be wrong. Because of this, processes and memory maps in GNOSIS/KeyKOS/EROS were a "pun" on nodes. A significant amount of code and complexity existed to preserve and guard invariants that support that pun.

At least in the US, I can buy a 4TB SSD today for a very small fraction of the cost of that 1970s hard drive. Completely wasting 100MB of disk space to support a simpler set of on-disk and in-memory data structures is a no-brainer.

So much has changed (large fast storage, multicore, slow/fast cores, NUMA, 64 bit word size, large addr space, potentially thousands of connections) that your coyotos port might end up being a major rewrite and bring in its own complexities! For instance, SSDs have write limits (4TB SSD might have 600 TBW) which might influence checkpointing!

If the goal is for people to understand the result, maybe it's better to avoid getting tangled up in the complexity of constructions that are no longer relevant 30 years later. I'm not saying that as a "this is it" position, but as food for thought.

Such complexity might be considered machine dependent. One of the things a port can help with is discovering what is general purpose vs machine dependent. That is, how much does a new species differ from an older one. For that one has to study them both and discover out common factors.

And holy cr*p, the notion that we might even talk about continued 30 year relevance in the field of computing is kind of mind blowing!

We should be building structures that are resilient enough to last and be useful for far longer. [This is why I don't have much faith in proofs -- what happens when requirements change?] Can we build malleable (operating) systems with capabilities?

I also think this depends on what the existing papers describe. I wrote The KeyKOS Nanokernel Architecture paper 33 years ago. I should probably have a look at it to refresh my memory about what it says. :-)

Thanks for your response!

Bakul



Jonathan 

John Kemp

unread,
Aug 27, 2025, 6:50:14 PM (13 days ago) Aug 27
to fr...@googlegroups.com
El 08/27/25 a las 15:50, Jonathan S. Shapiro escribió:
>
> In short: GC doesn't scale in the economic sense. I'm actually very sad
> about that, but the numbers are what they are. I spent quite a long time
> looking for ways to address this during the BitC effort.

I guess I have competing (within myself) points of view on this:

1. Garbage collection (* in erlang) scales nicely (in my experience) and
doesn't have the occasional latency across the VM, because memory
management can be done on the process level, and there is no shared
memory across the VM, only message-passing.

2. In _traditional_ (not cryptocurrency) financial applications with
which I am familiar, even the web servers are hand-written in C/C++
(sometimes, yes, with inline assembly). They have not transitioned much
towards any other language environment. Rust is a possible game-changer
there since it can eliminate whole classes of memory-related bugs with
the borrow-checker memory management model and maintain C-like
performance. So if any language may eventually replace C/C++ for
latency-sensitive applications in those environments, it would be Rust,
in my opinion.

- johnk

--
Independent Security Architect
t: +1.413.645.4169
e: stable.p...@gmail.com

https://www.linkedin.com/in/johnk-am9obmsk/
https://github.com/frumioj

Matt Rice

unread,
Aug 27, 2025, 6:52:05 PM (13 days ago) Aug 27
to fr...@googlegroups.com
I'd also be curious how gc impacts power usage as a metric.
I haven't seen any numbers (only those comparing the power usage
differences between different gc algorithms/implementations), but it
seems like another aspect that may impact the choice of gc for some
specific purposes besides the usual suspects.

Jonathan S. Shapiro

unread,
Aug 28, 2025, 11:55:02 AM (12 days ago) Aug 28
to fr...@googlegroups.com
On Wed, Aug 27, 2025 at 3:27 PM Chip Morningstar <ch...@fudco.com> wrote:
 
> In short: GC doesn't scale in the economic sense. I'm actually very sad about that, but the numbers are what they are. I spent quite a long time looking for ways to address this during the BitC effort.

How then do you account for the ginormous popularity of JavaScript running in NodeJS as the server platform of choice for new web applications?

There is a long conversation here, though a modicum of mercy suggests we should hold it under a different subject line. For here, I'll say two things and leave it:
  1. Popularity and economic scalability are unrelated metrics.
  2. The absence of a remotely viable second choice - a state of affairs that I expect will change soon - is not an argument that the sole choice you have is a good one.
This feels to me a lot like the debates 40 years ago about whether we could afford the performance and memory overhead of using C for performance sensitive code.

I think those arguments were largely retired in the early 1970s, You have an off by ten error. :-)
 
Despite all the hand wringing about how impractical it was, the huge improved productivity benefit pushed developers over the edge and then in fairly short order Moore’s Law made up for the cost hit.

I agree about the productivity improvement and also about Moore's law. I think we shouldn't neglect compiler improvements. But I think the main reason is actually Amdahl's law. It is still the case that assembler is faster than C for certain algorithms under certain conditions, and that we actually use assembly for many of those algorithms. But the cases where this is the right answer are well off into the dim corners of programming problems today.


Jonathan

Jonathan S. Shapiro

unread,
Aug 28, 2025, 12:19:28 PM (12 days ago) Aug 28
to fr...@googlegroups.com
On Wed, Aug 27, 2025 at 3:47 PM 'Bakul Shah' via friam <fr...@googlegroups.com> wrote:
On Aug 25, 2025, at 11:03 PM, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
If the goal is for people to understand the result, maybe it's better to avoid getting tangled up in the complexity of constructions that are no longer relevant 30 years later. I'm not saying that as a "this is it" position, but as food for thought.
Such complexity might be considered machine dependent. One of the things a port can help with is discovering what is general purpose vs machine dependent. That is, how much does a new species differ from an older one. For that one has to study them both and discover out common factors.

In most operating systems this is true. In most systems these hardware-dependent differences are fairly well isolated these days, living mainly in interrupt and context switch logic, drivers, very high performance algorithms, and concurrency primitives. For the most part, none of this touches the core of the kernel or of applications very much.

For some programs, that changes hard and fast when we start thinking about clusters or distributed systems, because the latency relationships these programs rely on change by an order of magnitude or more. You're not going to see an impact in your word processor, but you'll sure see an impact in search back end structure.

A lot of this is because most applications hit a ceiling where more memory or more compute resource stops being helpful. In that word processor, the bottleneck is the human. But for the rest, there's a bit of a chicken and egg problem. We mostly haven't seen good languages or system structures to support distributed programming. MarkM's work and its successors seem promising. Go and goroutines may turn out to be interesting as well. async/await is not.

And holy cr*p, the notion that we might even talk about continued 30 year relevance in the field of computing is kind of mind blowing!
We should be building structures that are resilient enough to last and be useful for far longer.

When the technical foundations change weekly or monthly (as they are currently doing in AI), that isn't really possible. As the technical foundations stabilize, the need for an economic equilibrium means that we're never going to build this way. Putting my company executive hat on, development time is a capital investment. The right way to think of it is that you take out a loan at high interest and opportunity cost to pay to build a system. Many of those systems never get to market. Eventually that loan has to be paid off from product revenues.

You can make different choices about amortizing that cost. Japanese practices favor longer term amortization in order to support investment in strong software foundations. US investment patterns give us a three year horizon on payoffs (in essence: half a venture cycle), which is very short-term and drives us to the disaster that is the practice of Agile. Other economies put things somewhere in the middle.

[This is why I don't have much faith in proofs -- what happens when requirements change?] 

The kinds of invariants (not requirements)  one is concerned with in software verification rarely change. What changes are the programs over which you discharge the proofs. The evidence to date is that these updates do not delay product shipments, mainly because the co-development of code and proof removes false paths in the development process. What they do tend to delay is interim internal deliverables.

Can we build malleable (operating) systems with capabilities?

Assuming we can build general systems on oCap foundations at all, the answer is yes. The key to malleability is modularity, and that's what oCap systems are about.

There's a pullback happening in containerization right now that's going to put a hitch in that. Long and short, container management and devops are much more expensive than was initially appreciated, and the complexity of asynchronous error handling across containers and unwinding intermediate results from failed end-to-end operations is pretty damned bad. It's becoming pretty convincing that we scaled these systems the wrong way and ended up paying for container scalability with container management and error handling complexity.

The need for dynamic scaling hasn't gone away, of course.
 

Jonathan

Jonathan S. Shapiro

unread,
Aug 28, 2025, 12:48:37 PM (12 days ago) Aug 28
to fr...@googlegroups.com
On Wed, Aug 27, 2025 at 3:50 PM John Kemp <stable.p...@gmail.com> wrote:
I guess I have competing (within myself) points of view on this:

1. Garbage collection (* in erlang) scales nicely (in my experience) and
doesn't have the occasional latency across the VM, because memory
management can be done on the process level, and there is no shared
memory across the VM, only message-passing.

Yes. At least in part. The other contributor is that many erlang VMs run smaller programs. The combination of smaller heaps and GC-isolated VMs is exactly the case I identified where GC works very well.

As VMs grow larger, GC-isolation across VMs does not result in GC isolation in the underlying operating system. Full-heap GC algorithms, by their nature, visit a substantial fraction of the heap, and tend to do so in an unpredictable memory order.* This doesn't interact well with either the memory hierarchy or the paging subsystem. When enough large VMs run simultaneously, their paging behaviors compete at the kernel level, and GC performance for all of them degrades exponentially. You can see real-world confirmation of this by opening too many browser windows on your desktop.

I don't say this as an argument against GC. I do offer it as one of the essential arguments for why GC doesn't scale. Those same programs written in Rust run in heaps that are 4x-10x smaller and do not require memory walks in unpredictable order. Think about the financial implications at the scale of a Google or Amazon data center.

It's tempting to argue that we should build programs from more components that have smaller heaps, and this can help. But there remain parts of programs where a lot of data has to be manipulated by a single algorithm. The most obvious one being search.

2. In _traditional_ (not cryptocurrency) financial applications with
which I am familiar, even the web servers are hand-written in C/C++
(sometimes, yes, with inline assembly). They have not transitioned much
towards any other language environment.

So far as I'm aware, the very best examples of this type are written by Jane Street, primarily in a version of OCaml that they have enhanced over the years. Key factors in this decision were the need to do very rapid development and the fact that C, C++, and asm are so error prone that program trading in these languages is a fast path to bankruptcy.

J.P. Morgan Chase for a while had competing systems for their mortgage backed securities operations that ran (respectively) in C++ and SmallTalk. I was brought in to review the efforts because they were hoping to eliminate redundancy. I recommended that they shut down the C++ version.
 
Rust is a possible game-changer
there since it can eliminate whole classes of memory-related bugs with
the borrow-checker memory management model and maintain C-like
performance.

These things are important, but for financial applications they are "check the box" requirements. Given that the half life of a many financial instruments from the perspective of the bank is roughly 24 hours, development speed becomes extremely important. Depends, of course, on what financial instruments we're talking about.


Jonathan

Jonathan S. Shapiro

unread,
Aug 28, 2025, 12:56:07 PM (12 days ago) Aug 28
to fr...@googlegroups.com
On Wed, Aug 27, 2025 at 3:52 PM Matt Rice <rat...@gmail.com> wrote:
 
I'd also be curious how gc impacts power usage as a metric.
I haven't seen any numbers (only those comparing the power usage
differences between different gc algorithms/implementations), but it
seems like another aspect that may impact the choice of gc for some
specific purposes besides the usual suspects.

Both the choice of GC algorithms and the choice of whether GC is affordable.

Gathering good numbers for this is hard. For small heaps and short run programs, the memory management power consumption isn't enough that anybody should care, and may well be lower than manually managed approaches. Why? Because allocation is so much simpler and release simply doesn't happen in the GC case.

For mid-size heaps, the answers have a lot more to do with program behavior than with GC.

For large heaps, the answers depend on the specific GC. Multi-generational schemes help a lot, but not perfectly.

For very large heaps, the performance cost of GC is heavily influenced by cross-prorgam interactions that don't interact well with GC. We can see that these interactions happen by looking at the patterns in the page fault events. The problem here is that the system effects are much larger than the GC algorithm effects, which makes isolated measurement very difficult.


Jonathan

Chip Morningstar

unread,
Aug 28, 2025, 7:41:55 PM (12 days ago) Aug 28
to fr...@googlegroups.com


On Aug 28, 2025, at 8:54 AM, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:

This feels to me a lot like the debates 40 years ago about whether we could afford the performance and memory overhead of using C for performance sensitive code.

I think those arguments were largely retired in the early 1970s, You have an off by ten error. :-)

Heh.

In the mid 1980s, when I was doing games at Lucasfilm, nobody would dare program an Atari 800 or a Commodore 64 or an Apple II (all 6502 machines) in anything other than assembly language (see https://github.com/Museum-of-Art-and-Digital-Entertainment/macross for some amusing legacy software for those of you who are amused by that sort of thing).
We escaped this misery when the games market shifted in the late 1980s from 6502 home computers to x86 PCs.
I think a goodly portion of the world of microcontrollers and embedded systems stuck with assembler for a few years beyond even that.

Chip

William ML Leslie

unread,
Aug 28, 2025, 8:18:41 PM (12 days ago) Aug 28
to fr...@googlegroups.com
On Fri, 29 Aug 2025 at 02:19, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
On Wed, Aug 27, 2025 at 3:47 PM 'Bakul Shah' via friam <fr...@googlegroups.com> wrote:

A lot of this is because most applications hit a ceiling where more memory or more compute resource stops being helpful. In that word processor, the bottleneck is the human. But for the rest, there's a bit of a chicken and egg problem. We mostly haven't seen good languages or system structures to support distributed programming. MarkM's work and its successors seem promising. Go and goroutines may turn out to be interesting as well. async/await is not.

With fear of taking a slightly off-topic thread and taking it roaming; having suffered Go professionally for about five years, its concurrency model is exactly the sort of accidental-deadlock bugfarm that MarkM's thesis predicts.

I had a fairly small (~25ksloc) service that co-ordinated about a hundred users and a handful of services, and the original architect had the foresight to have one goroutine for its high-level state management.  This goroutine listened to a couple of channels, including one that was fed commands by other goroutines on the system.  This internal channel was mostly user requests and regular timers.  It really didn't handle a lot of events, maybe 5-10 per second, and with a channel size of 10 that should have been generous.  One day, after some changes from a senior developer on my team, the system deadlocked, leaving all of those users and the customers they were talking to completely stranded.  It turned out they had introduced a code path where the state-management thread tried to post a message to that system channel, and in this case, the channel was full, so it just hung there, waiting for room on the channel.

So this became a thing that I would regularly have to audit for.  Are we making sure that we never post from the main thread?  Eventually this became: functions then are either "main thread safe" or not, and these are mutually exclusive, since you have to be on the main thread in order to read or write any of the shared state.

The flippant response to this is, "that's bad application design".  I'll counter with: concurrency safety had been my bag for over a decade, and I still didn't catch it in code review.

If you find yourself having to deal with Go, the first job should be building a sensible concurrency abstraction on top of channels.

--
William ML Leslie

Mark S. Miller

unread,
Aug 28, 2025, 9:26:33 PM (12 days ago) Aug 28
to fr...@googlegroups.com
On Thu, Aug 28, 2025 at 5:18 PM William ML Leslie <william.l...@gmail.com> wrote:
On Fri, 29 Aug 2025 at 02:19, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
On Wed, Aug 27, 2025 at 3:47 PM 'Bakul Shah' via friam <fr...@googlegroups.com> wrote:

A lot of this is because most applications hit a ceiling where more memory or more compute resource stops being helpful. In that word processor, the bottleneck is the human. But for the rest, there's a bit of a chicken and egg problem. We mostly haven't seen good languages or system structures to support distributed programming. MarkM's work and its successors seem promising. Go and goroutines may turn out to be interesting as well. async/await is not.

With fear of taking a slightly off-topic thread and taking it roaming; having suffered Go professionally for about five years, its concurrency model is exactly the sort of accidental-deadlock bugfarm that MarkM's thesis predicts.

I had a fairly small (~25ksloc) service that co-ordinated about a hundred users and a handful of services, and the original architect had the foresight to have one goroutine for its high-level state management.  This goroutine listened to a couple of channels, including one that was fed commands by other goroutines on the system.  This internal channel was mostly user requests and regular timers.  It really didn't handle a lot of events, maybe 5-10 per second, and with a channel size of 10 that should have been generous.  One day, after some changes from a senior developer on my team, the system deadlocked, leaving all of those users and the customers they were talking to completely stranded.  It turned out they had introduced a code path where the state-management thread tried to post a message to that system channel, and in this case, the channel was full, so it just hung there, waiting for room on the channel.

First, to both of you, thanks for the kind words!

Somewhere on erights.org (but probably not in my thesis) I gathered a list of "lost progress bugs". This one specifically I call "gridlock". The distinction I make from deadlock is: Had you had more memory (e.g., larger buffers), you wouldn't have lost progress at that moment, though you may still lose it later for the same reason.

Off the top of my head, the lost progress bugs are:
- Deadlock
- Livelock
- Gridlock
- Lost signal

(I think there was another but do not remember)
 

So this became a thing that I would regularly have to audit for.  Are we making sure that we never post from the main thread?  Eventually this became: functions then are either "main thread safe" or not, and these are mutually exclusive, since you have to be on the main thread in order to read or write any of the shared state.

The flippant response to this is, "that's bad application design".  I'll counter with: concurrency safety had been my bag for over a decade, and I still didn't catch it in code review.

For me, looking for systems that can be massively adopted and used *successfully*, this is a definitive counter-argument.
 

If you find yourself having to deal with Go, the first job should be building a sensible concurrency abstraction on top of channels.

--
William ML Leslie

--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.

Jonathan S. Shapiro

unread,
Aug 28, 2025, 11:24:00 PM (11 days ago) Aug 28
to fr...@googlegroups.com
On Thu, Aug 28, 2025 at 5:18 PM William ML Leslie <william.l...@gmail.com> wrote:
With fear of taking a slightly off-topic thread and taking it roaming; having suffered Go professionally for about five years, its concurrency model is exactly the sort of accidental-deadlock bugfarm that MarkM's thesis predicts.

Accidental bugfarm isn't fair. There's nothing accidental about it.

I've had some questions about eventually consistency that I've never had time to explore:
  1. What well-known performance patterns exist that can't be expressed this way? Are there alternative expressions?
  2. What ideas exist about computation placement and deadlines. The do-across pattern is only as good as its slowest leg, which leads me to wonder how that is handled?

Jonathan
 
I had a fairly small (~25ksloc) service ...  One day, after some changes from a senior developer on my team, the system deadlocked, leaving all of those users and the customers they were talking to completely stranded.  It turned out they had introduced a code path where the state-management thread tried to post a message to that system channel, and in this case, the channel was full, so it just hung there, waiting for room on the channel.

It sounds like queueing opacity has a lot to do with the problem?

Async/await has this on steroids. When it works, it works great. When it goes off a cliff, good luck. And it's tied in to the call/return logic in a way that doesn't seem (at least to me) to leave any room for mitigation.


Jonathan

William ML Leslie

unread,
Aug 29, 2025, 1:33:44 AM (11 days ago) Aug 29
to fr...@googlegroups.com
On Fri, 29 Aug 2025 at 11:26, Mark S. Miller <eri...@gmail.com> wrote:
First, to both of you, thanks for the kind words!

Somewhere on erights.org (but probably not in my thesis) I gathered a list of "lost progress bugs". This one specifically I call "gridlock". The distinction I make from deadlock is: Had you had more memory (e.g., larger buffers), you wouldn't have lost progress at that moment, though you may still lose it later for the same reason.

Off the top of my head, the lost progress bugs are:
- Deadlock
- Livelock
- Gridlock
- Lost signal

(I think there was another but do not remember)

Ah yes.

These are fun to explore together with the broader context of unexpected concurrency (finalisers, signals), stale stack frames, and inconsistent event order.

--
William ML Leslie

William ML Leslie

unread,
Aug 29, 2025, 2:00:07 AM (11 days ago) Aug 29
to fr...@googlegroups.com
On Fri, 29 Aug 2025 at 13:24, Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
On Thu, Aug 28, 2025 at 5:18 PM William ML Leslie <william.l...@gmail.com> wrote:
With fear of taking a slightly off-topic thread and taking it roaming; having suffered Go professionally for about five years, its concurrency model is exactly the sort of accidental-deadlock bugfarm that MarkM's thesis predicts.

Accidental bugfarm isn't fair. There's nothing accidental about it.

I've had some questions about eventually consistency that I've never had time to explore:
  1. What well-known performance patterns exist that can't be expressed this way? Are there alternative expressions?
  2. What ideas exist about computation placement and deadlines. The do-across pattern is only as good as its slowest leg, which leads me to wonder how that is handled?
You and I both, especially since it looks like a compiler problem; the idea that you can build a CSP set that operates efficiently and safely.  I think I spend more of my performance worries on the cost of dependent pointer lookups.  For example, Coyotos IPC involves looking up the Endpoint, then its Object Table Entry and target Process, then the process OTE.  If you can line up your cache misses, there's about 400 cycles spent doing nothing much.  That's a kernel written by somebody who cares deeply about this stuff, imagine what the average Java application is doing.  Maybe we should go back to the Alto with its 32 instruction pointers.

Ensuring all the data you need arrives just in time for your computation is a fun abstract problem but did you have something more specific in mind?  Or have I misunderstood?


I had a fairly small (~25ksloc) service ...  One day, after some changes from a senior developer on my team, the system deadlocked, leaving all of those users and the customers they were talking to completely stranded.  It turned out they had introduced a code path where the state-management thread tried to post a message to that system channel, and in this case, the channel was full, so it just hung there, waiting for room on the channel.

It sounds like queueing opacity has a lot to do with the problem?

There was no in-language mechanism to prevent performing a blocking send to the current process, or any process that publishes to the current process.  Is that a queueing opacity problem?
 
Async/await has this on steroids. When it works, it works great. When it goes off a cliff, good luck. And it's tied in to the call/return logic in a way that doesn't seem (at least to me) to leave any room for mitigation.

Async/await is not really distributed, so I was wondering how it fits.

I was doing some native Android development back in 2019 and built a set of promise-like abstractions that had, out of necessity, the fun property of allowing you to specify what executor to put your callback on.  I think this would solve the most common performance issue with async/await that we see in the wild, where node.js will start handling new requests when it has data on old requests it could pipe out right away.  Scheduling is definitely a fun problem.

The one that I see a bit more often is people awaiting things for no reason.  Let the asynchronous computation be free :)

--
William ML Leslie

Jonathan S. Shapiro

unread,
Aug 29, 2025, 3:15:41 AM (11 days ago) Aug 29
to fr...@googlegroups.com
Apologies that my last review of your thesis is coming up on 20 years old, but I think this framing remains sound.

On Fri, 29 Aug 2025 at 11:26, Mark S. Miller <eri...@gmail.com> wrote:
Off the top of my head, the lost progress bugs are:
- Deadlock
- Livelock
- Gridlock
- Lost signal

When these are defined behavior, they are not bugs. Defined and consistent misbehavior is unequivocally better than undefined misbehavior.

Contracts around these behaviors are hierarchical. Speaking loosely, a lower-level handling of these concerns can make it possible for a higher level to handle them constructively, but a higher level cannot recover from mishandling by a lower level.

Implemented correctly, deadlock and livelock should be an exclusively application-level phenomenon. Our in extremis testing of EROS revealed exactly one livelock bug, triggerable only when (a) a perverse arrangement of nodes in a domain was contrived, and (b) the entire system was running with 2N in-memory nodes, where N was the number of nodes required to define a domain/process. I suspect this bug was latent in the original KeyKOS implementation.

I apologize that I don't (yet) have an established mental model for gridlock in spite of your email earlier this evening.

If I am inferring "lost signal" correctly, there are two cases:
  1. Schemes in which repeated notifications can be consolidated without loss of information. UNIX signals are supposed to be an example of this. Such schemes, correctly implemented, should NEVER incur lost delivery.
  2. Schemes in which signal delivery takes the form of a message that may be temporarily deferred. In such a system, barring system-wide crash, "lost" signals are invariably application-level bugs.
I think it is best to first evaluate these sorts of issues without regard to system-wide crash and [optional] recovery.


Jonathan 

Mark S. Miller

unread,
Aug 29, 2025, 3:49:28 AM (11 days ago) Aug 29
to fr...@googlegroups.com
It's late, so briefly:

> When these are defined behavior, they are not bugs. 

By bugs, I usually mean application bugs, which can of course happen on a perfectly correct platform. But even a perfectly correct and deterministic platform can make it more or less likely for application developers to accidentally write such bugs. To the extent that the platform makes it more likely, I say the platform has a hazard that users will create such bugs.

> Defined and consistent misbehavior is unequivocally better than undefined misbehavior.

Totally agree. Undefined behavior is the worst. Taken literally, it includes destroying the universe.


--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.


--
  Cheers,
  --MarkM

Dale Schumacher

unread,
Aug 29, 2025, 11:49:55 AM (11 days ago) Aug 29
to fr...@googlegroups.com
Some of the problems cited seem to come down to "concurrency is hard" and contemporary programmers don't know how to deal with it properly. I'm not going to claim that it's easy. In fact, it may require a dreaded "paradigm shift" to escape the single-threaded sequential-imperative mindset. I think the most important shift is the elimination of shared mutable state.

Perhaps this points to the need for another book (or several) to discuss ways to mitigate these issues. I've been collecting "actor patterns" for over 15 years now. Producer/consumer rate mismatches have several potential mitigations that can be easily expressed with actors. Mechanisms like back-pressure in bounded buffers and policies for shedding excess work are inspired by machinery that interacts with the physical world. TANSTAAFL. Our systems are finite, so limits will always exist.

The uFork processor architecture attempts to stake out a relatively unexplored area of the design space. By implementing the "classical" actor model at the lowest possible level, uFork provides an intrinsically memory-safe machine language. Instruction-level interleaving (aka "context switching") means that all activities make progress concurrently. Private mutable state (inside actors) and shared immutable data (in events) prevent entire categories of potential errors. Sponsors provide processor-enforced quotas on memory, processing, and communication. These can be used to manage and mitigate resource-exhaustion attacks (or accidents). Since uFork is an actor processor, and actor addresses are ocaps, all our capability-security patterns apply directly.

It has been suggested to me that perhaps a good place to begin applying this model is at the "edge" of the network. A place where there is less legacy pressure and where there are significantly more resource constraints. The simplicity of the uFork processor allows it to operate within extremely small memory and power budgets. Actor programming could help bring asychronous thinking and capability discipline to the masses.


Marc Stiegler

unread,
Sep 2, 2025, 1:06:20 PM (7 days ago) Sep 2
to fr...@googlegroups.com
Since Jonathan asked for help, when I finally had the time this morning I read all the material to see if I could contribute anything. Whoa, what a thread.

Since you could write everything I know about the KeyKos OS family on a postcard, the closest to help I can offer is a meta-look at this discussion. Here are points I think should be reinforced:

Kris Kowal is right to focus on a smaller endeavor, perhaps the one he describes, perhaps another. But keep it small.

None of us ever remember strongly enough markm's comment, "Perfect security tech only makes the world safer if it gets adopted." I am as guilty as anyone of not beating this into my thick head hard enough. Jonathan implicitly acknowledges that L4 is more successful than the KeyKos family. My suggestion: Examine the question, "Why is L4 only successful as a hypervisor?", and if you can identify reasons it only works for that niche, ask the question, "how can we enhance it to fit more niches?" Traditionally, in a field with entrenched winners, you succeed by starting with a tiny niche and expanding with new features into more niches. Unix derivatives couldn't blow Windows off the desktop. But they could conquer the tiny early smartphone niche, which then grew till it was larger than the desktop market.

Chip correctly observes that NodeJS is extremely successful despite its memory model, for a host of applications. These apps are not the place to start an ocap OS revolution. Jonathan observes (well, I'm going to twist his observation and put some words in his mouth here) that IoT is a fantastic new application space that has a huge number of characteristics that ought to favor KeyKos-style operating systems. Focusing on this niche, which will someday be larger than smartphones, seems like a terrific opportunity. (and it meets Dale's suggestion to start with "the edge of the network, with less legacy pressure").

Combining this observation with Kris's observation, and combining them with the proposal to begin with something that already has some success, a more strategic possibilty arises: one could engage in a small, powerful project to document a plan for transforming L4 into the best-in-class OS for the IoT world.

I now end with an even bigger meta-point for you all to consider before rejecting it. There is an even bigger reason to keep projects small at this time: We may be in the middle of a whopper of a transition.

I have been working with AIs for over a year on a nearly daily basis, using them as assistants for writing novels. The AIs suck. They save me no time, and the quality of my output is different but not really better. Why do I still use them? Because it is clear that this is the way high quality books will be written in the future. So I suffer now in preparation for the day this changes. And it is changing. Tiny increment by tiny increment, you can see that it is better now than it was a year ago.

As nearly as I can tell, AI assisted software engineering is in the same boat. You couldn't use it to write a piece of a secure OS today. But it is getting incrementally better. A big project started now could easily be overrun by future tech. We may be in the humorous position of asking the classic question, "You want to get a ship to Alpha Centauri ASAP. The technology is advancing so the speed of a spaceship is doubling every two years. When should you build and launch a ship?" The answer is not "today".

Keep it small.

--marcs


Jonathan S. Shapiro

unread,
Sep 3, 2025, 1:41:23 AM (6 days ago) Sep 3
to fr...@googlegroups.com
Marc:

Thanks for your thoughts. I agree with everything you say in spirit. I'm less certain that some of the assumptions hold up, but they are definitely worth considering carefully.

On Tue, Sep 2, 2025 at 10:06 AM Marc Stiegler <ma...@skyhunter.com> wrote:
Kris Kowal is right to focus on a smaller endeavor, perhaps the one he describes, perhaps another. But keep it small.

I think this depends somewhat on the objectives, but I agree that (a) one shouldn't do more work than is actually necessary, and (b) the tighter and more focused you can keep it, the more likely success will be.
 
None of us ever remember strongly enough markm's comment, "Perfect security tech only makes the world safer if it gets adopted."
 
Jonathan implicitly acknowledges that L4 is more successful than the KeyKos family.

It's hard to beat 1B+ deployments, but I think this may be an apples and oranges thing. L4 serves its problem domain very, very well. Its main use in cell phones is to protect the secure boot enclave. Coyotos could do that just as well, but there's no point going after a space that is already well served and well established. That said, secure boot enclave isn't exactly a general purpose use. I think its relevant that the "killer apps" for L4 are mainly built around things that (a) can be [mostly] statically preconfigured, (b) that do not need to pass capabilities between subsystems, and (c) have embarrassingly simple trust relationships. This pattern does not generalize to general purpose subsystems, or even to IoT applications that have non-trivial security or fault isolation concerns. I've been playing around with the HomeKit and Matter families of home automation products recently. Matter is complex enough that I would not want to implement it as a monolith, and implementations need some interesting information sharing.
 
My suggestion: Examine the question, "Why is L4 only successful as a hypervisor?", and if you can identify reasons it only works for that niche, ask the question, "how can we enhance it to fit more niches?"

It's a reasonable proposition. Short answer: L4 is good as a microvisor or enclave mainly because it has excellent mechanisms for low-level mapping and IPC. What it does not have is the concept that the users should be able to introduce new objects, that capabilities should be proxyable, or that resource pools should be first class (for example, there's nothing analogous to a space bank). Mutual isolation of storage allocation - and more important storage teardown - has repeatedly proven essential for isolated subsystems in KeyKOS, EROS, and Coyotos. L4 also has nothing analogous to a constructor - there's no straightforward way to build a generic instantiation mechanism for new subsystems that I can see, and no way to certify that the resulting subsystems are confined.

I hasten to add that I'm not as up to date on L4 as I should be, and I really need to go look at where things are at today.

Could L4 be enhanced to have these? Yes. In fact, I argued strongly for doing so at the summit meeting where L4 adopted capabilities. The L4 folks who were there didn't see why this was important, or perhaps I was not clear enough or persuasive enough. While it was early on, the team in Australia was already focused on the microvisor/enclave problem space, and they may (correctly) have viewed this type of system re-architecture as a diversion. The problem with these enhancements now is that they would require major changes, which would entail re-doing all of the verification work that underpins their current customer contracts, and the financial risk for them would be quite high. For all of these reasons, I think the work would be unlikely to be adopted back into the L4 base.

To the extent that I have that analysis right, we're looking at a new system without a user base anyway, and if that is the case, I think we're better off with what we know works.
 
Traditionally, in a field with entrenched winners, you succeed by starting with a tiny niche and expanding with new features into more niches.

Yes. This is Clay Christiansen's argument. But it doesn't apply in this comparison because the entrenched thing is in a completely separate solution space and the new thing would not be looking to displace it.
 
Unix derivatives couldn't blow Windows off the desktop. But they could conquer the tiny early smartphone niche...

UNIX derivatives currently hold more than 20% of global desktop market share, and more than 50% of server market share. I'd be a little cautious with that line of argument. There's a case to be made that Apple's recent enhancements to SwiftUI are at least partly about defeating platform-neutral client code on handheld and tablet devices.
 
Jonathan observes (well, I'm going to twist his observation and put some words in his mouth here) that IoT is a fantastic new application space that has a huge number of characteristics that ought to favor KeyKos-style operating systems.

Close enough. Primarily the fact that the absence of users on these devices means that capability-based authorization is adequate. Which, conversely, means we're no closer to a desktop solution than we were 30 years ago.

Also, the IoT and robotics spaces are badly fragmented and the implementations are operationally dicey. Both spaces are ripe to be overthrown for that reason as well. Might as well overthrow them with something vaguely defensible.
 
... and it meets Dale's suggestion to start with "the edge of the network, with less legacy pressure"

I had missed that comment, so thanks for drawing me back to it. It's a good point.
 
Combining this observation with Kris's observation, and combining them with the proposal to begin with something that already has some success, a more strategic possibilty arises: one could engage in a small, powerful project to document a plan for transforming L4 into the best-in-class OS for the IoT world.

If the assumptions about L4 relevance and feasibility were well-founded, I'd agree. Unfortunately I don't think the path described would be viable in the market.

I now end with an even bigger meta-point for you all to consider before rejecting it. There is an even bigger reason to keep projects small at this time: We may be in the middle of a whopper of a transition.

I've also been using them, and Darcy has become quite facile with them. I'll make three comments, the first factual and then two opinions:
  1. If you aren't paying with the paid versions, you don't really have a handle on what these systems can actually do. The difference in the level of capability exposed is staggering.
  2. The chat interfaces, interesting though they certainly are, don't give you enough control of the internal process to manage it successfully - there's a list of necessary issues they don't adequately address. You definitely can get a fair bit done with the chat interfaces, but it's a night and day difference. Both on the capabilities and the price tag.
  3. We're in the "General Magic" stage of AI. I believe there will be at least one, and probably two, additional technology cycles before this stuff is anywhere close to living up to current expectations.
I don't say any of this in any grouchy sense. I'm using the tools and I'm quite impressed with their potential. I'm mainly saying that they are early stage.

I'd also note that their force multiplier is inversely related to the expertise of the programmer. There's some rote stuff they help on, but if you actually know what you're trying to accomplish the added benefit is small, and smaller when the need to guard against them f*cking everything over or hallucinating or introducing bugs is taken into account. It's very easy to spend more time keeping the LLMs honest than you get back in productivity - something that a lot of companies have started to notice.

At this time, I don't think there's any benefit to be had in low-level systems work, both because it's high-expertise work to begin with and because the sample set that the LLM model building process has to work with is so limited.

I have been working with AIs for over a year on a nearly daily basis, using them as assistants for writing novels. The AIs suck. They save me no time, and the quality of my output is different but not really better.

That's a pretty fair synopsis of my experience as well. Prompt engineering is critical and difficult, and it only takes you so far. For tasks like software construction, you end up having to gather a farm of LLMs, one doing the main work and the other three or four to keep the first one honest.

A big project started now could easily be overrun by future tech.

Yes and no. Right now, there is dramatically more benefit for new start projects than for use on old work. The Anthropic family is making steady progress on this, and others will follow. By the end of next year, I think it will be pretty easy to be enhancing an established well structured project using AI. So I'm not sure the concern about being overrun is as big as people have been assuming. In consequence, I don't buy in to "keep it small" because of AIs. I do buy in to keeping it structured and modular - which many people don't do but is hardly new.

The more interesting issue is that the cost of new starts has ben so dramatically reduced, and their rate of development progress so improved, that we may be returning to the wild west of software competition for a while. That could be interesting.

Finally, I'd note that the nature of the input that goes into the model building necessarily imposes an intrinsic bias for mediocrity and another intrinsic bias for "more of the same". Which is to say that one of the "honesty keeper" tools you want in a real working setup is a linter and style checker. But also: LLMs aren't nearly as good on things for which a large sample set isn't available. Which unfortunately includes capability-based application code...


Jonathan

Pierre Thierry

unread,
Sep 4, 2025, 11:28:41 AM (5 days ago) Sep 4
to fr...@googlegroups.com
Le 29/08/2025 à 17:49, Dale Schumacher a écrit :
Some of the problems cited seem to come down to "concurrency is hard" and contemporary programmers don't know how to deal with it properly. I'm not going to claim that it's easy. In fact, it may require a dreaded "paradigm shift" to escape the single-threaded sequential-imperative mindset. I think the most important shift is the elimination of shared mutable state.

One of these possible shifts that has yielded exceptional results but little adoption is Software Transactional Memory. In languages like Haskell, Purescript or Unison, it's both fast and safe, because the type system prevents transaction code to contain side effects. I've heard it claimed that the possibility of side-effects in STM code in Scala is usually not a problem in practice.

But I think that when such paradigm shifts are necessary for a technique to work, or for the technique to perform well (whether in speed or safety), you either enforce it strongly or wide adoption means wide misuse. React seems a good example. Most programmers don't understand that they would create safer and possibly faster applications if they didn't see the requirements around side-effects and immutability as hurdles to bypass.

I suspect that the lesson to learn from that is that if we want people to use ocaps in a way that make applications safer, it needs to be presented with a great DX.

Curiously,
Pierre Thierry
--
pie...@nothos.net
0xD9D50D8A
OpenPGP_0xC5ED7720D9D50D8A.asc
OpenPGP_signature.asc

Pierre Thierry

unread,
Sep 4, 2025, 11:38:05 AM (5 days ago) Sep 4
to fr...@googlegroups.com
Le 28/08/2025 à 18:48, Jonathan S. Shapiro a écrit :
I don't say this as an argument against GC. I do offer it as one of the essential arguments for why GC doesn't scale. Those same programs written in Rust run in heaps that are 4x-10x smaller and do not require memory walks in unpredictable order. Think about the financial implications at the scale of a Google or Amazon data center.
I wonder if this is not an essential limit of GC but an accidental limit, caused by most GCs being pretty rudimentary. There has been a lot of work on GC to make it able to collect concurrently or suitable with soft real-time. Some work has been done to drastically reduce its latency. I think it was Standard Chartered Bank that financed the work to bring a low-latency GC in Haskell. While that effort built on relatively recent scientific results about GC, many language runtimes seem to lag 30 to 40 years behind the latest results in what they implement.
OpenPGP_0xC5ED7720D9D50D8A.asc
OpenPGP_signature.asc

Raoul Duke

unread,
Sep 4, 2025, 1:01:30 PM (5 days ago) Sep 4
to fr...@googlegroups.com
Never used STM much myself but i am under the impression it is a leaky abstraction including wrt performance due to at least the hidden retries? 

Jonathan S. Shapiro

unread,
Sep 4, 2025, 2:25:17 PM (5 days ago) Sep 4
to fr...@googlegroups.com
On Thu, Sep 4, 2025 at 8:38 AM Pierre Thierry <pie...@nothos.net> wrote:
Le 28/08/2025 à 18:48, Jonathan S. Shapiro a écrit :
I don't say this as an argument against GC. I do offer it as one of the essential arguments for why GC doesn't scale. Those same programs written in Rust run in heaps that are 4x-10x smaller and do not require memory walks in unpredictable order. Think about the financial implications at the scale of a Google or Amazon data center.
I wonder if this is not an essential limit of GC but an accidental limit, caused by most GCs being pretty rudimentary. There has been a lot of work on GC to make it able to collect concurrently or suitable with soft real-time.

A huge amount of work. A lot of it, initially, driven by Cliff Click's team at  Azul, and then a bunch of other work from others. That work is how you get down to a 4x heap size. The advances on incremental concurrent GC since Azul have been pretty amazing.

But as you say, runtimes don't adopt this stuff. Go, for example, decided to do something really simple, and made a bunch of pretty ignorant arguments about why theirs was "better" (it isn't). They got laughed at and derided by just about everybody who actually knows anything about modern GC, and with reason. But the flip side of that is that the more sophisticated GCs take a lot of effort to get right, require some very delicate and machine-specific code generation, and are really effing hard to port when a new platform needs to be supported. If Go's goal was "good enough for our problem space, relatively easy to maintain, and fairly easy to port", they made the right choice.

In particular, they very clearly decided that their target space was containers running on servers, where hard real time is not a requirement and small pauses are tolerable.

The place where they missed is that a fair number of applications see intermittent large pauses, and (as is true with every GC) the causes are hard to isolate and debug.


Jonathan 
Reply all
Reply to author
Forward
0 new messages