Greetings!
I am a weirdo who has, since 2005 and without knowing Inferno OS existed, been conducting what I will summarize as a thought experiment, which as far as I can tell (as I am just beginning to learn about Inferno) correlates with you at a good 70+% level. I kept my thinking private for a long time, and am recently trying to come out of that shell.
I am reluctant to either ramble on about my ideas apropos of nothing, or blindly link others to the small blog where I have written about my project. That blog, written in ignorance of all but my own thinking, may seem a bit… “Golly gee whiz” about some things. Ideally, I would share places where I believe my thought experiment has provided insights that may be useful to you. Perhaps I will after a bit more reading, if it is welcome. I'd be happy to talk theory about a lot of different parts of the system.
That said… I mean, it's not as though Inferno is in active development, and I understand that. Even if I offered suggestions… abstract ideas aren't helpful without the work to back it up. And I… I don't think I'll be able to do that work myself, especially not alone. I have a Bachelor's CS, but… my programming skills are not impressive, and I've never done kernel work. And I may be losing my mind, a bit.
That having been said, I have otherwise been unable to find like-minded people to talk to, and I hope some people here will be willing to indulge me. I say with some conviction that a distributed operating system like Inferno could become the future of computing, if done correctly.
Of course, I literally call my project MAD, so take that with an appropriately sized grain of salt.
- Vincent
So here's an example of what I'm talking about, of the kind of theoretical nonsense I spend entirely too much time thinking about:
Remote procedure calls.
Fundamentally in any distributed system, we will be doing things that fall into the category of remote procedure calls very frequently, so it's important that those mechanisms are well thought-through. If people can't have confidence in your RPC system, your distributed system is on shaky grounds. Without having dived into the details of it, Styx is not a confidence-inspiring mechanism.
Now, that's unfair, but only kind of. Styx is either a peer of, or a step ahead of, the Internet Protocol as an RPC mechanism. (See also this blog post deriding IP as an RPC mechanism) The weak part of both, is that they are fundamentally an application entry point that accepts strings (or data streams) as arguments, and that's been how you call applications since applications were first developed. We have grown used to the fact that that's how it works. I completely understand using a mature metaphor to minimize new technology being introduced or developed, and applying it to files to create multiple entry points to control your remote application isn't a bad addition. (In fact reusing the filesystem metaphor to map a distributed system, and adding more entry points to a program, are both points on which we agree.) As a procedure call mechanism, though, it's… limited.
Now, my proposed solution is probably on the entire other side of the map, too complicated instead of too simple, requiring dev work I don't want to do… so instead of talking about that, let's just talk about the high-level theory.
My guiding philosophy is that an operating system should not force programmers to implement or reimplement core systems, even if it permits it; wheel-reinvention should be kept to a minimum, a hobby rather than a job. In that context, typeless data streams are… basically the worst. They oblige programmers to either invent a complicated mechanism to handle various valid and invalid data coming through the stream channel, or naively hope and pray that what comes in follows the plan. What do you do when you start getting bad data in a stream? Kill it and start over? What if it's in the middle of a complex client-server interaction? Networking stacks have gotten good at minimizing events where the network itself corrupts data channels, but there are other potential sources of errors, including active malice. It is still required that programmers invent ways to detect and handle these errors, or pull in a third-party library that does the same.
Message-based data streams are a bit better, in that there are defined intervals in which the sender and receiver's expectations are reset to a normalized state. Instead of potentially seeing an unending bitstream through a channel, due to a malfunctioning or malicious sender, you can only see a potentially unending stream of individual packets, and these packets could be mixed in with legitimate traffic that may still be handled correctly. Common message streams, however, still contain typeless data or text, which brings us to our next problem, or rather, my next bit of guiding philosophy:
If a feature is common in programming languages, that feature should probably exist for RPCs, data channels, application entry points, and any middleware that ties them together (such as the OS shell). Most directly, I'm looking at type checking (in particular advanced objects, but also the basics), though you could argue that preconditions and postconditions, or something similar, fit the same bill. On the one hand, formalized type checking and/or custom condition checking at data ingress means the programmer doesn't need to reinvent that particular wheel. This is helpful, among other reasons, because programmers have proven over and over that they will do a terrible job reinventing that wheel. We have all half-assed an “Is this what I think it is” check or two or ten or many many more. When we aren't half-assing those checks, we're overdoing them because we don't trust any part of the code we aren't looking at right now, so who knows, maybe something weird slipped through out of nowhere, in the absolute middle of a workflow that honestly should be bulletproof, so why do these errors keep coming up.
On the other hand, formalized typing for entry points and remote procedures is, if not self documenting, a world ahead of “shout strings down a pipe” syntax. In that context, I dislike Inferno/Styx's bare Unix file access methodology. The fact that control files are named is helpful, but that's essentially sanity checking only the first argument to a function, the one that selects which internal function the rest of your arguments go to. Without access to the documentation, the files which act as data sources and control functions can seem just as arbitrary and opaque as any other undocumented mass of files. It is still therefore incumbent upon the programmer to offer feedback (of the --help type) so that someone who is exploring the system or debugging it doesn't have to constantly refer back to a separate window to look up the interface details. (This is still the case with formalized types, but the fact of type checking means that type information is stored in the executable, meaning some extra machine-generated documentation is plausible; likewise, if you had explicit pre/postcondition blocks, you might machine-generate some basic documentation from that, or just output the condition block's source for the user to peruse.)
All of that said, the idea of type checking in a distributed system creates loads of headaches immediately and persistently. It requires the system and its core tools (shell, etc) be aware of types and typed objects. You need something more like a python or javascript REPL loop for your shells, which is capable of implicitly and explicitly capturing objects and passing them as arguments to functions. Any type that gets passed in or out of a public function need to be universal (eg JSON/pythonic data objects) and/or public. Implementing all that would not be fast, easy, or cheap. But also, programs needs to be working off of compatible implementations of standard data types, or else your public type infrastructure breaks down immediately.
In that context, my theoretical Project MAD has a global type directory, where you ask for an API and get an implementation object that can be dynamically linked into your executable. But… that's a complex topic for another time, or you could read this. Either way, this seems like a suitable stopping point for this topic.
In short: What you really want from a distributed system is for a programmer to treat a function on a remote machine as though it were a part of their own application. That means just passing arguments back and forth, using a system that's just as powerful as a full programming language, complete with type checking and similar features.
My argument that Inferno/Styx doesn't go far enough, is not about Inferno at all. IP, sockets, Unix IPC, and the like are all doing what I will characterize as "the wrong thing". Granted, doing what I consider “the right thing” would introduce a generational change in how programs operate, and like… I'm just some guy with an idea. I'm not actually so arrogant as to think I haven't missed anything. At absolute best, it needs more thinking about, and it's very likely that I'm being too naive.
But that's why I enjoy thinking about the high-level theory. I'm not setting myself or anyone else up for heartbreak by spending, or inducing the spending of millions of dollars and thousands of man-hours on an implementation that may end up being wrong. If someone in this group, or someone discovering my blog, finds a logical hole and blows my theory out of the water, well, nothing is lost. And if someone takes some of my ideas and makes good use of them in ways I don't expect and can't take credit for… well, I'll still count that as a win. More broadly… it's just fun, holding an entire design in my head and turning it around like a puzzle to see how it works and what shapes I can bend it into.
I don't plan to keep rambling on here, absent an actual conversation. Goodness knows I have my blog for that. But I thought it'd be worth offering an example of what's in my head.
-V