Hello,
I'm implementing CapTP for the first time ever in my life, though I've
been reading about it for a long time. Mark has told me that a nice
phrase they use at Agoric is "don't squander your ignorance". To that
end, I wonder if it might be a good idea for me to work through some of
my thinking live on a thread here. If people find this too noisy,
please tell me and I'll move it off-list.
(In case it's useful and to prevent doubts of copywrongs, I waive all
copyright into the public domain under CC0 1.0. Feel free to use this
if it results in anything resembling useful documentation... though it
probably won't.)
EDIT BEFORE SENDING: this message has gotten huge and I apologize
in advance unless you think it's great in which case you're welcome.
What is CapTP?
==============
(Feel free to skip this, and maybe the next section, if you're familiar
with CapTP already yourself. Or skip everything, I'm not the boss of
you!)
I recently described CapTP and VatTP to a colleague. Here is my attempt
to mirror those definitions, in short.
- VatTP: How you get secure connections between vats (or maybe
machines; that's up to debate). But it's really more about setting
up a way to get messages securely *between* these vats/machines,
rather than what is done with those messages.
- CapTP: This is the "what happens when you get a message" protocol and
is really what allows objects to talk to each other in a fully
distributed environment. It takes care of a bunch of things,
including linking objects across the vats, informing of promise
resolution, cooperative distributed (acyclic) garbage collection, etc
etc.
There have been a bunch of "CapTP"-like things over time. They aren't
all necessarily compatible. In the meanwhile, this makes CapTP more
like an abstract pattern, like "Lisp", rather than a specific standard,
like "R5RS Scheme". Like lisps, you'll likely find many common ideas
between dialects. It may be desirable to find The Great Unification at
one point but that hasn't happened yet. Maybe soon!
How do I learn more?
====================
Here are the main resources I am using to learn about CapTP:
- The Erights CapTP pages:
http://erights.org/elib/distrib/captp/index.html
- MarkM's thesis:
http://www.erights.org/talks/thesis/
- Cap'n Proto's docs:
https://capnproto.org/rpc.html
- More importantly, rpc.capnp, which is a beautiful document:
https://github.com/capnproto/capnproto/blob/master/c++/src/capnp/rpc.capnp
- This extremely wide aspect ratio video by MarkM:
https://www.youtube.com/watch?v=YXUqfgdDbr8
- Agoric's captp.js:
https://github.com/Agoric/agoric-sdk/blob/master/packages/captp/lib/captp.js
- The following Agoric SwingSet docs:
https://github.com/Agoric/agoric-sdk/blob/master/packages/SwingSet/docs/delivery.md
https://github.com/Agoric/agoric-sdk/blob/master/packages/SwingSet/docs/networking.md
(ok that latter one is really for VatTP)
If you are somehow completely new to ocaps (ok, nobody on this list is,
you can tell I'm over-engineering this email for the future), I
personally recommend "A Security Kernel Based on the Lambda Calculus":
http://mumble.net/~jar/pubs/secureos/secureos.html
The core idea is that (object) capabilities are just object references.
There's no need to layer a complex (and more insecure) security
architecture on top of our programming languages because the way
programmers already program, if we take it seriously, already *is* our
programming model: passing references around to functions (or objects)
is basically the full idea. If you don't have it in scope, you can't
use it. More advanced patterns flow from this core idea; we're not
covering them here but the "Ode to the Grannovetter Diagram" writeup
explains many of them in brief:
http://erights.org/elib/capability/ode/index.html
Core structures
===============
I'm going to borrow some slides from a talk I gave on Goblins recently:
https://gitlab.com/dustyweb/talks/-/blob/master/spritely/friam-2020/goblins-talk.org
Jump to "Goblins Architecture", though really *most* of this is common
across other ocap'y systems like E, Agoric's stuff, Cap'n Proto, blah
blah blah.
Here are the abstractions, as used in Goblins, layered. Moving from
inner most part outward:
(machine (vat (actormap {refr: (mactor object)})))
- object: Some sort of encapsulated thing that can be talked to via
call-and-return invocation and asynchronous passing of messages.
Manages its own state, though another way to say that is "decides
after handling one message/invocation how it will respond to the next
message/invocation". In fact, in Goblins this is usually just a
procedure representing the current message handler. Often times
these things may support multiple methods.
In Goblins, objects resemble "classic actors", although that term may
be subject to bikeshedding.
- mactor (Goblins-only?): Stands for "meta-actor"; there are actually
a few core kinds of these and they wrap the object (eg this is where
promises vs non-promises are distinguished in Goblins).
- refr: Also known as "ref" elsewhere in the ocap community; this is
the reference that is used to communicate with the object. If you
have it, you can communicate, if not, you can't.
- actormap: Mapping of refrs to the objects they represent. In
Goblins, if an object specifies it would like to "become" a new
message handler, this is updated (in a transactional way).
Can be used on its own, but only for non-distributed programs.
- vat: An event loop. Wraps the actormap datastructure, handles
passing messages to it. Handles messages one "turn" at a time,
however objects may send asynchronous messages to objects in any vat
that can be established a connection to and even do immediate calls
to other objects that are in the same vat ("near" each other).
- Machine: Some sort of abstract machine or OS process that may have
one or multiple vats in it. (Agoric does fancy things so that they
can even treat blockchains and other such things as abstract
"machines"... we're not doing anything so fancy in Goblins yet.)
Zooming in on the Vat
=====================
Looking just on the vat-and-deeper levels, this looks something like the
following, borrowed and adjusted a bit from MarkM's dissertation:
.-----------------------.
|Internal Vat Schematics|
'======================='
stack heap
($) (actormap)
.-------.----------------------. -.
| | | |
| | .-. | |
| | (obj) .-. | |
| | '-' (obj) | |
| __ | '-' | |
| |__>* | .-. | |- actormap
| __ | (obj) | | territory
| |__>* | '-' | |
| __ | | |
| |__>* | | |
:-------'----------------------: -'
queue | __ __ __ | -.
(<-) | |__>* |__>* |__>* | |- event loop
'------------------------------' -' territory
In the upper-right box is abstractly the actormap datstructure
representing references pointing to objects. If we just do synchronous
programming, we can add in the left-hand column which resembles a call
stack for call-and-return behavior. However these calls can only be
done between objects in this same actormap/vat. Adding in the bottom
row we see messages queued to be handled. Each message is handled,
one "turn" at a time, from this queue, kicking off a call stack (again,
the left hand column) starting with the message invoking some object in
the actormap/heap with some arguments. In general, when a "turn" is
complete, if there is a promise waiting to be fulfilled attached to this
message, it uses the return value from this first call.
Crossing vat and machine boundaries (hello CapTP)
=================================================
Of course, vats aren't just limited to speaking to just themselves.
We want to speak to other vats, including on other machines!
Visually, this looks something like the following (sorry, might render
better in the link to my talk above):
.----------------------------------. .----------------------.
| Machine 1 | | Machine 2 |
| ========= | | ========= |
| | | |
| .--------------. .---------. .-. .-. |
| | Vat A | | Vat B | | \______| \_ .------------. |
| | .---. | | .-. | .-| / | / | | Vat C | |
| | (Alice)----------->(Bob)----' '-' '-' | | .---. | |
| | '---' | | '-' | | | '--->(Carol) | |
| | \ | '----^----' | | | '---' | |
| | V | | | | | | |
| | .----. | | .-. .-. | .------. | |
| | (Alfred) | '-------/ |______/ |____---( Carlos ) | |
| | '----' | \ | \ | | '------' | |
| | | '-' '-' '------------' |
| '--------------' | | |
| | | |
'----------------------------------' '----------------------'
Here we see, with nested bulleted points representing "what contains
what":
- Machine 1, with a connection to Machine 2
- Vat A
- The object Alice (holding references to: Alfred, Bob)
- The object Alfred
- Vat B
- The object Bob (holding a reference to: Carol)
- Machine 2, with a connection to Machine 1
- Vat C
- The object Carol
- The object Carlos (holding a reference to: Bob)
At the boundaries of the vats are triangle-looking things. These, in
theory, represent tables of "live references".
- The top connection between the triangle-looking things represents
references Machine 1 has to objects in Machine 2.
- The top left triangle-looking thing is Machine 1's imports
(from Machine 2)
- The top right triangle-looking thing is Machine 2's expors
(to Machine 1)
- The bottom connection represents the reverse, Machine 2's references
to objects in Machine 1
- Bottom left being Machine 1 exporting to Machine 2
- Bottom right being Machine 2 importing from Machine 1
These tables are numerical indices. For example, Machine1+VatB's
reference to Carol in Machine2+VatC may look like the following:
- Machine1Imports: {<remote-carol-ref>: 3}
- Machine2Exports: {3: <carol-ref>}
(Side note, when I hear "import" and "export" and then look at the
arrows, I get confused, because I think of "import", arrow-wise, "being
shipped to" the side that is importing, such as packages "ship to" an
importer in a trading situation and "ship from" an exporter. The arrows
are being tricky though, because the we're importing a reference "to" an
object that never leaves its location.)
But this either isn't a complete picture, or doesn't represent other
CapTP implementations (remember, Goblins hasn't fully implemented it
yet), so caveats from "really existing" CapTP systems:
- Traditionally, imports/exports have been on the vat level, rather
than on the machine level
- Also usually there are two other tables in addition to
imports/exports: questions/answers. These corresponds to "future
resolutions to promises" (as well as some "promise pipelining" stuff
but we'll talk about that later). A way to think of this is: if Bob
sends a message to Carol using the <- operator in Goblins, Bob should
get back a response that will eventually be resolved with Carol's
response... but when we cross the network divide and Machine2+VatC
gets that message saying "hey, call Carol... and when you're done,
fulfill this thing", we need some way to refer to that.
*SCREECH!* Let's slam the brakes on that last statement for a second.
Because there could be another solution (ignoring promise pipelining for
a second) that gets rid of questions/answers: Machine1+VatB could set up
an export for Machine2+VatC that refers to the promise-resolver that
will fulfill Bob's promise. We could just say in the message to Carol:
"and once you have an answer, fulfill the promise with
<this-resolver-i-just-exported-for-you>"!
Well that seems to solve that just fine and dandy so why bloat our
protocol with these extra two question/answer tables? Shouldn't
import/export be fine?
The desiderata of promise pipelining
====================================
Well, we said we'd get back to "promise pipelining" at some point, so
here we go.
We could send a message to the remote car factory and say "make me a
car!" and hold onto that promise. We could wait for it to resolve to
get a reference to that car, but *only then* would we be able to tell it
to drive.
So this is:
.-- Ask car factory for car
|
| .--- Made the car, sending back the reference
| |
| | .--- Got the reference, tell the car I want to turn it on!
| | |
| | | .--- Turned on the car, telling you it makes a
| | | | "vroom vroom" noise
| | | |
| | | | .--- Finally I have heard my car go "vroom vroom"
| | | | |
V V V V V
B => A => B => A => B
That's a lot of round trips when we knew that we wanted to drive the car
immediately. What if we could instead say, "make me a car, and then
I want to turn it on immediately!" That would instead look like:
.--- Ask car factory for car, and once you have that car, turn it on
|
| .--- Okay I made the car.
| | Okay, now I will turn on that car, telling you it makes a
| | "vroom vroom" noise"
| |
| | .--- Now I get to hear my car go "vroom vroom" already!
| | |
V V V
B => A => B
This is nice for a few reasons: we can start talking about what we'd
like to do immediately instead of waiting for it to be a possibility.
But most importantly, it reduces the round trips of our system, which
are often the most expensive part in a networked environment:
"Machines grow faster and memories grow larger.
But the speed of light is constant and New York is not getting any
closer to Tokyo."
-- MarkM in Robust Composition: Towards a Unified Approach
to Access Control and Concurrency Control
So does that mean we need questions/answers too?
================================================
So now that brings us back to: do we need questions/answers in addition
to these imports/exports? And... actually I'm not so sure.
It strikes me that, when exporting, Machine1+VatB could say "I'm
allocating this object reference for you, and it's a resolver type".
Then when importing, Machine2+VatC could make note of that, and
when it resolves it, immediately make note in its imports table.
Thus when Carol has that answer ready for Bob, Vat C can make note
of that and still use the reference to Carol.
I still don't see the need for separate questions/answers tables.
Maybe I'm missing something. Maybe it's obvious in practice.
Cooperative acyclic garbage collection
======================================
Maybe Bob doesn't need the reference to Carol anymore, or maybe Bob
doesn't even exist anymore. If nobody in Vat B is holding onto a
reference to Carol anymore, then Vat B should let Vat C know so that Vat
C can sever that entry in their database to Carol. That should allow
Vat C to garbage collect Carol if nobody else is holding onto a
reference either.
If Bob and Carol both hold references to each other, neither might ever
GC. Let's hope that doesn't happen!
3-party introductions
=====================
Once you've implemented all that, what happens if Alice wants to spend a
message about Carol to Dave, but Dave is on Machine 3 in Vat D (not
pictured above). Well what the heck do we do now?
There's a whole thing about handoff tables. I feel like this is a
complicated subject, and one I want to write an entirely separate email
about because it still hurts my brain a little. The MarkM talk I linked
to above explains it reasonably though. So ok, we'll just say we can do
that. (There also has been something called "vines" used historically
though I've never been completely clear on what a "vine" is... maybe it
doesn't matter anymore.)
SturdyRefs, or handoff-only, or certificates?
=============================================
So far we've talked about capabilities using double-ended,
network-spanning c-lists (the import/export tables, numerically
ordered). Great... but how do you bootstrap a connection at all? Let's
say Bob in Vat B wants to talk to Carol in Vat C, but they don't have a
connection "yet". Well if Bob was *introduced* to Carol through someone
else (using those handoff tables or whatever) that seems fine. But this
is a weird bootstrapping problem. When Machine1+VatB+Bob has never
connected to *anyone* on *any other* machine, how on earth does Bob get
an entry point into the system at all?
This is one, but only one, justification for SturdyRefs. SturdyRefs are
long-lived network addresses, which we can think of like:
<object-id>@<vat-id>[.<machine-id>]
We could think that vat-id could be a public/(verification+encryption)
key fingerprint, and that immediately gives us a path to thinking about
how to send messages securely to <vat-id>. (<machine-id> is thus more
of a "hint" of how to get there... "oh yeah, I'm on this IP address or
whatevs".) <object-id> is what's called a swissNum, a sufficiently
unguessable random number / blob of randomly generated junk that we
shouldn't be able to brute force.
This is something we could put in a web hyperlink (indeed, "capability
URLs" are technically such a thing) or print on a business card as a QR
code, etc etc. Now we don't have to be born in the network to get
access to it. Once we set up a connection, we can, from there, start
setting up live references using our imports/exports tables between
vats/machines.
Sturdyrefs have some challenges:
- When do you make them expire / need to be renewed?
- They're easy to leak, and it can make re-establishing relationships
difficult if intrusion occurs. Not only do you give away all your
outgoing authority, this resembles a "we were broken into so now all
our users have to reset their passwords" type problem... but arguably
it's much worse because ocap systems may be constructed such that
users don't really know where all the sturdyrefs they rely on exist
in their machine.
- They don't work with systems like blockchains where "which can't hold
secrets" (though necessarily require on secrets being externalized).
So I guess we have a couple of other options:
- Apparently Agoric's stuff is rolling out without them but I'm not
really sure how. I know "bootstrap objects" exist but I think of
those as "system-level objects" and really only there (if they need
to be at all) as a special plumbing object, and aren't sufficient if
everyone has access to them. My best guess is that what you would do
is have Carol in Vat C *anticipate* Bob in Vat B's arrival... "When
Vat B gets here, let's pre-allocate a reference to Carol for them".
Is that how it works?
- Or we could use ocap certificate chains, eg like zcap-ld or CapCert
or the Zebra Copy stuff, etc etc. This adds some structural overhead
but pleasantly removes a large portion of the leaking risks;
recovering from an intrusion may leak some private data, but you
don't need to ask users something resembling "we were broken into so
plese reset your passwords" (but potentially much worse, because
you're now asking your users to debug their running object capability
systems).
Both of those are nice, but neither of them covers the "here's a
blog/social networking post where I mention something interesting" use
case. Particularly, consider if a post is encoded in some document
structure that is stored in something like tahoe-lafs (or datashards or
etc)... how do you encode it as a link in this offline-stored data?
So I'd like to get around the need for sturdyrefs, but I see use cases
for them where they're still desireable. And they're just such a dang
easy way to bootstrap connectivity in a system.
Store imports/exports tables in vats or machines?
=================================================
If multiple vats are on the same machine, who is responsible for the
imports/exports tables? Does each vat provide them? Or should it be on
the machine level?
I think there are a lot of tradeoffs here... admittedly this is one of
the things I'm struggling with most. I'll have more to say in a
forthcoming email maybe.
Store and forward vs break-on-disconnect
========================================
Assuming we have "live references" at all, we are left with some
decisions on what to do about connections. Maybe this is more of a
VatTP thing, I'm unsure, but there seem to be CapTP considerations.
Let's contrast two approaches:
- Live connections which break on disconnect: this was the E approach,
and you use a sturdyref (though maybe we could use a certificate
chain or whatever) to start the connection between vats/machines,
from which you bootstrap your access to live references. On
connection severance (for whatever reason), all live references
break and throw relevant errors.
This is an extremely sensible choice for a distributed video game
like Electric Communities, and seems extremely sensible for my own
use case likewise. If I disconnect from a real-time game, I want my
interface to reflect that.
- Store and forward networks where undelivered messages are always
"waiting in transit". This is really nice for peer to peer systems
where users may go offline a lot, and thus it's an appealing
direction to me for social networks. It could even be very appealing
for turn-based games. This is the direction Agoric is going, but
their motivation appears to be primarily "how do we collaborate with
blockchains" oriented.
I feel like there is a strong desiderata for both of these cases. I
will probably start with the former but I'd like to support the latter.
Is supporting both really feasible?
Procedures vs objects-with-methods as first class?
==================================================
Guy L. Steele nicely broke down how both objects are a poor man's
procedures, and procedures are a poor man's objects.
https://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html
This is a bit vague, because what "object" means is a bit vague:
http://www.mumble.net/~jar/articles/oo.html
So let's clarify that we're talking about "objects with methods" vs
"procedures", which as many know once you can have one, you can build
the other out of it:
- E chose to make objects-with-methods first-and-foremost: a procedure
is just a special kind of object that merely has a .run() method.
- Goblins goes down the classic Scheme route:
https://dl.acm.org/doi/10.1145/62678.62720
Procedures are first class, and some (but not all) procedures take a
first argument, which is used for method dispatch.
If we want to make Goblins and Agoric's SwingSet inter-compatible, how
do we do it? Which one "wins"?
Looking at the TC39 Eventual Send proposal suggests another path:
https://github.com/tc39/proposal-eventual-send
I personally find EventualGet to be highly disturbing, so let's ignore
it. Use getters, not attributes! :)
So EventualGet being ignored, this proposal gives EventualApply and
EventualSend. These correspond to procedures and method invocation
respectively. (Note that I don't like calling method invocation "send";
for me, "send" is something we support in Goblins and really means
sending an asynchronous message as opposed to call-and-return
invocation. That seems more correct anyway, and "send" thus resembles
dropping something off at the post office. I have never liked "send" as
a way to refer to "invoke a method" for this reason.)
Separating these out into two different ways of *calling things* (as
opposed to two different ways of *constructing things*) is
interesting, and actually seems justifiable to me. An object can itself
provide both "a way to be invoked as a procedure" and "a way to be
invoked with methods". I could support this (and it may even simplify
some "meta methods" headaches I was considering surrounding supporting
the interfaces grant we're working on... more on that later).
It also removes the need to "make a decision" of whether or not to make
procedures and objects-with-methods two separate things or to build it
out of one thing's abstraction. As long as we support both ways of
invoking/sending, we are good.
(But seriously, fire EventualGet into the sun.)
Message ordering
================
I don't understand E-Order and it kind of intimidates me. I'm just
being honest. It would be great to rectify this.
I think as a first step I'm just going to do a roughly-FIFO type thing
that doesn't do too much in terms of message ordering across vat
boundaries. I know there are reasons expressed by MarkM and especially
apparently Dean, but I'm actually unsure: is the complexity worth it?
And how hard is it to do, really?
Maybe in the future I'll Get It (TM) but I'll probably start out without it.
BONUS: Distributed cyclic GC
============================
Obviously I am not going to do this anytime soon but it breaks my brain
that MarkM told me something like "Original E had distributed cyclic GC
support". Apparently it is complicated, involves some hooks into the
local garbage collector, and is rarely needed and tricky enough that
Mark said he doesn't bother to ask for it anymore, but I'm just gonna
say that it both blows my mind and completely befuddles me that some
version of E ever had something like this.
What huh what huh what? And was it ever documented how it works so that
sages in the future can help us sweep out our networks from unneeded
junk? Or is it an idea lost to the sands of time?
I find it personally fairly mystifying.
BONUS: What happens when multiple vats/machines resolve the same resolver?
==========================================================================
This is kind of a side note so I left it for the end.
It strikes me that if promise-resolver pairs are first class, one could
do some really goofy things and hand the same resolver to two different
vats to resolve... first one wins. If Vat C thinks it just successfully
resolved the resolver, it can immediately move forward with pipelined
messages waiting on the result, while Vat D thinks the same, and moves
forward with messages waiting on its own conflicting result. Both of
them may think they can move forward with plans when it actually isn't
safe.
The right answer seems to be, "promise pipelining is something only
set up at the CapTP layer and isn't exposed as something first class
so users shouldn't be able to create messed up situations like this
because we never gave them first-class ability to do it... and there's
only so much you can trust stuff moving across the network layer
in opaque actor type systems anyway."
Which seems true enough for our live-actor'y things. I bet you could
create some sneaky vulnerabilities in a cross-blockchain system what
wasn't anticipating this and took advantage of promise pipelining, and
wasn't engineered to handle this scenario. Dunno.
What's next, assuming I keep sending messages about these
=========================================================
Hi, do you hate this thread yet?
I've left some things I'm uncertain about above. Thoughts welcome.
I'll also share more as I implement, assuming people are open to me
continuing to do so on this list.
I think in an upcoming email I may try to break down some of the common
message types sent in CapTP... that might help me think about what *I*
should implement, too.
But I guess we'll see... what do you think? Is this, and subsequent
similar, message(s) worthwhile/welcome?
- Chris