Communicating Event Loops and transactional turns

44 views
Skip to first unread message

Mark Miller

unread,
Aug 12, 2018, 12:16:27 AM8/12/18
to Discussion of E and other capability languages, fr...@googlegroups.com, cap-...@googlegroups.com
[cc'ing friam and cap-talk because e-lang has been so idle lately. But e-lang is the right place for this, so further discussion on this topic should continue on e-lang.]

In E, if code in a turn went into an infinite loop, that vat was hosed. 

Separately, within our model of persistence, if a vat crashed, then it was restarted from the last checkpointed state, which was ideally the beginning of the current turn. When it restarted, it might retry the same event. If it always retries that event, and if that event always causes a crash, then the vat would also be hosed.

Previously we've always said that an ocap vat defends only integrity at object granularity, but that a vat is the minimal unit for defending availability. If Alice is to protect her availability from Bob going into an infinite loop, then Bob must be in a separate vat, and Alice and Bob may interact only asynchronously.

We are now rebuilding the Communicating Event Loops model to run on blockchains. On blockchains, all computation is resource constrained. It must be paid for in a finite amount of some unit, now universally called "gas". Thus, infinite loops are reliably turned into transaction abort. For blockchains, the bookkeeping needed to rewind to the previous turn boundary is not optional; nor is it expensive when compared to the rest of the platform. For non-blockchains where we still do this bookkeeping, we can use a non-deterministic watchdog timer instead of gas.

Such transactional rewind gives us new degrees of freedom. 

Imagine that Alice wishes to use Bob within the same vat, invoking Bob only asynchronously with some kind of enforced budget, to protect Alice's availability. But Alice passes some of her own objects to Bob that Bob can invoke Alice's objects synchronously during such a turn. The constraint is that Bob's stack frames may only appear when the frame at the bottom of the call stack is a Bob frame, and that everything that happens during that turn draws on the limited budget Alice set up. Within such a turn, Alice and Bob can call back and fourth synchronously, since there's still a Bob frame at the bottom of that stack.

If that turn exhausts the budget Alice allocated for that turn, then the turn aborts and the vat goes back to the state right before Bob's turn started. 

Bob's turn might also send asynchronous messages, but we adopt the Waterken invariant that no messages are ever released from uncommitted turns. If the turn aborts, then it also did not send any messages. If the turn commits, then we need to think about what budget the turns caused by those messages draw on. But let's worry about that later.

Once Bob's turn itself aborts, say, from exhausting its budget, what should happen next? I propose that Bob's turn acts as if this first invocation throws an exception instead of doing anything else. IOW, rather than starting Bob's turn, the promise for the result of the turn becomes a rejected (broken) promise, reducing this case to the normal asynchronous exception handling coordination.

With all that mechanism, we could support transactional abort of a turn for other purposes. We could provide an abort(error) operation that, in general, aborts the turn, reverts to the state before the turn starts, and immediately rejects the promise for the turn's result with that error. Alice can let Bob call her objects synchronously while still protecting her availability from Bob's profligacy.

Does this seem like a good idea?

--
  Cheers,
  --MarkM

Kevin Reid

unread,
Aug 12, 2018, 10:35:14 AM8/12/18
to e-l...@googlegroups.com
On Sat, Aug 11, 2018 at 9:16 PM Mark Miller <eri...@gmail.com> wrote:
Imagine that Alice wishes to use Bob within the same vat, invoking Bob only asynchronously with some kind of enforced budget, to protect Alice's availability. But Alice passes some of her own objects to Bob that Bob can invoke Alice's objects synchronously during such a turn. The constraint is that Bob's stack frames may only appear when the frame at the bottom of the call stack is a Bob frame, and that everything that happens during that turn draws on the limited budget Alice set up. Within such a turn, Alice and Bob can call back and fourth synchronously, since there's still a Bob frame at the bottom of that stack.
 
If that turn exhausts the budget Alice allocated for that turn, then the turn aborts and the vat goes back to the state right before Bob's turn started. 

Bob's turn might also send asynchronous messages, but we adopt the Waterken invariant that no messages are ever released from uncommitted turns. If the turn aborts, then it also did not send any messages. If the turn commits, then we need to think about what budget the turns caused by those messages draw on. But let's worry about that later.

Once Bob's turn itself aborts, say, from exhausting its budget, what should happen next? I propose that Bob's turn acts as if this first invocation throws an exception instead of doing anything else. IOW, rather than starting Bob's turn, the promise for the result of the turn becomes a rejected (broken) promise, reducing this case to the normal asynchronous exception handling coordination.

Given this behavior, we can say that it is not true that no messages are released from exhausted turns. Rather, exactly one message is released: the exception. This means that Alice can attack Bob in the sense of causing Bob to reveal information but not remember having done so. Even if you make the exception carry no payload, you still have a strong instance of the 'anthropic side channel': Alice can leak one bit at a time by deciding whether or not to abort, then restart Bob with different arguments.

This is not a problem if the system has no private state, as is true of a blockchain system not also using homomorphic encryption. But if you adopt the same principle in a conventional computing platform with private state, it could be quite surprising.

Mark Miller

unread,
Aug 12, 2018, 11:27:01 AM8/12/18
to Ben Laurie, Ben Laurie, Discussion of E and other capability languages
Hi Ben, please subscribe at https://groups.google.com/forum/#!forum/e-lang and resend. Thanks.


On Sun, Aug 12, 2018 at 6:18 AM 'Ben Laurie' via friam <fr...@googlegroups.com> wrote:
BTW, I appear not to be able to post to e-lang.

On Sun, 12 Aug 2018 at 14:13, Ben Laurie <be...@google.com> wrote:
Adding blockchains and calling it gas doesn't really alter the fact that you're using timeouts, which is not exactly a new idea.

The argument against timeouts, AFAIK, is that it's hard to set them right. But clearly they are widely used, so I guess it's not impossible.

To answer your question, yes, the timeout alternative I mention as

> For non-blockchains where we still do this bookkeeping, we can use a non-deterministic watchdog timer instead of gas.

is indeed similar but non-deterministic. The only difference between this and gas is that gas exhausts deterministically. Outside of blockchains this determinism rarely matters. The determinism of gas does not make it much easier to think about (though it is nice that gas exhaustion is reproducible under debugging).

How can we build highly available systems (like for example blockchains) without resource limits and visible termination of agents that have exhausted their budgets? Is there any adequate means of resource-limit-and-visible-termination that is less problematic than timeouts or gas?
 


--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To post to this group, send email to cap-...@googlegroups.com.
Visit this group at https://groups.google.com/group/cap-talk.
To view this discussion on the web visit https://groups.google.com/d/msgid/cap-talk/CAK5yZYiq_BnOd4WovGEccxYvpULgar9YKXNrbZA0G01XE4oTBA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
To post to this group, send email to fr...@googlegroups.com.
Visit this group at https://groups.google.com/group/friam.
For more options, visit https://groups.google.com/d/optout.


--
  Cheers,
  --MarkM

Mark Miller

unread,
Aug 12, 2018, 12:04:09 PM8/12/18
to Discussion of E and other capability languages
Any system with high availability under mutual suspicion must treat computational resources as finite and terminate runaways (Bob) that exhaust their budget. Most practical defenses of high availability will also enable Bob's counter-parties (Alice) to sense Bob's termination and somehow recover from Bob's unavailability, which introduces the leak. Previously, I had always considered the unit for such budgeting and preemptive termination to be a vat as a whole, which is very Erlang-like. Waterken masks vat crash-and-restart, so this does not leak. Waterken admits visible permanent vat death, which if done on purpose by the terminated vat (Bob) also does not inadvertently leak info to Alice. However, I'm not sure what Waterken's perspective is on Bob's genuine resource exhaustion. Is this visible to counter-parties (Alice in another vat) so they can somehow react/recover?

Are there any practical approaches to high availability under mutual suspicion that terminate runaways, enable counter-parties to somehow continue, and does not threaten confidentiality by this termination channel? (I say "practical" because there are info-flow systems that prevent leakage by this termination channel.) How do KeyKOS and/or seL4 deal with this?

The new observation is that, given the bookkeeping needed for easy rewind to the previous turn boundary, we can treat the turn as the unit of resource exhaustion, visible termination, and recovery of counter-parties. This does not threaten integrity; it allows Alice to defend her availability from a runaway Bob that she still interacts with synchronously; but it does indeed make the anthropic side channel an inter-turn intra-vat issue, where previously is was only a coarser-grain inter-vat issue.


Finally a terminology nit: When Alice observes whether Bob is alive or not and infers Bob's secrets, that is the _termination side channel_. When Alice observes that she herself is alive and infers Bob's secrets, that is the _anthropic side channel_. In our previous architecture where only a vat as a whole is the unit of budgeting and pre-emptive termination, when Alice and Bob are in one vat, Alice can still sense Bob's secrets by the anthropic side channel. The proposed shift would instead enable Alice to also sense Bob's secrets by a termination side channel, which is indeed worse.

--
You received this message because you are subscribed to the Google Groups "e-lang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to e-lang+un...@googlegroups.com.
To post to this group, send email to e-l...@googlegroups.com.
Visit this group at https://groups.google.com/group/e-lang.
To view this discussion on the web visit https://groups.google.com/d/msgid/e-lang/CANkSj9Vp%3DAWQ_4rPVMnqSfSTdy_tapcRA%2B8Lc8CX%2B%3DtWE%2BN9QQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
  Cheers,
  --MarkM

Mark Miller

unread,
Aug 12, 2018, 12:09:26 PM8/12/18
to Discussion of E and other capability languages
On Sun, Aug 12, 2018 at 9:03 AM Mark Miller <eri...@gmail.com> wrote:


On Sun, Aug 12, 2018 at 7:35 AM Kevin Reid <kpr...@switchb.org> wrote:
On Sat, Aug 11, 2018 at 9:16 PM Mark Miller <eri...@gmail.com> wrote:
Imagine that Alice wishes to use Bob within the same vat, invoking Bob only asynchronously with some kind of enforced budget, to protect Alice's availability. But Alice passes some of her own objects to Bob that Bob can invoke Alice's objects synchronously during such a turn. The constraint is that Bob's stack frames may only appear when the frame at the bottom of the call stack is a Bob frame, and that everything that happens during that turn draws on the limited budget Alice set up. Within such a turn, Alice and Bob can call back and fourth synchronously, since there's still a Bob frame at the bottom of that stack.
 
If that turn exhausts the budget Alice allocated for that turn, then the turn aborts and the vat goes back to the state right before Bob's turn started. 

Bob's turn might also send asynchronous messages, but we adopt the Waterken invariant that no messages are ever released from uncommitted turns. If the turn aborts, then it also did not send any messages. If the turn commits, then we need to think about what budget the turns caused by those messages draw on. But let's worry about that later.

Once Bob's turn itself aborts, say, from exhausting its budget, what should happen next? I propose that Bob's turn acts as if this first invocation throws an exception instead of doing anything else. IOW, rather than starting Bob's turn, the promise for the result of the turn becomes a rejected (broken) promise, reducing this case to the normal asynchronous exception handling coordination.

Given this behavior, we can say that it is not true that no messages are released from exhausted turns. Rather, exactly one message is released: the exception. This means that Alice can attack Bob in the sense of causing Bob to reveal information but not remember having done so. Even if you make the exception carry no payload, you still have a strong instance of the 'anthropic side channel': Alice can leak one bit at a time by deciding whether or not to abort, then restart Bob with different arguments.

This is not a problem if the system has no private state, as is true of a blockchain system not also using homomorphic encryption. But if you adopt the same principle in a conventional computing platform with private state, it could be quite surprising.


Any system with high availability under mutual suspicion must treat computational resources as finite and terminate runaways (Bob) that exhaust their budget. Most practical defenses of high availability will also enable Bob's counter-parties (Alice) to sense Bob's termination and somehow recover from Bob's unavailability, which introduces the leak. Previously, I had always considered the unit for such budgeting and preemptive termination to be a vat as a whole, which is very Erlang-like. Waterken masks vat crash-and-restart, so this does not leak. Waterken admits visible permanent vat death, which if done on purpose by the terminated vat (Bob) also does not inadvertently leak info to Alice. However, I'm not sure what Waterken's perspective is on Bob's genuine resource exhaustion. Is this visible to counter-parties (Alice in another vat) so they can somehow react/recover?

Are there any practical approaches to high availability under mutual suspicion that terminate runaways, enable counter-parties to somehow continue, and does not threaten confidentiality by this termination channel? (I say "practical" because there are info-flow systems that prevent leakage by this termination channel.) How do KeyKOS and/or seL4 deal with this?

The new observation is that, given the bookkeeping needed for easy rewind to the previous turn boundary, we can treat the turn as the unit of resource exhaustion, visible termination, and recovery of counter-parties. This does not threaten integrity; it allows Alice to defend her availability from a runaway Bob that she still interacts with synchronously; but it does indeed make the anthropic side channel an inter-turn intra-vat issue, where previously is was only a coarser-grain inter-vat issue.

Above, I violate exactly the terminology distinction I then try to clarify below. The side channel immediately above is a termination side channel, not an anthropic side channel.




Finally a terminology nit: When Alice observes whether Bob is alive or not and infers Bob's secrets, that is the _termination side channel_. When Alice observes that she herself is alive and infers Bob's secrets, that is the _anthropic side channel_. In our previous architecture where only a vat as a whole is the unit of budgeting and pre-emptive termination, when Alice and Bob are in one vat, Alice can still sense Bob's secrets by the anthropic side channel. The proposed shift would instead enable Alice to also sense Bob's secrets by a termination side channel, which is indeed worse.

--
You received this message because you are subscribed to the Google Groups "e-lang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to e-lang+un...@googlegroups.com.
To post to this group, send email to e-l...@googlegroups.com.
Visit this group at https://groups.google.com/group/e-lang.
To view this discussion on the web visit https://groups.google.com/d/msgid/e-lang/CANkSj9Vp%3DAWQ_4rPVMnqSfSTdy_tapcRA%2B8Lc8CX%2B%3DtWE%2BN9QQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


--
  Cheers,
  --MarkM


--
  Cheers,
  --MarkM

Alan Karp

unread,
Aug 12, 2018, 3:11:58 PM8/12/18
to cap-...@googlegroups.com, e-l...@googlegroups.com, <friam@googlegroups.com>
I'm a bit confused about the conditions.  You say, "everything that happens during that turn draws on the limited budget Alice set up."  How is that enforced?  Is Bob an object that Alice passes some budget on invocation?  Is Bob an object with a budget of its own?  Does Bob have another source of budget, perhaps something left over from Carol's invocation?  

Ben Laurie's comment about timeouts is on the mark.  A timeout is a heuristic, as is the amount of budget Alice sets up.  It appears that Alice has an incentive to over provision Bob, because she loses everything she allocated to Bob if he doesn't finish the task.  Might Bob use that extra budget to do Bob's task at Alice's expense?  

The problem always is setting the right value of a heuristic.  In this example, Bob might be within epsilon of having enough budget to complete the task.  Would Alice be willing to provide it?  Could Alice provide an object Bob can invoke synchronously to get a bit more budget?  If so, that might be a better mechanism.  Say that Alice gives Bob a small amount of budget to get started.  Bob can then ask for more, perhaps proving progress toward completing Alice's task.  Alice's object could then allocate a bit more budget to Bob, and so on.  To me, this approach feels like the way Digital Silk Road bootstrapped trust.

--------------
Alan Karp


On Sat, Aug 11, 2018 at 9:16 PM Mark Miller <eri...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To post to this group, send email to cap-...@googlegroups.com.
Visit this group at https://groups.google.com/group/cap-talk.
To view this discussion on the web visit https://groups.google.com/d/msgid/cap-talk/CAK5yZYiq_BnOd4WovGEccxYvpULgar9YKXNrbZA0G01XE4oTBA%40mail.gmail.com.

Baldur Jóhannsson

unread,
Aug 14, 2018, 4:39:36 PM8/14/18
to e-lang
On Sunday, 12 August 2018 04:16:27 UTC, Mark Miller wrote:
[cc'ing friam and cap-talk because e-lang has been so idle lately. But e-lang is the right place for this, so further discussion on this topic should continue on e-lang.]

In E, if code in a turn went into an infinite loop, that vat was hosed. 

Separately, within our model of persistence, if a vat crashed, then it was restarted from the last checkpointed state, which was ideally the beginning of the current turn. When it restarted, it might retry the same event. If it always retries that event, and if that event always causes a crash, then the vat would also be hosed.

Previously we've always said that an ocap vat defends only integrity at object granularity, but that a vat is the minimal unit for defending availability. If Alice is to protect her availability from Bob going into an infinite loop, then Bob must be in a separate vat, and Alice and Bob may interact only asynchronously.

We are now rebuilding the Communicating Event Loops model to run on blockchains. On blockchains, all computation is resource constrained. It must be paid for in a finite amount of some unit, now universally called "gas". Thus, infinite loops are reliably turned into transaction abort. For blockchains, the bookkeeping needed to rewind to the previous turn boundary is not optional; nor is it expensive when compared to the rest of the platform. For non-blockchains where we still do this bookkeeping, we can use a non-deterministic watchdog timer instead of gas.

There are some issues that arise from using non-deterministic watchdog timers. The issues arise basically from their non-deterministic nature, so I wont go further into it here. Some MCU platforms (I think a variation of mecrisp) allows for deterministic exec-cycle down counting watchdog limits. This means that it is not that hard nor tedious to implement.

 

Such transactional rewind gives us new degrees of freedom. 

Imagine that Alice wishes to use Bob within the same vat, invoking Bob only asynchronously with some kind of enforced budget, to protect Alice's availability. But Alice passes some of her own objects to Bob that Bob can invoke Alice's objects synchronously during such a turn. The constraint is that Bob's stack frames may only appear when the frame at the bottom of the call stack is a Bob frame, and that everything that happens during that turn draws on the limited budget Alice set up. Within such a turn, Alice and Bob can call back and fourth synchronously, since there's still a Bob frame at the bottom of that stack.

If that turn exhausts the budget Alice allocated for that turn, then the turn aborts and the vat goes back to the state right before Bob's turn started. 

Bob's turn might also send asynchronous messages, but we adopt the Waterken invariant that no messages are ever released from uncommitted turns. If the turn aborts, then it also did not send any messages. If the turn commits, then we need to think about what budget the turns caused by those messages draw on. But let's worry about that later.

Once Bob's turn itself aborts, say, from exhausting its budget, what should happen next? I propose that Bob's turn acts as if this first invocation throws an exception instead of doing anything else. IOW, rather than starting Bob's turn, the promise for the result of the turn becomes a rejected (broken) promise, reducing this case to the normal asynchronous exception handling coordination.

Humm... I think some people would be tempted to write an retry membrane that slowly increases the gas budget until this kind of rejection no longer happens.
I am not sure if it is to ill or good though. 

With all that mechanism, we could support transactional abort of a turn for other purposes. We could provide an abort(error) operation that, in general, aborts the turn, reverts to the state before the turn starts, and immediately rejects the promise for the turn's result with that error. Alice can let Bob call her objects synchronously while still protecting her availability from Bob's profligacy.

Does this seem like a good idea?

It is a start of one, yes. But I am curious of the case when Bob is in seperate vat from Alice yet Alice has control of the 'gas budget' that Bob will run under when handling the message from Alice. In veiled or private runtime enviroments (not public state visible blockchains) this could give Alice some idea on the complexity of Bobs implementation and possibly leak information on some private state.


--
  Cheers,
  --MarkM

Dan Connolly

unread,
Aug 14, 2018, 7:14:07 PM8/14/18
to e-l...@googlegroups.com
On Sun, Aug 12, 2018 at 11:03 AM, Mark Miller <eri...@gmail.com> wrote:
> Are there any practical approaches to high availability under mutual suspicion that terminate runaways, enable counter-parties to somehow continue, and does not threaten confidentiality by this termination channel? (I say "practical" because there are info-flow systems that prevent leakage by this termination channel.) How do KeyKOS and/or seL4 deal with this?

seL4 reifies access to the CPU and RAM as capabilities. But maybe
that's not much of an answer.

Looking more closely, I suppose seL4 doesn't terminate runaways
(though it seems to suspend them indefinitely if they throw an
exception that they are not set up to handle). It uses a preemptive
scheduler to enable counter-parties to continue.


For reference: excerpts from

seL4 Reference Manual Version 10.0.0 May 2018
https://sel4.systems/Info/Docs/seL4-manual-latest.pdf


THREADS AND EXECUTION

6.1.3
Scheduling
"seL4 uses a preemptive round-robin scheduler with 256 priority levels."

6.1.4 Exceptions
"if the lookup fails then no exception message is delivered and the
thread is suspended
indefinitely. "

--
Dan Connolly
http://www.madmode.com/
Reply all
Reply to author
Forward
0 new messages