Chip Morningstar writes:
>> On Dec 16, 2020, at 12:30 PM, Christopher Lemmer Webber <
cwe...@dustycloud.org> wrote:
>>
>> 1) How does SwingSet currently handle a failure of a vat turn?
>> Are state changes, such as the decrement of an integer value stored
>> under some variable, automatically reset?
>> I would think the right answer would be "yes", but I haven't been
>> able to imagine how it would be done with the present architecture I
>> understand Agoric to be using without either:
>>
>> a) doing a full snapshot each turn, from the VM level
>>
>> b) doing a full snapshot each turn, jhu-paper style
>>
>> c) starting with initial conditions and replaying all messages to
>> return to current state, which would mean that failures would
>> result in quadratic replay behavior, so I cannot imagine this is
>> the case (or if it is, it must be a temporary thing)
>
> It’s a little unclear to me from the context what you mean by “failure
> of a vat turn”.
>
> If you mean “what do you do if a vat fails?”, the answer is: we kill
> the vat. Since vats are deterministic, if it fails once it will
> always fail and so there’s no point to trying to recover; the vat is
> considered to have broken the rules and is terminated as if the turn
> had never happened.
Ah... well, let me be clearer. What I mean is: what happens if an
exception is thrown by code running within the turn?
My impression is that in E, an uncaught exception would not result in
termination of the vat, but all promises which would be resolved with
the return-value of said turn would be broken. Perhaps I am wrong, but
I interpreted a contract/guard/dynamic-type-check error in the makeMint
example in the ode to be more or less throwing an uncaught exception,
and any relevant promises would be broken but the vat would march
onward.
http://erights.org/elib/capability/ode/ode-capabilities.html
It seems that aside from the type annotations, a failure at the
unsealer.unseal(...) part of the code would mean that the turn would be
terminated, but the vat would not be terminated in such a way as to mean
"would not receive future messages". (Otherwise, this code appears to
make the mint very fragile!)
I assumed that the same approach was being taken with javascript here.
Is that wrong?
> If you mean “what do you do if the hosting environment fails somehow?”
> (e.g., the process dies due to a hardware glitch, etc.), the answer is
> (c). As you speculate, this is intended as a temporary thing, but
> perhaps not as temporary as you might hope. The longer term plan is
> (a), but given that our principal target environment is blockchain, we
> need to have the snapshot be part of the consensus state, which
> requires much more careful specification and implementation than if
> all you needed to do was something more like a heap dump. The good
> news is that to a very rough first approximation, such failures don’t
> happen, so the primary use case is spinning up a new validator, which
> typically can afford to spend some time. Also, because of the way we
> log the kernel state and transaction history, vats can be replayed in
> parallel (though this optimization wrinkle is not currently
> implemented because we haven’t needed it yet).
Not what I was asking, but interesting to know.
> Also, a bit of terminology that may be clarifying, especially if you
> are looking at the code: we use the word “turn” to refer to one pass
> through the underlying JavaScript engine’s event loop (what a lot of
> engine implementors call a “microtask”). We use the word “crank” to
> refer to one pass through our higher level event loop, i.e., all of
> the activity that happens in a vat as a consequence of a message
> delivery into that vat, which typically encompasses multiple turns;
> basically, we let the vat run to quiescence, at which point it no
> longer has agency and control returns to the kernel. We use the word
> “block” to refer to a series of cranks that are treated as a unit for
> purposes of resource management and consensus.
That's interesting. I'm not sure I understand what multiple turns in a
crank would look like in practice. Do you have an example?
> The kernel state is persistently committed at crank boundaries (though
> a later, more advanced implementation may choose to actually perform
> the commit at block boundaries to amortize the overhead, with replay
> used to fill in the gaps if there’s a failure, kind of the way
> sophisticated databases use a mixture of roll-forward and roll-back
> strategies) and contains everything needed to reconstruct the swingset
> state as of that moment in time; however, we can’t capture
> idiosyncratic memory state inside the vats (consider, for example,
> closed over variables) without the complicity of the underlying
> JavaScript engine, hence the persistence and failure story above.
I see. Yes it was the closed over variables bit that I was especially
confused about.
Does the kernel run several vat cranks simultaneously, or is it
one-vat-crank-at-a-time?
Let's say I do a wavy-dot-send or something... that ends up in the
queue. Ok. But if multiple vats are running simultaneously, somewhere
there must be something that binds multiple messages to be queued up for
this particular turn/crank, if they are indeed released all at once at
the end. I don't understand how that's being done without some
context-sensitive information if it's multiple-cranks-at-a-time... you
would at least need to make that somehow some "thread exlusive data",
I'd think. But I might just be thinking wrong.
But even the fact that wavy-dot stuff ends up in the queue somewhere
means that somewhere in the system, there is something that is routing
that wavy-dot-sending to a globalish queue I'd think...
>> 3) For that matter, is there a description of what the swingset
>> nanokernel is, how it is booted up and what the available system
>> calls are? I'm curious to compare.
>
> Obviously you’ve already found the `docs` directory. The stuff in
> there describes a lot of this, though some of it may be slightly out
> of date (as documentation often is, alas) and of course it’s organized
> in a somewhat piecemeal fashion rather than as an overall Principles
> Of Operation style presentation (the latter is something we very much
> should do, but it’s just not at the front of our priority queue right
> now). The other resource, of course, is the code itself. With
> respect to system calls, a good place to start would be the file
> `kernelSyscall.js`
Thanks, will take a look.
>> 4) Is it true that "async/await" are currently discouraged but not yet
>> banned? I was surprised to see them in the ERTP implementation given
>> all the warnings MarkM has levied against re-entrancy attacks with
>> coroutines? (Okay I guess this is a more general Agoric question,
>> less SwingSet.)
>
> These are discouraged, but as far as I know there are no plans to ban
> them. It’s also the case that we regard them as more problematic in
> kernel and contract code and much less so in tooling (in particular,
> there are a lot of unit test things that become extremely challenging
> to implement without them, given the test framework we’re using
> (Ava)). It’s more of a code hygiene thing rather than a question of
> fundamental semantics. After much internal debate, we’ve decided that
> an `await` at the top level of a function (i.e., not inside an loop or
> conditional) is usually OK, and then we have lint rules that complain
> about any await that is not at the top level.
Got it.
> Hope this was helpful, but feel free to keep tossing questions at us
> in any case.
>
> — Chip
Very helpful, thank you for taking the time... I know you all are very
busy.
- Chris