On Thu, Jun 27, 2013 at 2:50 PM, Kenton Varda <temp...@gmail.com> wrote:
> https://gist.github.com/kentonv/5880714
32-bit callIds may not be enough without some wrapping support.
What does releasing a capability mean? Is it guaranteed that a new
call cannot target a released capability? Can the endpoint that
created a capability destroy it asynchronously?
Cancel may be problematic -- the callee will probably have to store
the list of capabilities returned by a successful call indefinitely so
it knows to release them if the call is later canceled.
Should Fail contain an Object to encode a richer exception? (Quite
possibly not -- the runtime interface could get unnecessarily
complicated.)
--
How will you handle fault-tolerance? You mention out-of-order delivery, but what about frames that are delivered more or less than one time?
On Fri, Jun 28, 2013 at 3:15 PM, Geoffrey Romer <gro...@google.com> wrote:
How will you handle fault-tolerance? You mention out-of-order delivery, but what about frames that are delivered more or less than one time?An excellent question.I don't think that this is something that can be done completely transparently. Two generals' problem, etc. At the end of the day, all RPC methods are going to have to be idempotent at the application level. At that point, the RPC system itself can implement optimizations which squelch known repeats, but these don't have to be 100% reliable. The callee app may receive a call twice, and simply has to carry it out twice. On the caller side, the RPC system could retry automatically, although my experience at Google is that most apps disabled this and implemented their own retry semantics (to be fair, I mostly worked on interactive servers), so I'm not sure how much effort should be put into that.
All that said, I do want the protocol to be amenable to UDP. This probably means we need to define retry rules for individual frames. Maybe something like...- Call should be retried until it is ack'd by a Return or a Fail. (Do we need a separate way to ack calls that take a while?)
- Cancel is not ack'd; the caller knows it succeeded when it gets a Return or Fail.- If a spurious Cancel (one whose call ID matches no current call) is received, a Fail should be sent back.
- Spurious Returns or Fails should be ignored.
Open problems:- I think Release needs to be ack'd. This may imply that Release messages need their own IDs.
- I don't think call IDs are reusable in this scenario, at least in theory. In practice, wrap-around would probably take long enough to purge any hung calls. But perhaps we should go ahead and use 64-bit IDs anyway?
- A lower layer managing the UDP connection may want to send pings and such. Should it define its own outer frame to distinguish these from RPC packets or should we extend Frame itself?
- Anything else?
-Kenton
That doesn't match my experience, but then my experience is with backend servers. I've certainly seen systems implement their own retry semantics, but I don't recall seeing them turn off the lower-level retry.
In any event, Google is such an outlier in most respects that experience there may not be directly applicable: Cap'n Proto should certainly support Google-scale applications, but it also needs to scale down to support projects with much more constrained bandwidth, less reliable networks, and fewer engineers to throw at RPC optimization.
- Cancel is not ack'd; the caller knows it succeeded when it gets a Return or Fail.- If a spurious Cancel (one whose call ID matches no current call) is received, a Fail should be sent back.Doesn't this make Cancels non-idempotent? How should a client handle a Fail response?
- Spurious Returns or Fails should be ignored.I assume you mean they're not exposed to the client code; they should still at least be acked, right?
Open problems:- I think Release needs to be ack'd. This may imply that Release messages need their own IDs.You're probably going to have to recover from misbehaving clients that fail to Release properly for one reason or another anyway, and the mechanism for doing so (e.g. LRU garbage collection?) might handle dropped Releases without the need for explicit acks. OTOH it might not (e.g. if the strategy is to just detect and disconnect misbehaving clients), so maybe you do need this to maximize implementation flexibility.
- I don't think call IDs are reusable in this scenario, at least in theory. In practice, wrap-around would probably take long enough to purge any hung calls. But perhaps we should go ahead and use 64-bit IDs anyway?I can't think of any plausible use case where 32 bits would be insufficient (assuming of course that you skip IDs that are still live), but this is starting to feel like "64K ought to be enough for anybody". Among other things, 64 bits would give you the option of assigning IDs randomly, which could be useful in some settings.
- Spurious Returns or Fails should be ignored.I assume you mean they're not exposed to the client code; they should still at least be acked, right?I'm not sure they need to be ack'd, since the caller will retry if it doesn't get a response. The callee can hold on to the response for a little while in order to be able to re-send it in response to duplicate calls.If the response contains capabilities that need release, things get a little tricky. If the callee receives a duplicate Call shortly after sending the corresponding Return, it doesn't know if the caller sent this Call just before getting the Return (in which case it successfully received and is using the returned caps) or if it failed to receive the Return altogether (in which case it will use the caps in the second Return). So, the callee better return exactly the same response with the same caps. To that end, it will have to hold on to the response message for a little while. Once it has dropped the response message, it will have to respond to any duplicate Call by releasing the caps and replying with a Fail.
(Side note: I'm trying to avoid saying "client" and "server" here since the protocol is symmetric, but might it make sense to use those terms in the context of a single Call? Speaking of which, you use those terms in the documentation of CapDescriptor.needsRelease, but it's not clear to me what they mean in that context.)
A related question: have you thought about supporting segmented messages and/or streaming at the protocol level? Ack and keepalive messages could conceivably be implemented as initial segments of a multi-segment message or stream (although the semantics are a little different since you don't need to fill in gaps).
> Yes, I think you need a separate ack, at least as an option (probably theMine's local anyway, so it doesn't help with this particular use.
> default, since it's doubtful if most clients will have access to Andrew's
> magical transport layer).
> As you probably know, tail latency (latency of slow requests, relative toI wonder if this will be necessary once TCP tail loss probing is
> the median latency) is a critical performance characteristic of many
> distributed systems, and retries are a key tool for controlling it. However,
> retry-based strategies only work if a likely failure can be detected within
> a small fraction of the total expected latency (if you can't detect a
> failure except by noticing that the final answer is slow to arrive, you've
> doubled the failure-case latency).
widely available. See:
https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6ba8a3b19e764b6a65e4030ab0999be50c291e6c
The TCP / other lower layer stack probably has a better idea of RTT
than capnp anyway.
This discussion is really interesting. Kudos to everyone.
To @Andrew Lutomirski : Could you outlight me reasons for thrift
being "dog slow" ? I run Thrift in several projects and I haven't
noticed performances issues so far...but I don't run a
massive network service.
Thanks, Geoffrey, you brought up a lot of good points I hadn't really thought of. (Unfortunately, I am not an expert at RPC either... only at serialization.)
Yes, I agree that calls need to be ack'd -- even over TCP -- to keep tail latency low.I wonder if we could cleverly separate these transport issues from the handling of the RPC layer itself. Imagine this interface...
class Transport {public:class Request {public:virtual void wait(Time timeout = -1) = 0;
// Start trying to send the request, blocking until a reply is received.
// Throws an exception if the timeout is reached or if the transport// fails permanently. The transport takes care of re-sending the message// as appropriate depending on the reliability of the underlying medium.
virtual void receivedReply() = 0;
// Call when a reply to this request has been received. wait() will// then return.
};
virtual Own<Request> startRequest(const Message& message) = 0;// Start a new request. Call wait() on it to begin sending the message.
- Spurious Returns or Fails should be ignored.I assume you mean they're not exposed to the client code; they should still at least be acked, right?I'm not sure they need to be ack'd, since the caller will retry if it doesn't get a response. The callee can hold on to the response for a little while in order to be able to re-send it in response to duplicate calls.
If the response contains capabilities that need release, things get a little tricky. If the callee receives a duplicate Call shortly after sending the corresponding Return, it doesn't know if the caller sent this Call just before getting the Return (in which case it successfully received and is using the returned caps) or if it failed to receive the Return altogether (in which case it will use the caps in the second Return). So, the callee better return exactly the same response with the same caps. To that end, it will have to hold on to the response message for a little while. Once it has dropped the response message, it will have to respond to any duplicate Call by releasing the caps and replying with a Fail.
Open problems:- I think Release needs to be ack'd. This may imply that Release messages need their own IDs.You're probably going to have to recover from misbehaving clients that fail to Release properly for one reason or another anyway, and the mechanism for doing so (e.g. LRU garbage collection?) might handle dropped Releases without the need for explicit acks. OTOH it might not (e.g. if the strategy is to just detect and disconnect misbehaving clients), so maybe you do need this to maximize implementation flexibility.All caps are implicitly released on disconnect. Disconnecting clients that allocate too many caps seems like a good idea regardless.But yes, it seems clear that Releases need to be ack'd.- I don't think call IDs are reusable in this scenario, at least in theory. In practice, wrap-around would probably take long enough to purge any hung calls. But perhaps we should go ahead and use 64-bit IDs anyway?I can't think of any plausible use case where 32 bits would be insufficient (assuming of course that you skip IDs that are still live), but this is starting to feel like "64K ought to be enough for anybody". Among other things, 64 bits would give you the option of assigning IDs randomly, which could be useful in some settings.I think it's plausible to imagine a long-running connection that performs more than 4G requests. Remotely plausible, but plausible, especially if you implemented a persistent transport layer. I don't think IDs can be reused even after the call completes since this would make it hard for the server to detect duplicate calls and efficiently reply with the same response...
(Side note: I'm trying to avoid saying "client" and "server" here since the protocol is symmetric, but might it make sense to use those terms in the context of a single Call? Speaking of which, you use those terms in the documentation of CapDescriptor.needsRelease, but it's not clear to me what they mean in that context.)
I try to use the terms "caller" and "callee", although I keep catching myself mixing them up. I think it's fine to say "client" and "server" instead (when talking about specific interactions where it's obvious which is which). In the case of needsRelease, I meant "client" to be the side holding the reference, and "server" to be the side implementing it; I'll try to make it clearer.
A related question: have you thought about supporting segmented messages and/or streaming at the protocol level? Ack and keepalive messages could conceivably be implemented as initial segments of a multi-segment message or stream (although the semantics are a little different since you don't need to fill in gaps).
Messages are already arranged in segments, though the intent was to allow progressive memory allocation, not so much to help with transport.I think using the Transport interface I specified, it would make a lot of sense for the transport itself to interleave large messages so that they don't starve smaller ones. Interestingly, we can actually allow the receiver to start operating on a message as soon as the first segment is received, since it'll make a virtual call the first time each segment is accessed, giving the transport an opportunity to block.All that said, for a protocol involving large file transfers, the right thing to do is to create a capability representing the stream and then make multiple calls to that capability to transfer small chunks of data. This way the app has the ability to display a progress bar or do other things that require being aware of the transfer progress... and the serialization layer doesn't need to be well-optimized for ginormous messages.
- CapTP is based on one-way messages with two-way request-repely interactions built on top, whereas Cap'n Proto implements only two-way RPC. This is not necessarily a problem, but we need to think carefully about it. My current intuition is that at the application level you always want request-reply anyway.
I'm not sure how one-way messages solve that problem. A malicious client could refuse to ack the TCP packets just as easily as it could refuse to send a return message at the RPC level. How can you have guaranteed delivery without this vulnerability?
Note that RPCs can have timeouts. In the scenario you describe, I'd advise the server to protect itself by setting a reasonable timeout.
2013/7/7 Kenton Varda <temp...@gmail.com>
I'm not sure how one-way messages solve that problem. A malicious client could refuse to ack the TCP packets just as easily as it could refuse to send a return message at the RPC level. How can you have guaranteed delivery without this vulnerability?Packet losses should be handled at TCP level not to break isolation. The RPC transport layer built on top of TCP should not care about it. Am I right?
One more example to illustrate what I'am talking about: a multiplayer game. Server keeps the current state of the in-game world and notifies clients on world updates. The rate of these updates varies and has high peaks corresponding to active in-game interactions. Although updates may be received in any order, they must be handled consequentially. Timeout for handling such a message may be rather small, because it doesn't imply any complex computations to handle. But due to low network bandwidth or high packet loss rate, particular 'world-update" could be postponed until the previous updates arrive. Reaching the timeout then leads to message retransmission and further degradation of client's latency.
On Mon, Jul 8, 2013 at 12:57 AM, Stanislav Ivochkin <i...@extrn.org> wrote:2013/7/7 Kenton Varda <temp...@gmail.com>
I'm not sure how one-way messages solve that problem. A malicious client could refuse to ack the TCP packets just as easily as it could refuse to send a return message at the RPC level. How can you have guaranteed delivery without this vulnerability?Packet losses should be handled at TCP level not to break isolation. The RPC transport layer built on top of TCP should not care about it. Am I right?That's beside the point. A malicious client can attack the TCP layer just as easily as the RPC layer. So it doesn't make sense to go out of our way to defend against attacks at the RPC layer which aren't defended at the TCP layer.
One more example to illustrate what I'am talking about: a multiplayer game. Server keeps the current state of the in-game world and notifies clients on world updates. The rate of these updates varies and has high peaks corresponding to active in-game interactions. Although updates may be received in any order, they must be handled consequentially. Timeout for handling such a message may be rather small, because it doesn't imply any complex computations to handle. But due to low network bandwidth or high packet loss rate, particular 'world-update" could be postponed until the previous updates arrive. Reaching the timeout then leads to message retransmission and further degradation of client's latency.
You can always write a client that just puts the message in a queue and then returns immediately, if that's what you want. I'm not preventing that.