Clean Disconnect Procedure

52 views
Skip to first unread message

Alex

unread,
Sep 21, 2023, 7:09:04 PM9/21/23
to capn...@googlegroups.com
Hi all,

I am designing an application (in C++) where, upon invocation of a
particular RPC call, both the server and the client agree to cleanly
disconnect from one another. By "cleanly", I mean that both the server
and the client send a TCP FIN/ACK and nothing more (e.g. no RSTs).
Unfortunately, in the current design the receipt of a FIN will cause
AsyncIoMessageStream::tryReadMessage() to abort, whereupon it will
throw KJ_EXCEPTION(DISCONNECTED, "Peer disconnected.")[0]. This
exception is eventually written to the client socket, and if the client
is already gone, there will be one or more RSTs in response:

C -> S: "Goodbye" (RPC call)
C -> S: "I have nothing more to say" (TCP FIN)

(the client does not expect the server to say anything more and closes
the socket)

S -> C: "Exception! You disconnected from me" (RPC message)
C -> S: "Error: Connection reset by peer" (TCP RST)

Given that both the server and client have agreed to shut down the
connection, this is not an exceptional circumstance. Therefore, an
exception should not be thrown.

Unfortunately, there does not seem to be a way to indicate to the
RpcSystem that the DISCONNECTED exception ought to be suppressed. Is
there something I am missing? I appreciate any assistance.

Regards,
Alex

[0] https://github.com/capnproto/capnproto/blob/761aeb17563a59f43b3fe9bae93df83c6bd57d06/c%2B%2B/src/capnp/rpc.c%2B%2B#L2775

Kenton Varda

unread,
Sep 27, 2023, 3:06:20 PM9/27/23
to Alex, capn...@googlegroups.com
Indeed, there isn't really a clean shutdown mechanism right now. I guess it hasn't come up as a priority because in most use cases we just haven't really cared if there's a TCP RST triggered under the hood... since we're already killing the connection, we ignore that error anyway.

I suppose what we should do is, in the case that we receive a clean EOF, inhibit the sending of an abort message back, just send EOF back.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/20230921190853.115b911d%40centromere.net.

Kenton Varda

unread,
Sep 27, 2023, 3:07:46 PM9/27/23
to Alex, capn...@googlegroups.com
(Happy to accept a PR. The relevant code is in `messageLoop()` and `RpcConnectionState::disconnect()` in `rpc.c++`.)

Alex

unread,
Sep 27, 2023, 5:36:04 PM9/27/23
to 'Kenton Varda' via Cap'n Proto, Kenton Varda
Thank you for the info, Kenton. I've been looking deeply at the design
of the RpcSystem, and I have a couple thoughts/questions:

1. I would like to add a new RPC Message in rpc.capnp:

goodbye @14 :Void;

This message indicates to the recipient that the sender has nothing
more to say, and that it should stop read()ing the socket. In other
words, upon receipt of rpc::Message::GOODBYE, messageLoop() ends
gracefully (no exceptions are thrown). The sender then shuts down its
write() side of the connection, causing a TCP FIN to be delivered to
the recipient. Because there is no read() in progress, an exception shall
not be thrown.

The recipient performs whatever cleanup is necessary and sends a
reciprocal GOODBYE, causing the same logic described above to be
invoked on the other end.

Do you think this is a good solution?

2. In your opinion, what is the best way to expose this graceful
disconnect functionality to applications? I considered modifying the
signature of the BootstrapFactory's createFor method in this manner:

Before:

capnp::Capability::Client createFor(VatId::Reader clientId)

After:

capnp::Capability::Client createFor(VatId::Reader clientId, kj::Own<kj::PromiseFulfiller<void>> shutdown)

The PromiseFulfiller can then be passed to the constructor of the Server:

class AdderImpl final: public Adder::Server {
public:
AdderImpl(kj::Own<kj::PromiseFulfiller<void>> shutdown) : shutdown(kj::mv(shutdown)) {}

kj::Promise<void> add(AddContext context) override {
auto params = context.getParams();
context.getResults().setValue(params.getLeft() + params.getRight());
return kj::READY_NOW;
}

kj::Promise<void> cleanupGracefully(CleanupGracefullyContext context) override {
this->shutdown->fulfill();
}

private:
kj::Own<kj::PromiseFulfiller<void>> shutdown;
};

Another approach could be to add a shutdown() method to FooContext.

On the client side, perhaps it is best to simply allow the rpcSystem to
fall out of scope, at which point the destructors can invoke the
necessary machinery to send the GOODBYE and FIN the TCP stream.

What do you think? Are these approaches going to lead to a leaky
abstraction? Do you know of an elegant way to design this?

Regards,
Alex

Kenton Varda

unread,
Sep 29, 2023, 12:26:46 PM9/29/23
to Alex, 'Kenton Varda' via Cap'n Proto
On Wed, Sep 27, 2023 at 4:37 PM 'Alex' via Cap'n Proto <capn...@googlegroups.com> wrote:
1. I would like to add a new RPC Message in rpc.capnp:

goodbye @14 :Void;

This message indicates to the recipient that the sender has nothing
more to say, and that it should stop read()ing the socket. In other
words, upon receipt of rpc::Message::GOODBYE, messageLoop() ends
gracefully (no exceptions are thrown). The sender then shuts down its
write() side of the connection, causing a TCP FIN to be delivered to
the recipient. Because there is no read() in progress, an exception shall
not be thrown.

The recipient performs whatever cleanup is necessary and sends a
reciprocal GOODBYE, causing the same logic described above to be
invoked on the other end.

Do you think this is a good solution?

Hmm, what's the benefit of this, vs. simply sending EOF?

To extend the protocol in this way we would have to think about backwards compatibility. If a peer running an older version of capnp receives the "goodbye" message, it will respond with an "unimplemented" message, which seems like it could make things worse?
 
2. In your opinion, what is the best way to expose this graceful
disconnect functionality to applications?

This is a bit tricky.

For rpc-twoparty.h I think it's straightforward, it could simply be a method on `TwoPartyClient` and `TwoPartyServer` to signal graceful disconnect. (This would have to be a method returning a promise which resolves when all buffers are flushed and such, so I don't think it can just be destructor behavior.)

But in the full many-party vision of Cap'n Proto, the application is not really intended to know what connections exist. The application could receive two capabilities from two different parties which both happened to point to the same third party, and those two capabilities end up sharing a connection, even though they came from different places. So it seems like the application has no reasonable way to express that it wants a connection to shut down, if it doesn't even know a connection exists.

I think, then, it has to be up to the RPC system to shut down connections that are idle. Probably RpcSystem could signal to the underlying VatNetwork whenever a connection has reached an idle state, meaning it has no outstanding RPCs nor capabilities. The VatNetwork could choose to close such a connection if it feels like it -- some transports may want to do this on a timeout, others may decide it's better to keep the connection open.

But I'd suggest not worrying about that for now and focusing just on rpc-twoparty, since that's what most people are using today.

-Kenton
 

Alex

unread,
Sep 29, 2023, 2:01:26 PM9/29/23
to Kenton Varda, 'Kenton Varda' via Cap'n Proto
On Fri, 29 Sep 2023 11:26:05 -0500
Kenton Varda <ken...@cloudflare.com> wrote:

> On Wed, Sep 27, 2023 at 4:37 PM 'Alex' via Cap'n Proto <
> capn...@googlegroups.com> wrote:
>
> > 1. I would like to add a new RPC Message in rpc.capnp:
> >
> > goodbye @14 :Void;
> >
> > This message indicates to the recipient that the sender has nothing
> > more to say, and that it should stop read()ing the socket. In other
> > words, upon receipt of rpc::Message::GOODBYE, messageLoop() ends
> > gracefully (no exceptions are thrown). The sender then shuts down
> > its write() side of the connection, causing a TCP FIN to be
> > delivered to the recipient. Because there is no read() in progress,
> > an exception shall not be thrown.
> >
> > The recipient performs whatever cleanup is necessary and sends a
> > reciprocal GOODBYE, causing the same logic described above to be
> > invoked on the other end.
> >
> > Do you think this is a good solution?
> >
>
> Hmm, what's the benefit of this, vs. simply sending EOF?
>

Currently when an EOF occurs, there is no way to discern between an
exceptional circumstance and a normal/expected circumstance.

For example, consider a program which performs file transfer between
two machines/vats:

filectl get capnproto://192.168.1.2/tmp/movie.mp4

If my machine establishes a Cap'n Proto session with 192.168.1.2 and
successfully downloads the file, there is nothing left to do. If the
hypothetical filectl program simply exits, then the remote daemon is
going to raise an exception (e.g. "Peer disconnected"), and that
exception is going to be written to the now-dead connection in the form
of an RPC Abort message. Since the program has terminated, the socket
is closed and my machine will send a TCP RST packet in response.

If I am collecting metrics across a fleet of machines, and one of those
metrics is the number of exceptions thrown or the number of connection
resets, my charts will show a constant flow of exceptions, leaving me
unable to determine whether or not an outage is occurring.

> To extend the protocol in this way we would have to think about
> backwards compatibility. If a peer running an older version of capnp
> receives the "goodbye" message, it will respond with an
> "unimplemented" message, which seems like it could make things worse?
>

It's unclear to me how it would make things worse, since the connection
is in the process of being shut down anyway. I am not saying it
can't/wouldn't make things worse, I am only saying that it is not clear
to me how that could be so.

>
> > 2. In your opinion, what is the best way to expose this graceful
> > disconnect functionality to applications?
>
>
> This is a bit tricky.
>
> For rpc-twoparty.h I think it's straightforward, it could simply be a
> method on `TwoPartyClient` and `TwoPartyServer` to signal graceful
> disconnect. (This would have to be a method returning a promise which
> resolves when all buffers are flushed and such, so I don't think it
> can just be destructor behavior.)
>

I will take a look there.

> But in the full many-party vision of Cap'n Proto, the application is
> not really intended to know what connections exist. The application
> could receive two capabilities from two different parties which both
> happened to point to the same third party, and those two capabilities
> end up sharing a connection, even though they came from different
> places. So it seems like the application has no reasonable way to
> express that it wants a connection to shut down, if it doesn't even
> know a connection exists.
>

As I understand it, the RPC system has no notion of an underlying
network structure (a wonderful feature!). Capabilities may reside on
the same machine or on different machines, but it shouldn't matter to
the application. The application is only concerned about VatIds. In the
two-party case, there are only two possible VatIds, "client" and
"server". In the multi-party case, VatIds would likely take the form of
a public key. This leads to my next question: In the CapTP/E/Vat
paradigm, is it valid for a single RPC system to form multiple
independent connections to the same VatId? In other words, if I call:

connA = VatNetwork::connect(vatIdA);

followed by:

connB = VatNetwork::connect(vatIdB);

where vatIdA == vatIdB, should connA and connB refer to the same object
in memory -- thus only ever creating a single RpcConnectionState? Or,
should connA and connB instead be two independent objects in memory,
each with their own independent underlying connection and thus,
independently evolving RpcConnectionState?

> I think, then, it has to be up to the RPC system to shut down
> connections that are idle. Probably RpcSystem could signal to the
> underlying VatNetwork whenever a connection has reached an idle
> state, meaning it has no outstanding RPCs nor capabilities. The
> VatNetwork could choose to close such a connection if it feels like
> it -- some transports may want to do this on a timeout, others may
> decide it's better to keep the connection open.
>

I have no strong opinion on this.

> But I'd suggest not worrying about that for now and focusing just on
> rpc-twoparty, since that's what most people are using today.
>

Indeed, a PR is forthcoming.

> -Kenton
>

Alex

Kenton Varda

unread,
Sep 29, 2023, 3:13:48 PM9/29/23
to Alex, 'Kenton Varda' via Cap'n Proto
On Fri, Sep 29, 2023 at 1:01 PM Alex <capn...@centromere.net> wrote:
Currently when an EOF occurs, there is no way to discern between an
exceptional circumstance and a normal/expected circumstance.

That may be, but all we really want to decide here is whether to send an abort message back to the peer. In the case of an EOF that was the result of an exceptional situation, it's almost certainly the case that the peer can no longer receive messages anyway, and therefore sending a message back to them is pointless. It's only when the peer carefully shut down the socket only in a single direction that it'll be able to receive a reply at all -- and in that case, it's not an error situation.

So I think we can safely say: If we receive an EOF, we might as well send an EOF.

(I kind of wish that if you closed a socket without doing shutdown(SHUT_WR) first, the recv() on the other end would fail instead of signaling EOF, but alas...)
 
> To extend the protocol in this way we would have to think about
> backwards compatibility. If a peer running an older version of capnp
> receives the "goodbye" message, it will respond with an
> "unimplemented" message, which seems like it could make things worse?
>

It's unclear to me how it would make things worse, since the connection
is in the process of being shut down anyway. I am not saying it
can't/wouldn't make things worse, I am only saying that it is not clear
to me how that could be so.

Just in that it's sending another unwanted message back to a peer that has already disconnected. But I suppose that's not that much worse compared to the status quo.
 
In the CapTP/E/Vat
paradigm, is it valid for a single RPC system to form multiple
independent connections to the same VatId?

That's an interesting question.

Ideally, no more than one connection is formed between any two vats, and this is especially helpful if the application needs to be able to compare capabilities for equality. But in practice I think this gets difficult to ensure if vats cannot all directly address each other or cannot use asymmetric cryptography to authenticate each other. I think it'll be hard to answer this question definitively without a specific real-world system to talk about.

VatNetwork::connect(vatId) is currently designed to return the existing connection if there is one but I know of at least one real-world implementation where it doesn't actually work that way.

Alex

unread,
Sep 29, 2023, 4:36:53 PM9/29/23
to Kenton Varda, 'Kenton Varda' via Cap'n Proto
On Fri, 29 Sep 2023 14:13:08 -0500
Kenton Varda <ken...@cloudflare.com> wrote:

> On Fri, Sep 29, 2023 at 1:01 PM Alex <capn...@centromere.net> wrote:
>
> > Currently when an EOF occurs, there is no way to discern between an
> > exceptional circumstance and a normal/expected circumstance.
> >
>
> That may be, but all we really want to decide here is whether to send
> an abort message back to the peer. In the case of an EOF that was the
> result of an exceptional situation, it's almost certainly the case
> that the peer can no longer receive messages anyway, and therefore
> sending a message back to them is pointless. It's only when the peer
> carefully shut down the socket only in a single direction that it'll
> be able to receive a reply at all -- and in that case, it's not an
> error situation.
>
> So I think we can safely say: If we receive an EOF, we might as well
> send an EOF.
>

I agree: EOFs ought to be reciprocal irrespective of whether or not they
were expected. Note that I am using "EOF" synonymously with "receipt of
a TCP FIN packet". Based on our discussion, my current understanding is
as follows:

1. EOFs may be either expected or unexpected,

2. Unexpected EOFs are exceptional and ought to trigger
immediate/ungraceful shutdown of the connection, and

3. Expected EOFs ought not ever invoke exception machinery in either
the local process or the remote process. Invoking the exception
machinery for every-day normal disconnects is contrary to the KJ style
guide[0].

There were two questions:

A. How shall a peer discern between an exceptional connection closure
and a normal one?

My proposal is to add a new message, "GOODBYE", which allows the system
to signal to its remote peer an intent to close the connection. Note
that this message is communicated within the context of any
encryption/authentication layer which may be in place (such as TLS),
thus allowing application designers to discern between proper operation
and outside interference.

B. How should an application signal this intent to the RPC system? What
would the API look like?

Based on your recommendation, I will study the design of rpc-twoparty.h
and submit a PR for your consideration.

Is this a good summary, Kenton?

[0] https://github.com/capnproto/capnproto/blob/f7e8d58ac67635d7e09997bca3254ff376a568a0/style-guide.md#exceptions

> (I kind of wish that if you closed a socket without doing
> shutdown(SHUT_WR) first, the recv() on the other end would fail
> instead of signaling EOF, but alas...)
>
>
> > > To extend the protocol in this way we would have to think about
> > > backwards compatibility. If a peer running an older version of
> > > capnp receives the "goodbye" message, it will respond with an
> > > "unimplemented" message, which seems like it could make things
> > > worse?
> >
> > It's unclear to me how it would make things worse, since the
> > connection is in the process of being shut down anyway. I am not
> > saying it can't/wouldn't make things worse, I am only saying that
> > it is not clear to me how that could be so.
> >
>
> Just in that it's sending *another* unwanted message back to a peer
> that has already disconnected. But I suppose that's not that much
> worse compared to the status quo.
>

Yes.

>
> > In the CapTP/E/Vat
> > paradigm, is it valid for a single RPC system to form multiple
> > independent connections to the same VatId?
>
>
> That's an interesting question.
>
> Ideally, no more than one connection is formed between any two vats,
> and this is especially helpful if the application needs to be able to
> compare capabilities for equality. But in practice I think this gets
> difficult to ensure if vats cannot all directly address each other or
> cannot use asymmetric cryptography to authenticate each other. I
> think it'll be hard to answer this question definitively without a
> specific real-world system to talk about.
>

I am only inquiring about the ideal case, so your answer is helpful.

> VatNetwork::connect(vatId) is currently designed to return the
> existing connection if there is one but I know of at least one
> real-world implementation where it doesn't actually work that way.
>

Good to know, thank you.

Kenton Varda

unread,
Oct 2, 2023, 3:47:08 PM10/2/23
to Alex, 'Kenton Varda' via Cap'n Proto
On Fri, Sep 29, 2023 at 3:36 PM Alex <capn...@centromere.net> wrote:
A. How shall a peer discern between an exceptional connection closure
and a normal one?

My proposal is to add a new message, "GOODBYE", which allows the system
to signal to its remote peer an intent to close the connection.

Right, I get the motivation, but what I'm saying is there is no behavior or API in Cap'n Proto which cares to distinguish between these. Are you proposing adding a new API which would allow the application to learn whether the disconnect was intentional?

I think I would prefer that we leave this up to the application instead. That is, an application can always have an RPC method which signals a clean end, without any special support from the underlying transport. This approach keeps the application decoupled from the transport layer, allowing the same code to work with local capabilities or alternative protocols.

Note
that this message is communicated within the context of any
encryption/authentication layer which may be in place (such as TLS),
thus allowing application designers to discern between proper operation
and outside interference.

TLS itself already communicates the difference between a clean shutdown and an outside interruption. In KJ, read()ing from a TLS socket will throw an exception if it is prematurely terminated.

-Kenton
Reply all
Reply to author
Forward
0 new messages