flow limit related deadlock?

14 views
Skip to first unread message

Ian Denhardt

unread,
Nov 23, 2021, 12:51:47 PM11/23/21
to capn...@googlegroups.com
Hey all,

A few days ago one of my co-maintainers (Louis) alerted me to a deadlock
in the Go implementation. We've pinned down the cause, and while trying
to figure out how to fix it, I looked into how the C++ implementation
handles backpressure.

From what I can tell, the only way backpressure is applied is via the
flow limit, which limits the total size of arguments to in-flight
incoming calls. The portion of the quota reserved by a call is returned
to the pool when the call is removed from the questions table, which
makes sense, since this is when the memory is actually freed.

However, I see two possible deadlocks that could occur because of this.

The one I am less concerned about is one where calls that depend on one
another go back and forth between vats, until both vats exceed their
quota and block on oneanother, causing a deadlock. I am less concerned
about this since it is basically the RPC equivalent of a stack overflow,
and could be turned from a crash into a thrown exception by adding a
timeout or such.

The one I'm more worried about comes up in the context of streaming;
the problematic scenario is as follows:

Alice in vat A is continuously streaming calls to bob in vat B. It
possible and expected that at some point alice will cause vat B's
flow limit to be reached, at which point further calls will block
until some outstanding calls return. Good so far: this backpressure
is exactly what we want.

The problem arises after the return message arrives at vat A. vat A
then sends a finish message, but this message is *behind other calls*,
so it will not reach vat B until vat B reads in all outstanding calls.
This will never happen, since vat B is waiting on the flow limit.

I don't know how to avoid this problem with the current protocol as
specified. One way that almost works is for vat A to just send the
finish message for each streaming call before the next call is sent,
relying on the -> stream annotations to know what calls it should do
this for. but this doesn't quite work since vat B is allowed to
cancel an ongoing call in response to a finish message. Some
extension to the protocol to allow non-cancelling finish messages
would solve this.

Is there a solution that I haven't seen? Are there other ways
of dealing with this in the protocol?

-Ian

Kenton Varda

unread,
Nov 23, 2021, 1:03:12 PM11/23/21
to Ian Denhardt, Cap'n Proto
Hmm, I think the intention was that the flow limit should be released on Return, independent of Finish. But I can totally believe I implemented it wrong. Could we just change it to be based on Return?

FWIW by default there is no flow limit, it's only enabled in the Sandstorm supervisor to defend against an app sending excessive requests that end up queued in memory elsewhere in the system.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/163768976630.4734.18127071831897488161%40localhost.localdomain.

Ian Denhardt

unread,
Nov 23, 2021, 1:41:57 PM11/23/21
to Kenton Varda, Cap'n Proto
Quoting Kenton Varda (2021-11-23 13:02:32)

> Hmm, I think the intention was that the flow limit should be released
> on Return, independent of Finish. But I can totally believe I
> implemented it wrong. Could we just change it to be based on Return?
> FWIW by default there is no flow limit, it's only enabled in the
> Sandstorm supervisor to defend against an app sending excessive
> requests that end up queued in memory elsewhere in the system.

Wouldn't releasing it on return allow the caller to cause runaway memory
usage by just never sending the finish? the return entry needs to be kept
around in case calls are pipelined on it, and itself might take up some
space (arguably it should count against the limit too...).

Kenton Varda

unread,
Nov 23, 2021, 2:50:58 PM11/23/21
to Ian Denhardt, Cap'n Proto
On Tue, Nov 23, 2021 at 12:41 PM Ian Denhardt <i...@zenhack.net> wrote:
Wouldn't releasing it on return allow the caller to cause runaway memory
usage by just never sending the finish? the return entry needs to be kept
around in case calls are pipelined on it, and itself might take up some
space (arguably it should count against the limit too...).

In all honesty, this isn't a very good defense against resource exhaustion attacks either way. The problem that motivated it was actually badly-written apps that would stream a whole large HTTP response body without paying attention to backpressure.

Technically the callee doesn't need to keep the whole response struct in memory, only any returned capabilities and their paths. The current implementation does keep the whole struct though.

My general opinion about resource exhaustion is that it's not so much a security issue as an abuse issue. These attacks create a temporary nuisance / disruption, but no permanent damage. It's probably impossible to anticipate every attack. But, since the attacks don't actually get anything for the attacker, they are much more rare than you might think, and it's usually fine to respond reactively by revoking capabilities, banning users, blocking IPs, etc. If the problem recurs then you start investing new safeguards against the specific scenario.

-Kenton 

Ian Denhardt

unread,
Nov 23, 2021, 4:59:52 PM11/23/21
to Kenton Varda, Cap'n Proto, Louis Thibault
(Adding Louis to cc per his request)

Quoting Kenton Varda (2021-11-23 14:50:20)
> On Tue, Nov 23, 2021 at 12:41 PM Ian Denhardt <[1]i...@zenhack.net>
> wrote:
>
> Wouldn't releasing it on return allow the caller to cause runaway
> memory
> usage by just never sending the finish? the return entry needs to be
> kept
> around in case calls are pipelined on it, and itself might take up
> some
> space (arguably it should count against the limit too...).
>
> In all honesty, this isn't a very good defense against resource
> exhaustion attacks either way. The problem that motivated it was
> actually badly-written apps that would stream a whole large HTTP
> response body without paying attention� to backpressure.

What are apps *supposed* to do here? It isn't clear to me where else the
backpressure is supposed to come from?

Most apps are using sandstorm-http-bridge anyway, so they're just acting
like normal http servers -- which generally write out data to the
response stream as fast as the socket will take it. Skimming
sendRequest() in the bridge's source, it looks like it just copies that
data directly into the response stream. So I am confused as to what a
"well written" app would do?

> Technically the callee doesn't need to keep the whole response struct
> in memory, only any returned capabilities and their paths. The current
> implementation does keep the whole struct� though.
> My general opinion about resource exhaustion is that it's not so much a
> security issue as an abuse issue. These attacks create a temporary
> nuisance� / disruption, but no permanent damage. It's probably
> impossible to anticipate every attack. But, since the attacks don't
> actually get anything for the attacker, they are much more rare than
> you might think, and it's usually fine to respond reactively by
> revoking capabilities, banning users, blocking IPs, etc. If the problem
> recurs then you start investing new safeguards against the specific
> scenario.

I suppose that's a reasonable perspective; you're right that it's a
different beast (Mark Miller wrote
https://agoric.com/blog/all/taxonomy-of-security-issues/ which is
really useful for thinking about this stuff. In his terminology we are
talking about availibility rather than integrity).

Every approach I've been able to think of for dealing with availability
attacks gracefully in the capnp implementation itself does seem very
fragile. Perhaps we could treat the flow limit as a soft limit, not
bound the space taken up by returns and table entries, but track it,
and provide a way to notify the user if it goes past a harder limit,
at which point they can drop the connection if they want. If we were
doing this at the process level I'd be inclined to just lean on the
kernel's resource management, but I'd like to a least find a way to
confine these issues to a single connection. Anyway, thanks for your
input.

-Ian

Kenton Varda

unread,
Nov 23, 2021, 5:33:22 PM11/23/21
to Ian Denhardt, Cap'n Proto, Louis Thibault
On Tue, Nov 23, 2021 at 3:59 PM Ian Denhardt <i...@zenhack.net> wrote:
What are apps *supposed* to do here? It isn't clear to me where else the
backpressure is supposed to come from?

Apps should cap the number of write()s they have in-flight at once. (`-> stream` helps a lot with this, as it'll automatically figure out how many is a good number of requests to have in flight.)

Most apps are using sandstorm-http-bridge anyway, so they're just acting
like normal http servers -- which generally write out data to the
response stream as fast as the socket will take it. Skimming
sendRequest() in the bridge's source, it looks like it just copies that
data directly into the response stream. So I am confused as to what a
"well written" app would do?

sandstorm-http-bridge currently only does one outstanding write RPC at a time. The code is convoluted but look at pumpWrites() -- it waits for each send() to complete before performing the next one.

Historically there was a time where it didn't implement such a limit and would just pump the whole response as fast as it got it, which led to the need to do some enforcement on the supervisor side.

-Kenton

Ian Denhardt

unread,
Nov 23, 2021, 6:01:28 PM11/23/21
to Kenton Varda, Cap'n Proto, Louis Thibault
Ok, I think I get it, let me know if I have this right:

The correct thing to do is to handle congestion/flow control for
multiple calls on each object individually, using something like
the mechanisms provided by the C++ implementiation's streaming
construct. This is important so that calls on different objects
originating from the same vat do not cause problems with one
another (whereas from TCP's standpoint, they're the same stream,
so it can't help). This also provides backpressure wrt. the
receiving vat; it avoids queueing too many calls on the connection
overall.

Another check of my understanding: am I correct in thinking that in a
client implementation that's just a single threaded loop calling
methods on one object in sequence, it would not cause a problem to
skip in-process flow control and let the TCP connection deal with it,
since there is only one stream involved anyway? Assuming of course there
is something like the flow limit provided by the vat on the other side
of the connection.

-Ian

Quoting Kenton Varda (2021-11-23 17:32:44)
> On Tue, Nov 23, 2021 at 3:59 PM Ian Denhardt <[1]i...@zenhack.net>
> Verweise
>
> 1. mailto:i...@zenhack.net

Kenton Varda

unread,
Nov 23, 2021, 7:21:28 PM11/23/21
to Ian Denhardt, Cap'n Proto, Louis Thibault
On Tue, Nov 23, 2021 at 5:01 PM Ian Denhardt <i...@zenhack.net> wrote:
Ok, I think I get it, let me know if I have this right:

The correct thing to do is to handle congestion/flow control for
multiple calls on each object individually, using something like
the mechanisms provided by the C++ implementiation's streaming
construct. This is important so that calls on different objects
originating from the same vat do not cause problems with one
another (whereas from TCP's standpoint, they're the same stream,
so it can't help). This also provides backpressure wrt. the
receiving vat; it avoids queueing too many calls on the connection
overall.

Right.
 
Another check of my understanding: am I correct in thinking that in a
client implementation that's just a single threaded loop calling
methods on one object in sequence, it would not cause a problem to
skip in-process flow control and let the TCP connection deal with it,
since there is only one stream involved anyway? Assuming of course there
is something like the flow limit provided by the vat on the other side
of the connection.

Cap'n Proto doesn't provide any backpressure from the underlying TCP connection to the app, except through streaming. If you just make a ton of calls all at once without waiting for returns, you'll bloat your memory with unsent messages. And possibly worse: if the capability bounces through a proxy, and you have a fast connection (say, a unix socket) to the proxy, but a slow connection from the proxy to the eventual destination, you'll end up bloating the proxy's memory.

-Kenton

Ian Denhardt

unread,
Nov 23, 2021, 8:30:20 PM11/23/21
to Kenton Varda, Cap'n Proto, Louis Thibault
Quoting Kenton Varda (2021-11-23 19:20:50)

> On Tue, Nov 23, 2021 at 5:01 PM Ian Denhardt <[1]i...@zenhack.net>
> wrote:
>
> Ok, I think I get it, let me know if I have this right:
> [...]
>
> Right.

Ok, great. Thanks for your patience.

> Cap'n Proto doesn't provide any backpressure from the underlying TCP
> connection to the app, except through streaming. If you just make a ton
> of calls all at once without waiting for returns, you'll bloat your
> memory with unsent messages. And possibly worse: if the capability
> bounces through a proxy, and you have a fast connection (say, a unix
> socket) to the proxy, but a slow connection from the proxy to the
> eventual destination, you'll end up bloating the proxy's memory.

This is not true of the Go implementation, which currently blocks
until the message has been written to the socket (we don't currently
treat streaming specially, but presumably we could keep the blocking
API when we add that; I don't think we even need to treat the annotation
specially, we should be able to do it for all calls). So I don't think
this applies to a scenario where both sides of the connection work like
the Go implementation. But I hadn't thought about the proxy issue
(where the proxy might be using a different implementation); thank
you for pointing that out.

-Ian

Erin Shepherd

unread,
Nov 24, 2021, 7:05:34 AM11/24/21
to 'Kenton Varda' via Cap'n Proto
On Wed, 24 Nov 2021, at 02:28, Ian Denhardt wrote:
Quoting Kenton Varda (2021-11-23 19:20:50)
> Cap'n Proto doesn't provide any backpressure from the underlying TCP
> connection to the app, except through streaming. If you just make a ton
> of calls all at once without waiting for returns, you'll bloat your
> memory with unsent messages. And possibly worse: if the capability
> bounces through a proxy, and you have a fast connection (say, a unix
> socket) to the proxy, but a slow connection from the proxy to the
> eventual destination, you'll end up bloating the proxy's memory.

This is not true of the Go implementation, which currently blocks
until the message has been written to the socket (we don't currently
treat streaming specially, but presumably we could keep the blocking
API when we add that; I don't think we even need to treat the annotation
specially, we should be able to do it for all calls). So I don't think
this applies to a scenario where both sides of the connection work like
the Go implementation. But I hadn't thought about the proxy issue
(where the proxy might be using a different implementation); thank
you for pointing that out.

I guess this is a matter of futures implementation: callback based futures (like kj) often do not provides such a backpressure mechanism. Othe solutions (like Go fibers or Rust poll-based futures) can easily provide such a mechanism, though

If a concurrency limiter were provided at the capnp-server level (i.e. allow no more than X RPCs to be in flight at once), I wonder if that would help things?

--

As to the original topic: I wonder if a message could be added which advises the peer as to limits on the number of questions and/or the maximum size of them which may be in flight? The client might then be able to thrrottle itself and the sender could then reject (immediately, with an error) any calls once that limit was exceeded. This would keep the socket unblocked for finish messages

Kenton Varda

unread,
Nov 24, 2021, 3:57:33 PM11/24/21
to Erin Shepherd, 'Kenton Varda' via Cap'n Proto
It sounds like what Go is providing today would be equivalent to a KJ API that returns two promises: once which waits for backpressure, and one which waits for the RPC return value. In principle, we could provide a new version of `send()` in C++ which provides this. But if it only recognizes backpressure from the first-hop socket, this could cause more harm than good, causing people to write apps that tend to cause proxy buffer bloat. Backpressure in Cap'n Proto really needs to be based on return messages to handle proxies correctly. This is what `-> stream` provides.

(Note we definitely wouldn't want to stall the whole KJ event loop when one connection has backpressure. Applications may be doing many concurrent tasks on the same event loop.)

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages