Proposal for removing per-stream flow control

Alek Storm

unread,

Jun 8, 2012, 4:39:46 AM6/8/12

to spdy...@googlegroups.com

I initially proposed this in the massive flow control thread, and Mike asked for a separate thread and some elaboration, so here goes.

I propose to:

1) Remove the WINDOW_UPDATE frame

2) Add a new RATE_UPDATE frame, with the following format:

+------------------------------------+
|1|   version     |          9       |
+------------------------------------+
| Flags (8)  |   Length (24 bits)    |
+------------------------------------+
|X| Number of stream/rate pairs (31) |
+------------------------------------+
|X|      Stream ID (31 bits)         |
+------------------------------------+
|      Consumption rate (32 bits)    |
+------------------------------------+
|          (repeats)                 |

Consumption Rate: The current average rate that the receiver can drain the specified stream's buffer, in bytes/sec.

Sending endpoints MUST associate a consumption rate with each outgoing stream. Senders MUST NOT send data on a particular stream at an average rate greater than the stream's consumption rate, unless this would prevent any stream from sending data.

Receiving endpoints MAY send this frame at any time to update the streams' associated consumption rates on the sender.

3) Change setting ID 7 in the SETTINGS frame to SETTINGS_INITIAL_CONSUMPTION_RATE: The consumption rate for the remote endpoint to associate with new streams.

Rationale: These changes allow senders to implement a wide variety of stream scheduling algorithms, so long as they do not cause HoL blocking on the receiver, unless the only data available to send is on a stream that would cause HoL blocking (senders are not required to continue at this point; they may back off). This enables data to flow continuously without being stalled by window updates, a penalty that increases with latency. If a receiving endpoint suspects that its counterpart is not respecting its declared consumption rates, it may either accept the HoL blocking, or kill the offending stream with RST_STREAM(CANCEL).

Advantages:

- Speed. Data is *always* being sent. Round trip delays are not periodically incurred, unlike per-stream window solutions.

- Simplicity. Nearly as simple as eliminating flow control altogether, but without the HoL blocking. Does not require complex algorithms to resize windows or estimate BDP, unlike per-stream window solutions.

- Cheap on resources. No additional buffering commitments beyond the TCP connection window.


What do you guys think?


Alek

Simone Bordet

unread,

Jun 8, 2012, 5:15:11 AM6/8/12

to spdy...@googlegroups.com

Hi,

My guts reaction is that it's much more complicated to send data at a
certain rate rather than following a window.

Note that with a proper implementation, the sender can already know
the consumption rate even with v3.

While I may like the idea, I am not sure about baking it into the protocol.

Simon
--
www.webtide.com
Developer advice, services and support
from the Jetty & CometD experts.
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Daniel Stenberg

unread,

Jun 8, 2012, 6:13:14 AM6/8/12

to spdy...@googlegroups.com

On Fri, 8 Jun 2012, Alek Storm wrote:

> Consumption Rate: The current average rate that the receiver can drain the
> specified stream's buffer, in bytes/sec.

I see very little gain with this, and lots of problems.

For this to even begin to work sensibly you would need to explain with enough
details exactly how the speed is to be calculated - in both ends.

For a situation where the consumption rate varies, it would basically result
in the same amount of control traffic having to be sent.

For a receiver, it is much easier to know how much data it can receive next
rather than how fast it can consume/handle such data. Writing software that
can blindly assume a certain network related efficiency into the future is not
always (if ever) possible. Especially when spread out over N concurrent
streams.

If a receiver gets data at X bytes/second for a particular stream and can
consume it all, how can it know how fast it can go? Won't it then basically
always have to (slowly?) increase the rate as long as can consume it? Until it
no longer can consume it and then it has to lower the rate again...

--

/ daniel.haxx.se

Alek Storm

unread,

Jun 8, 2012, 6:45:30 AM6/8/12

to spdy...@googlegroups.com

On Fri, Jun 8, 2012 at 5:13 AM, Daniel Stenberg <dan...@haxx.se> wrote:

On Fri, 8 Jun 2012, Alek Storm wrote:

Consumption Rate: The current average rate that the receiver can drain the specified stream's buffer, in bytes/sec.

I see very little gain with this, and lots of problems.

For this to even begin to work sensibly you would need to explain with enough details exactly how the speed is to be calculated - in both ends.

For a situation where the consumption rate varies, it would basically result in the same amount of control traffic having to be sent.

Control traffic itself doesn't matter, since control frames are tiny compared to data frames. Blocking data transmission on a stream while waiting for a window update, however, is very costly - one RTT each time. This proposal eliminates it.

For a receiver, it is much easier to know how much data it can receive next rather than how fast it can consume/handle such data. Writing software that can blindly assume a certain network related efficiency into the future is not always (if ever) possible. Especially when spread out over N concurrent streams.

The discussion over on the flow control thread shows this is not the case - complex algorithms are required to determine the correct window size for a given stream at any point in time.

If a receiver gets data at X bytes/second for a particular stream and can consume it all, how can it know how fast it can go? Won't it then basically always have to (slowly?) increase the rate as long as can consume it? Until it no longer can consume it and then it has to lower the rate again...

Data does not just disappear from buffers instantaneously - it is not difficult to simply time how long it takes to consume a given chunk of data. Receivers are free to use any algorithm they wish to find the optimal consumption rate; they may increase it as quickly or slowly as they like, if they find this necessary.

Alek

--

/ daniel.haxx.se

Costin Manolache

unread,

Jun 8, 2012, 10:53:31 AM6/8/12

to spdy...@googlegroups.com

To try an example, 3 uploads that are consumed at 10Mbps. Sender is on a 1Mbps link, so he sends as much as his line allows, proxy gets the data and sends it up.

Outage happens - consumer rate is 100kbps, proxy sends an update to all 'upload streams', sender start sending at 100kbps.

What happens with the buffers that are already in flight (BDP) or received for the interval between slowdown and update ? What if they don't fit in the proxy memory - a proxy may handle 10000+ streams. You are in the same situation with window per-stream control.

I disagree on 'simplicity' - figuring out the 'data rate' is quite complicated, it's not really a constant.

I would suggest at least replacing 'consumption rate' with 'buffer stats', which is far more deterministic and easy.

Receiver will send 2 ints: available buffer and used buffer ( it can be packed in 32 bits - for example 16 bit each, in kB).

Available buffer will be based on total RAM and number of active streams - can be used as a window, sender

should not send more than that.

'Used buffer' is a proxy for data rate, sender can calculate consumption rate from 2 successive 'stats' packages.

But I like the core of the proposal: provide data to the sender, let it make the decision on how much to send based

on stat data from receiver. Finding the right stats to send is tricky, I don't think 'consumption rate' is going to fly.

Costin



Alek

Mike Belshe

unread,

Jun 8, 2012, 11:41:36 AM6/8/12

to spdy...@googlegroups.com

On Fri, Jun 8, 2012 at 1:39 AM, Alek Storm <alek....@gmail.com> wrote:

I initially proposed this in the massive flow control thread, and Mike asked for a separate thread and some elaboration, so here goes.

I propose to:

1) Remove the WINDOW_UPDATE frame

2) Add a new RATE_UPDATE frame, with the following format:

+------------------------------------+
|1|   version     |          9       |
+------------------------------------+
| Flags (8)  |   Length (24 bits)    |
+------------------------------------+
|X| Number of stream/rate pairs (31) |
+------------------------------------+
|X|      Stream ID (31 bits)         |
+------------------------------------+
|      Consumption rate (32 bits)    |
+------------------------------------+
|          (repeats)                 |

Consumption Rate: The current average rate that the receiver can drain the specified stream's buffer, in bytes/sec.

OK - I see what you're talking about now. I guess you're trying to force a form of pacing.

Sending endpoints MUST associate a consumption rate with each outgoing stream. Senders MUST NOT send data on a particular stream at an average rate greater than the stream's consumption rate, unless this would prevent any stream from sending data.

Receiving endpoints MAY send this frame at any time to update the streams' associated consumption rates on the sender.

3) Change setting ID 7 in the SETTINGS frame to SETTINGS_INITIAL_CONSUMPTION_RATE: The consumption rate for the remote endpoint to associate with new streams.

Rationale: These changes allow senders to implement a wide variety of stream scheduling algorithms, so long as they do not cause HoL blocking on the receiver, unless the only data available to send is on a stream that would cause HoL blocking (senders are not required to continue at this point; they may back off). This enables data to flow continuously without being stalled by window updates, a penalty that increases with latency. If a receiving endpoint suspects that its counterpart is not respecting its declared consumption rates, it may either accept the HoL blocking, or kill the offending stream with RST_STREAM(CANCEL).

Advantages:


- Speed. Data is *always* being sent. Round trip delays are not periodically incurred, unlike per-stream window solutions.

This is only true if you know what consumption rates to provide...

From the proxy side, the buffering exposure is now dynamic, because you didn't provide an absolute limit and it will cost you a half-round-trip to close the rate to 0. e.g. If you specify too high of a rate ("I can suport 10Mbps"), by the time you realize you're buffering 1Mb, its going to take a half RT to tell the other guy to stop sending at all.



- Simplicity. Nearly as simple as eliminating flow control altogether, but without the HoL blocking. Does not require complex algorithms to resize windows or estimate BDP, unlike per-stream window solutions.

I see this as substantially more complex. With size based limits, its a pretty easy check. With rate based limits, you're now requiring all endpoints to clock IOs to certain rates, which is far more subjective than a window size. The hardest problem, which is just figuring out what the limits should be (regardless of whether specified as a rate or number of bytes) is the same in both cases.



- Cheap on resources. No additional buffering commitments beyond the TCP connection window.

This is just not true nor is it an advantage.

The proxy has to read from the socket buffer. The problem comes up when the proxy can't pass that data to the next hop (because the next hop is running slow/down/something bad) - it now has to buffer. You simply cannot leave it in the socket buffer, as that would mean that any single stream having a slow backend would block *all* streams.

Mike

Alek Storm

unread,

Jun 8, 2012, 5:54:06 PM6/8/12

to spdy...@googlegroups.com

Then don't give the sender the wrong consumption rate. If you say you can support 10Mbps, then you'd better have a 10Mb buffer that you can drain in 1s, a 1Mb buffer that you can drain in 0.1s, or something in between. If you've somehow overestimated your consumption rate, the worst-case scenario is that TCP flow control kicks in and you HoL-block.

- Simplicity. Nearly as simple as eliminating flow control altogether, but without the HoL blocking. Does not require complex algorithms to resize windows or estimate BDP, unlike per-stream window solutions.
I see this as substantially more complex. With size based limits, its a pretty easy check. With rate based limits, you're now requiring all endpoints to clock IOs to certain rates, which is far more subjective than a window size. The hardest problem, which is just figuring out what the limits should be (regardless of whether specified as a rate or number of bytes) is the same in both cases.

They're not clocking IO, they're clocking whatever process consumes the data, which I remain convinced will not be difficult. If figuring out the limits is equally complex in both schemes (which I now realize is more subjective than I'd thought), then my scheme is still superior in that it does not incur periodic RTTs while window updates are sent.

- Cheap on resources. No additional buffering commitments beyond the TCP connection window.
This is just not true nor is it an advantage.

The proxy has to read from the socket buffer. The problem comes up when the proxy can't pass that data to the next hop (because the next hop is running slow/down/something bad) - it now has to buffer. You simply cannot leave it in the socket buffer, as that would mean that any single stream having a slow backend would block *all* streams.

Proxies have the easiest job of all - they just forward consumption rates from downstream receivers to upstream senders. The only time they would ever have to substantially buffer incoming data is when a consumption rate suddenly, drastically changes - and they're still protected by the TCP window. Or have I misunderstood you?

Alek

Alek Storm

unread,

Jun 8, 2012, 6:08:02 PM6/8/12

to spdy...@googlegroups.com

If the data is not being consumed (in the case of a proxy, this means the data is being forwarded downstream), then TCP flow control will kick in. No data will be lost. In any case, outages will be relatively rare events.

I disagree on 'simplicity' - figuring out the 'data rate' is quite complicated, it's not really a constant.

I would suggest at least replacing 'consumption rate' with 'buffer stats', which is far more deterministic and easy.

Receiver will send 2 ints: available buffer and used buffer ( it can be packed in 32 bits - for example 16 bit each, in kB).
Available buffer will be based on total RAM and number of active streams - can be used as a window, sender

should not send more than that.
'Used buffer' is a proxy for data rate, sender can calculate consumption rate from 2 successive 'stats' packages.

Would the sender be allowed to send more data than the "available buffer" would allow, if they determine that the calculated consumption rate for that stream can handle it? If it's so easy for senders to calculate consumption rate, why is it difficult for receivers?

Alek

Antonio Vicente

unread,

Jun 8, 2012, 8:36:13 PM6/8/12

to spdy...@googlegroups.com

Hi Alek,

On Fri, Jun 8, 2012 at 4:39 AM, Alek Storm <alek....@gmail.com> wrote:

I initially proposed this in the massive flow control thread, and Mike asked for a separate thread and some elaboration, so here goes.

I propose to:

1) Remove the WINDOW_UPDATE frame

2) Add a new RATE_UPDATE frame, with the following format:

+------------------------------------+
|1|   version     |          9       |
+------------------------------------+
| Flags (8)  |   Length (24 bits)    |
+------------------------------------+
|X| Number of stream/rate pairs (31) |
+------------------------------------+
|X|      Stream ID (31 bits)         |
+------------------------------------+
|      Consumption rate (32 bits)    |
+------------------------------------+
|          (repeats)                 |

Consumption Rate: The current average rate that the receiver can drain the specified stream's buffer, in bytes/sec.

Sending endpoints MUST associate a consumption rate with each outgoing stream. Senders MUST NOT send data on a particular stream at an average rate greater than the stream's consumption rate, unless this would prevent any stream from sending data.

Receiving endpoints MAY send this frame at any time to update the streams' associated consumption rates on the sender.

3) Change setting ID 7 in the SETTINGS frame to SETTINGS_INITIAL_CONSUMPTION_RATE: The consumption rate for the remote endpoint to associate with new streams.

Rationale: These changes allow senders to implement a wide variety of stream scheduling algorithms, so long as they do not cause HoL blocking on the receiver, unless the only data available to send is on a stream that would cause HoL blocking (senders are not required to continue at this point; they may back off). This enables data to flow continuously without being stalled by window updates, a penalty that increases with latency. If a receiving endpoint suspects that its counterpart is not respecting its declared consumption rates, it may either accept the HoL blocking, or kill the offending stream with RST_STREAM(CANCEL).

Advantages:


- Speed. Data is *always* being sent. Round trip delays are not periodically incurred, unlike per-stream window solutions.

On what interval do we compute the average consumption rate? Is it legal for clients to send nothing for X-1 secs and then transmit at X * comsumption rate every X seconds?

Consumption rates are also much harder to enforce by the receiver; network blips could cause jitter at the receiver which would make it think that data is coming in slower or faster than the allowed rate during various periods of time.

A consumption rate doesn't prevent the receiver from needing more memory than it can afford, therefore is less useful than the current scheme. If we can't guarantee that we can read all data from the socket into buffers, HOL

blocking is unavoidable at the receiver side since the only recourse left is to stop reading from the socket.

Another problem with consumption rates is that when when consumption goes down because something goes into overload and you are forced to do more work to push back on clients.

Required work going up as you reach overload situations is dangerous, as it could lead to system collapse and failure.

- Simplicity. Nearly as simple as eliminating flow control altogether, but without the HoL blocking. Does not require complex algorithms to resize windows or estimate BDP, unlike per-stream window solutions.

- You're trading an explicit scheme for an implicit one.

- You're also make flow control violations unenforceable.

- You trade a small reduction of bytes in the wire, but you require the client and server to add pacing logic to where there was none before.

I really don't think your proposal is simpler than the current scheme.

You also mention BDP estimation. I'm not sure how much that helps: either BDP product is larger than what you can do and you fall back to saying 'what you can do' in order to not compromise the system,

or the BDP product is lower than 'what you can do' in which saying the BDP and what you can do has the same effect since the other end won't be able to reach that limit.

- Cheap on resources. No additional buffering commitments beyond the TCP connection window.

You still need to buffer as you read from the socket. You can't guarantee that you'll be able to write + flush everything as you read from the client side.

Buffering can be dynamic in both cases, so there is no real need to be wasteful with resources.

In both cases servers and clients are likely to end up over-commited at times in order to remain efficient due to large difference between average and peek resource usage rates on individual streams or connections.

The receivers last recourse is always to stop reading from sockets to protect itself, but doing that causes HOL blocking and greatly reduces the capabilities of the system. Great implementations will likely need to balance

between over-commitment and chances for HOL blocking.


What do you guys think?

I admit I haven't been able to look at the 200+ messages in the other flow control thread. I know that the flow control in SPDY is by no means perfect, but I do believe is close to the simplest thing that could possibly work.

The current scheme is very similar to other flow control algorithms in use today, like the one available in the SSH protocol, as described in section 5.2 of RFC 4252. There are of course some differences, like the notion of

a global default that applies to all created streams, which has the purpose of allowing clients to start sending POST data before SYN_REPLY + WINDOW_UPDATE. We did think about flow control systems that based on

consumption rates but decided against them due to complexity concerns, and schemes that allow receivers to discard data and request retransmits but decided against it due to the high expense we would incur if we required

senders to support for both userspace and tcp level for retransmits.

I think that the real solution to avoiding flow control stalls on large bandwidth-delay product connections is to just start with larger windows, or send updates to grow the window much more quickly as the receiver learns

about the senders immediate intentions and the receivers current capabilities. In other words, once we should consider updating windows as soon as we see a content-length in SYN_STREAM and SYN_REPLY frames

if the system can sink in significant resources to the processing of that stream. If large BDP is detected on a connection, endpoints should consider tweaking the initial window to compensate.

-antonio

Mike Belshe

unread,

Jun 8, 2012, 9:01:08 PM6/8/12

to spdy...@googlegroups.com

Just because you have a 10Mbps line doesn't mean you have a 10Mbps consumption rate.

- Simplicity. Nearly as simple as eliminating flow control altogether, but without the HoL blocking. Does not require complex algorithms to resize windows or estimate BDP, unlike per-stream window solutions.
I see this as substantially more complex. With size based limits, its a pretty easy check. With rate based limits, you're now requiring all endpoints to clock IOs to certain rates, which is far more subjective than a window size. The hardest problem, which is just figuring out what the limits should be (regardless of whether specified as a rate or number of bytes) is the same in both cases.
They're not clocking IO, they're clocking whatever process consumes the data, which I remain convinced will not be difficult.

The sender is the one enforcing the rate limit dictated by the receiver. So the sender has to clock the rate at which it sends. (BTW - if you strictly adhere to a rate, you could end up sending partial packets)

If figuring out the limits is equally complex in both schemes (which I now realize is more subjective than I'd thought), then my scheme is still superior in that it does not incur periodic RTTs while window updates are sent.
- Cheap on resources. No additional buffering commitments beyond the TCP connection window.
This is just not true nor is it an advantage.

The proxy has to read from the socket buffer. The problem comes up when the proxy can't pass that data to the next hop (because the next hop is running slow/down/something bad) - it now has to buffer. You simply cannot leave it in the socket buffer, as that would mean that any single stream having a slow backend would block *all* streams.
Proxies have the easiest job of all - they just forward consumption rates from downstream receivers to upstream senders. The only time they would ever have to substantially buffer incoming data is when a consumption rate suddenly, drastically changes - and they're still protected by the TCP window. Or have I misunderstood you?

In the case of a SPDY -to- SPDY gateway, the destination server could implement a rate throttle lower than the proxy's rate throttle; in which case the proxy would have to buffer.

I don't think I'm going to convince you, but rate throttling is really really hard. You should read some of the research around transport level pacing.

Mike

Alek Storm

unread,

Jun 11, 2012, 4:05:16 AM6/11/12

to spdy...@googlegroups.com

Fair enough. Since my proposal's met near-universal opposition, I suppose I'll have to fall back to supporting the system of stream+session windows that discussion in the other thread seems to have coalesced around. It has its flaws (as does mine, as others have rightly pointed out), but none of them are fatal. In particular, I'd love to solve the problem of maximum bandwidth utilization (point 1.1 from Greg's excellent writeup) - this proposal was an attempt at doing so.