Thinking about flow control

648 views
Skip to first unread message

Roberto Peon

unread,
May 17, 2012, 4:44:36 PM5/17/12
to spdy...@googlegroups.com
All--

We've been having a number of private conversations about flow control.
The flow control that has been deployed in SPDY/3 seems to work, but it has some definite flaws and suboptimalities.
Basically, it only allows for per-stream flow control.

Per-stream flow control is certainly necessary; without it we'd have head-of-line blocking or infinite-buffering requirements at proxies, which is untenable.
On its, own, however, if we are still worried about infinite buffering and wish to increase the stream limit, per-stream flow control isn't enough. 

So, we probably need per-connection flow-control so that we can manage the total amount of buffering necessary, while still allowing for large enough per-stream flow-control windows to ensure that we can at least fill up the TCP connection.


We're hoping to get comments in over the next two weeks (at most), and then begin spec revisions and possibly experimentation with new flow control in (and for) SPDY/4.

So... comments? Ideas?
-=R

Adam Langley

unread,
May 17, 2012, 4:47:38 PM5/17/12
to spdy...@googlegroups.com
On Thu, May 17, 2012 at 4:44 PM, Roberto Peon <fe...@google.com> wrote:
> So, we probably need per-connection flow-control so that we can manage the
> total amount of buffering necessary

Why is TCP's per-connection flow control insufficient?


Cheers

AGL

Patrick McManus

unread,
May 17, 2012, 4:54:33 PM5/17/12
to spdy...@googlegroups.com
one reason is spdy flow control only covers data frames.. so a server
could protect itself from too much concurrent upload without impacting
its ability to serve get/etc..



Patrick McManus

unread,
May 17, 2012, 5:19:13 PM5/17/12
to spdy...@googlegroups.com
On Thu, 2012-05-17 at 13:44 -0700, Roberto Peon wrote:

>
>
> We're hoping to get comments in over the next two weeks (at most), and
> then begin spec revisions and possibly experimentation with new flow
> control in (and for) SPDY/4.
>
>
> So... comments? Ideas?
> -=R

I'm certainly in favor of doing something as the new flow control params
as used with the small windows currently on google.com definitely slow
down certain scenarios. If an aggregate session window allows the server
to safely use larger per-stream windows then that's a good fix.

I'd suggest that the new global window can both shrink and expand
(presumably via an explicit settings frame analogous to initial window),
and that per stream window_updates be changed to allow shrinking of the
window not just expanding it.

use case for that - its hard to know what the BDP really is, so if you
want fully streamed transfer its tempting to set the window extremely
large so spdy flow control is not the choke point.. if for some reason
the local data sink stops accepting data temporarily you can slam the
window shut with a negative delta and only actually end up buffering
~BDP of data instead of "extremely large".

-Patrick

Roberto Peon

unread,
May 17, 2012, 5:27:40 PM5/17/12
to spdy...@googlegroups.com
Agreed. In all cases where we overestimate, we may cause HOL blocking, but that is always true so long as we're on top of TCP.

One other thing that I was thinking about was that we could have default window sizes which differed based on the stream priority.
We'd react in hopefully one-RTT, so the question there is: Is it worth the additional complexity given the addition of the connection-based flow-control?

-=R
 

-Patrick


Tatsuhiro Tsujikawa

unread,
May 17, 2012, 9:35:41 PM5/17/12
to spdy...@googlegroups.com


2012/05/18 6:19 "Patrick McManus" <mcm...@ducksong.com>:

Do you mean use only per-connection flow control?
Then +1.
It achieves the control of total number of buffering, which is the very issue the flow control was introduced.

Patrick McManus

unread,
May 18, 2012, 7:54:30 AM5/18/12
to spdy...@googlegroups.com

> Do you mean use only per-connection flow control?
> Then +1.
> It achieves the control of total number of buffering, which is the
> very issue the flow control was introduced.
>

No - I'm suggesting we need both per-session and per-stream flow
control.

We need per stream flow control because different data sinks in the set
of multiplexed streams may consume data at a different rate, and you
want to be able to stop buffering one without impacting the other.

On the client side this could be a plugin vs a core rendering component
- or even a media component that has had the pause button pushed. A
proxy is obviously connected in diverse ways (with diverse bandwidth) on
the back side while multiplexing on the front, and an origin server
might be multiplexing 1 receiver that is writing to a big data warehouse
and one that is using its input for lookup in a light in memory hash.

If we add a session wide window - then we can do this stream control
with the windowing we have, or honestly we could do it with per stream
txon/txoff and that might lead to better quality implementations.


Simone Bordet

unread,
May 18, 2012, 8:12:08 AM5/18/12
to spdy...@googlegroups.com
Hi,

On Thu, May 17, 2012 at 10:44 PM, Roberto Peon <fe...@google.com> wrote:
> All--
>
> We've been having a number of private conversations about flow control.
> The flow control that has been deployed in SPDY/3 seems to work, but it has
> some definite flaws and suboptimalities.
> Basically, it only allows for per-stream flow control.
>
> Per-stream flow control is certainly necessary; without it we'd have
> head-of-line blocking or infinite-buffering requirements at proxies, which
> is untenable.

Yes.

> On its, own, however, if we are still worried about infinite buffering and
> wish to increase the stream limit, per-stream flow control isn't enough.

I am not sure I understand this paragraph. Can you expand ?

The current SETTINGS for the initial window size, when received,
applies to all active streams, so it's already more at the connection
level rather than at the stream level (i.e. you cannot update the
initial window size of only one stream).

Thanks,

Simon
--
http://cometd.org
http://intalio.com
http://bordet.blogspot.com
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

Peter Lepeska

unread,
May 18, 2012, 9:39:20 AM5/18/12
to spdy...@googlegroups.com
"Essentially, the proxy needs a way to advertise the amount of room it has in its outgoing buffers, which it can't easily do with TCP's flow-control."

Yes it can. When a proxy server's buffer grows to a certain threshold, it just stops posting buffers to the receive socket connected to the web server. When this happens the TCP stack buffer fills up (since no packets are being pulled off by the user mode process), and TCP starts sending smaller receive windows in its ACKs to the web server, which slows down its send rate. This seems fine to me.

I'm still not understanding what problem we are trying to solve by adding another layer of flow control to the TCP session.

Peter

On Thu, May 17, 2012 at 5:00 PM, Roberto Peon <fe...@google.com> wrote:
TCP's flow-control alone would allow HOL blocking to occur in cases where the proxy->server connections were full for only some streams because the userspace process of the server doesn't control advertisements of its ingress window.
Essentially, the proxy needs a way to advertise the amount of room it has in its outgoing buffers, which it can't easily do with TCP's flow-control.

-=R

Ryan Hamilton

unread,
May 18, 2012, 11:34:50 AM5/18/12
to spdy...@googlegroups.com
On Fri, May 18, 2012 at 6:39 AM, Peter Lepeska <bizzb...@gmail.com> wrote:
"Essentially, the proxy needs a way to advertise the amount of room it has in its outgoing buffers, which it can't easily do with TCP's flow-control."

Yes it can. When a proxy server's buffer grows to a certain threshold, it just stops posting buffers to the receive socket connected to the web server. When this happens the TCP stack buffer fills up (since no packets are being pulled off by the user mode process), and TCP starts sending smaller receive windows in its ACKs to the web server, which slows down its send rate. This seems fine to me.

I'm still not understanding what problem we are trying to solve by adding another layer of flow control to the TCP session.

Imagine a browser has a SPDY session open to a proxy.  In this session, imagine that there is a large POST upload in progress.  Further, imagine that the server that the proxy needs to relay the POST data to is slower than than client.  At this point the proxy needs to either buffer indefinitely (which clearly does not scale) or it needs to pause the upload.  If the only flow control knob at the proxy's disposal is at the TCP level, it can clearly pause the entire session.  However, this will prevent any other streams from making any  progress.  You might be surprised how often this situation comes up.

You can also spin the example around, as Patrick did earlier, and imagine that the server is using a stream to send streaming media to the client...  Same potential problem.

Cheers,

Ryan

Simone Bordet

unread,
May 18, 2012, 12:25:17 PM5/18/12
to spdy...@googlegroups.com
Hi,

On Fri, May 18, 2012 at 5:34 PM, Ryan Hamilton <r...@google.com> wrote:
>
>
> On Fri, May 18, 2012 at 6:39 AM, Peter Lepeska <bizzb...@gmail.com> wrote:
>>
>> "Essentially, the proxy needs a way to advertise the amount of room it has
>> in its outgoing buffers, which it can't easily do with TCP's flow-control."
>>
>> Yes it can. When a proxy server's buffer grows to a certain threshold, it
>> just stops posting buffers to the receive socket connected to the web
>> server. When this happens the TCP stack buffer fills up (since no packets
>> are being pulled off by the user mode process), and TCP starts sending
>> smaller receive windows in its ACKs to the web server, which slows down its
>> send rate. This seems fine to me.
>>
>> I'm still not understanding what problem we are trying to solve by adding
>> another layer of flow control to the TCP session.
>
>
> Imagine a browser has a SPDY session open to a proxy.  In this session,
> imagine that there is a large POST upload in progress.  Further, imagine
> that the server that the proxy needs to relay the POST data to is slower
> than than client.  At this point the proxy needs to either buffer
> indefinitely (which clearly does not scale) or it needs to pause the upload.
>  If the only flow control knob at the proxy's disposal is at the TCP level,
> it can clearly pause the entire session.  However, this will prevent any
> other streams from making any  progress.  You might be surprised how often
> this situation comes up.

I do not follow. The proxy should flow-control the browser.

> You can also spin the example around, as Patrick did earlier, and imagine
> that the server is using a stream to send streaming media to the client...
>  Same potential problem.

Same solution: the proxy flow-controls the server.

Back to upload example: browser sends a window-sized data frame, and
can't send more until it gets a window update.
Proxy receives the data frame, forwards to upstream server.
If the server is SPDY aware, the proxy will get a window update, and
upon receiving that, it sends the window update to the browser, which
will send another data frame.
If the server is not SPDY aware, the proxy writes to it and only when
the write is fully completed (to leverage TCP backpressure), then the
proxy sends a window update to the browser.

Similarly for server to client.

I feel I am missing something, but can't see it ?

Ryan Hamilton

unread,
May 18, 2012, 12:33:22 PM5/18/12
to spdy...@googlegroups.com
Perhaps I was confused.  I was responding to your question:

> I'm still not understanding what problem we are trying to solve by adding
> another layer of flow control to the TCP session.

To do this, I presented two situations where a browser talking to a proxy would want to stop a single stream in a SPDY session but allow other SPDY streams to progress.  TCP-only flow control does not provide this facility.  However, you replied to my message with this:

> If the server is SPDY aware, the proxy will get a window update, and
> upon receiving that, it sends the window update to the browser, which
> will send another data frame.
> If the server is not SPDY aware, the proxy writes to it and only when
> the write is fully completed (to leverage TCP backpressure), then the
> proxy sends a window update to the browser.

Both of these examples include WINDOW_UPDATE frames which are, of course, at the SPDY layer, not the TCP layer.  I think I don't understand the question you are asking?  Do you agree with the need for a particular stream in a SPDY session to be paused which others can proceed?

Cheers,

Ryan

 

Simone Bordet

unread,
May 18, 2012, 12:44:06 PM5/18/12
to spdy...@googlegroups.com
Hi,

On Fri, May 18, 2012 at 6:33 PM, Ryan Hamilton <r...@google.com> wrote:
> Perhaps I was confused.  I was responding to your question:
>
>> I'm still not understanding what problem we are trying to solve by adding
>> another layer of flow control to the TCP session.

Was not me, was Peter :)

> To do this, I presented two situations where a browser talking to a proxy
> would want to stop a single stream in a SPDY session but allow other SPDY
> streams to progress.  TCP-only flow control does not provide this facility.

Sure.

>  However, you replied to my message with this:
>
>> If the server is SPDY aware, the proxy will get a window update, and
>> upon receiving that, it sends the window update to the browser, which
>> will send another data frame.
>> If the server is not SPDY aware, the proxy writes to it and only when
>> the write is fully completed (to leverage TCP backpressure), then the
>> proxy sends a window update to the browser.
>
> Both of these examples include WINDOW_UPDATE frames which are, of course, at
> the SPDY layer, not the TCP layer.  I think I don't understand the question
> you are asking?  Do you agree with the need for a particular stream in a
> SPDY session to be paused which others can proceed?

I agree, and that's what the current flow-control is for.

I do not clearly see the case for a SPDY session level flow control,
in addition to SPDY stream level flow control.
See also my other email in reply to Roberto asking for more details.

We just want be part of the loop: we can't comment if we don't
understand the use case :)

Thanks,

Ryan Hamilton

unread,
May 18, 2012, 1:05:53 PM5/18/12
to spdy...@googlegroups.com
On Fri, May 18, 2012 at 9:44 AM, Simone Bordet <sbo...@intalio.com> wrote:
Hi,

On Fri, May 18, 2012 at 6:33 PM, Ryan Hamilton <r...@google.com> wrote:
> Perhaps I was confused.  I was responding to your question:
>
>> I'm still not understanding what problem we are trying to solve by adding
>> another layer of flow control to the TCP session.

Was not me, was Peter :)

Doh!  Well, that explains why I was confused :>  Sorry about that.  I'll let the other comment further on the session-level flow control debate.

Cheers,

Ryan

Patrick McManus

unread,
May 18, 2012, 1:09:32 PM5/18/12
to spdy...@googlegroups.com
On Fri, 2012-05-18 at 18:44 +0200, Simone Bordet wrote:

> I do not clearly see the case for a SPDY session level flow control,
> in addition to SPDY stream level flow control.

My understanding is that giving every stream a window equal to
session-buffers-available represents a scary overcommitment for the
server if every stream decided to utilize it at the same time. It also
creates an unwelcome incentive to minimize the number of parallel
streams.

Likewise, dividing the session buffers available into small shares for
each stream often results in some streams easily running out of window
space while other streams waste their allocations.

You'll see that google.com has been using small 12KB initial windows -
which are too small to allow full rate uploads for many common BDPs. The
flow control proposal arises out of the need to fix that bottleneck
while still informing the client of the server's available bufferspace
(which is presumably > 12KB).

Having 2 levels of flow control lets you separate the concerns: the
session value is about total buffers available, the per-stream value is
about letting different streams proceed at different rates (and that's
why I think it can be done with xon/xoff in the presence of a session
window).

-Patrick

Tatsuhiro Tsujikawa

unread,
May 18, 2012, 1:14:25 PM5/18/12
to spdy...@googlegroups.com
Thank you for answering.
I understand that stream sink has different consume rate.
Then we need a way to tell the other endpoint the initial window size
for each stream, no?
In the SPDY/3 spec, SETTINGS frame sets initial window size for all
streams in a session.
The one good point of introducing per-session flow control is that we
can use larger
initial window size for stream so that flow control does not slow down
the transfer.
If the endpoint know the particular stream sink is slow, then it has
to tell the other endpoint about it by shrinking window size for it.

Ryan Hamilton

unread,
May 18, 2012, 1:17:37 PM5/18/12
to spdy...@googlegroups.com
Right, that is what the WINDOW_UPDATE frame is for.  It changes the window size for a particular stream.

Cheers,

Ryan

Tatsuhiro Tsujikawa

unread,
May 18, 2012, 1:25:05 PM5/18/12
to spdy...@googlegroups.com
But WINDOW_UPDATE delta-window-size only allows positive value:

Delta-Window-Size: The additional number of bytes that the sender can
transmit in addition to existing remaining window size. The legal
range for this field is 1 to 2^31 - 1 (0x7fffffff) bytes.

My concern is that in SPDY/3, all streams have the same initial window
size, for example, 64KB.
We can increase window size by WINDOW_UPDATE (I'm not sure we can exceed initial
window size by sending WINDOW_UPDATE), but we cannot decrease window
size by that frame.

Tatsuhiro

> Cheers,
>
> Ryan
>

William Chan (陈智昌)

unread,
May 18, 2012, 1:26:39 PM5/18/12
to spdy...@googlegroups.com
There is no negative delta currently in the SPDY/3 spec:
"Delta-Window-Size: The additional number of bytes that the sender can transmit in addition to existing remaining window size. The legal range for this field is 1 to 2^31 - 1 (0x7fffffff) bytes."
 

-Patrick


Mike Belshe

unread,
May 18, 2012, 1:30:41 PM5/18/12
to spdy...@googlegroups.com
Is there data that accompanies this claim?

I'm mostly siding with Langley's question - that TCP's flow control should work for the overall layer.  Its simple and should suffice just fine.  It sounds like you're trying to flow control between multiple systems now.  (e.g. client -> proxy -> backend server).  This is a mistake - in the end, this will be a mistake - its over-optimizing for the proxy at the expense of the endpoints, because the only way to do this is to add round trips of latency onto the user.

I'll again caution against thinking flow control is easy.  It sounds easy, but its complicated with really subtle implications.

Mike

William Chan (陈智昌)

unread,
May 18, 2012, 1:32:32 PM5/18/12
to spdy...@googlegroups.com
Just to clarify, you don't have to *shrink* the window size per-se, but just not send more WINDOW_UPDATEs, or send ones with smaller deltas.

Patrick McManus

unread,
May 18, 2012, 1:37:49 PM5/18/12
to spdy...@googlegroups.com
On Fri, 2012-05-18 at 10:26 -0700, William Chan (陈智昌) wrote:
>
> I'd suggest that the new global window can both shrink and
> expand
> (presumably via an explicit settings frame analogous to
> initial window),
> and that per stream window_updates be changed to allow
> shrinking of the
> window not just expanding it.
>
> use case for that - its hard to know what the BDP really is,
> so if you
> want fully streamed transfer its tempting to set the window
> extremely
> large so spdy flow control is not the choke point.. if for
> some reason
> the local data sink stops accepting data temporarily you can
> slam the
> window shut with a negative delta and only actually end up
> buffering
> ~BDP of data instead of "extremely large".
>
>
> There is no negative delta currently in the SPDY/3 spec:
> "Delta-Window-Size: The additional number of bytes that the sender can
> transmit in addition to existing remaining window size. The legal
> range for this field is 1 to 2^31 - 1 (0x7fffffff) bytes."
>

right! that's why I said.. "window_update be changed to allow
shrinking".. :)

But I'm now thinking that, assuming a new per-session window, the
per-stream windows can be replaced with simpler per-stream xon/xoff
notifications. You don't have to worry about sizing those to network
conditions, so it will be more robust.





Mike Belshe

unread,
May 18, 2012, 1:38:40 PM5/18/12
to spdy...@googlegroups.com
On Fri, May 18, 2012 at 10:09 AM, Patrick McManus <mcm...@ducksong.com> wrote:
On Fri, 2012-05-18 at 18:44 +0200, Simone Bordet wrote:

> I do not clearly see the case for a SPDY session level flow control,
> in addition to SPDY stream level flow control.

My understanding is that giving every stream a window equal to
session-buffers-available represents a scary overcommitment for the
server if every stream decided to utilize it at the same time. It also
creates an unwelcome incentive to minimize the number of parallel
streams.

Likewise, dividing the session buffers available into small shares for
each stream often results in some streams easily running out of window
space while other streams waste their allocations.

You'll see that google.com has been using small 12KB initial windows -
which are too small to allow full rate uploads for many common BDPs. The
flow control proposal arises out of the need to fix that bottleneck
while still informing the client of the server's available bufferspace
(which is presumably > 12KB).

The 64KB default is supposed to make flow control so rare that it almost never gets in the way while also providing a reasonable backstop.

One thing I hate about SPDY flow control is it gives too much control to the proxy (similar to how we don't want proxies to be able to turn off compression at the user's expense).   I hope Chrome and Firefox implement a minimum window size clamp of 32KB or something to prevent these shenanigans.

Mike

William Chan (陈智昌)

unread,
May 18, 2012, 1:39:34 PM5/18/12
to spdy...@googlegroups.com
Does this work? How do you know they've received the xoff notification? Do we need sequence numbers in streams? How do you detect a broken peer so you can send them an error?
 

Patrick McManus

unread,
May 18, 2012, 1:53:47 PM5/18/12
to spdy...@googlegroups.com
On Fri, 2012-05-18 at 10:38 -0700, Mike Belshe wrote:

>
> The 64KB default is supposed to make flow control so rare that it
> almost never gets in the way while also providing a reasonable
> backstop.
>

it's hard to square that rarity at 64KB with the entire presence of the
TCP window scaling option which is needed past the same threshold.

just 8mbit/sec at 100ms of latency needs 100KB of BDP. That's hardly a
corner case, right?




Patrick McManus

unread,
May 18, 2012, 2:01:18 PM5/18/12
to spdy...@googlegroups.com
On Fri, 2012-05-18 at 10:39 -0700, William Chan (陈智昌) wrote:

>
> Does this work? How do you know they've received the xoff
> notification?

you can't verify it, other than at some extreme point as the total
session window provides the real backstop. but when do you really care?

do you really have a partitioned set of resources available to some
streams but not others (in which case I agree, we need per stream
windows), or do you just want to pushback and stop the sending asap (in
which case xoff ought to be sufficient).

For the most part the resource in question is ram and ram is mostly
fungible.



William Chan (陈智昌)

unread,
May 18, 2012, 2:01:41 PM5/18/12
to spdy...@googlegroups.com
One thing that Roberto neglected to mention from our private conversations is also trying to improve latency, and to what degree flow control is useful here. Let me call your attention to a few points:

* SPDY/3 is faster than SPDY/2 for *some* webpages - Google has data on some of our properties to support this. We don't have data (yet) on cases where it is hurting our page load times. But don't take that to mean that that could not happen.
  - It turns out that you may incur bufferbloat by returning responses too quickly. SPDY/3 (as a side-effect) helps fight the problem due to per-stream flow control windows that restrain streams. Then if you get a high priority resource request later (let's say there's a script at the bottom of a document behind a gazillion large images), it won't be stuck behind all the other data in the bloated buffers.
  - Obviously, if the flow control windows are too small, then we're not filling the pipe as quickly as we could, and thus flow control can make certain pages *slower*.
* We don't have data on what percent of the web would see improvements with the current 64k window size, and what percent would have no impact, and what percent would get slower. It'd be interesting to see.
* Note that the original goal for flow control was to better manage buffers (rather than rely on TCP flow control, which would lead to HoL blocking), not prevent this bufferbloat.
* Note that just because there is space in the peer's per-stream rwin, it does not mean that a sender *has* to use that space. It theoretically could self-throttle if it expects that there is bufferbloat (maybe examine inter-packet arrival times?).
* But if the receiver believes the sender has a naive implementation, perhaps it's better to force the sender not to cause bufferbloat by constraining it with smaller window sizes via SETTINGS for per-stream window sizes. Or, perhaps better would be to do so via theoretical SPDY/4 per-session flow control window sizes.
* Also note that individual streams *can* have different rwins. You control them individually via WINDOW_UPDATE deltas. Why would you want to do this? How about long-lived streams vs short-lived streams (the current common case). Long-lived streams like big downloads/uploads or tunneling (SPDY over SPDY for example) may require different window sizes.
* These considerations all become more complicated given networks with higher variance in relevant characteristics, like mobile networks.

Anyway, I just wanted to point out that the implementation choices are complicated. It'd be great to hear discussion from folks (and more data if possible) about what good implementations *should* do.

William Chan (陈智昌)

unread,
May 18, 2012, 2:11:06 PM5/18/12
to spdy...@googlegroups.com
On Fri, May 18, 2012 at 11:01 AM, Patrick McManus <mcm...@ducksong.com> wrote:
On Fri, 2012-05-18 at 10:39 -0700, William Chan (陈智昌) wrote:

>
> Does this work? How do you know they've received the xoff
> notification?

you can't verify it, other than at some extreme point as the total
session window provides the real backstop. but when do you really care?

do you really have a partitioned set of resources available to some
streams but not others (in which case I agree, we need per stream
windows), or do you just want to pushback and stop the sending asap (in
which case xoff ought to be sufficient).

I'm not exactly clear on what you mean by partitioned set of resources available to some streams but not others, but if I understand correctly, then yes, I suspect that this can happen, in particular with proxies. If you don't agree, then perhaps I need more clarity on what you mean.

Patrick McManus

unread,
May 18, 2012, 2:18:47 PM5/18/12
to spdy...@googlegroups.com
On Fri, 2012-05-18 at 11:11 -0700, William Chan (陈智昌) wrote:

>
> I'm not exactly clear on what you mean by partitioned set of resources
> available to some streams but not others, but if I understand
> correctly, then yes, I suspect that this can happen, in particular
> with proxies. If you don't agree, then perhaps I need more clarity on
> what you mean.
>

>

you have a session window of N for M streams.

do you really care if 1 stream consumes more than 1/Mth of that, as long
as the session doesn't exceed N? (or even 1/2 or whatever..) I do
understand the need to give feedback to individual streams that they
shouldn't send because that data won't be acked right away (thus xoff).

from my pov the peer has been given a quota of N, how they want to split
that up can be up to them.

I don't hate per stream windows, its just a knob that's hard to set
right so if it can be removed that's a good thing. Maybe it can't be
removed :)

Peter Lepeska

unread,
May 18, 2012, 2:35:09 PM5/18/12
to spdy...@googlegroups.com
Hi Ryan,

I understand why SPDY needs to do per stream flow control. That part makes sense to me and is well illustrated by your example. But I thought this thread was about adding another layer of flow control to the TCP session itself. That part seems redundant with what TCP is already doing.

Thanks,

Peter

Simone Bordet

unread,
May 19, 2012, 1:29:43 PM5/19/12
to spdy...@googlegroups.com
Hi,

On Fri, May 18, 2012 at 8:18 PM, Patrick McManus <mcm...@ducksong.com> wrote:
> you have a session window of N for M streams.
>
> do you really care if 1 stream consumes more than 1/Mth of that, as long
> as the session doesn't exceed N? (or even 1/2 or whatever..) I do
> understand the need to give feedback to individual streams that they
> shouldn't send because that data won't be acked right away (thus xoff).
>
> from my pov the peer has been given a quota of N, how they want to split
> that up can be up to them.
>
> I don't hate per stream windows, its just a knob that's hard to set
> right so if it can be removed that's a good thing. Maybe it can't be
> removed :)

What do you suggest to avoid that one stream that takes 90% of N does
not starve other streams (e.g. I am streaming a movie, but it's
boring, so I want to do a little browsing meanwhile) ?
Also, you seem to suggest a mechanism to increase the per-stream
window (that starts at a default < N, but eventually may increase up
to N), right ?
While for decreasing the per-stream window you suggest negative window updates.

I am worried about the fact that there may be the need for some
inter-stream communication to adjust the per-stream window size (e.g.
stream1 takes most of N, but as soon as another transfer is requested,
stream1 needs to be shrunk in order to give some room to stream2), but
perhaps I am not understanding it right.

Patrick McManus

unread,
May 19, 2012, 9:30:33 PM5/19/12
to spdy...@googlegroups.com, Simone Bordet
On 5/19/2012 1:29 PM, Simone Bordet wrote:
> Hi,
>
> On Fri, May 18, 2012 at 8:18 PM, Patrick McManus<mcm...@ducksong.com> wrote:
>> you have a session window of N for M streams.
>>
>> do you really care if 1 stream consumes more than 1/Mth of that, as long
>> as the session doesn't exceed N? (or even 1/2 or whatever..) I do
>> understand the need to give feedback to individual streams that they
>> shouldn't send because that data won't be acked right away (thus xoff).
>>
>> from my pov the peer has been given a quota of N, how they want to split
>> that up can be up to them.
>>
>> I don't hate per stream windows, its just a knob that's hard to set
>> right so if it can be removed that's a good thing. Maybe it can't be
>> removed :)
> What do you suggest to avoid that one stream that takes 90% of N does
> not starve other streams (e.g. I am streaming a movie, but it's
> boring, so I want to do a little browsing meanwhile) ?

If you send tx-off when you pause the movie then you'll end up buffering
BDP. (or less if it isn't sending at line rate).

If, instead of txon/txoff, you had a per-stream-window smaller than BDP
then the stream can never flow at line rate, almost by definition. And
you would of course buffer less for that one stream. I don't think the
risk of accidentally constraining the transfer rate is worth that, but
it would be the argument in favor of per stream windows.

If, instead of txon/txoff, you had a per-stream window greater than BDP
the stream would buffer more than BDP before it was stopped. This could
be mitigated down to BDP if we had negative deltas.

BDP is very hard to know - so its hard to make the per stream window
size choice accurately as an implementor and that lets people
accidentally slow things down without a lot of insight into why. I've
definitely already seen this once with spdy/3 and I suspect the more
chances there are to select window sizes the more often they will be
selected too small.

Simone Bordet

unread,
May 21, 2012, 11:14:56 AM5/21/12
to spdy...@googlegroups.com
Hi,

On Sun, May 20, 2012 at 3:30 AM, Patrick McManus <mcm...@ducksong.com> wrote:
> If you send tx-off when you pause the movie then you'll end up buffering
> BDP. (or less if it isn't sending at line rate).

But I am not pausing the movie.
I am watching the movie, but meanwhile I start a big download, for example.
How do you transfer part of the session window taken by the movie
stream to the download stream ?

> If, instead of txon/txoff, you had a per-stream-window smaller than BDP then
> the stream can never flow at line rate, almost by definition. And you would
> of course buffer less for that one stream. I don't think the risk of
> accidentally constraining the transfer rate is worth that, but it would be
> the argument in favor of per stream windows.
>
> If, instead of txon/txoff, you had a per-stream window greater than BDP the
> stream would buffer more than BDP before it was stopped. This could be
> mitigated down to BDP if we had negative deltas.
>
> BDP is very hard to know - so its hard to make the per stream window size
> choice accurately as an implementor and that lets people accidentally slow
> things down without a lot of insight into why.  I've definitely already seen
> this once with spdy/3 and I suspect the more chances there are to select
> window sizes the more often they will be selected too small.

Not sure you offered a solution ? :)

Just to play the devil's advocate here: are we not reinventing TCP on
top of SPDY ?
With all the problems it carries (e.g. bufferbloat), and additional
ones (how to share bandwidth among streams) ?

Can we get a clearer explanation of what Google found in its measurement ?

Thanks !

Patrick McManus

unread,
May 21, 2012, 11:44:56 AM5/21/12
to spdy...@googlegroups.com
On Mon, 2012-05-21 at 17:14 +0200, Simone Bordet wrote:

> But I am not pausing the movie.
> I am watching the movie, but meanwhile I start a big download, for example.
> How do you transfer part of the session window taken by the movie
> stream to the download stream ?
>

I don't think that's a great use of flow control. imo flow control is
about not overrunning the consumer's buffers. Your scenario is using it
to try and manage bandwidth sharing (you don't need more buffers to
support the different streams as the aggregate bandwidth hasn't
changed). I don't think it should manage stream rates - but if the group
thinks it should then I agree txon/txoff ain't going to fill the bill. I
think priorities are a better approach to bandwidth issues and spdy
needs to make them dynamically manageable.

> Just to play the devil's advocate here: are we not reinventing TCP on
> top of SPDY ?

to some degree (hopefully a minor one) the multiplexing demands this.
Definitely a price to be paid for the benefit of being able to actually
improve the web today. (instead of doing a sctp like approach)

> Can we get a clearer explanation of what Google found in its measurement ?
>

google can share their own data points, but I can attest that a gmail
attachment over spdy/3 against a 12KB per-stream window can be 3x slower
with latencies in the high (but not insane) range of 200ms. and 12kb is
what we've seen deployed to this point - roberto suggested that a
per-session window would allow a greater value.

and you can see Chris Strom running 64KB download windows down to zero
at
http://japhr.blogspot.com/2012/05/firefox-spdy2-vs-spdy3-in-graphs.html ..

it seems to me this will be a very hard knob to set correctly. So if it
can go away - it should.



Peter Lepeska

unread,
May 21, 2012, 12:50:47 PM5/21/12
to spdy...@googlegroups.com
I realize that this email thread is primarily about adding another layer of flow control to SPDY that operates at the SPDY session level but I'm still trying to understand if even the per stream flow control is necessary.

So it seems like the main problem per stream flow control is meant to solve is head of line blocking. To be specific, the problem is when you have a low and a high priority stream active and the low priority stream is transferring a lot of data from a fast content server and the high priority stream is only sending intermittently. In this case, there can be a full 64KB of low priority data in flight across the TCP connection when the high priority data needs to be sent over the TCP connection. If the first packet of the 64KB gets dropped, then the high priority data has to wait at least until the low priority block gets re-transmitted. 

Per stream flow control mitigates (but does not solve) this by preventing the low priority data from ever using the full TCP connection so that the number of low priority bytes in front of the high priority data is smaller. The drawback is that the low priority stream never gets to use the full connection, even when there is no high priority data to send. Is this description correct? And is this the primary use case for per stream flow control?

Has it been considered to use multiple SPDY sessions, one TCP connection per priority level at any given time, so that two streams with different priorities are never competing for bandwidth over the same TCP connection?

Not asking anyone in particular. Just trying to think through the problem. My apologies if I'm taking the discussion backwards a bit.

Thanks,

Peter

Eric Ceres

unread,
May 21, 2012, 1:06:18 PM5/21/12
to spdy...@googlegroups.com, spdy...@googlegroups.com
Hi,

To add more clarity here is an example. A large file is being uploaded. Let's say the server is not able to store the file fast enough. Then with a standard http transmission the receive buffer fills up and the upload waits for more buffer space.

With speedy this means the server stops reading the buffer then all requests will be blocked. This leaves few options:
1. Handle infinite buffer (not really doable)
2. Have someway to block further requests on the stream by notify all reads blocked/unblocked (on/off switch which could trigger new connection)
3. Have someway to block buffering for a stream (moves buffer/blocking from TCP to protocol handling level) 

Eric Ceres

Roberto Peon

unread,
May 21, 2012, 1:17:55 PM5/21/12
to spdy...@googlegroups.com
Nah, this is nowhere near as complicated as TCP. We're only solving a flow-control issue here.
 
With all the problems it carries (e.g. bufferbloat), and additional
ones (how to share bandwidth among streams) ?

If this is done right, we won't have bufferbloat (or, if we do, it would still be better than the bufferbloat which would be happening with many TCP connections).

-=R

Roberto Peon

unread,
May 21, 2012, 1:23:58 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 9:50 AM, Peter Lepeska <bizzb...@gmail.com> wrote:
I realize that this email thread is primarily about adding another layer of flow control to SPDY that operates at the SPDY session level but I'm still trying to understand if even the per stream flow control is necessary.

So it seems like the main problem per stream flow control is meant to solve is head of line blocking. To be specific, the problem is when you have a low and a high priority stream active and the low priority stream is transferring a lot of data from a fast content server and the high priority stream is only sending intermittently. In this case, there can be a full 64KB of low priority data in flight across the TCP connection when the high priority data needs to be sent over the TCP connection. If the first packet of the 64KB gets dropped, then the high priority data has to wait at least until the low priority block gets re-transmitted. 

Well, per-stream flow control is necessary to prevent HOL blocking through proxies, since the proxies will demux your multiplexed session, and then remux onto other connections (any of which will have different path-restrictions or buffer requirements). Most loadbalancers are proxies.

 

Per stream flow control mitigates (but does not solve) this by preventing the low priority data from ever using the full TCP connection so that the number of low priority bytes in front of the high priority data is smaller. The drawback is that the low priority stream never gets to use the full connection, even when there is no high priority data to send. Is this description correct? And is this the primary use case for per stream flow control?

Has it been considered to use multiple SPDY sessions, one TCP connection per priority level at any given time, so that two streams with different priorities are never competing for bandwidth over the same TCP connection?

It has been considered and, at least so far, rejected as impractical. Basically, each TCP connection likely goes to a different machine for any decent scale deployment (most of which use loadbalancers). You end up losing too much latency, you add significant cost, and you end up with much more potential buffer bloat.

 

Not asking anyone in particular. Just trying to think through the problem. My apologies if I'm taking the discussion backwards a bit.

Its a good thought, and if we find that the problems with multiplexing are insurmountable it will likely be tried. We're not there yet, though!
-=R

Peter Lepeska

unread,
May 21, 2012, 1:25:07 PM5/21/12
to spdy...@googlegroups.com
That example helps a lot. So per stream flow control in this example is about managing buffering on the SPDY server and the SPDY client for uploads and downloads respectively. If that is the case, then it should only kick in when the network is faster than the receiving end point, which in this case is the content server receiving the uploaded file, right?

So then in summary there are two use cases for per stream flow control -- 1) mitigating HOL blocking and 2) managing buffering by providing feedback on receiver rate of data acceptance. Any others?

The problem with #1 is that it slows down low priority connections even when there is no high priority data to send. For #2, there has to be some type of feedback because as you said infinite buffering is not an option but at least the per stream flow control only has to kick in when the receiving end point is slow.

Anyone please correct anything incorrect about the above understanding,

Peter

Costin Manolache

unread,
May 21, 2012, 1:25:16 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 10:06 AM, Eric Ceres <eric...@gmail.com> wrote:
Hi,

To add more clarity here is an example. A large file is being uploaded. Let's say the server is not able to store the file fast enough. Then with a standard http transmission the receive buffer fills up and the upload waits for more buffer space.

With speedy this means the server stops reading the buffer then all requests will be blocked. This leaves few options:
1. Handle infinite buffer (not really doable)
2. Have someway to block further requests on the stream by notify all reads blocked/unblocked (on/off switch which could trigger new connection)
3. Have someway to block buffering for a stream (moves buffer/blocking from TCP to protocol handling level) 

So the goal is to allow the sender to keep sending high-priority and 'fast' streams, based on some info about the receiver buffers and how much of each stream is buffered. 

If SPDY had session-level flow control, the sender would know how much 
total buffer space is available - and slow down some streams, stop sending lower priority, etc - but still be able to send higher priority streams until receiver buffer is completely filled. At that point - it can't send anything else.

With the current per-stream, the problem is that the receiver ( or proxy ) needs a buffer that is the sum of all streams windows, for the worse-case when all streams are stalled.

With extra session-level you can have a smaller buffer in the receiver, and if the client may adapt the sending. But eventually you can still get to a worse-case state where the receiver buffer is all full and you can't send anything new stream. In which case if the client knows it's happened, it can either open a new connection or cancel some of the streams.

Costin

Peter Lepeska

unread,
May 21, 2012, 1:59:15 PM5/21/12
to spdy...@googlegroups.com
" Well, per-stream flow control is necessary to prevent HOL blocking through proxies, since the proxies will demux your multiplexed session, and then remux onto other connections (any of which will have different path-restrictions or buffer requirements). Most loadbalancers are proxies. "

I'm having trouble understanding how proxies exacerbate the problem of HOL blocking. Can you provide a little more detail on this example? Is it b/c a proxy adds its own buffering and therefore increases the amount of low priority data, over and above a single TCP send window, that can be ahead of high priority data on the stream?

Thanks,

Peter

Greg Wilkins

unread,
May 21, 2012, 2:20:42 PM5/21/12
to spdy...@googlegroups.com
On 18 May 2012 19:09, Patrick McManus <mcm...@ducksong.com> wrote:
Having 2 levels of flow control lets you separate the concerns: the
session value is about total buffers available, the per-stream value is
about letting different streams proceed at different rates (and that's
why I think it can be done with xon/xoff in the presence of a session
window).

Patrick,

Can you confirm deny if the following paraphrasing captures what you mean by xon/xoff.

Your hypothesis is that we can achieve near maximum TCP/IP throughput, but without HOL channel blocking by having two levels of flow control.  The outer flow control is applied to the session as a whole and once some threshold is reached, then channels will start being given Xoff messages to stop them sending more data.     So which channels would received the Xoff?  The next one to send a data packet or the ones with the most recent usage?

When would a Xon be sent? after a time delay? once all queued data is sent? once queued data is reduced below some threshold?

Do you think it would be possible to use TCP itself as the outer flow control?  Ie let all channels send as much data as they like until the next frame would block the TCP/IP stream.  At this point Xoff's would be issued to the busiest channels and data would be queued for the non-busy channels... until such point as they become busy and are Xoffed or until the buffers are full and all channels are Xoffed.    Xons would be issued as the buffers are emptied (to the least busy channels first). I think such an algorithm would represent a small amount of HOL blocking as it would allow a busy channel to run the TCP/IP connection until it was actually flow controlled, but it would give the non busy channels priority once the flow resumes.     If something like this is workable, then it has the advantage of not needing any window sizes at all - just TCP flow control and channel xon/xoff.

cheers




William Chan (陈智昌)

unread,
May 21, 2012, 2:50:45 PM5/21/12
to spdy...@googlegroups.com
If a proxy receives data for a stream X, but its backend cannot consume the data anymore, then without per-stream flow control or infinite buffering, the proxy must stop reading the TCP connection. Thus, stopping the reading of data for stream X requires stopping the reading of the entire TCP connection, so stream X causes HoL blocking of all other stream data behind it in the TCP connection.

Patrick McManus

unread,
May 21, 2012, 3:11:07 PM5/21/12
to spdy...@googlegroups.com
On Mon, 2012-05-21 at 20:20 +0200, Greg Wilkins wrote:
>
>
> On 18 May 2012 19:09, Patrick McManus <mcm...@ducksong.com> wrote:
> Having 2 levels of flow control lets you separate the
> concerns: the
> session value is about total buffers available, the per-stream
> value is
> about letting different streams proceed at different rates
> (and that's
> why I think it can be done with xon/xoff in the presence of a
> session
> window).
>
> Patrick,
>
> Can you confirm deny if the following paraphrasing captures what you
> mean by xon/xoff.
>
> Your hypothesis is that we can achieve near maximum TCP/IP throughput,
> but without HOL channel blocking by having two levels of flow control.
> The outer flow control is applied to the session as a whole and once
> some threshold is reached, then channels will start being given Xoff
> messages to stop them sending more data.

pretty much

> So which channels would received the Xoff? The next one to send a
> data packet or the ones with the most recent usage?
>

not specified by the protocol - the receiver can implement it any way
they want. I would expect that the receiver would xoff a stream who's
data sink had built a stream specific buffer too large. (i.e. was not
consuming at line rate). I'm not naive, I understand that this is a sort
of windowing in itself but for configuration it separates out the
necessary BDP component for making the stream run at line rate, which I
hope we all agree is the goal under normal circumstances, and that's the
knob people are going to intuitively buffer too low and not future
proof.

When someone asks themselves "how many buffers do I allocate to the data
base process" are they asking themselves contextually "how many buffers
do I allocate for the DB and at what speed and latency?" I'm going to
say they don't do that and that's an impossibly hard question to ask an
admin anyhow. Much better to say "how many buffers on top of those
required by the network for full data rate" will you allocate to this
stream?

The full session buffer limit can backstop you if you really don't have
the resources to run in a high BDP environment.

> When would a Xon be sent? after a time delay? once all queued data is
> sent? once queued data is reduced below some threshold?
>

from the pov of the protocol - it isn't specified. I would expect some
watermark approach.

> Do you think it would be possible to use TCP itself as the outer flow
> control?

Maybe; but probly not.

I think its a little tricky for a server (or the server side of a proxy)
to do things like 1] exhaust tcp rwin, 2] xoff all senders, 3] update
initial window via settings to prevent new senders and 4] reopen tcp
rwin a bit to process GETs and pings and goaways etc..

The client in that situation is going to have data buffered in its tcp
layer that it put there before it realized rwin was empty (every body
runs with some send buffers) that is going to go out when the window
reopens.. it could be anything.. and there is of course some state
(compression, sequence numbers, etc) to the data stream so its a hard
bell to unring. Better to just not generate it when it can't be given to
the tcp stack.

That's the advantage of a spdy centric session window - it just covers
the goodput and lets the control stuff flow as freely as tcp allows.


Costin Manolache

unread,
May 21, 2012, 3:26:24 PM5/21/12
to spdy...@googlegroups.com
Can the decision be made on the sender side instead ?

The receiver would send back info about total buffer size and any stream that has more is using more than x% of the buffer (or some other measure).  

So instead of 2 flow controls - one for stream, one per channel, you would just send more detailed info about the buffer and top stream status, and let the client chose what to send next and what to suspend. I think client or backend are better to decide if they have enough info.

Costin

Roberto Peon

unread,
May 21, 2012, 3:34:28 PM5/21/12
to spdy...@googlegroups.com
Detail about the buffer is what flow control exposes, though, right?
The client always chooses what to send and what not to send anyway. A receiver never has control over the sender, they can merely give them policy and hope the sender acts upon it.

We'll always need per-stream flow control. No avoiding that. Anything which accurately signals the number of bytes per stream which it is safe to send, and which provides a constant and predictable overall memory usage on the receiver for all streams will do.

I'm having a hard time reconciling your suggestion with this. If we do the per-stream flow control, and can't do 2 levels of flow control, what data would we send which wouldn't BE flow control that would allow the server to be assured (with a well behaved client) to use no more buffer than it wishes to use?

-=R

Costin Manolache

unread,
May 21, 2012, 4:57:01 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 12:34 PM, Roberto Peon <fe...@google.com> wrote:
Detail about the buffer is what flow control exposes, though, right?
The client always chooses what to send and what not to send anyway. A receiver never has control over the sender, they can merely give them policy and hope the sender acts upon it.

Right now the proxy controls things by setting window - I assume if a client doesn't obey flow control the server will close the connection, it's not quite 'client decide what to send'.

Yes, the details (inputs) would be the same for any solution - max buffer of the receiver, current window for each stream, how much of the total and per-stream buffer is used.

My suggestion is to send all this data to client - instead of sending info about how much each stream is allowed to send, send data about how much of each stream is buffered and the total buffer size.

For example: the proxy has a 1M buffer per connection. Client can choose to send 900K of one stream, than wait for the flow control frame, and keep sending high-priority or small requests. If the large upload doesn't make any progress in 10 min, it can chose to cancel it, or not send any more bytes until it makes progress. 

Benefits:

- A smart client will have more choices on what it sends - in particular it may chose to drop some streams. I think that's the main difference - if few  large uploads gets stale for a long time, there is nothing in the current per-stream / per-channel flow control that would help, the only help is to drop the stale upload.

- a dumb client doesn't need to do anything - can just ignore the flow control and the TCP flow control will do its job.

- the server doesn't have to send data for all streams - only for the streams that use large buffers or are stalled ( fewer bytes ).

- the initial per stream window is close to the total buffer size of the proxy ( instead of small per stream window to account for the worse case of all streams getting buffered )

- the code may be simpler on the server, and for equivalent functionality it may be simple on the client.




We'll always need per-stream flow control. No avoiding that. Anything which accurately signals the number of bytes per stream which it is safe to send, and which provides a constant and predictable overall memory usage on the receiver for all streams will do.

Yes, you need to know the buffering status per-stream - maybe not for all streams, and it doesn't have to be 'per stream flow control'. You just send info about the buffer usage on the proxy, client can make any choice as long as it respects total buffer size.


 

I'm having a hard time reconciling your suggestion with this. If we do the per-stream flow control, and can't do 2 levels of flow control, what data would we send which wouldn't BE flow control that would allow the server to be assured (with a well behaved client) to use no more buffer than it wishes to use?

Server will send total buffer size (once), than periodically will send stream status info. For example any time it would send a 'window update' now, or only if the buffer use is > 50%. 

The data is about the same with the current flow control - how many bytes can be sent by the client safely (now) versus how many bytes are currently buffered (in my proposal). The frequency can be the same - or you can send flow info only when you need to (now window updates are required for each stream, or client will be forced to stop sending). The decision on when to allow sending is similar - except it's done by client using the info from server, instead of server. 

Costin

Costin Manolache

unread,
May 21, 2012, 5:06:05 PM5/21/12
to spdy...@googlegroups.com
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

- client should also detect if some streams are 'stalled' - no progress for X minutes, too much data buffered - and may time-out them so it can send other streams.

It's still a combination of connection and stream flow control.

Costin

William Chan (陈智昌)

unread,
May 21, 2012, 5:09:15 PM5/21/12
to spdy...@googlegroups.com
This doesn't make sense to me. You say "remove the current per stream window" and then you say "it's still a combination of connection and stream flow control".

On Mon, May 21, 2012 at 2:06 PM, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

This sounds like per stream window updates. Why do you say "remove the current per stream window"?
 

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

With a per-stream window and per-session window, a sender could still do this. When space opens up in the per-session window, the sender is obviously allowed to choose which stream to send data over, assuming that per-stream window has space.

Costin Manolache

unread,
May 21, 2012, 5:41:13 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 2:09 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
This doesn't make sense to me. You say "remove the current per stream window" and then you say "it's still a combination of connection and stream flow control".

It replaces the current definition of per-stream window - right now each stream window/flow is sent individually, all streams have the same initial window, and the information sent back is how much it's allowed to sent for the stream.

It still has per-stream flow control - in the sense that you send back how much is buffered for each stream ( all or or only streams that have >x bytes or % buffered ). 
 


On Mon, May 21, 2012 at 2:06 PM, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

This sounds like per stream window updates. Why do you say "remove the current per stream window"?

The window updates will be per session. 

The 'per stream window' is replaced by info about how much is currently buffered for the stream, combined with info about total session buffer available. 


 
 

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

With a per-stream window and per-session window, a sender could still do this. When space opens up in the per-session window, the sender is obviously allowed to choose which stream to send data over, assuming that per-stream window has space.

Yes, but I guess it's a more direct decision when you know total buffer and 
how much each stream has buffered. And when you start new streams - they won't be limited by a small per-stream window, but by per-session window.

You can probably infer from the stream window how much of that stream is buffered and if the stream is stale.


Costin 

William Chan (陈智昌)

unread,
May 21, 2012, 5:58:14 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 2:41 PM, Costin Manolache <cos...@gmail.com> wrote:
On Mon, May 21, 2012 at 2:09 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
This doesn't make sense to me. You say "remove the current per stream window" and then you say "it's still a combination of connection and stream flow control".

It replaces the current definition of per-stream window - right now each stream window/flow is sent individually, all streams have the same initial window, and the information sent back is how much it's allowed to sent for the stream.

It still has per-stream flow control - in the sense that you send back how much is buffered for each stream ( all or or only streams that have >x bytes or % buffered ). 
 


On Mon, May 21, 2012 at 2:06 PM, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

This sounds like per stream window updates. Why do you say "remove the current per stream window"?

The window updates will be per session. 

The 'per stream window' is replaced by info about how much is currently buffered for the stream, combined with info about total session buffer available. 


 
 

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

With a per-stream window and per-session window, a sender could still do this. When space opens up in the per-session window, the sender is obviously allowed to choose which stream to send data over, assuming that per-stream window has space.

Yes, but I guess it's a more direct decision when you know total buffer and 
how much each stream has buffered. And when you start new streams - they won't be limited by a small per-stream window, but by per-session window.

Don't you know how much each stream has buffered via per-stream flow control windows? And the existence of a per-session window obviates the need for small initial per-stream windows. You can make them larger now. 64k is a reasonable default still (if you disagree, let's fix it then), and the SETTINGS frame lets you adjust it to a more appropriate value within one RTT for the server rwin, and zero RTTs for the client rwin.

I guess I don't understand how communicating a percentage of stream buffer size is better than a per-stream flow control window size. The latter approach lets you dynamically adjust the window size on a per-stream basis in a more elegant matter IMO.

Roberto Peon

unread,
May 21, 2012, 6:01:29 PM5/21/12
to spdy...@googlegroups.com
I think they both achieve the same goal. The server can lie about the amount of buffer it has available to accomplish any of the goals that it would have done. The question thus resolves to: which is easier to implement.
I'd guess that the current window-based way is easier and cheaper since the window-updates are delta-encodings for the state changes.

Did you think this would allow something else Costin? I readily admit I could be missing something :)

-=R

Costin Manolache

unread,
May 21, 2012, 7:17:09 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 2:58 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
On Mon, May 21, 2012 at 2:41 PM, Costin Manolache <cos...@gmail.com> wrote:
On Mon, May 21, 2012 at 2:09 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
This doesn't make sense to me. You say "remove the current per stream window" and then you say "it's still a combination of connection and stream flow control".

It replaces the current definition of per-stream window - right now each stream window/flow is sent individually, all streams have the same initial window, and the information sent back is how much it's allowed to sent for the stream.

It still has per-stream flow control - in the sense that you send back how much is buffered for each stream ( all or or only streams that have >x bytes or % buffered ). 
 


On Mon, May 21, 2012 at 2:06 PM, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

This sounds like per stream window updates. Why do you say "remove the current per stream window"?

The window updates will be per session. 

The 'per stream window' is replaced by info about how much is currently buffered for the stream, combined with info about total session buffer available. 


 
 

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

With a per-stream window and per-session window, a sender could still do this. When space opens up in the per-session window, the sender is obviously allowed to choose which stream to send data over, assuming that per-stream window has space.

Yes, but I guess it's a more direct decision when you know total buffer and 
how much each stream has buffered. And when you start new streams - they won't be limited by a small per-stream window, but by per-session window.

Don't you know how much each stream has buffered via per-stream flow control windows? And the existence of a per-session window obviates the need for small initial per-stream windows. You can make them larger now. 64k is a reasonable default still (if you disagree, let's fix it then), and the SETTINGS frame lets you adjust it to a more appropriate value within one RTT for the server rwin, and zero RTTs for the client rwin.

Would 64K be the default for both session and stream windows ?

I guess my proposal is almost equivalent - if you know the stream window and client window you can calculate how much is buffered. After you fill the session window you need to wait for session and stream window updates in both cases.


I guess I don't understand how communicating a percentage of stream buffer size is better than a per-stream flow control window size. The latter approach lets you dynamically adjust the window size on a per-stream basis in a more elegant matter IMO.

I think it's more direct to tell the client the relevant information - how big is the proxy buffer and how much of each stream is buffered. 

The 'percentage' was an optimization - you don't need to send flow control for streams that go trough / have very low buffers, or if the proxy buffer has plenty of space. The per-stream flow will only kick in when it's needed.
You can still do delta-encoding for stream and session. 


What would be the meaning of 'per stream window' - it's no longer amount you are allowed to send for that stream - you need to consider the session window and all other streams windows. It's this computation that I think would be cleaner if you work in reverse, with how much is buffered instead of the 'window' which is no longer a direct indication of how much to send.

But if you can find a good explanation of the stream window and how it interact with the session window it would be less confusing.

Costin

Costin Manolache

unread,
May 21, 2012, 7:25:35 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 3:01 PM, Roberto Peon <fe...@google.com> wrote:
I think they both achieve the same goal. The server can lie about the amount of buffer it has available to accomplish any of the goals that it would have done. The question thus resolves to: which is easier to implement.

Agreed.
 
I'd guess that the current window-based way is easier and cheaper since the window-updates are delta-encodings for the state changes.

I think a 'dumb client' would be simpler with my proposal - it would just ignore all flow control packets. The TCP flow control would ensure the 'per session' buffer is used. 

In both cases a smart client needs to keep track of how much they can send on each stream - based on session window and stream window, which in both cases is a proxy for how fast that stream is and how much is buffered. 

In the current flow,the information is received as one packet per stream, when the stream is making progress, plus one new packet for session.
In my proposal - the almost same information will be sent in a single packet, which will only be sent if the session buffer is to big. 

In the good case - all streams floating around - my proposal wouldn't send any stream flow control packet, only the session info.

Costin

William Chan (陈智昌)

unread,
May 21, 2012, 7:35:13 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 4:17 PM, Costin Manolache <cos...@gmail.com> wrote:


On Mon, May 21, 2012 at 2:58 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
On Mon, May 21, 2012 at 2:41 PM, Costin Manolache <cos...@gmail.com> wrote:
On Mon, May 21, 2012 at 2:09 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
This doesn't make sense to me. You say "remove the current per stream window" and then you say "it's still a combination of connection and stream flow control".

It replaces the current definition of per-stream window - right now each stream window/flow is sent individually, all streams have the same initial window, and the information sent back is how much it's allowed to sent for the stream.

It still has per-stream flow control - in the sense that you send back how much is buffered for each stream ( all or or only streams that have >x bytes or % buffered ). 
 


On Mon, May 21, 2012 at 2:06 PM, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

This sounds like per stream window updates. Why do you say "remove the current per stream window"?

The window updates will be per session. 

The 'per stream window' is replaced by info about how much is currently buffered for the stream, combined with info about total session buffer available. 


 
 

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

With a per-stream window and per-session window, a sender could still do this. When space opens up in the per-session window, the sender is obviously allowed to choose which stream to send data over, assuming that per-stream window has space.

Yes, but I guess it's a more direct decision when you know total buffer and 
how much each stream has buffered. And when you start new streams - they won't be limited by a small per-stream window, but by per-session window.

Don't you know how much each stream has buffered via per-stream flow control windows? And the existence of a per-session window obviates the need for small initial per-stream windows. You can make them larger now. 64k is a reasonable default still (if you disagree, let's fix it then), and the SETTINGS frame lets you adjust it to a more appropriate value within one RTT for the server rwin, and zero RTTs for the client rwin.

Would 64K be the default for both session and stream windows ?

I suspect we can make the session window larger than 64K. I'm open to suggestions here.
 

I guess my proposal is almost equivalent - if you know the stream window and client window you can calculate how much is buffered. After you fill the session window you need to wait for session and stream window updates in both cases.


I guess I don't understand how communicating a percentage of stream buffer size is better than a per-stream flow control window size. The latter approach lets you dynamically adjust the window size on a per-stream basis in a more elegant matter IMO.

I think it's more direct to tell the client the relevant information - how big is the proxy buffer and how much of each stream is buffered. 

The 'percentage' was an optimization - you don't need to send flow control for streams that go trough / have very low buffers, or if the proxy buffer has plenty of space. The per-stream flow will only kick in when it's needed.
You can still do delta-encoding for stream and session. 


What would be the meaning of 'per stream window' - it's no longer amount you are allowed to send for that stream - you need to consider the session window and all other streams windows. It's this computation that I think would be cleaner if you work in reverse, with how much is buffered instead of the 'window' which is no longer a direct indication of how much to send.

But if you can find a good explanation of the stream window and how it interact with the session window it would be less confusing.

I guess I'm not sure what's confusing about the interaction with the stream window and the session window. When writing stream data in a naive implementation, the amount you'd write is amount_to_write = min(stream_data_length, stream_window_size, session_window_size). Then stream_window_size -= amount_to_write and session_window_size -= amount_to_write. A more advanced implementation may examine the remaining session window size to see how much space is left and the stream priority, and based on info like that, may opt to write less (or zero) stream data if the session window is small and/or the stream priority is low.

If we wanted to allow dynamically resizing the session window size apart from per stream windows, then we'd have to introduce a separate SESSION_WINDOW_UPDATE or something. Or add a new field to the WINDOW_UPDATE frame for the amount to adjust the session window.

Hasan Khalil

unread,
May 21, 2012, 7:47:58 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 7:35 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
If we wanted to allow dynamically resizing the session window size apart from per stream windows, then we'd have to introduce a separate SESSION_WINDOW_UPDATE or something. Or add a new field to the WINDOW_UPDATE frame for the amount to adjust the session window.

Or neither: send a WINDOW_UPDATE for stream 0.

    -Hasan

Costin Manolache

unread,
May 21, 2012, 10:03:24 PM5/21/12
to spdy...@googlegroups.com
On Mon, May 21, 2012 at 4:35 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
On Mon, May 21, 2012 at 4:17 PM, Costin Manolache <cos...@gmail.com> wrote:


On Mon, May 21, 2012 at 2:58 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
On Mon, May 21, 2012 at 2:41 PM, Costin Manolache <cos...@gmail.com> wrote:
On Mon, May 21, 2012 at 2:09 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
This doesn't make sense to me. You say "remove the current per stream window" and then you say "it's still a combination of connection and stream flow control".

It replaces the current definition of per-stream window - right now each stream window/flow is sent individually, all streams have the same initial window, and the information sent back is how much it's allowed to sent for the stream.

It still has per-stream flow control - in the sense that you send back how much is buffered for each stream ( all or or only streams that have >x bytes or % buffered ). 
 


On Mon, May 21, 2012 at 2:06 PM, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is:

- remove the current per stream window

- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

- server will send flow control packets with a list of stream IDs and how much bytes are buffered for each stream. There are few options on which streams to include in the flow control packet, and when to send this packet.

This sounds like per stream window updates. Why do you say "remove the current per stream window"?

The window updates will be per session. 

The 'per stream window' is replaced by info about how much is currently buffered for the stream, combined with info about total session buffer available. 


 
 

- client will not send more than [total buffer] until it receives a flow control packet with a stream-based window update. When it receives a flow control packet - it can chose what stream to send based on free space, priorities, etc - as long as total in-flight bytes fit total buffer.

With a per-stream window and per-session window, a sender could still do this. When space opens up in the per-session window, the sender is obviously allowed to choose which stream to send data over, assuming that per-stream window has space.

Yes, but I guess it's a more direct decision when you know total buffer and 
how much each stream has buffered. And when you start new streams - they won't be limited by a small per-stream window, but by per-session window.

Don't you know how much each stream has buffered via per-stream flow control windows? And the existence of a per-session window obviates the need for small initial per-stream windows. You can make them larger now. 64k is a reasonable default still (if you disagree, let's fix it then), and the SETTINGS frame lets you adjust it to a more appropriate value within one RTT for the server rwin, and zero RTTs for the client rwin.

Would 64K be the default for both session and stream windows ?

I suspect we can make the session window larger than 64K. I'm open to suggestions here.

Any reason to have the stream window smaller than the session window ? 
Can't think of any use case. 

 
 

I guess my proposal is almost equivalent - if you know the stream window and client window you can calculate how much is buffered. After you fill the session window you need to wait for session and stream window updates in both cases.


I guess I don't understand how communicating a percentage of stream buffer size is better than a per-stream flow control window size. The latter approach lets you dynamically adjust the window size on a per-stream basis in a more elegant matter IMO.

I think it's more direct to tell the client the relevant information - how big is the proxy buffer and how much of each stream is buffered. 

The 'percentage' was an optimization - you don't need to send flow control for streams that go trough / have very low buffers, or if the proxy buffer has plenty of space. The per-stream flow will only kick in when it's needed.
You can still do delta-encoding for stream and session. 


What would be the meaning of 'per stream window' - it's no longer amount you are allowed to send for that stream - you need to consider the session window and all other streams windows. It's this computation that I think would be cleaner if you work in reverse, with how much is buffered instead of the 'window' which is no longer a direct indication of how much to send.

But if you can find a good explanation of the stream window and how it interact with the session window it would be less confusing.

I guess I'm not sure what's confusing about the interaction with the stream window and the session window. When writing stream data in a naive implementation, the amount you'd write is amount_to_write = min(stream_data_length, stream_window_size, session_window_size).

Then stream_window_size -= amount_to_write and session_window_size -= amount_to_write. A more advanced implementation may examine the remaining session window size to see how much space is left and the stream priority, and based on info like that, may opt to write less (or zero) stream data if the session window is small and/or the stream priority is low.

A non-naive implementation would look at all outgoing streams, sorted by priority, and attempt to divide the stream_window_size somehow - maybe give priority to SYN_STREAM packets, etc.

And it may consider which streams are 'stale' == no change in window size, i.e. the server is stuck and proxy needs to cache it.  


 

If we wanted to allow dynamically resizing the session window size apart from per stream windows, then we'd have to introduce a separate SESSION_WINDOW_UPDATE or something. Or add a new field to the WINDOW_UPDATE frame for the amount to adjust the session window.

The use case would be a proxy server that may cache more if the load is low. 

But the main issue (IMHO) is how to chose the amount to send for each stream, based on stream window, priority and remaining buffer space. 
The proxy buffer can be considered fixed size when the decision is made, the 'stream window' is a proxy for how much of that stream is buffered - and that indicates how likely that stream is to clog the pipe. 

I don't think the 'naive' implementation will work so well if you have many streams going slowly ( few uploads plus some new requests ), unless you somehow reduce the window for each of the slow streams - but that won't work for the initial window size. So you need to reduce the initial window.

Costin 

William Chan (陈智昌)

unread,
May 21, 2012, 10:47:37 PM5/21/12
to spdy...@googlegroups.com
Maybe you missed my email here: https://groups.google.com/forum/#!msg/spdy-dev/JB_aQPNI7rw/3Kata9-QeI0J. If the peer is naive, then it may be advantageous to have a smaller per-stream win.
 

 
 

I guess my proposal is almost equivalent - if you know the stream window and client window you can calculate how much is buffered. After you fill the session window you need to wait for session and stream window updates in both cases.


I guess I don't understand how communicating a percentage of stream buffer size is better than a per-stream flow control window size. The latter approach lets you dynamically adjust the window size on a per-stream basis in a more elegant matter IMO.

I think it's more direct to tell the client the relevant information - how big is the proxy buffer and how much of each stream is buffered. 

The 'percentage' was an optimization - you don't need to send flow control for streams that go trough / have very low buffers, or if the proxy buffer has plenty of space. The per-stream flow will only kick in when it's needed.
You can still do delta-encoding for stream and session. 


What would be the meaning of 'per stream window' - it's no longer amount you are allowed to send for that stream - you need to consider the session window and all other streams windows. It's this computation that I think would be cleaner if you work in reverse, with how much is buffered instead of the 'window' which is no longer a direct indication of how much to send.

But if you can find a good explanation of the stream window and how it interact with the session window it would be less confusing.

I guess I'm not sure what's confusing about the interaction with the stream window and the session window. When writing stream data in a naive implementation, the amount you'd write is amount_to_write = min(stream_data_length, stream_window_size, session_window_size).

Then stream_window_size -= amount_to_write and session_window_size -= amount_to_write. A more advanced implementation may examine the remaining session window size to see how much space is left and the stream priority, and based on info like that, may opt to write less (or zero) stream data if the session window is small and/or the stream priority is low.

A non-naive implementation would look at all outgoing streams, sorted by priority, and attempt to divide the stream_window_size somehow - maybe give priority to SYN_STREAM packets, etc.

SYN_STREAM frames do not have any data payload, only header payload.
 

And it may consider which streams are 'stale' == no change in window size, i.e. the server is stuck and proxy needs to cache it.  


 

If we wanted to allow dynamically resizing the session window size apart from per stream windows, then we'd have to introduce a separate SESSION_WINDOW_UPDATE or something. Or add a new field to the WINDOW_UPDATE frame for the amount to adjust the session window.

The use case would be a proxy server that may cache more if the load is low. 

But the main issue (IMHO) is how to chose the amount to send for each stream, based on stream window, priority and remaining buffer space. 
The proxy buffer can be considered fixed size when the decision is made, the 'stream window' is a proxy for how much of that stream is buffered - and that indicates how likely that stream is to clog the pipe. 

Yes, this is an interesting implementation discussion. And it's also interesting to discuss since we know that not all implementations will be equal. As I suggested in my earlier earlier, there's a tradeoff between giving a peer control to do more optimal and horrible things, and taking away that control to prevent stupid shit, yet also prevent a smarter peer implementation. It's an interesting discussion to have about what / how powerful of knobs to provide.
 

I don't think the 'naive' implementation will work so well if you have many streams going slowly ( few uploads plus some new requests ), unless you somehow reduce the window for each of the slow streams - but that won't work for the initial window size. So you need to reduce the initial window.

I don't grok this statement. Can you rephrase?

Greg Wilkins

unread,
May 22, 2012, 3:41:56 AM5/22/12
to spdy...@googlegroups.com


On 21 May 2012 23:06, Costin Manolache <cos...@gmail.com> wrote:
If I'm too confusing, my proposal is
- remove the current per stream window
- server will declare it's total buffer  ( with a reasonable default - 64k or 1M), and use a per-connection window.

I'm now concerned about any approach that has a window size - be it session or stream based.   Any window size is a guess that will mostly be wrong.  Either the guess will be too small, in which case we can't use all the capacity of a tcp connection and we create the incentive to use multiple connections (in effect getting multiple windows).  If the guess is two large, then we will be able to go fast enough to trigger TCP flow control, which may have some fairness issues.

Instead, I think a simple xon/xoff approach as suggested by Patrick merits at least some further consideration (although I understand he suggested it in conjunction with a session window) to see if it can avoid creating a window guess.

If we say that any component along a connection is allowed to send an xoff if it detects congestion then I believe that we can allow the connection to run at full speed and significantly reduce the impact of (but not avoid completely) HOL blocking.   The problem is that by the time an Xoff is received and processed, TCP flow control may already have been invoked.  Worse still, the transmission of the Xoff may itself be delayed by heavy traffic going the other way.

However, I content that even with a window, this is a situation that can occur because a window guess may always be larger than the available network capacity (and large guesses will be favoured to attempt to use all of a connections capacity).  So even with a window, tcp flow control can kick in when a small percentage of the window has been sent.    I'd much rather a situation where the sender detects that tcp flow control is taking place and issues some local Xoffs rather than steadfastly plugging on trying to send the remaining quota it has in its window (and thus causing HOL blocking anyway).

The upside of xon/xoff is that a xoff can be generated anywhere along the connection.   The sender itself can generate local xoffs to its producers if it detects tcp flow control on it's writes. A proxy can send xoffs if its buffers are full or a receiver can xoff if it's consumers are too slow.   xoffs will not prevent tcp flow control, but they will ensure that it is high priority or low usage streams that get to write first once the tcp connection starts flowing again.   So there will be some HOL blocking, but only for short periods.  Windowing does not prevent this situation because a window can always be larger than the available network capacity.

thoughts?
 





Greg Wilkins

unread,
May 22, 2012, 3:51:43 AM5/22/12
to spdy...@googlegroups.com
Also while we are considering windows, it is worth while reading Mikes paper on slow starts in TCP  http://dev.chromium.org/spdy/An_Argument_For_Changing_TCP_Slow_Start.pdf?attredirects=0
Surely if we put windows into SPDY, then we are just recreating that same problem? How big to make the initial windows and creating an incentive to use multiple sessions in order to get multiple initial windows.

cheers

Greg Wilkins

unread,
May 22, 2012, 5:56:23 AM5/22/12
to spdy...@googlegroups.com

Some more reading of interest is http://www.ietf.org/rfc/rfc4254.txt.  This is the SSH connection protocol that supports multiplexed channels using individual window sizes - more or less what SPDY/3 has.  It also has an optimisation to support xon/xoff of individual channels.

There is also a paper about how these window sizes can cause significant performance problems and how dynamically resizing the windows can improve throughput
http://www.psc.edu/networking/projects/hpn-ssh/papers/hpnssh-postertext.pdf

cheers

Daniel Stenberg

unread,
May 22, 2012, 7:53:02 AM5/22/12
to spdy...@googlegroups.com
On Tue, 22 May 2012, Greg Wilkins wrote:

> Some more reading of interest is http://www.ietf.org/rfc/rfc4254.txt. This
> is the SSH connection protocol that supports multiplexed channels using
> individual window sizes - more or less what SPDY/3 has. It also has an
> optimisation to support xon/xoff of individual channels.
>
> There is also a paper about how these window sizes can cause significant
> performance problems and how dynamically resizing the windows can improve
> throughput
> http://www.psc.edu/networking/projects/hpn-ssh/papers/hpnssh-postertext.pdf

Yes, SSH does indeed have more or less the exact same windowing that SPDY has,
which in turn seems to mimic what TCP itself offers.

I'm the primary developer of libssh2 which is the SSH library used for SCP and
SFTP transfers with for example curl and I just felt I should add two minor
details when comparing SPDY to SSH and reading the PDF mentioned above. In
general SSH and SPDY have a lot in common: secure from the start, multiple
channels over a single physical, flow control per stream etc.

1 - Lots of people use SFTP for file transfers over SSH. But SFTP has its own
slew of added complexity that makes it a bad comparison. A good comparison
between SPDY and SSH needs to be based on plain SSH channels, such as an SCP
transfer.

2 - The PDF above makes specific remarks and improvement recommendations based
on internal design decisions and source code in OpenSSH. They are not strictly
enforced by the SSH protocol and other implementations will have different
algorithms to adjust the window size.

--

/ daniel.haxx.se

Patrick McManus

unread,
May 22, 2012, 8:34:02 AM5/22/12
to spdy...@googlegroups.com
On Mon, 2012-05-21 at 19:47 -0700, William Chan (陈智昌) wrote:

>
>
> Any reason to have the stream window smaller than the session
> window ?
> Can't think of any use case.
>
>
> Maybe you missed my email here: https://groups.google.com/forum/#!
> msg/spdy-dev/JB_aQPNI7rw/3Kata9-QeI0J. If the peer is naive, then it
> may be advantageous to have a smaller per-stream win.
>

the concern I have with that approach is that it addresses an
algorithmic shortcoming of the sender via a configuration on the
receiver. If the sender improves its algorithm it doesn't help anything
because it can't unilaterally change the window advertised by the
receiver and we are painted into a corner. As you say - the sender isn't
obligated to use the whole window just because its there... if the
implementor of that sender thinks restraint is the right thing to do -
then go for it. If another implementor thinks filling high BDP pipes is
the most important thing, more power to them!



Greg Wilkins

unread,
May 22, 2012, 8:51:35 AM5/22/12
to spdy...@googlegroups.com
On 22 May 2012 13:53, Daniel Stenberg <dan...@haxx.se> wrote:
2 - The PDF above makes specific remarks and improvement recommendations based on internal design decisions and source code in OpenSSH. They are not strictly enforced by the SSH protocol and other implementations will have different algorithms to adjust the window size.

Daniel,

In your experience with ssh would you say that all implementations that want high throughput have opted for dynamic window sizes?

Does the protocol itself specify as default assumed window size or do all implementations have to advertise the window size they want during handshake?

cheers

Daniel Stenberg

unread,
May 22, 2012, 9:30:46 AM5/22/12
to spdy...@googlegroups.com
On Tue, 22 May 2012, Greg Wilkins wrote:

> In your experience with ssh would you say that all implementations that want
> high throughput have opted for dynamic window sizes?

I'm afraid I don't think I have enough data to say that with a definite
certainty, as I haven't paid enough attention to those specific details and
perhaps also because OpenSSH is such a dominant implementation in use.

We have achieved much higher throughput over SSH by advertising very large
windows (at least in the hundreds of Kbyte) and we've always done dynamic
sizing.

But what I think makes my primary use of SSH not so good to compare with SPDY
and the way we're talking about the flow control here is that I'm primarily
author a single SSH consumer which mainly uses a single channel. I don't have
that much experience with having multiple consumers of different channels,
each performing differently, and I haven't heard a lot from such users either.

> Does the protocol itself specify as default assumed window size or do all
> implementations have to advertise the window size they want during
> handshake?

When you create a new channel (stream) in SSH you advertise your initial
window size. (See RFC4254 section 5.1)

--

/ daniel.haxx.se

Costin Manolache

unread,
May 22, 2012, 11:10:57 AM5/22/12
to spdy...@googlegroups.com
On Tue, May 22, 2012 at 4:53 AM, Daniel Stenberg <dan...@haxx.se> wrote:
On Tue, 22 May 2012, Greg Wilkins wrote:

Some more reading of interest is http://www.ietf.org/rfc/rfc4254.txt.  This is the SSH connection protocol that supports multiplexed channels using individual window sizes - more or less what SPDY/3 has.  It also has an optimisation to support xon/xoff of individual channels.

There is also a paper about how these window sizes can cause significant
performance problems and how dynamically resizing the windows can improve
throughput
http://www.psc.edu/networking/projects/hpn-ssh/papers/hpnssh-postertext.pdf

Yes, SSH does indeed have more or less the exact same windowing that SPDY has, which in turn seems to mimic what TCP itself offers.

TCP has not only flow control windows, but also re-transmit, a router can drop IP packets if they're stuck. I don't think SPDY can work as well as TCP does with only flow control. 

My proposal was to provide as much information as possible to the sender about the status of the connection and buffering. If a sender can detect that one of the streams is 'stuck' and clogging the buffers, and it has a higher priority stream to send - it can abort the slow stream.

Typical example will be uploads - in many cases ( with range, etc ) it's safe to interrupt an upload and restart it later. If you have a large sync/upload operation in background you want it to go as fast as possible but not interrupt normal browsing. 

The sender has a list of streams with different priorities, with data waiting to be sent. What information should it have to make the best decision about how much of each to send, or which stream to abort ?  

My understanding of 'stream window' was that it's an indication of how much to send from the stream (without clogging the pipes for other streams). Pretty easy, but if you add 'session flow control' it gets quite complicated, the stream window no longer means that you can send that much data.


Costin

William Chan (陈智昌)

unread,
May 22, 2012, 11:36:39 AM5/22/12
to spdy...@googlegroups.com
This is a very valid concern! Just to be clear, I was only pointing out a use case for a small per-stream window. FWIW, my personal slant is also not to artificially constrain the sender, but I'd like to hear more opinions on that.
 

Danner Stodolsky

unread,
May 22, 2012, 10:19:11 PM5/22/12
to spdy...@googlegroups.com
For the multiplexed control situations we've been discussing, I've always been partial systems with a per-stream minimum reservation + a shareable burst that the sender can partition. The burst-size is sized by the receiver based on BDP estimate and/or resource constraints and should be both growable and shrinkable. The per-stream minimum reservation fits well with the current window settings mechanism. Burst space is returned to sender via an window-update type frame.

By both limiting the number of streams and the burst size, the reciever can bound memory needs. 

As a concrete example: A SPDY server supports 100 concurrent streams per connection, with a per-stream window of 8K and a burst of 1MB. This bounds memory usage at 1M * 100*8K = 1.8MB while allowing any individual stream to quickly ramp to up to 1MB, if needed. 

Obviously, the sender can use feedback from tcp flow control to try to avoid buffer bloat, though many on this thread have commented on the complexity of bandwidth-delay estimation.

Greg Wilkins

unread,
May 29, 2012, 10:46:18 AM5/29/12
to spdy...@googlegroups.com
Simone and I have been debating offline various flow control algorithms and have not reached any significant conclusions. However I have come up with some formalisation of how we can rate various flow control proposals.

IMNSHO the primary factors that should rate flow control algorithms  are:

1.1 Uses full all available bandwidth
The whole point of SPDY is to make the web faster, so if there is a willing stream producer, a willing stream consumer and available capacity in the pipe between SPDY implementations, then we should allow that capacity to be used up to and possibly even slightly beyond TCP/IP flow control.

1.2 No infinite postponement of a stream
The behaviour of one stream should not be able to infinitely postpone the delivery of data on another stream. We can control the behaviour of compliant SPDY senders and receivers, but we cannot control the behaviour of stream consumers and producers (the application).  Thus we have to consider that stream producers and consumers may act maliciously (either by design, by error or by force majeure (eg DNS resolution suddenly blocks)).   This does not mean that we have to avoid congesting the TCP pipe, just so long as we know the other end (the SPDY receiver) will eventually uncongest the pipe even if a stream consumer is not reading data - ie that the SPDY receiver has sufficient buffers.

1.3 Little incentive to use multiple connections
One of the problems with HTTP that has driven the development of SPDY is to remove the incentive for clients to use multiple connections.  Any per connection limit (such as TCP/IP slow start windows) are incentives to have multiple connections.  For example if we allocate a default initial per connection window, then a client that opens 2 connections will get twice times that limit and better initial performance.  Worse still, because of TCP/IP slow start, once you open 2 connections, you'll probably open 6 or 8 to get multiple slow start window allocations as well.

1.4 No infinite buffering
When an endpoint accepts a connection and/or a stream, it must be able to know the maximum memory that it has to commit to provide in order to satisfy the specification.  If accepting a connection/stream can result in unlimited memory commitments then we are open to denial of service attacks and unpredictable application performance.

The secondary factors that we should consider are many, but I think they should include:

2.1 Complexity
Avoiding complexity is a motherhood statement.  Yes we want to be as simple as possible, but not to the point were we significantly compromise the primary factors.  At the end of they day, there is likely to be perhaps a few hundred or a few thousand SPDY implementations that will provide infrastructure for millions of developers - better we suck up some complexity rather than fail a primary objective.

2.2 Fairness
Another motherhood statement. Yes we want to be fair between streams, but fair is really subjective and difficult to quantify.  Is taking frames from streams in a round robin manner fair?  some would say yes, but others would say that a stream that has not participated in any recent rounds should have more of a share than one that has sent in every round. I think the primary concern in the protocol spec is to avoid the possibility of infinite postponement and then we can mostly leave fairness measures to be implementation details/features.

2.3 Priorities
See fairness.  If we can't work out what fair is, then it is even harder to work out what priorities mean.   I think priorities are a nice to have feature, but should not compromise the primary objectives.


The other thing I've concluded is that there is no perfect solution.  Even if the flow controlling algorithm was to ask an all knowing connection GOD if a frame could be sent or not, if that GOD gives free will to the stream consumers then GOD is fallible.  GOD might see a single stream that is flowing well and allow it to keep sending frames right up to the capacity of a fat pipe, but then just as another frame is opened the consumer of the first frame may stop reading and all the data in the fat  pipe will have to be buffered so that the new stream can send some frames.  But that buffering may consume almost all the memory reservation of the receiver, so that now GOD cannot allow the new stream to go at a full rate because the receivers buffers are almost full and he cannot risk the new stream suddenly stopping consuming like the first did.

Once we accept that an all knowing algorithm can still be fallible in the face of  streams with free will, then we just have to accept that we are looking for the best approximation of a perfect solution.

So the current 64KB window per stream actually rates pretty well on most of these concerns, except for one: 1.1 A 64KB window on a fast pipe will limit bandwidth utilisation if there is any significant latency; 1.2 A stream cannot be blocked by another stream not being consumed; 1.3 Resources are allocated per stream, so there is little incentive to use multiple connections, and with TCP slow start there is an incentive to use an already warmed up SPDY connection over a new one; 1.4 implementations know the commitment is 64KB to accept a new stream;  2.1 it's moderately simple; 2.2 so long as the sender does not create long frame queues, it can be fair in a round robin sense; 2.3 priorities could be implemented to affect round robin fairness.

So the only think we really a missing is the ability to use the full capacity of a TCP connection.  The 64KB has already been demonstrated to slow throughput .

The proposals to introduce a per connection window or burst allowance do look attractive, but my concern is that they sacrifice 1.4 to meet 1.1.  Ie any per connection limit will create an incentive to open multiple connections in order to obtain the benefits of multiple per connection allocations of resources.

Instead,  I think that we should look at a system that allows a connection to grow the per stream window if the pipe supports the extra bandwidth.  More over, that new streams created on that connection should be allocated the same grown window (perhaps adjusted down a little for the number of streams), so that they do not need to slow start and there is an incentive to open a new stream on an existing connection rather than create a new one.   Growing the initial window size does not violate 1.4 as the size is known when a stream is accepted (and perhaps can be adjusted down if resources are short).

So how can we detect if a stream window can be grown?  Sending the entire window before receiving a window update is not sufficient, as that can equally indicate a slow consumer or a fast pipe.  Perhaps sending the entire window without seeing any TCP/IP flow control is one way? ie we can grow our stream windows until we reach either a limit or we see tcp/ip flow control?    Assuming we can come up with a way to decide when to grow, then I think this style of flow control rates OK: 1.1 windows can grow to TCP capacity; 1.2 Streams cannot block other streams; 1.3 incentive to used warmed up connection 1.4 memory requirements known when accepting stream; 2.1 moderate complexity; 2.2 writer can implement fairness in frame selection 2.3; priorities can be used to influence fair frame selection.

Anyway... let me not get ahead of myself proposing solutions... what do people think for the criteria for rating flow control algorithms?

cheers











Simone Bordet

unread,
May 29, 2012, 11:38:00 AM5/29/12
to spdy...@googlegroups.com
Hi,

On Tue, May 29, 2012 at 4:46 PM, Greg Wilkins <gr...@intalio.com> wrote:
> Simone and I have been debating offline various flow control algorithms and
> have not reached any significant conclusions. However I have come up with
> some formalisation of how we can rate various flow control proposals.

I agree with the criteria.

In case of fast pipes, TCP buffers can be enlarged with kernel tuning
in order to support the fast pipe.
So, I don't see that bad that we have a per-connection window, as long
as we can tune that (and that's an implementation detail), or find a
way to communicate new values discovered dynamically (e.g. a new
SETTINGS flag) interacting with TCP congestion.
The requirement of tuning the per-connection window is equivalent (and
probably done in consequence) of tuning the TCP buffers in case of a
fast pipe.

Having such limit will limit 1.1 (which is bad), and limit 1.4 (which
is good). Can't see yet how to get both satisfied.
Bullet 1.3 can be forbidden by the spec (but then, who cares if the
HTTP spec suggests no more than 2 connections per domain ? Everybody
uses 6, so even if it's forbidden, it may be ignored ?).
But I agree that the solution should disincentive to open additional
connections.

Note that TCP can autotune itself, see net.ipv4.tcp_rmem and
net.ipv4.tcp_wmem parameters in Linux.
Perhaps we can mimic that autotuning capability, but it's becoming
scary how much we're reimplementing the TCP bits on top of SPDY. But
perhaps I am too impressionable :)

Simon
--
http://cometd.org
http://intalio.com
http://bordet.blogspot.com
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

Costin Manolache

unread,
May 29, 2012, 12:29:16 PM5/29/12
to spdy...@googlegroups.com
Very nice.

IMO the most interesting case is when a number of consumers get very slow - for a longer time ( minutes ). You must keep bound memory (1.4), but still support 1.1 and 1.2, while the slow steams are already using most of the memory.

I would add a

2.4 Don't penalize the good streams for the faults of the bad ones. 
A bad stream is a stream where consumer suddenly stops accepting bytes or slows down, on a stream that has a large window and lots of cached bytes.

What I'm trying to get is ability to drop frames in intermediaries, and have some re-transmit. It may sound like a big complexity - but there are quite a few systems doing this, in particular TCP.

Costin
 

cheers












William Chan (陈智昌)

unread,
May 29, 2012, 12:39:07 PM5/29/12
to spdy...@googlegroups.com
I'm not sure I understand what you mean by this. When you say intermediary, do you mean a transparent one? If the intermediary is actually an endpoint of the SPDY connection, then it clearly can do anything it wants (dropping frames, etc). Also, are you saying you want the client to retransmit to the intermediary? SPDY is over a "reliable" connection, so any dropped packets should already be retransmitted. Can you clarify further?
 

Costin
 

cheers













Roberto Peon

unread,
May 29, 2012, 12:44:13 PM5/29/12
to spdy...@googlegroups.com

Nice writeup! I think that is a great way to rate and consider solutions!

I think that bufferbloat (which should be a real worry) might want an explicit mention . It sorta fits into your categorizations/goals but has implications for latency jitter and measurement (and fairness, but you have a category/goal for that)

A proposed flow-control window update solution:

Whenever a stream has more data that it wishes to send right now and the tcp connection  is writable and the stream is restricted from sending by flow control, send an update for the stream indicating this to the other side.
This update could be a size update of size 0, or it could be another frame.

On the receiver side, receipt of this would indicate the sender is throttled by SPDY's flow control.

Figuring out the appropriate window for that stream would be  implementation dependent, but, as a guideline, it should probably be the maximum window size for any stream you've recently increased or 50% more, whichever is greater. Of course, window updates should be done only so long as the receiver has space in b

This is meant to be a simple guideline that will hopefully work fairly well.

If good estimates for rtt and bw are available, use that to bound the max possible send size. If the size of the tcp window is known, use that as a max upper bound.

-=R

Costin Manolache

unread,
May 29, 2012, 1:25:28 PM5/29/12
to spdy...@googlegroups.com
I mean the SPDY receiver. It can be a proxy or the final server, but it needs to buffer frames until the final endpoint (servlet, next backend) can consume them. 

In the current flow control proposals ( per stream, or per connection ) you can't drop frames, you can stop sending window updates, but you must keep in memory all frames you received until they are consumed.


Also, are you saying you want the client to retransmit to the intermediary? SPDY is over a "reliable" connection, so any dropped packets should already be retransmitted. Can you clarify further?

Yes, TCP is "reliable" and has its own flow control, including the ability to drop packets when it needs to.

SPDY frames are over TCP - so the frames can't be dropped, once you receive a frame you must keep it in (bound - 1.4) memory until the actual endpoint consumes them.

We are copying 1/2 of TCP - the flow control window, but are missing the other 1/2, dropping packets and re-transmission. IP packets are dropped because they get "lost", but also if router buffers are full or they timeout.

Costin 

 

Costin
 

cheers














William Chan (陈智昌)

unread,
May 29, 2012, 1:34:54 PM5/29/12
to spdy...@googlegroups.com
Yes.
 


Also, are you saying you want the client to retransmit to the intermediary? SPDY is over a "reliable" connection, so any dropped packets should already be retransmitted. Can you clarify further?

Yes, TCP is "reliable" and has its own flow control, including the ability to drop packets when it needs to.

SPDY frames are over TCP - so the frames can't be dropped, once you receive a frame you must keep it in (bound - 1.4) memory until the actual endpoint consumes them.

We are copying 1/2 of TCP - the flow control window, but are missing the other 1/2, dropping packets and re-transmission. IP packets are dropped because they get "lost", but also if router buffers are full or they timeout.

Why would we need to drop frames and retransmit them? The receiver is able to communicate its buffer sizes via the stream windows. If there is any problem, it should communicate it via a RST_STREAM. I do not believe we should allow dropping frames and retransmitting them individually. HTTP intermediaries do not allow for dropping payloads once the TCP layer has already acknowledged receipt, so I do not see why we need support for this within a SPDY stream.
 

Costin 

 

Costin
 

cheers















Costin Manolache

unread,
May 29, 2012, 4:44:29 PM5/29/12
to spdy...@googlegroups.com
RST_STREAM is a fine way to drop frames of stuck streams. Maybe we should just define a  status code 'slow stream using too many buffers'.

The receiver can send its buffer size - but you can still get into a state where a lot of streams are slow and use the max buffer size they are allowed to use. If the total per-connection memory is filled with slow stream at some point the good streams no longer have space.

 
I do not believe we should allow dropping frames and retransmitting them individually.

Not individual frames - but a proxy for example should be able to send a RST_STREAM or similar message indicating that a stream is too slow and is using too much of the buffer space. 

Ideally this will not be a simple RST_STREAM followed by another full upload attempt of the entire stream - you could improve by sending an indication of how much has been consumed, and the client could re-start.
 

HTTP intermediaries do not allow for dropping payloads once the TCP layer has already acknowledged receipt, so I do not see why we need support for this within a SPDY stream.

HTTP intermediaries don't multiplex - each TCP stream can be as slow as it wants without affecting other streams.  

I think documenting that RST_STREAM can be used for flow control would make us more similar with HTTP, where stuck TCP connections may be aborted if the proxy needs the memory. 

Costin

 

Costin 

 

Costin
 

cheers
















William Chan (陈智昌)

unread,
May 29, 2012, 6:43:16 PM5/29/12
to spdy...@googlegroups.com
I was surprised to see we don't have a RST_STREAM code for the server to indicate that it wants to abort it for whatever reason. I think we should add one like that, but we don't need one as specific as 'slow stream using too many buffers'. Or is INTERNAL_ERROR appropriate here? That'd be similar to a HTTP 5XX error, and it can be delivered at any point. But I guess I don't really like that, as I like INTERNAL_ERROR to indicate a bug only, and I think servers can abort a stream without it necessarily being an implementation bug.
 

The receiver can send its buffer size - but you can still get into a state where a lot of streams are slow and use the max buffer size they are allowed to use. If the total per-connection memory is filled with slow stream at some point the good streams no longer have space.

I think I see what you're saying now. If the stream is totally hung, then yes, you want to abort it so you can forcibly reclaim the memory. Fair enough. This is equivalent to the HTTP case of just closing the socket to abort.
 

 
I do not believe we should allow dropping frames and retransmitting them individually.

Not individual frames - but a proxy for example should be able to send a RST_STREAM or similar message indicating that a stream is too slow and is using too much of the buffer space. 

Ideally this will not be a simple RST_STREAM followed by another full upload attempt of the entire stream - you could improve by sending an indication of how much has been consumed, and the client could re-start.

Hm. We're no worse than HTTP if we don't support this. The question is can/should we do better? I dunno. Being conservative, I'd err on the side of maintaining the status quo. But I think a case could legitimately be made to improve on the situation. I'd be curious to hear what others thought here. This one in particular is a bit kludge since it requires a new frame type or something to pass extra information regarding the response body accepted.
 
 

HTTP intermediaries do not allow for dropping payloads once the TCP layer has already acknowledged receipt, so I do not see why we need support for this within a SPDY stream.

HTTP intermediaries don't multiplex - each TCP stream can be as slow as it wants without affecting other streams.  

My critical error was not understanding the desire to forcibly reclaim the memory for a hung stream by aborting the stream. Fair enough, that makes sense.

Greg Wilkins

unread,
May 29, 2012, 6:47:10 PM5/29/12
to spdy...@googlegroups.com


On 29 May 2012 22:44, Costin Manolache <cos...@gmail.com> wrote:

I think documenting that RST_STREAM can be used for flow control would make us more similar with HTTP, where stuck TCP connections may be aborted if the proxy needs the memory. 

We just have to be very careful about idempotent streams being reset, else we end up in the current HTTP pipeline problems.

But if the problem of idem potency can be solved, then RST is probably the only way we can reduce buffers allocated to a stuck stream.     Although it will still rely on HTTP retry semantics and user interactions.

cheers

Roberto Peon

unread,
May 29, 2012, 6:51:07 PM5/29/12
to spdy...@googlegroups.com
On Tue, May 29, 2012 at 3:47 PM, Greg Wilkins <gr...@intalio.com> wrote:


On 29 May 2012 22:44, Costin Manolache <cos...@gmail.com> wrote:

I think documenting that RST_STREAM can be used for flow control would make us more similar with HTTP, where stuck TCP connections may be aborted if the proxy needs the memory. 

We just have to be very careful about idempotent streams being reset, else we end up in the current HTTP pipeline problems.

Yes. I think that the amount of complexity that this adds is large, especially for application writers :/
 

But if the problem of idem potency can be solved, then RST is probably the only way we can reduce buffers allocated to a stuck stream.     Although it will still rely on HTTP retry semantics and user interactions.

I don't see how it could be reasonably solved, though.

The alternative is to only send what the receive-side knows is safe to receive. IF we can figure out a way to make that work (which I believe is doable), then we avoid this problem altogether.

-=R
 

Costin Manolache

unread,
May 30, 2012, 12:12:40 AM5/30/12
to spdy...@googlegroups.com
On Tue, May 29, 2012 at 3:47 PM, Greg Wilkins <gr...@intalio.com> wrote:
One relatively easy solution - used on older versions of android - is to have the client hold to the data it sent, and the proxy or server to indicate how much was consumed, with negative numbers indicating that client needs to resend some data. 

For example you want to upload a 1M file, you start sending frames up to the window size ( say 128k ), but you don't delete/GC the frames you sent until you get the next window update from server. The window update will have an extra field indicating how much was consumed.

If the stream gets very slow - the server will send a window update with a very small window, and a negative number of bytes consumed, and empty the buffers used by that stream. The client will just resend the data at the slower speed.

If the consumer is fast - server can increase the window size, and keep sending updates on how much was consumed, and the client will free the 're-transmit' memory when server indicates it was consumed. 

It sounds complicated because I'm not explaining it right :-) This actually becomes more interesting if you want to keep the streams flowing while the TCP connection is interrupted.

The cost is that both sender and receiver will need to buffer window-size of data. 

Costin  



 
cheers


Greg Wilkins

unread,
May 30, 2012, 6:36:33 AM5/30/12
to spdy...@googlegroups.com
On 30 May 2012 06:12, Costin Manolache <cos...@gmail.com> wrote:
> One relatively easy solution - used on older versions of android - is to
> have the client hold to the data it sent, and the proxy or server to
> indicate how much was consumed, with negative numbers indicating that client
> needs to resend some data.
>
> For example you want to upload a 1M file, you start sending frames up to the
> window size ( say 128k ), but you don't delete/GC the frames you sent until
> you get the next window update from server. The window update will have an
> extra field indicating how much was consumed.
>
> If the stream gets very slow - the server will send a window update with a
> very small window, and a negative number of bytes consumed, and empty the
> buffers used by that stream. The client will just resend the data at the
> slower speed.
>
> If the consumer is fast - server can increase the window size, and keep
> sending updates on how much was consumed, and the client will free the
> 're-transmit' memory when server indicates it was consumed.
>
> It sounds complicated because I'm not explaining it right :-) This actually
> becomes more interesting if you want to keep the streams flowing while the
> TCP connection is interrupted.
>
> The cost is that both sender and receiver will need to buffer window-size of
> data.


Costin,

My gut reaction is YUCK!!!! this is re implementing even more or TCP.
But then on consideration, perhaps with a bit of complexity added, it
might not be that bad.

We really only need to start buffering sent data when a streams window
has grown to be a significant percentage of the total available
buffers. So we might be able to say that we will never shrink a
stream window below 64KB, so if a stream never sends more than that
much data, then no send buffers are needed. So it is kind of a
tradeoff deal, we will let a stream use a large share of the bandwidth
and buffers, but only on the terms that it buffers its excess sends so
that we can reduce the window if need be. If we are operating in a
mode with many active streams, then hopefully no stream would need to
take such a large window, so no send buffering would be needed.

Even when send buffering is needed, in many cases the implementation
will be able to do that efficiently using the originally passed
complete message buffer to regenerate frames without the need for
additional copying.

Essentially the problem is that if we let a stream use all the
available bandwidth, then BDP logic means we need big buffers and we
need a way to take them back if that stream becomes stalled. RST is
too draconian, so resends for the data in excess of some minimum might
be a reasonable way to do that.

cheers

Simone Bordet

unread,
May 30, 2012, 6:47:18 AM5/30/12
to spdy...@googlegroups.com
Hi,

On Wed, May 30, 2012 at 6:12 AM, Costin Manolache <cos...@gmail.com> wrote:
> One relatively easy solution - used on older versions of android - is to
> have the client hold to the data it sent, and the proxy or server to
> indicate how much was consumed, with negative numbers indicating that client
> needs to resend some data.
>
> For example you want to upload a 1M file, you start sending frames up to the
> window size ( say 128k ), but you don't delete/GC the frames you sent until
> you get the next window update from server. The window update will have an
> extra field indicating how much was consumed.

You don't want to send window updates until you're sure data has being
consumed by the application.
Otherwise you're duplicating what the TCP ACK is saying, and it is of
no interest to the sender.

I think we should strive for much more simplicity here: if one stream
has occupied the whole connection window, and does not send window
updates, then it's the same situation as being TCP congested: there is
nothing you can do apart timing out that stream and reset it.

Simone Bordet

unread,
May 30, 2012, 6:53:14 AM5/30/12
to spdy...@googlegroups.com
Hi,

On Tue, May 29, 2012 at 6:44 PM, Roberto Peon <fe...@google.com> wrote:
> A proposed flow-control window update solution:
>
> Whenever a stream has more data that it wishes to send right now and the tcp
> connection  is writable and the stream is restricted from sending by flow
> control, send an update for the stream indicating this to the other side.
> This update could be a size update of size 0, or it could be another frame.
>
> On the receiver side, receipt of this would indicate the sender is throttled
> by SPDY's flow control.

What the receiver can do with this information ?

I think the work is entirely on the sender, who has to figure out the
connection window, and have some logic to redistribute that window
among the streams.

Alek Storm

unread,
May 30, 2012, 7:10:49 AM5/30/12
to spdy...@googlegroups.com
I'd like to advocate something completely different: a modified round-robin scheme that is guaranteed to use all available bandwidth, ensures that no stream is infinitely postponed, rewards streams that have not sent data recently, and allows for fine-grained prioritization/compensation for data processing rates that vary by stream at the other endpoint. First, since there will always be a per-connection window in the form of TCP flow control, we can do away with per-stream windows, the only justification for which seemed to be avoiding infinite buffering.

Each stream has both a "current window" and a "round delta". When it is a particular stream's turn in the round-robin to send data, it sends a number of bytes equal to its current window or the number of bytes sitting in its output buffer, whichever is smaller. Its current window is then decremented by however many bytes it sent. After each stream's turn, the current windows of every stream are incremented by their round deltas. In this manner, no stream can monopolize the session, and streams which send less data than their current window would allow in a given round are rewarded in the next with a larger-than-normal window. In addition, since every stream is more-or-less constantly sending data as long as its output buffer is non-empty, the connection will be saturated with data until every stream is blocked, waiting to compute more data to send.

Consider an extreme case: Five streams are currently open, but only one has been sending data for quite some time; the others are silent. Each round, it sends its current window, decrements it to zero, then increments it by its round delta five times while the others pass on the opportunity to send data, then the round repeats. Suddenly, one of the other streams begins transmitting - its current window has been accumulating for so long that it is allowed to send a large amount of data at once. This is fair, because the other stream has been sending data exclusively for some time. After its current window is exhausted, the other stream goes back to transmitting. It's also an argument for a (reasonably large) cap on the size of current windows - perhaps a multiple of its round delta.

Despite the lack of per-stream fixed windows as in spdy/3, this scheme allows receivers to control precisely how much throughput they'd like to handle for a given stream by specifying each one's round delta individually, taking into account both its priority and the estimated speed at which its buffer can be drained (how fast incoming data can be processed). There are multiple ways to implement this communication; I'm unsure of the optimal choice. Note that no matter how inefficiently the client allocates round deltas, full connection bandwidth will always be utilized.

Since there is always a per-connection window, whether in TCP or both TCP and SPDY, there will always be an incentive to circumvent it by opening multiple connections. I don't believe any proposal will be able to solve this, but the problem is mitigated by the advantage of using a pre-warmed TCP connection. This also means that infinite buffering is a red herring (I think).

What do you guys think? I'd love to hear any feedback at all.

Thanks,
Alek

P.S. In the v3 spec, WINDOW_UPDATE specifically states that receivers must buffer all control frames. Wouldn't this be a problem if the sender transmits a large SYN_STREAM or SYN_REPLY, or several large HEADERS? Why can't the window delta just specify the total number of bytes in all frames tied to a specific stream?

Mike Belshe

unread,
May 30, 2012, 11:04:51 AM5/30/12
to spdy...@googlegroups.com
On Tue, May 29, 2012 at 7:46 AM, Greg Wilkins <gr...@intalio.com> wrote:
Simone and I have been debating offline various flow control algorithms and have not reached any significant conclusions. However I have come up with some formalisation of how we can rate various flow control proposals.

IMNSHO the primary factors that should rate flow control algorithms  are:

1.1 Uses full all available bandwidth
The whole point of SPDY is to make the web faster, so if there is a willing stream producer, a willing stream consumer and available capacity in the pipe between SPDY implementations, then we should allow that capacity to be used up to and possibly even slightly beyond TCP/IP flow control.

1.2 No infinite postponement of a stream
The behaviour of one stream should not be able to infinitely postpone the delivery of data on another stream. We can control the behaviour of compliant SPDY senders and receivers, but we cannot control the behaviour of stream consumers and producers (the application).  Thus we have to consider that stream producers and consumers may act maliciously (either by design, by error or by force majeure (eg DNS resolution suddenly blocks)).   This does not mean that we have to avoid congesting the TCP pipe, just so long as we know the other end (the SPDY receiver) will eventually uncongest the pipe even if a stream consumer is not reading data - ie that the SPDY receiver has sufficient buffers.

1.3 Little incentive to use multiple connections
One of the problems with HTTP that has driven the development of SPDY is to remove the incentive for clients to use multiple connections.  Any per connection limit (such as TCP/IP slow start windows) are incentives to have multiple connections.  For example if we allocate a default initial per connection window, then a client that opens 2 connections will get twice times that limit and better initial performance.  Worse still, because of TCP/IP slow start, once you open 2 connections, you'll probably open 6 or 8 to get multiple slow start window allocations as well.

1.4 No infinite buffering
When an endpoint accepts a connection and/or a stream, it must be able to know the maximum memory that it has to commit to provide in order to satisfy the specification.  If accepting a connection/stream can result in unlimited memory commitments then we are open to denial of service attacks and unpredictable application performance.

I think the desire is more specific "no infinite buffering without requiring a proxy to close individual streams".  If you're willing to close the mal-behaving stream, the solution is much easier.


The secondary factors that we should consider are many, but I think they should include:

2.1 Complexity
Avoiding complexity is a motherhood statement.  Yes we want to be as simple as possible, but not to the point were we significantly compromise the primary factors.  At the end of they day, there is likely to be perhaps a few hundred or a few thousand SPDY implementations that will provide infrastructure for millions of developers - better we suck up some complexity rather than fail a primary objective.

2.2 Fairness
Another motherhood statement. Yes we want to be fair between streams, but fair is really subjective and difficult to quantify.  Is taking frames from streams in a round robin manner fair?  some would say yes, but others would say that a stream that has not participated in any recent rounds should have more of a share than one that has sent in every round. I think the primary concern in the protocol spec is to avoid the possibility of infinite postponement and then we can mostly leave fairness measures to be implementation details/features.

2.3 Priorities
See fairness.  If we can't work out what fair is, then it is even harder to work out what priorities mean.   I think priorities are a nice to have feature, but should not compromise the primary objectives.


The other thing I've concluded is that there is no perfect solution.  Even if the flow controlling algorithm was to ask an all knowing connection GOD if a frame could be sent or not, if that GOD gives free will to the stream consumers then GOD is fallible.  GOD might see a single stream that is flowing well and allow it to keep sending frames right up to the capacity of a fat pipe, but then just as another frame is opened the consumer of the first frame may stop reading and all the data in the fat  pipe will have to be buffered so that the new stream can send some frames.  But that buffering may consume almost all the memory reservation of the receiver, so that now GOD cannot allow the new stream to go at a full rate because the receivers buffers are almost full and he cannot risk the new stream suddenly stopping consuming like the first did.

Once we accept that an all knowing algorithm can still be fallible in the face of  streams with free will, then we just have to accept that we are looking for the best approximation of a perfect solution.

So the current 64KB window per stream actually rates pretty well on most of these concerns, except for one: 1.1 A 64KB window on a fast pipe will limit bandwidth utilisation if there is any significant latency; 1.2 A stream cannot be blocked by another stream not being consumed; 1.3 Resources are allocated per stream, so there is little incentive to use multiple connections, and with TCP slow start there is an incentive to use an already warmed up SPDY connection over a new one; 1.4 implementations know the commitment is 64KB to accept a new stream;  2.1 it's moderately simple; 2.2 so long as the sender does not create long frame queues, it can be fair in a round robin sense; 2.3 priorities could be implemented to affect round robin fairness.

So the only think we really a missing is the ability to use the full capacity of a TCP connection.  The 64KB has already been demonstrated to slow throughput .

The proposals to introduce a per connection window or burst allowance do look attractive, but my concern is that they sacrifice 1.4 to meet 1.1.  Ie any per connection limit will create an incentive to open multiple connections in order to obtain the benefits of multiple per connection allocations of resources.

The current flow control is simply a tradeoff.  It trades 1.1 to have 1.4.  (personally, I think this is a poor trade)
 

Instead,  I think that we should look at a system that allows a connection to grow the per stream window if the pipe supports the extra bandwidth.  More over, that new streams created on that connection should be allocated the same grown window (perhaps adjusted down a little for the number of streams), so that they do not need to slow start and there is an incentive to open a new stream on an existing connection rather than create a new one.   Growing the initial window size does not violate 1.4 as the size is known when a stream is accepted (and perhaps can be adjusted down if resources are short).

So how can we detect if a stream window can be grown?  Sending the entire window before receiving a window update is not sufficient, as that can equally indicate a slow consumer or a fast pipe.  Perhaps sending the entire window without seeing any TCP/IP flow control is one way? ie we can grow our stream windows until we reach either a limit or we see tcp/ip flow control?    Assuming we can come up with a way to decide when to grow, then I think this style of flow control rates OK: 1.1 windows can grow to TCP capacity; 1.2 Streams cannot block other streams; 1.3 incentive to used warmed up connection 1.4 memory requirements known when accepting stream; 2.1 moderate complexity; 2.2 writer can implement fairness in frame selection 2.3; priorities can be used to influence fair frame selection.

Anyway... let me not get ahead of myself proposing solutions... what do people think for the criteria for rating flow control algorithms?

I think you've nailed the desires pretty well.  All solutions will be tradeoffs against these possible features.

Mike


 

cheers












Costin Manolache

unread,
May 30, 2012, 12:22:41 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 3:47 AM, Simone Bordet <sbo...@intalio.com> wrote:
Hi,

On Wed, May 30, 2012 at 6:12 AM, Costin Manolache <cos...@gmail.com> wrote:
> One relatively easy solution - used on older versions of android - is to
> have the client hold to the data it sent, and the proxy or server to
> indicate how much was consumed, with negative numbers indicating that client
> needs to resend some data.
>
> For example you want to upload a 1M file, you start sending frames up to the
> window size ( say 128k ), but you don't delete/GC the frames you sent until
> you get the next window update from server. The window update will have an
> extra field indicating how much was consumed.

You don't want to send window updates until you're sure data has being
consumed by the application.

Why ? Window updates indicate there is space for more bytes in the buffers, not that the bytes have been consumed.

 
Otherwise you're duplicating what the TCP ACK is saying, and it is of
no interest to the sender.

The problem is duplicating only part of TCP flow control. ACK is a part of flow control, just like the window update. TCP relies on packet drops and ACKs to determine what to send and how fast.

Even in HTTP the sender has to be able to deal with drops and re-transmits. The status code is a form of ACK, and plenty of problems have been caused by not dealing properly with drops and retries in http.

Bufferbloat is mentioned quite a bit - maybe we should look at ECN ( congestion notification ), which is the alternative to ACK and dropping packets.  

SPDY duplicates stuff from lower layers - multiplexing, a part of flow control.  It's likely to duplicate some of the problems and make other worse by not duplicating enough :-)


Costin

Roberto Peon

unread,
May 30, 2012, 1:43:57 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 9:22 AM, Costin Manolache <cos...@gmail.com> wrote:


On Wed, May 30, 2012 at 3:47 AM, Simone Bordet <sbo...@intalio.com> wrote:
Hi,

On Wed, May 30, 2012 at 6:12 AM, Costin Manolache <cos...@gmail.com> wrote:
> One relatively easy solution - used on older versions of android - is to
> have the client hold to the data it sent, and the proxy or server to
> indicate how much was consumed, with negative numbers indicating that client
> needs to resend some data.
>
> For example you want to upload a 1M file, you start sending frames up to the
> window size ( say 128k ), but you don't delete/GC the frames you sent until
> you get the next window update from server. The window update will have an
> extra field indicating how much was consumed.

You don't want to send window updates until you're sure data has being
consumed by the application.

Why ? Window updates indicate there is space for more bytes in the buffers, not that the bytes have been consumed.

For any decent implementation, so long as the bytes in the buffers aren't blocking any other stream, this is good enough, and certainly no worse that TCP.
For a proxy, 'consumed by the application' translates to 'moved ingress bytes to egress bytes'.
 

 
Otherwise you're duplicating what the TCP ACK is saying, and it is of
no interest to the sender.

The problem is duplicating only part of TCP flow control. ACK is a part of flow control, just like the window update. TCP relies on packet drops and ACKs to determine what to send and how fast.

Even in HTTP the sender has to be able to deal with drops and re-transmits. The status code is a form of ACK, and plenty of problems have been caused by not dealing properly with drops and retries in http.

Bufferbloat is mentioned quite a bit - maybe we should look at ECN ( congestion notification ), which is the alternative to ACK and dropping packets.  

SPDY duplicates stuff from lower layers - multiplexing, a part of flow control.  It's likely to duplicate some of the problems and make other worse by not duplicating enough :-)

rexmits are a necessary part of TCP because of packet loss. We have a reliable transport that already does rexmit for us. We shouldn't need to reimplement that-- it is a fair bit of additional complexity and I believe that we don't need it to solve our problems.

In particular, the scheme I proposed earlier should address all of the issues we've brought up so far. Roughly, it is:

Assume:
   we have per-stream window sizes.
   we have per-connection window sizes.
  A sender can send up to (min(per-stream, per-connection)) bytes.

"a stream is blocked"  means that the sender for that stream:
  * can write to the socket (i.e. the socket is writable and there is space in the TCP egress buffers)
  * has no higher priority stream that is currently sending (and thus blocking this stream because of prioritization)
  * there are bytes to be sent on this stream.

The algorithm is then (pseudocode):

OnWritable(socket, stream_id):
  if (bytes_to send && socket_writable && clear_to_send_this_priority):
    max_window = min(connection_flow_control_window),
                              flow_control_window[stream_id]))
    bytes_sent = send(socket, bytes_to_send, min(bytes_to_send.len(), max_window))
    if (StillWritable(socket) && bytes_to_send.len() > max_window):
      SendBlockedFrame(socket, stream_id, bytes_to_send.len() - max_window)
      last_time_blocked[stream_id] = Now()

OnWindowUpdate(stream_id, window_update_frame):
  flow_control_window[stream_id] += window_update_frame.stream_update_size;
  connection_flow_control_window += window_update_frame.overall_update_size;
  
  // If we were blocked
  if (bytes_to_send.len() > 0 && last_time_blocked[stream_id] > 0):
    time_blocked = Now() - last_time_blocked[stream_id]
    SendUnblockedFrame(socket, stream_id, bytes_to_send.len(), time_blocked)
    max_window = min(connection_flow_control_window),
                              flow_control_window[stream_id]))   
    if max_window >= bytes_to_send.len():
      last_time_blocked[stream_id] = 0
    else:
      last_time_blocked[stream_id] = Now()
 
In natural language:
if a sender is ever blocked, then it should send a frame to the receiver indicating the stream ID which is blocked, with then amount of bytes it would wish to send, but couldn't because of flow-control.
When a sender receives a window update frame, it should indicate how many bytes are blocked and for how long

The receiver, upon receipt of such a frame, could increase the various window sizes as indicated by the frames which tell the receiver the number (and possible duration) of the blocked bytes (hopefully up to a maximum as estimated by the BDP).

This scheme doesn't require much additional complexity, and it meets all of the ratings targets proposed by Greg earlier.
This scheme rapidly converges on the appropriate window size without too much overshoot.

Comments? Are there holes to be poked-in here?
-=R

William Chan (陈智昌)

unread,
May 30, 2012, 2:34:04 PM5/30/12
to Roberto Peon, spdy...@googlegroups.com
First let me say that I think the re-implementing rexmit at the SPDY level when SPDY operates over a reliable transport is a mistake. I think most people agree that's a mistake. If anyone other than Costin supports rexmit at the SPDY level, please speak up.
"how many bytes are blocked" is not well defined. Let's say the sender is reading from a file, whose length he doesn't know. Is "how many bytes are blocked" just the next chunk size he would have sent in the next send() call, or is it rather the remaining amount of data on the stream? It's not clear to me that this heuristic is useful. Also, what's the "how long" used for? Preventing starvation?
 

The receiver, upon receipt of such a frame, could increase the various window sizes as indicated by the frames which tell the receiver the number (and possible duration) of the blocked bytes (hopefully up to a maximum as estimated by the BDP).

I think I need more clarification on the server-side motivations here. Are you saying you want the client to provide information as to which streams need more buffers? Just to be clear, in this proposal, are we trying to address a deficiency compared to HTTP over TCP connections, or are we trying to provide better facilities to do better buffer management than is possible with HTTP over TCP?

FWIW, I'm lean towards Mike's POV more, although I do concede a need for flow control (and per-session windows in addition to per-stream windows). But I think that these windows should be sized so they only come into play in the less common cases (most streams are short-lived) and would like to see Chromium and Firefox and other SPDY clients agree on minimum sizes to require, so we prevent stupid servers from making things unnecessarily slow. And I think per-session+per-stream windows give enough knobs for the server to manage things appropriately, and don't really see a need for further knobs.

Costin Manolache

unread,
May 30, 2012, 2:49:34 PM5/30/12
to spdy...@googlegroups.com
"Packet loss" also means "dropped because of full buffers".
We also have a transport that does flow control - yet we are duplicating that ( in part - without the part that deals with congestion). 
Not sure I understand - you can't increase any window if the per-connection buffers are full. 

The sender has sent initial window of all the streams, up to filling the per connection window. There is no more space in the proxy ( maybe for control frames which are not subject to frame control). 
 


This scheme doesn't require much additional complexity, and it meets all of the ratings targets proposed by Greg earlier.
This scheme rapidly converges on the appropriate window size without too much overshoot.

I really don't see how this can work. Maybe if 'per connection' buffers equals per-stream window * max number of streams - in which case what's the point of per connection flow control ? 

Well, if I'm the only one who doesn't understand I'll have to accept it and wait for implementations to see where I was wrong... 



Costin

Roberto Peon

unread,
May 30, 2012, 3:13:54 PM5/30/12
to spdy...@googlegroups.com
Agreed. We're duplicating it because we'd be required to do infinite buffering at proxies without it. That problem is not being solved by the transport, unfortunately :/
If the proxy has no more space, then noone should be sending more bytes and things are working properly.
If the proxy has space in its buffers, it indicates such by increasing the per-connection window size.
If the proxy is blockage for a particular stream, it doesn't update the window size for that stream, but otherwise the per-stream window size should be the per-connection window size.

Does that clarify it?

-=R

Roberto Peon

unread,
May 30, 2012, 3:21:08 PM5/30/12
to will...@google.com, spdy...@googlegroups.com
Any of the above. The larger the blockage, the less interesting the signal, however. Blockages which are less than BDP send some very useful and interesting data...
The 'how long' is useful in estimating bandwidth demand, or for implementations of fairness, depending on the receiver implementation.

 
 

The receiver, upon receipt of such a frame, could increase the various window sizes as indicated by the frames which tell the receiver the number (and possible duration) of the blocked bytes (hopefully up to a maximum as estimated by the BDP).

I think I need more clarification on the server-side motivations here. Are you saying you want the client to provide information as to which streams need more buffers? Just to be clear, in this proposal, are we trying to address a deficiency compared to HTTP over TCP connections, or are we trying to provide better facilities to do better buffer management than is possible with HTTP over TCP?

The client knows how much it needs to send for each stream. The server, of course, doesn't know this. The 'I'm blocked" mechanism will be useful for servers which don't have the ability to hack the kernel to discover TCP parameters by allowing a reasonably accurate assessment of how much smaller the flow control window is as compared to BDP.
 

FWIW, I'm lean towards Mike's POV more, although I do concede a need for flow control (and per-session windows in addition to per-stream windows). But I think that these windows should be sized so they only come into play in the less common cases (most streams are short-lived) and would like to see Chromium and Firefox and other SPDY clients agree on minimum sizes to require, so we prevent stupid servers from making things unnecessarily slow. And I think per-session+per-stream windows give enough knobs for the server to manage things appropriately, and don't really see a need for further knobs.

If we set the per-connection window size to roughly the BDP (or slightly more), and we set the per-stream window size to roughly the per-connection window size unless the server needs to constrain the stream to avoid HOL blocking, that achieves what we need, agreed. 

-=R

Simone Bordet

unread,
May 30, 2012, 3:35:57 PM5/30/12
to spdy...@googlegroups.com
Hi,

On Wed, May 30, 2012 at 7:43 PM, Roberto Peon <fe...@google.com> wrote:
> In natural language:
> if a sender is ever blocked, then it should send a frame to the receiver
> indicating the stream ID which is blocked, with then amount of bytes it
> would wish to send, but couldn't because of flow-control.

I don't follow this point.
If the sender is blocked, how can it send a frame to the receiver ?

And even supposing that the receiver somehow receives it, what use can
it make of it ?
If a receiver is bound to a slow application, it knows that (it reads
more for that stream than the application consumes), and I can't
imagine what can it do with the information that it's flow controlled
- it probably already knows that by its own.

Thanks,

Roberto Peon

unread,
May 30, 2012, 3:48:39 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 12:35 PM, Simone Bordet <sbo...@intalio.com> wrote:
Hi,

On Wed, May 30, 2012 at 7:43 PM, Roberto Peon <fe...@google.com> wrote:
> In natural language:
> if a sender is ever blocked, then it should send a frame to the receiver
> indicating the stream ID which is blocked, with then amount of bytes it
> would wish to send, but couldn't because of flow-control.

I don't follow this point.
If the sender is blocked, how can it send a frame to the receiver ?

"a stream is blocked"  means that the sender for that stream:
  * can write to the socket (i.e. the socket is writable and there is space in the TCP egress buffers)
  * has no higher priority stream that is currently sending (and thus blocking this stream because of prioritization)
  * there are bytes to be sent on this stream.

And even supposing that the receiver somehow receives it, what use can
it make of it ?
If a receiver is bound to a slow application, it knows that (it reads
more for that stream than the application consumes), and I can't
imagine what can it do with the information that it's flow controlled
- it probably already knows that by its own.

The receiver can:
1) Allocate more buffer space for a particular stream vs another stream
2) Increase the per-connection flow control window, and, importantly:
3) Estimate the amount of additional bytes-on-the-wire necessary to meet the BDP.

The receiver can't know that something is flow controlled unless the sender informs it. 

The intent is:
To be able to do a decent estimate of BDP when necessary.
To adjust the per-connection window size appropriately so as to ensure that the tcp-connection is the bottleneck while using minimal, finite buffer resources on the server side
To understand the differing demand for bandwidth of the various streams so that appropriate decisions can be made as to *egress* buffer management at proxies.

-=R

Jim Roskind

unread,
May 30, 2012, 3:57:30 PM5/30/12
to fe...@google.com, spdy...@googlegroups.com
On Wed, May 30, 2012 at 10:43 AM, Roberto Peon <fe...@google.com> wrote:

...

I like Roberto's proposal.

I see it as being helpful in a certain class of cases, involving upload from client to server (or client to proxy).

In that case, the client may be immediately aware of the size of a stream's complete byte count (a.k.a., file size).  It seems that it can only be helpful to warn a server about an impending upload.  Advance, and precise info, beats the heck out of delayed info ("now the buffer is full") and inprecise info ("we have no idea how bad this is going to get").

I like the fact that that server kernels can evolve more quickly with regard to the transport such as TCP.  As a result, it is plausible that a server will be able to nicely track the current BDP in TCP, and make "intelligent" proposals and updates to the overall SPDY connection window.

I like the fact that it introduces a (potentially) low latency control message, that could be "just in time" inserted into client-to-server connection, and that can be done potentially long before a window is full, or overflowing.  For instance, such a message could be sent at the start of a giant upload.  I like that this control message is not counted against the connection or stream window (not stated, but apparent from offline discussion).

I don't think this solves world hunger... but I do think it helps in a real and common problem case: Head of line blocking induced by a large upload hogging a whole connection.

Jim

Costin Manolache

unread,
May 30, 2012, 4:10:26 PM5/30/12
to spdy...@googlegroups.com
My point is that the proposals I've seen are not _duplicating_ - just pick a subset. 

And I don't agree it 'requires infinite buffering' - it requires the same amount of buffering that HTTP would require, all other knobs being equal. 
They are not working properly - you may have higher priority streams that can't go trough, and 'bad' streams preventing good streams. Very similar with the buffer bloat, and for similar reasons. 

In contrast plain HTTP would work just fine in this situation - the bad streams will hit TCP flow control, the new streams will keep working. 

 
If the proxy has space in its buffers, it indicates such by increasing the per-connection window size.
If the proxy is blockage for a particular stream, it doesn't update the window size for that stream, but otherwise the per-stream window size should be the per-connection window size.

Yes, but that stream still uses bytes ( equals with last window ). Enough of those streams and the connection is useless. 

It's replicating the buffer bloat problem almost identically, and all this just to make SPDY memory use lower than the equivalent HTTP memory use on the proxy. 


Costin

Mike Belshe

unread,
May 30, 2012, 6:04:39 PM5/30/12
to spdy...@googlegroups.com
This thread has gotten confusing, maybe someone can make a concrete proposal as to what we're talking about at this point?

My current summary:

I liked Greg's summary of the goals of flow control.

I believe that we have a tradeoff of full-pipe performance vs buffering which the current SPDY spec makes.

I think Roberto is proposing something, but I'm not sure what it is.  I am very much against more complexity, because I believe the entire problem is 100% contrived and unreal - simply not worth the complexity.  The protocol can already deal with an over-buffer situation even without *any* flow control - just kill the stream.  (See below for justification).  I'd rather remove all flow control from SPDY than add more complexity.

Justification:
a) The client isn't going to throttle the downlink - it wants data as fast as it can get it.
b) The server doesn't get huge uploads very often; so there isn't much to throttle here anyway.
c) DoS is an exception that can be detected and dealt with, and flow control doesn't solve it anyway.
d) If your backend server is down causing backlogs in your proxy, you can write code to deal with that (e.g. failover) or nuke the stream.  Why expose it out to the whole protocol?

Mike

Costin Manolache

unread,
May 30, 2012, 6:24:08 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 3:04 PM, Mike Belshe <mbe...@chromium.org> wrote:
This thread has gotten confusing, maybe someone can make a concrete proposal as to what we're talking about at this point?

My current summary:

I liked Greg's summary of the goals of flow control.

I believe that we have a tradeoff of full-pipe performance vs buffering which the current SPDY spec makes.

I think Roberto is proposing something, but I'm not sure what it is.  I am very much against more complexity, because I believe the entire problem is 100% contrived and unreal - simply not worth the complexity.  The protocol can already deal with an over-buffer situation even without *any* flow control - just kill the stream.  (See below for justification).  I'd rather remove all flow control from SPDY than add more complexity.

+1 ( on removing all flow control and killing streams that miss-behave ).


Costin

Alek Storm

unread,
May 30, 2012, 7:16:00 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 5:04 PM, Mike Belshe <mbe...@chromium.org> wrote:
I believe that we have a tradeoff of full-pipe performance vs buffering which the current SPDY spec makes.

I think Roberto is proposing something, but I'm not sure what it is.  I am very much against more complexity, because I believe the entire problem is 100% contrived and unreal - simply not worth the complexity.  The protocol can already deal with an over-buffer situation even without *any* flow control - just kill the stream.  (See below for justification).  I'd rather remove all flow control from SPDY than add more complexity.

So a stream can be killed halfway through transmitting a large amount of data? How will the sender know when it is safe to retransmit without the stream getting killed again? And they'll have to retransmit the entire block from the beginning - everything that was sent in the killed stream is lost. That sounds remarkably inefficient.

Justification:
a) The client isn't going to throttle the downlink - it wants data as fast as it can get it.

Absolutely not; see Patrick's examples (https://groups.google.com/d/msg/spdy-dev/JB_aQPNI7rw/-Hnjp94xjG4J) near the beginning of this thread. A lack of per-stream flow control would become even more of a problem as web applications become more media-rich. I believe clients would begin to compensate by opening multiple TCP connections - not for a greater share of the server's bandwidth, but for stream-specific flow control.

d) If your backend server is down causing backlogs in your proxy, you can write code to deal with that (e.g. failover) or nuke the stream.  Why expose it out to the whole protocol?

That doesn't help forward proxies, which don't have control over upstream servers (see Ryan's excellent explanation at https://groups.google.com/d/msg/spdy-dev/JB_aQPNI7rw/FGYat2VU22IJ). And I'm sure clients would not be pleased with repeatedly having their upload streams killed and restarting the upload from scratch because the proxy has no way to notify them of the size of its input buffers. With multiple downstream clients doing this at once, the proxy could become overwhelmed and unintentionally DoS'd.

Alek

William Chan (陈智昌)

unread,
May 30, 2012, 7:26:03 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 3:04 PM, Mike Belshe <mbe...@chromium.org> wrote:
This thread has gotten confusing, maybe someone can make a concrete proposal as to what we're talking about at this point?

I propose per-stream flow control windows + per-session flow control windows where the windows are very healthily sized so we don't hit them for 90% of streams (most of them are short-lived after all).


My current summary:

I liked Greg's summary of the goals of flow control.

I believe that we have a tradeoff of full-pipe performance vs buffering which the current SPDY spec makes.

I think Roberto is proposing something, but I'm not sure what it is.  I am very much against more complexity, because I believe the entire problem is 100% contrived and unreal - simply not worth the complexity.  The protocol can already deal with an over-buffer situation even without *any* flow control - just kill the stream.  (See below for justification).  I'd rather remove all flow control from SPDY than add more complexity.

Justification:
a) The client isn't going to throttle the downlink - it wants data as fast as it can get it.

Agreed.
 
b) The server doesn't get huge uploads very often; so there isn't much to throttle here anyway.

Most servers will not, very true, but some will get them very often. Very example, a system that heavily uses the Google Drive web app (let's say ChromeOS for example) will send a lot of large uploads to the Google Drive server.

Also, don't get too focused on the upload direction. The download direction also matters. For example, a forward proxy speaking SPDY to a SPDY origin server. The proxy wants to keep reading from the SPDY origin server, but then it might get infinite buffering or HoL blocking due to different client download bandwidths. Imagine if one of the clients is mobile! Is the solution really to always nuke the proxy<=>origin server stream that is servicing the mobile client?

Jim Roskind

unread,
May 30, 2012, 7:54:51 PM5/30/12
to spdy...@googlegroups.com
On Wed, May 30, 2012 at 3:24 PM, Costin Manolache <cos...@gmail.com> wrote:


On Wed, May 30, 2012 at 3:04 PM, Mike Belshe <mbe...@chromium.org> wrote:
This thread has gotten confusing, maybe someone can make a concrete proposal as to what we're talking about at this point?

My current summary:

I liked Greg's summary of the goals of flow control.

I believe that we have a tradeoff of full-pipe performance vs buffering which the current SPDY spec makes.

I think Roberto is proposing something, but I'm not sure what it is.  I am very much against more complexity, because I believe the entire problem is 100% contrived and unreal - simply not worth the complexity.  The protocol can already deal with an over-buffer situation even without *any* flow control - just kill the stream.  (See below for justification).  I'd rather remove all flow control from SPDY than add more complexity.

+1 ( on removing all flow control and killing streams that miss-behave ).

The big problem IMO is that large uploads are indeed common. Given how TCP will suck up a giant buffer (by deliberately bloating buffers along the route), SPDY connections can and do become so bloated that latency is soon untenable.  If you kill such a misbehaving(??) "hog" stream, you will have castrated the protocol in several ways.

Do we really want to tell an application to open a separate (non-SPDY?) connection for a big upload?

IMO, we should look at this problem, understand it, and try to help with it (or prove it is impossible to help).

Jim
It is loading more messages.
0 new messages