More proposed changes for SPDY/4

Roberto Peon

unread,

Aug 9, 2012, 2:46:25 PM8/9/12

to spdy...@googlegroups.com

Once everything is worked out, the intention is to merge the stuff in this repo:

https://github.com/grmocg/SPDY-Specification/tree/gh-pages

into the authoritative repo (which my repo is not).

There are a lot of changes to the spec here. Some of the changes I suspect we'll need to talk about. In particular, I've shortened the 'Length' field for all frames to 16 bits. Noone should be sending frames much larger than 16 bits worth (because then the mux/prioritization features work very poorly).

I've added a PUSH_PROMISE frame. It does need significant cleanup of the prose, but the basics should be there. The essential change here is that a server push starts with a PUSH_PROMISE, which always inherits headers from the request, and then overrides only those parts which are different by sending those overrides (and only those overrides). The PUSH_PROMISE frame allocates stream-ids for the push streams according to the regular (monotonically increasing) rules, however it doesn't count towards the MAX_STREAM limit. When the push streams are actually created (by sending a SYN_REPLY), then the stream-ids are in actual use and DO count against the MAX_STREAM limit.

The headers section hasn't changed much because I'm still actively playing with it. The most up-to-date stuff for the headers is found in the headers_sample.py file.

headers_sample.py now parses .har files, which makes it easy to see how compression would look on real transactions.

Comments, please!

-=R

Ilya Grigorik

unread,

Aug 9, 2012, 9:43:04 PM8/9/12

to spdy...@googlegroups.com

On Thu, Aug 9, 2012 at 11:46 AM, Roberto Peon <fe...@google.com> wrote:

I've added a PUSH_PROMISE frame. It does need significant cleanup of the prose, but the basics should be there. The essential change here is that a server push starts with a PUSH_PROMISE, which always inherits headers from the request, and then overrides only those parts which are different by sending those overrides (and only those overrides). The PUSH_PROMISE frame allocates stream-ids for the push streams according to the regular (monotonically increasing) rules, however it doesn't count towards the MAX_STREAM limit. When the push streams are actually created (by sending a SYN_REPLY), then the stream-ids are in actual use and DO count against the MAX_STREAM limit.

If the client wants to decline a promise for a specific asset (for whatever reason), I assume RST_STREAM is the appropriate mechanism? Or a whole lot of RST_STREAMS if we need to decline multiple assets? Does it make sense to have a bulk RST_STREAM? Heh.. :)

ig

Roberto Peon

unread,

Aug 9, 2012, 10:31:28 PM8/9/12

to spdy...@googlegroups.com

On Aug 9, 2012 6:43 PM, "Ilya Grigorik" <igri...@gmail.com> wrote:
>
>
>
> On Thu, Aug 9, 2012 at 11:46 AM, Roberto Peon <fe...@google.com> wrote:
>>
>> I've added a PUSH_PROMISE frame. It does need significant cleanup of the prose, but the basics should be there. The essential change here is that a server push starts with a PUSH_PROMISE, which always inherits headers from the request, and then overrides only those parts which are different by sending those overrides (and only those overrides). The PUSH_PROMISE frame allocates stream-ids for the push streams according to the regular (monotonically increasing) rules, however it doesn't count towards the MAX_STREAM limit. When the push streams are actually created (by sending a SYN_REPLY), then the stream-ids are in actual use and DO count against the MAX_STREAM limit.
>
>
> If the client wants to decline a promise for a specific asset (for whatever reason), I assume RST_STREAM is the appropriate mechanism?

Correct.

> Or a whole lot of RST_STREAMS if we need to decline multiple assets? Does it make sense to have a bulk RST_STREAM? Heh.. :)

It may... we've thought about in the past, but never had a strong need for it. This may be the proverbial straw for this particular camel's back though.

-=R
>
> ig
>

Simone Bordet

unread,

Aug 10, 2012, 5:37:40 AM8/10/12

to spdy...@googlegroups.com

Hi,

On Thu, Aug 9, 2012 at 8:46 PM, Roberto Peon <fe...@google.com> wrote:
> Once everything is worked out, the intention is to merge the stuff in this
> repo:
> https://github.com/grmocg/SPDY-Specification/tree/gh-pages
> into the authoritative repo (which my repo is not).

Is there a HTML format ?

> There are a lot of changes to the spec here. Some of the changes I suspect
> we'll need to talk about. In particular, I've shortened the 'Length' field
> for all frames to 16 bits. Noone should be sending frames much larger than
> 16 bits worth (because then the mux/prioritization features work very
> poorly).
>
> I've added a PUSH_PROMISE frame. It does need significant cleanup of the
> prose, but the basics should be there. The essential change here is that a
> server push starts with a PUSH_PROMISE, which always inherits headers from
> the request, and then overrides only those parts which are different by
> sending those overrides (and only those overrides). The PUSH_PROMISE frame
> allocates stream-ids for the push streams according to the regular
> (monotonically increasing) rules, however it doesn't count towards the
> MAX_STREAM limit. When the push streams are actually created (by sending a
> SYN_REPLY), then the stream-ids are in actual use and DO count against the
> MAX_STREAM limit.

Sorry, but I am lost on this change.

In v3 a server that wants to push sends a SYN_STREAM to the client,
with the URI of the resource to be pushed, and headers similar to a
HTTP response (status code).

In v4 I understand that the server has to send a PUSH_PROMISE, and
then a SYN_REPLY ? Given the strong adversion of this group to
additional round trips, I find this change really strange.

I can't see benefits either: in v3 canceling a push was made by
resetting the stream; same will be in v4; there is a non-clear concept
of "when the server is ready to push the resource, then it sends a
SYN_REPLY", which smells too much of implementation details slipped
into the specification.
The user agent can always detect what request triggered the push,
because the pushed SYN_STREAM has the associated stream id.

All in all, I can't find any reason for this change, but it's evident
that I am missing something.

Can you please expand on the use cases that caused this change, and
report the data that measured that sending 2 control frames instead of
1 for pushed resources is actually better ?

Thanks,

Simon
--
http://cometd.org
http://webtide.com
Developer advice, services and support
from the Jetty & CometD experts.
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Roberto Peon

unread,

Aug 10, 2012, 12:03:47 PM8/10/12

to spdy...@googlegroups.com

Push in v3 sends a syn_stream with the required request headers. Ay some later time it will send a headers frame which sends the remaining request headers. After all request headers are sent, it sends a SYN_REPLY with the response headers.

The v4 behavior is similar, but uses PUSH_PROMISE instead of SYN_STREAM. This allows us to remove 4 bytes from all SYN_STREAM frames, and it hopefully makes it clear that the PUSH frame doesn't count against the same concurrency limit that the SYN frame does. Anyway, that is the intent.

>
> In v4 I understand that the server has to send a PUSH_PROMISE, and
> then a SYN_REPLY ? Given the strong adversion of this group to
> additional round trips, I find this change really strange.

You don't wait for a syn reply-- the server sends both. You want to have these be separated (while under control of the server) so the sending of the main resource isn't delayed.

>
> I can't see benefits either: in v3 canceling a push was made by
> resetting the stream; same will be in v4; there is a non-clear concept
> of "when the server is ready to push the resource, then it sends a
> SYN_REPLY", which smells too much of implementation details slipped
> into the specification.
> The user agent can always detect what request triggered the push,
> because the pushed SYN_STREAM has the associated stream id.

>
> All in all, I can't find any reason for this change, but it's evident
> that I am missing something.
>
> Can you please expand on the use cases that caused this change, and
> report the data that measured that sending 2 control frames instead of
> 1 for pushed resources is actually better ?

See above :)
-=R

Simone Bordet

unread,

Aug 10, 2012, 1:23:45 PM8/10/12

to spdy...@googlegroups.com

Hi,

On Fri, Aug 10, 2012 at 6:03 PM, Roberto Peon <fe...@google.com> wrote:
> Push in v3 sends a syn_stream with the required request headers. Ay some
> later time it will send a headers frame which sends the remaining request
> headers.

Is not this an implementation detail ?
Jetty does not send HEADERS frames, for example: we just send the
pushed SYN_FRAME.

> After all request headers are sent, it sends a SYN_REPLY with the
> response headers.

The SYN_REPLY for the original request may be sent before the pushed
SYN_STREAMs, or concurrently, or after.
I don't see the v3 specification mandate that it must be after the
pushed SYN_STREAMs.

> The v4 behavior is similar, but uses PUSH_PROMISE instead of SYN_STREAM.
> This allows us to remove 4 bytes from all SYN_STREAM frames, and it
> hopefully makes it clear that the PUSH frame doesn't count against the same
> concurrency limit that the SYN frame does. Anyway, that is the intent.

To just limit concurrency on SYN_FRAMEs, won't be enough to say that
if they are unidirectional then they do not sum up ?

I still miss the reasons for this change ?

Roberto Peon

unread,

Aug 10, 2012, 1:37:01 PM8/10/12

to spdy...@googlegroups.com

On Fri, Aug 10, 2012 at 10:23 AM, Simone Bordet <sbo...@intalio.com> wrote:

Hi,

On Fri, Aug 10, 2012 at 6:03 PM, Roberto Peon <fe...@google.com> wrote:
> Push in v3 sends a syn_stream with the required request headers. Ay some
> later time it will send a headers frame which sends the remaining request
> headers.

Is not this an implementation detail ?
Jetty does not send HEADERS frames, for example: we just send the
pushed SYN_FRAME.

Yes, if you've sent all the headers in the SYN_FRAME, you need not use a HEADERS frame.

> After all request headers are sent, it sends a SYN_REPLY with the
> response headers.

The SYN_REPLY for the original request may be sent before the pushed
SYN_STREAMs, or concurrently, or after.
I don't see the v3 specification mandate that it must be after the
pushed SYN_STREAMs.

There is no mandate that you do so. The server can do as it wishes. If it does not wait, however, it induces latency in the primary resource, which generally increases both page-load-time and above-the-fold display time.

Basically, doing so is inadvisable unless the pipe is empty.

> The v4 behavior is similar, but uses PUSH_PROMISE instead of SYN_STREAM.
> This allows us to remove 4 bytes from all SYN_STREAM frames, and it
> hopefully makes it clear that the PUSH frame doesn't count against the same
> concurrency limit that the SYN frame does. Anyway, that is the intent.

To just limit concurrency on SYN_FRAMEs, won't be enough to say that
if they are unidirectional then they do not sum up ?

Yes, we can do everything we wish with just the SYN_STREAM, but it adds bytes to most SYN_STREAM frames. We do more SYN_STREAMS than pushes today, and eliminating the field makes sense. We have a large address space for opcodes and should be unafraid to use it when it does present advantages (as it does here).

I still miss the reasons for this change ?

The reasons.

1) make common operations (i.e. SYN_STREAM) more efficient.

2) differentiate SYN_STREAM and PUSH more easily (i.e. at the opcode level, where it belongs). I feel that this would increase the chance that a new implementation would deal properly with the concurrency limits, especially in the case where it does not adhere perfectly to the spec.

3) if we discover PUSH is non-essential, all we must do is deprecate that frame-- there will be no waste elsewhere in the protocol

Did you check out the other changes, btw?

-=R

Simone Bordet

unread,

Aug 16, 2012, 8:41:38 AM8/16/12

to spdy...@googlegroups.com

Hi,

On Fri, Aug 10, 2012 at 7:37 PM, Roberto Peon <fe...@google.com> wrote:
> The reasons.
> 1) make common operations (i.e. SYN_STREAM) more efficient.
> 2) differentiate SYN_STREAM and PUSH more easily (i.e. at the opcode level,
> where it belongs). I feel that this would increase the chance that a new
> implementation would deal properly with the concurrency limits, especially
> in the case where it does not adhere perfectly to the spec.
> 3) if we discover PUSH is non-essential, all we must do is deprecate that
> frame-- there will be no waste elsewhere in the protocol

Ok so I misunderstood your first email, sorry.

You are proposing to add a new frame PUSH_PROMISE instead of using
SYN_STREAM when doing pushes. This will remove the need for having an
associated stream id in the non-pushed SYN_STREAMS.
If I now understand correctly, I am all for that :)

> Did you check out the other changes, btw?

Nope, to be honest... I could only find the XML (no HTML format ?),
and I think it would be immensely helpful to have a section that
explains the changes between v3 and v4 instead of relying on diff
tools.
If it's already there and I missed it in the XML brackets, I apologize
in advance :)

Roberto Peon

unread,

Aug 16, 2012, 1:36:14 PM8/16/12

to spdy...@googlegroups.com

On Thu, Aug 16, 2012 at 5:41 AM, Simone Bordet <sbo...@intalio.com> wrote:

Hi,

On Fri, Aug 10, 2012 at 7:37 PM, Roberto Peon <fe...@google.com> wrote:
> The reasons.
> 1) make common operations (i.e. SYN_STREAM) more efficient.
> 2) differentiate SYN_STREAM and PUSH more easily (i.e. at the opcode level,
> where it belongs). I feel that this would increase the chance that a new
> implementation would deal properly with the concurrency limits, especially
> in the case where it does not adhere perfectly to the spec.
> 3) if we discover PUSH is non-essential, all we must do is deprecate that
> frame-- there will be no waste elsewhere in the protocol

Ok so I misunderstood your first email, sorry.

You are proposing to add a new frame PUSH_PROMISE instead of using
SYN_STREAM when doing pushes. This will remove the need for having an
associated stream id in the non-pushed SYN_STREAMS.
If I now understand correctly, I am all for that :)

Yup! Awesome!

> Did you check out the other changes, btw?

Nope, to be honest... I could only find the XML (no HTML format ?),
and I think it would be immensely helpful to have a section that
explains the changes between v3 and v4 instead of relying on diff
tools.
If it's already there and I missed it in the XML brackets, I apologize
in advance :)

No problem-- I'll regenerate the HTML for easy consumption! If you don't see an updated HTML in the future, just ping me. Regenerating it is quite easy-- I just forget sometimes as I concentrate on making the changes make sense :/

The section for differences does exist and is updated. Actually it is where I have my TODO list (the states of the various changes are DONE, ONGOING, TODO), so hopefully I won't be able to forget anything... The section is titled "Incompatibilities with SPDY draft #3"

I'll cut&paste it here.

Here is a list of the major changes between this draft #3 and this draft

DONE: Different, more precise, notation style used to describe all frames.

DONE: Downsizing various fields in all messages... this will definitely need debate

DONE: Removal of Version field from all messages.

DONE: Reordered fields in all messages. All messages now share a common header of: length, flags, control-bit, 31-bit-payload. All control frames include an 8th byte, which is the opcode.

DONE: Addition of end-of-message delimiter flag in data frames

DONE: Modification of server push; addition of PUSH_PROMISE frame add a push frame, removed the 'associated-stream-id' field from SYN frame

ONGOING: Significant modifications to how headers are transported. This involved changes to all frames incorporating HEADER blocks, and changes to the HEADER blocks themselves.

ONGOING: Different header compression technique which uses less CPU for proxies and which should result in higher compression

TODO: Addition of end-of-header-section delimiter flag in any frame which has a header block

ONGOING: Definition of cert-data push

ONGOING: Definition of name-resolution push

TODO: Modification of flow-control; allow two-levels of flow control so as to allow greater stream concurrency safely

TODO: Add the 'blocked-on-flow-control' notification. Experience has shown that limits are too easy to get wrong, and this helps to self-correct this problem

TODO: Modification of flow-control; headers-blocks (thus syn-stream) gets its own pool of memory, separate from data frames

TODO: Everything after the first header-block-section possibly treated as flow-control

TODO: Session-error status code added for UNRECOGNIZED_SCHEME for new streams. This triggers when the recipient doesn't know how to handle a stream of that type.

The thing I'd most like debated right now is the fact that the order and size of the fields is completely different.

A 32 bit field-length (as many had pointed out before) was a waste of space. I was also seeing some people get confused with what the 'length' field included, and the code to handle it was unnecessarily messy.

The new ordering makes a common 7-byte read for *all* frames, with the separation of the fields being the same for all frames.

-=R

Ilya Grigorik

unread,

Aug 16, 2012, 7:05:24 PM8/16/12

to spdy...@googlegroups.com

On Thu, Aug 16, 2012 at 5:41 AM, Simone Bordet <sbo...@intalio.com> wrote:

> Did you check out the other changes, btw?

Nope, to be honest... I could only find the XML (no HTML format ?),
and I think it would be immensely helpful to have a section that
explains the changes between v3 and v4 instead of relying on diff
tools.
If it's already there and I missed it in the XML brackets, I apologize
in advance :)

Github pages URLs are no fun... This should point to latest build in Roberto's repo:

http://grmocg.github.com/SPDY-Specification/draft-mbelshe-httpbis-spdy-00.html

ig

Tatsuhiro Tsujikawa

unread,

Aug 17, 2012, 8:21:06 AM8/17/12

to spdy...@googlegroups.com

Hi,

Putting Num-of-Entries-or-Stream-ID-or-ID in the common header is strange to me.
Some frames shares stream ID but for other frames they are completely
independent.
Why do we need to put num-of-entries and stream ID in the same place?
I think removing this field from the common header and putting in the
data section makes
more sense.
So the proposed common header will look like this:

0 1 2 3 4..N
+--------+--------+--------+-|--------+========+
| Length(16) |Flags(8)|1| Type(8)|Data ->
+--------+--------+--------+-|--------+========+

Best regards,

Tatsuhiro Tsujikawa

Jeroen

unread,

Sep 13, 2012, 12:08:29 PM9/13/12

to spdy...@googlegroups.com

I'm concerned about the amount om memory required for the zlib decompression.

The compressor (e.g. browser) decides the window size, obviously the bigger the window the better the compression. However the decompressor (e.g. server) needs to keep the whole window in memory to be able to decompress. If all browsers blindly choose a 32k window, a high-end server will need a large amount of memory just for this.

We should maybe add a SETTINGS value MAX_COMPRESSION_WINDOW_SIZE?

There is a possible race condition (server sends this setting, but the client has already sent a SYN_REQUEST), which could be addressed with an extra flag on the SYN_STREAM frame COMPRESSION_RESET, which indicates that the decompressor should create a new compression context (based on just the initializtion dictionary).

Any thoughts,

Jeroen

Roberto Peon

unread,

Sep 13, 2012, 1:52:27 PM9/13/12

to spdy...@googlegroups.com

Appropriate limits on the compressor is part of the plan for SPDY/4 compression, which, as I'll be proposing shortly, does not use gzip.

-=R

Hervé Ruellan

unread,

Sep 21, 2012, 4:36:12 AM9/21/12

to spdy...@googlegroups.com

Hi,

On Thursday, August 9, 2012 8:46:25 PM UTC+2, Roberto Peon wrote:

The headers section hasn't changed much because I'm still actively playing with it. The most up-to-date stuff for the headers is found in the headers_sample.py file.
headers_sample.py now parses .har files, which makes it easy to see how compression would look on real transactions.

I'm interested in new strategies for efficient header encoding, not relying solely on deflate compression. Therefore I downloaded your code and gave it a try on some sample .har files I have.

The results are very interesting, enabling very good compression ratios without relying on deflate (I disabled the deflate step to check this).

I think I uncovered a bug with one of my test samples, which contains many requests. From my understanding of the code, I think it is due to a difference of execution order for the opcodes during encoding and decoding.
Upon encoding, the opcodes are executed when they are generated. However, afterwards the "toggle" opcodes are sorted and possibly grouped as "trang" opcodes. The list of opcodes received by the decoder it this sorted list. And therefore, the execution order on the decoder side may be different. For decoding the list of headers, this has no impact.
However, upon execution, the opcodes are used to update the list of most recently used index. Therefore, a different execution order may result in differences in the list of most recently used index. This can be the source of problems when the maximum number of entries in the index table is reached, and some indexed values are dropped from the table: the indexed value dropped will not be the same on the encoder side and the decoder side.
Indeed, this is what happens for my example: the decoder tries to retrieve and indexed value it has dropped from the indexing table, resulting in a KeyError.

The bug can be replayed by running "headers_sample.py" on the attached .har. The "MaxEntriesInTable" at line 385 should be changed to 40.

Best Regards,

Hervé Ruellan