Header encoding

20 views
Skip to first unread message

Mark Nottingham

unread,
Nov 17, 2009, 7:16:56 PM11/17/09
to spdy...@googlegroups.com
Why is the number of headers important to convey in the header block? The control blocks are already length-delimited.

Also, disallowing multiple headers of the same name and using int16 as the maximum size for a header value profiles HTTP; although it's not common, some Set-Cookie and Cookie headers do get larger than 32K.

At the least, it should be unsigned short instead of int16; however, for better interoperability with the current Web, I'd allow multiple instances of a header (some private applications of HTTP do shove a *lot* of data around in headers, and I think this will become more common over time; it would be a shame to see "FooHeader1", "FooHeader2" and "FooHeader3" used as a workaround for this, no matter how much we dislike big headers on the open Internet).

Finally, there's a TODO in the header section to "Specify a string encoding." By necessity, this needs to be compatible with HTTP, which suggests but does not require one; i.e. a new header may come along and define the use of UTF-8, for example. See: <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/74> and <http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-08#section-3.2> (bottom of page 20).

--
Mark Nottingham http://www.mnot.net/

Mike Belshe

unread,
Nov 18, 2009, 2:44:00 AM11/18/09
to spdy-dev


On Nov 17, 4:16 pm, Mark Nottingham <m...@mnot.net> wrote:
> Why is the number of headers important to convey in the header block? The control blocks are already length-delimited.

Just for convenience.

>
> Also, disallowing multiple headers of the same name and using int16 as the maximum size for a header value profiles HTTP; although it's not common, some Set-Cookie and Cookie headers do get larger than 32K.  

Good point. we could increase to 32bits.

>
> At the least, it should be unsigned short instead of int16; however, for better interoperability with the current Web, I'd allow multiple instances of a header (some private applications of HTTP do shove a *lot* of data around in headers, and I think this will become more common over time; it would be a shame to see "FooHeader1", "FooHeader2" and "FooHeader3" used as a workaround for this, no matter how much we dislike big headers on the open Internet).

Multiple instances means you need to look through the whole list to
verify that there is no other. I'd prefer just increasing the
length. The number of bytes saved here is not consequential, and
simplicity is key.

>
> Finally, there's a TODO in the header section to "Specify a string encoding." By necessity, this needs to be compatible with HTTP, which suggests but does not require one; i.e. a new header may come along and define the use of UTF-8, for example. See: <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/74> and <http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-08#section...> (bottom of page 20).

It seems to me that UTF8 is good here. Any objection?

Mike

Bram Cohen

unread,
Nov 18, 2009, 2:57:05 PM11/18/09
to spdy-dev
On Nov 17, 11:44 pm, Mike Belshe <mbel...@chromium.org> wrote:
>
> It seems to me that UTF8 is good here.  Any objection?

All protocols should be assumed to be UTF-8 by default. Anything else
is insane.

-Bram

Robert

unread,
Nov 18, 2009, 3:03:32 PM11/18/09
to spdy-dev
Agree.

>
> -Bram

James A. Morrison

unread,
Nov 18, 2009, 3:28:06 PM11/18/09
to spdy...@googlegroups.com
I already updated the protocol document to specify utf8.

2009/11/18 Bram Cohen <br...@bitconjurer.org>:
--
Thanks,
Jim
http://phython.blogspot.com

Mark Nottingham

unread,
Nov 18, 2009, 5:29:16 PM11/18/09
to spdy...@googlegroups.com
If you define SPDY headers as UTF-8, you need to define how to work with HTTP headers that allow ISO-8859-1. Right now, that's anything with a comment production (e.g., Server, User-Agent, Via) or quoted-string (e.g., ETag, Warning, Cache-Control extensions).

You'll also need to specify how new HTTP headers that don't use UTF-8 (or a subset) are handled.

Neither of these cases are common, but they're not unknown either.

The potential problem here is that a HTTP<->SPDY gateway or even an API won't be aware of newly defined headers that don't use UTF-8, and therefore they won't know to transcode them. It's probably not going to happen often, but when it does, it's going to make some web dev have a bad day.

If SPDY is ever made into a standard, it also will need to describe its headers in relation to HTTP headers, which this will make more complex.

To be clear -- I agree that a new protocol should define textual elements in terms of UTF-8, but because SPDY leverages HTTP headers, it's not really a new protocol. I'm not sure what proclaiming HTTP-headers-inside-SPDY is buying us, except some headaches.

Cheers,

Mark Nottingham

unread,
Nov 18, 2009, 5:41:16 PM11/18/09
to spdy...@googlegroups.com

On 18/11/2009, at 6:44 PM, Mike Belshe wrote:
>
> On Nov 17, 4:16 pm, Mark Nottingham <m...@mnot.net> wrote:
>> Why is the number of headers important to convey in the header block? The control blocks are already length-delimited.
>
> Just for convenience.

I'm a little concerned about variation in implementations here. If one implementation uses the length to determine end of headers, and another uses the number of headers, it feels an awful lot like an HTTP response splitting attack...


>> At the least, it should be unsigned short instead of int16; however, for better interoperability with the current Web, I'd allow multiple instances of a header (some private applications of HTTP do shove a *lot* of data around in headers, and I think this will become more common over time; it would be a shame to see "FooHeader1", "FooHeader2" and "FooHeader3" used as a workaround for this, no matter how much we dislike big headers on the open Internet).
>
> Multiple instances means you need to look through the whole list to
> verify that there is no other. I'd prefer just increasing the
> length. The number of bytes saved here is not consequential, and
> simplicity is key.

The tradeoff here is that allowing multiple instances of a header makes it easy, for example, for an intermediary to append to a Via or X-Forwarded-For header without rewriting existing header lines.

A bad outcome of disallowing them would be if intermediaries started rewriting the header name to obscure it when they want to change the value, as is often done today with the Connection header.

James A. Morrison

unread,
Nov 18, 2009, 5:51:18 PM11/18/09
to spdy...@googlegroups.com
2009/11/18 Mark Nottingham <mn...@mnot.net>:
>
> On 18/11/2009, at 6:44 PM, Mike Belshe wrote:
>>
>> On Nov 17, 4:16 pm, Mark Nottingham <m...@mnot.net> wrote:
>>> Why is the number of headers important to convey in the header block? The control blocks are already length-delimited.
>>
>> Just for convenience.
>
> I'm a little concerned about variation in implementations here. If one implementation uses the length to determine end of headers, and another uses the number of headers, it feels an awful lot like an HTTP response splitting attack...

There is difference. The length is of the entire frame and most of
the frames content is compressed. So I can think of a couple cases
here:
1) There is more headers than NV entries indicates. If this happens,
I think that all the other headers should be ignored, or the stream
should be closed with a protocol error. Either way, I don't see
anything bad here.
2) There are less name value pairs (headers) than NV entries
indicates. This should be the same as having junk data, and the
stream should be closed with a protocol error.

NV entries is not used for framing at all since it isn't actually a
length of the headers, but simply the number of headers that exists.

>
>>> At the least, it should be unsigned short instead of int16; however, for better interoperability with the current Web, I'd allow multiple instances of a header (some private applications of HTTP do shove a *lot* of data around in headers, and I think this will become more common over time; it would be a shame to see "FooHeader1", "FooHeader2" and "FooHeader3" used as a workaround for this, no matter how much we dislike big headers on the open Internet).
>>
>> Multiple instances means you need to look through the whole list to
>> verify that there is no other.  I'd prefer just increasing the
>> length.  The number of bytes saved here is not consequential, and
>> simplicity is key.
>
> The tradeoff here is that allowing multiple instances of a header makes it easy, for example, for an intermediary to append to a Via or X-Forwarded-For header without rewriting existing header lines.
>
> A bad outcome of disallowing them would be if intermediaries started rewriting the header name to obscure it when they want to change the value, as is often done today with the Connection header.

Humm, perhaps we should have some way of saying skip this NV pair. I
don't think this is really useful since we are requiring that any
intermediary uncompress the NV pairs to read and alter them and to
recompress them when writing them back out.

>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
>



Bram Cohen

unread,
Nov 18, 2009, 5:53:29 PM11/18/09
to spdy-dev
On Nov 18, 2:29 pm, Mark Nottingham <m...@mnot.net> wrote:
> If you define SPDY headers as UTF-8, you need to define how to work with HTTP headers that allow ISO-8859-1. Right now, that's anything with a comment production (e.g., Server, User-Agent, Via) or quoted-string (e.g., ETag, Warning, Cache-Control extensions).
>
> You'll also need to specify how new HTTP headers that don't use UTF-8 (or a subset) are handled.

That's easy, they can be decoded from ISO-8859-1 into unicode, then re-
encoded using UTF-8.

SPDY headers are not HTTP headers, except to the extent that the
client might not know which it's speaking during initial handshake,
where they have to form a pun, and at least one handshake proposal
being discussed has that be clarified at the SSL layer. This would
seem to be a fairly good reason to use that method.

I would also really, really like to see SPDY headers be \n delimited
instead of HTTP's idiotic historical \r\n delimiting. That one just
bugs me.

-Bram

Mark Nottingham

unread,
Nov 18, 2009, 6:11:52 PM11/18/09
to spdy...@googlegroups.com
SPDY's headers are length-delimited, not character-delimited. HTTP already says that /n is an acceptable delimiter.

If SPDY headers are not HTTP headers, where are their syntax and semantics defined? :)

Mark Nottingham

unread,
Nov 18, 2009, 6:15:38 PM11/18/09
to spdy...@googlegroups.com
There's another whole discussion to be had about header compression, I think; as you indicate, it makes it difficult for intermediaries to inspect and manipulate headers, which kind of obviates one of the main motivations for separating headers from content.

Instead of compressing the entire header block, it would be nice to selectively compress individual header values, and perhaps use a well-known dictionary or other technique for compressing the header names. That way, an intermediary can much more easily pick out the values interesting to them, without decompressing and recompressing the whole header block.

Roberto Peon

unread,
Nov 18, 2009, 6:21:30 PM11/18/09
to spdy...@googlegroups.com
It doesn't make inspection hard, but it makes modification expensive.
One of the things we are likely to revisit is the compression-type used. gzip was chosen for simple expedience-- it is widely available and we know how to use it.
Speaking as a maintainer of an intermediary, the idea of having to maintain a new gzip stream all the time is *bleh*.
-=R

Mike Belshe

unread,
Nov 18, 2009, 7:25:14 PM11/18/09
to spdy...@googlegroups.com
As much as I would prefer UTF-8 over Latin1, I can't see a good reason for SPDY to switch to UTF-8.  Since the upper layer is going to be HTTP (as much as possible), having a conversion has no value add and certainly doesn't help with performance.

Mark is convincing me that Latin1 is the right answer.

Mike

Mike Belshe

unread,
Nov 18, 2009, 7:27:33 PM11/18/09
to spdy...@googlegroups.com
We could make the headers smaller by separating channel and request headers.  I'm not sure we'll see any performance gain (in fact it might be negative if poorly implemented).

Mike

Mark Nottingham

unread,
Nov 18, 2009, 8:36:49 PM11/18/09
to spdy...@googlegroups.com
I think that's a really interesting direction to explore... in HTTP/1.1 terms, it's separating the entity/representation headers out from the others. May also be useful for signing / encryption as well.

Bram Cohen

unread,
Nov 19, 2009, 3:20:44 AM11/19/09
to spdy-dev
On Nov 18, 4:25 pm, Mike Belshe <mbel...@google.com> wrote:
> As much as I would prefer UTF-8 over Latin1, I can't see a good reason for
> SPDY to switch to UTF-8.  Since the upper layer is going to be HTTP (as much
> as possible), having a conversion has no value add and certainly doesn't
> help with performance.

It would be kinda nice to make the option of using unicode headers in
the future. Then again, it would also be nice to stop the retarded
practice of browsers handing the url over the wire in encoded form.
It's probably the way it is now for the dumb reason - the current
method is the one which offers the least opportunity for browser
implementers to screw things up.

I'm not sure what you mean by the upper layer being HTTP as much as
possible. Some parts of spdy look like http as much as possible to
ease the transition, but that's rather different from being http - it
isn't even clear that a single line of spdy headers will be parsable
as http headers in the end, nor is it clear that there's any problem
with that.
Reply all
Reply to author
Forward
0 new messages