Response header case normalization

205 views
Skip to first unread message

Ryan Tomayko

unread,
Dec 10, 2008, 2:35:00 PM12/10/08
to rack-...@googlegroups.com
There was a bit of a dust up around response header name normalization here:

http://github.com/rtomayko/rack-contrib/commit/2af20#comments

Josh's ETag middleware checks if the "ETag" header is present before
adding one. The check looks like this:

if !headers.has_key?('ETag')

The problem here is that there's no guarantee that the header will be
cased as ETag; in fact, it's much more likely that it will be cased
"Etag". Rack::Utils::HeaderHash, which is used by Rack::Response,
converts header names to a naive camel case.

There was all kinds of disagreement on how this should be handled with
the following being offered up as The Right Way:

1. Headers should be wrapped in a case insensitive Hash implementation
before checking for key existence. Having each piece of middleware do
this individually strikes me as inefficient.

2. Middleware should assume that headers will be naive camel case and
check/get/set appropriately. This could be a bit brittle and people seem
to dislike using the weird case forms ("Etag", "Content-Md5", etc.).

3. Rack should not perform header name normalization and the convention
should be to use the header names as they're cased in RFC 2616 or other
canonical spec. This would require large and incompatible changes to
just about everything, though.

Some combination of these? I'm not sure.

Thanks,
Ryan

Christian Neukirchen

unread,
Dec 11, 2008, 7:20:01 AM12/11/08
to rack-...@googlegroups.com

How does WSGI do it?

> Thanks,
> Ryan

--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

Josh Peek

unread,
Dec 11, 2008, 10:54:02 AM12/11/08
to Rack Development
On Dec 11, 6:20 am, "Christian Neukirchen" <chneukirc...@gmail.com>
wrote:
> On Wed, Dec 10, 2008 at 8:35 PM, Ryan Tomayko <r...@tomayko.com> wrote:
>
> > There was a bit of a dust up around response header name normalization here:

"The response_headers argument is a list of (header_name,
header_value) tuples. It must be a Python list; i.e. type
(response_headers) is ListType, and the server may change its contents
in any way it desires. Each header_name must be a valid HTTP header
field-name (as defined by RFC 2616, Section 4.2), without a trailing
colon or other punctuation."

Doesn't say much about the case.

Pythons wgsi headers lib doesn't do any normalization.

>>> h = wsgiref.headers.Headers([])
>>> h.add_header('ETag', '123')
>>> str(h)
'ETag: 123\r\n\r\n'

Peter Zingg

unread,
Dec 11, 2008, 11:00:16 AM12/11/08
to Rack Development
Django tried to deal with this, too:

http://code.djangoproject.com/ticket/2970

At end of discussion:

(In [6546]) Changed the way we handle HTTP headers internally so that
they appear case-insensitive, but the original case is preserved for
output. This increases the chances of working with non-compliant
clients without changing the external interface. Fixed #2970.

Looking at change 6546, they actually do a mapping from a downcased
header name to the original-cased header name and value:

268 # _headers is a mapping of the lower-case name
to the original case of
269 # the header (required for working with legacy systems)
and the header
270 # value.
271 self._headers = {'content-type': ('Content-Type',
content_type)}

Good or bad idea?

James Tucker

unread,
Dec 11, 2008, 11:44:54 AM12/11/08
to rack-...@googlegroups.com

On 11 Dec 2008, at 16:00, Peter Zingg wrote:

>
> Django tried to deal with this, too:
>
> http://code.djangoproject.com/ticket/2970
>
> At end of discussion:
>
> (In [6546]) Changed the way we handle HTTP headers internally so that
> they appear case-insensitive, but the original case is preserved for
> output. This increases the chances of working with non-compliant
> clients without changing the external interface.

I agree with that sentiment entirely. I regularly deal with "legacy"
or 3rd party systems that exhibit such properties, and to have libs
get in the way is often very annoying.

> Fixed #2970.
>
> Looking at change 6546, they actually do a mapping from a downcased
> header name to the original-cased header name and value:
>
> 268 # _headers is a mapping of the lower-case name
> to the original case of
> 269 # the header (required for working with legacy systems)
> and the header
> 270 # value.
> 271 self._headers = {'content-type': ('Content-Type',
> content_type)}
>
> Good or bad idea?

Probably relatively efficient way of doing it, being two hash lookups
for each value set / retrieve isn't so bad. We could even symbolize
the lower case ones for MRI.

Matt Todd

unread,
Dec 11, 2008, 1:34:08 PM12/11/08
to rack-...@googlegroups.com
So...

NORMALIZED_HEADERS = {
  :"content-type" => "Content-Type",
  # etc
}

Then...

NORMALIZED_HEADERS[ header_in_question.downcase.to_sym ]

is used to determine the right header to use, right?

Could also provide a || to just use header_in_question if nothing is found.

Matt


--
Matt Todd
Highgroove Studios
www.highgroove.com
cell: 404-314-2612
blog: maraby.org

Scout - Web Monitoring and Reporting Software
www.scoutapp.com

Ryan Tomayko

unread,
Dec 12, 2008, 1:33:36 AM12/12/08
to rack-...@googlegroups.com
I took a crack at implementing the key case behavior discussed in the
header normalization thread. All of the specs pass with the exception of
those that tested for header case normalization. All specs in
rack-contrib and rack-cache also passed.

http://github.com/rtomayko/rack/commit/headernorm

Non-normalizing HeaderHash with case-insensitive lookups

This is a backwards incompatible change that removes header name
normalization while attempting to keep most of its benefits. The
header name case is preserved but the Hash has case insensitive
lookup, replace, delete, and include semantics.

Thanks,
Ryan

Joshua Peek

unread,
Dec 12, 2008, 10:29:04 AM12/12/08
to rack-...@googlegroups.com
+1 this is exactly what I want the Headers class to do.

--
Joshua Peek

Reply all
Reply to author
Forward
0 new messages