transfer-encoding issue

21 views
Skip to first unread message

Mark Menard

unread,
Jan 26, 2009, 3:06:52 PM1/26/09
to FakeWeb
I'm trying to use FakeWeb to test an app that uses mechanize.
Mechanize seems to analyze the headers of a request. I have created
a :response as follows:

HTTP/1.1 200 OK
Date: Sun, 25 Jan 2009 05:52:18 GMT
Server: Apache/2.0.52 (CentOS)
X-Powered-By: PHP/4.3.9
Set-Cookie: PHPSESSID=3554a5b423570dedb27338052c459158; expires=Tue,
17-Feb-2009 09:25:38 GMT; path=/
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

<head>
<title>Vita Rara: A Life Uncommon</title>
</head>

<body>
Hi
</body>
</html>

This result was somewhat created using curl. (I trimmed down the size
of the html doc for testing.)

The test script is as follows:

require 'rubygems'
require 'fake_web'
require 'mechanize'

FakeWeb.register_uri(:any, 'http://www.vitarara.org/', :response =>
'test.html')

result = Net::HTTP.post_form(URI.parse('http://www.vitarara.org/'),
{ })

result.each_header do |key,value|
puts "#{key} : #{value}"
end

puts result.body

This dies in the each_header loop because the transfer-encoding header
is in the result, but it's value is nil.

I looked in the latest source from chrisk and in responder.rb line 57
there is a method of transfer-encoding and an eval block that seems to
get it to nil. I tried commenting out this line, but it didn't help
matters.

Questions:

Why is transfer-encoding being nil'ed?

Any thoughts on the issue?

Mark

Mark Menard

unread,
Jan 26, 2009, 4:20:29 PM1/26/09
to FakeWeb
I'm pretty sure I have a fix for this. I pulled out the header, then
nil'ed as per the existing code, and then reset it after the read was
done. If the header is set when the result is read it throws an
exception. It seems a bit hack'ish, but then the whole thing is kind
of playing around with the guts of Net::HTTP anyway.

I added a test to check the header, and all existing tests pass.

Mark

Chris Kampmeier

unread,
Jan 26, 2009, 11:04:34 PM1/26/09
to FakeWeb
Huh, interesting. I'm not sure why it works that way in the current
code; I'll take a look at your patch in a bit. I'm also using
mechanize, here, albeit not with FakeWeb yet. I'll also try that and
see if I experience the same problem.

Thanks!

Chris

Chris Kampmeier

unread,
Jan 31, 2009, 9:45:45 PM1/31/09
to FakeWeb
Well, I applied your transfer-encoding patches for now. I have some
thoughts on this, read on.

As for the original behavior: I'm guessing that it was setting the
transfer-encoding to nil because Blaine designed it to work with the
output of `curl -is`.

There are two problems here:
* libcurl knows about chunked encoding, and decodes it for you (which
is reasonable: chunked responses are a transport-layer detail, and no
one wants to deal with that... that's what curl is for!). And, when
you pass -i to curl, it prints the Transfer-Encoding header if it
exists, even though libcurl decoded the chunks. Both behaviors seem
reasonable in isolation; the combination is definitely a funny edge-
case.
* Net::HTTP supports chunked-encoding (or at least my copy, 1.8.6p111,
does), so you can't give it a response to parse with a "Content-
Encoding: chunked" header and an already-decoded body... it wants to
decode the chunks itself, since it thinks it's on the network. It'll
either raise a HTTPBadResponse or an EOFError if you try it.

So to get it to work with recorded responses from `curl -is`, Blaine
stripped that header from the baked response, so that Net::HTTP
wouldn't try to decode the chunks that had already been decoded by
curl.

I pasted some example request/responses here, to prove that `curl -is`
is decoding the chunks and leaving the header in: http://pastie.org/376361

As you can see, I opened a couple raw HTTP sessions against www.google.com:80.
When I specified HTTP/1.1 in the request, the response body correctly
came back chunked. (Below that, there's an HTTP/1.0 request, which
isn't chunked.) curl uses HTTP/1.1, and decodes the chunks, but leaves
in the header.

So it seems like there are two options:
* have people stop using `curl -is`, so that chunked responses are
actually stored with the chunks they're supposed to have; and remove
the original transfer-encoding code (and your patch) altogether;
* or keep the current "curl -is" support, but delete the transfer-
encoding header altogether, if it exists (instead of setting it to
nil), so it matches the body. That should keep Mechanize happy.

Also, note: this isn't just a chunking issue; the other transfer-
encodings (deflate, compress, etc.) all have the exact same problem.

I think I'm more interested in approach #2... it seems to be the most
pragmatic. If we did #1, we'd need to provide a tool for users to
record responses in the proper fashion without using curl, and
probably raise if a recorded response is used that has a Transfer-
Encoding header but an already-decoded body, since it'd be invalid.

Hope this helps,
Chris Kampmeier
Reply all
Reply to author
Forward
0 new messages