Thanks for looking at this Chris. I spent some time on this today and
found that my problems boiled down to a couple of issues:
>> Following Redirects
- I found out how to use the curl '-L' option to follow redirects
- This option generates 'extra' 301 headers that cause FakeWeb to fail
- The non-200 headers can be removed, and FakeWeb will work
(hint - use "string.gsub(/^[\s\S]+^HTTP\/1\.0 200/,'HTTP/1.0 200')")
- I also configured Mechanize to follow redirects, just to be safe
>> User Agent
- I found that some websites don't work if they don't see a "User
Agent" header
- I used the curl option '-A "Mac Safari"' and that did the trick
- I also configured Mechanize to use the same User Agent header
>> FakeWeb Bug
- I think FakeWeb crashes if it sees a URL that ends in ':80/'
- I couldn't figure out how to work around this
>> It works!
- I manually applied your github commit and I made the changes above
- Then everything worked on both cygwin and linux
Looks like I don't have a way to attach my revised test script.
So I will paste it here - apologies in advance for non-brevity...
#!/usr/bin/env ruby
require 'rubygems'
require 'fakeweb'
require 'mechanize'
class FwTest
def initialize
@domains = %w(google bbcnews nytimes yahoo slashdot fandango
vitalist)
@agent = WWW::Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
a.follow_meta_refresh = true
end
end
def test_urls
@domains.inject([]) do |output, dom|
output << "http://#{dom}.com"
output << "http://#{dom}.com/"
output << "http://#{dom}.com:80"
# this causes fakeweb to crash
#output << "http://#{dom}.com:80/}"
output << "
http://www.#{dom}.com"
output << "
http://www.#{dom}.com/"
output << "
http://www.#{dom}.com:80"
# this causes fakeweb to crash
#output << "
http://www.#{dom}.com:80/}"
end
end
def test_without_fw
FakeWeb.allow_net_connect = true
test_urls.each do |url|
begin
@agent.get url
puts " w/o FakeWeb:WORKS> #{url}"
rescue
puts " w/o FakeWeb:FAILS> #{url}"
end
end
end
def test_with_fw
FakeWeb.allow_net_connect = false
cf = 'cache.htm'
cc = 1
test_urls.each do |url|
cache_data = strip_non_200_headers(`curl -is -A 'Mac Safari' -L #
{url}`)
File.open(cf, 'w') {|out| out.puts cache_data}
FakeWeb.register_uri(:get, url, :response => cf)
FakeWeb.registered_uri?(:get, url)
begin
@agent.get url
puts "With FakeWeb:WORKS> #{url}"
rescue
datfile = "#{cc}_#{cf}"
File.rename(cf, datfile)
puts "With FakeWeb:FAILS> #{url.ljust(25)} (Cache File: #
{datfile})"
cc += 1
end
end
end
def strip_non_200_headers(string)
string.gsub(/^[\s\S]+^HTTP\/1\.0 200/,'HTTP/1.0 200')
end
end
if $0 == __FILE__
x = FwTest.new
#x.test_without_fw
x.test_with_fw
end
On Apr 11, 8:44 pm, Chris Kampmeier <
ChrisGKampme...@gmail.com> wrote:
> OK, I did a little investigation:
>
> FakeWeb had a bug where trailing slashes were considered significant
> for requests to the root of a domain, so e.g.
http://example.com/andhttp://example.comwere considered different URLs. This has been
> fixed:
http://github.com/chrisk/fakeweb/commit/c98f1d8f8643449e035ce0204d878...
> That'll make it into the next gem release. Thanks for bringing it to
> my attention!
>
> After that change, your test script works for me with two exceptions,
> which seem legitimate. First, Google responds with a 301 to GEThttp://
google.com, with the Location header set tohttp://
www.google.com/.
> Mechanize tries to follow that redirect, but you hadn't registered
> that URI yet, so you get a FakeWeb::NetConnectNotAllowedError.
>
> Second, Google seems to check the user-agent (or perhaps something
> more complicated) for requests tohttp://
www.google.com/news. It
> >
http://www.google.com/newshttp://google.com:80http://google.com:80/)....