Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: http error 301 for urlopen

118 views
Skip to first unread message

D'Arcy J.M. Cain

unread,
Nov 7, 2010, 8:51:50 PM11/7/10
to Wenhuan Yu, pytho...@python.org
On Sun, 7 Nov 2010 19:30:23 -0600
Wenhuan Yu <yuwe...@gmail.com> wrote:
> I tried to open a link with urlopen:
>
> import urllib2
> alink = "
> http://feeds.nytimes.com/click.phdo?i=ff074d9e3895247a31e8e5efa5253183"
> f = urllib2.urlopen(alink)
> print f.read()
>
> and got the followinig error:
>
> urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error
> that would lead to an infinite loop.
> The last 30x error message was:
> Moved Permanently
>
> I can open the link in browser. Any way to get solve this? Thanks.

I checked with my tools and was told that it redirects more than five
times. Maybe it's not infinite but too many for urlopen. Or, maybe
the browser just ignores the extra redirects and the part of the page
with the redirects isn't critical for viewing it. I think that you are
going to have to investigate the HTML manually and follow all the
individual links to find the problem. You may have to put in a bug
request with the New York Times. Good luck with that.

--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

Nobody

unread,
Nov 7, 2010, 9:41:29 PM11/7/10
to
On Sun, 07 Nov 2010 20:51:50 -0500, D'Arcy J.M. Cain wrote:

>> urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error
>> that would lead to an infinite loop.
>> The last 30x error message was:
>> Moved Permanently
>>
>> I can open the link in browser. Any way to get solve this? Thanks.
>
> I checked with my tools and was told that it redirects more than five
> times. Maybe it's not infinite but too many for urlopen.

The default value of urllib2.HTTPRedirectHandler.max_redirections is 10.
Setting it to 11 allows the request to complete.


John Nagle

unread,
Nov 8, 2010, 1:28:10 AM11/8/10
to
On 11/7/2010 5:51 PM, D'Arcy J.M. Cain wrote:
> On Sun, 7 Nov 2010 19:30:23 -0600
> Wenhuan Yu<yuwe...@gmail.com> wrote:
>> I tried to open a link with urlopen:
>>
>> import urllib2
>> alink = "
>> http://feeds.nytimes.com/click.phdo?i=ff074d9e3895247a31e8e5efa5253183"
>> f = urllib2.urlopen(alink)
>> print f.read()
>>
>> and got the followinig error:
>>
>> urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error
>> that would lead to an infinite loop.
>> The last 30x error message was:
>> Moved Permanently
>>
>> I can open the link in browser. Any way to get solve this? Thanks.
>
> I checked with my tools and was told that it redirects more than five
> times. Maybe it's not infinite but too many for urlopen. Or, maybe
> the browser just ignores the extra redirects and the part of the page
> with the redirects isn't critical for viewing it. I think that you are
> going to have to investigate the HTML manually and follow all the
> individual links to find the problem. You may have to put in a bug
> request with the New York Times. Good luck with that.

It's the New York Times' paywall. They're trying to set a cookie,
and will redirect the URL until you store and return the cookie.

John Nagle

Lawrence D'Oliveiro

unread,
Nov 8, 2010, 9:10:24 PM11/8/10
to
In message <4cd7987e$0$1674$742e...@news.sonic.net>, John Nagle wrote:

> It's the New York Times' paywall. They're trying to set a cookie,
> and will redirect the URL until you store and return the cookie.

And if they find out you’re acessing them from a script, they’ll probably
try to find a way to block that as well.

Hans-Peter Jansen

unread,
Nov 10, 2010, 11:03:03 AM11/10/10
to pytho...@python.org

..which could be alleviated by carefully crafting the requests ;-)

Luckily, unpleasant related ground work was already done by others,
e.g.: http://bugs.python.org/issue2275

Pete

0 new messages