[patch] URLField says all links to Wikipedia are invalid

2 views
Skip to first unread message

shi...@msrl.com

unread,
Aug 26, 2006, 10:43:53 PM8/26/06
to django-d...@googlegroups.com
Akismet thinks this bug is spam, so I cannot submit it to Trac.


A URLField will report that all links to en.wikipedia.org are invalid,
because urllib2, along with wget and libwww-perl, are blocked by
default.

http://mail.wikipedia.org/pipermail/wikitech-l/2003-December/019849.html

This is due to people writing poorly-designed bots, not due to it being
a violation of Wikipedia policy to access the site automatically, so
a good fix is to set the User-Agent header to indicate that Django is
making the request. A patch is attached; it's against 0.95 but this
also affects the trunk.

It would be nice if the Django version could be included in the
User-Agent, but I didn't see where it was accessible from the code.
--
Shields.

validators-patch

Ian Holsman

unread,
Aug 27, 2006, 6:18:58 PM8/27/06
to django-d...@googlegroups.com
I would be +1 on this if it included the site domain in the user-agent.
having it this way will just cause wikipedia to block it when a
single badly behaving django-bot uses it.

--I

> > --- django/core/validators.py.orig 2006-08-21 06:13:11.000000000 +0000
> +++ django/core/validators.py 2006-08-27 00:43:37.000000000 +0000
> @@ -203,8 +203,10 @@
>
> def isExistingURL(field_data, all_data):
> import urllib2
> + req = urllib2.Request(url=field_data)
> + req.add_header('User-Agent', 'Django/0.0')
> try:
> - u = urllib2.urlopen(field_data)
> + u = urllib2.urlopen(req)
> except ValueError:
> raise ValidationError, gettext("Invalid URL: %s") %
> field_data
> except urllib2.HTTPError, e:

--
Ian Holsman
I...@Holsman.net
join http://gypsyjobs.com the marketplace for django developers


Will McCutchen

unread,
Aug 28, 2006, 10:48:29 AM8/28/06
to Django developers
Ian Holsman wrote:
> I would be +1 on this if it included the site domain in the user-agent.
> having it this way will just cause wikipedia to block it when a
> single badly behaving django-bot uses it.

+1 on including the domain in the User Agent... good idea.

Tom Tobin

unread,
Aug 28, 2006, 11:03:17 AM8/28/06
to django-d...@googlegroups.com
On 8/26/06, shi...@msrl.com <shi...@msrl.com> wrote:
> Akismet thinks this bug is spam, so I cannot submit it to Trac.

If you're being locked out due to the spamfilter (and this goes for
anyone else), please email me with your IP (or IP range) or, if you
have a wide dynamic IP range, the longest part of your hostname that
will stay the same. I'll get you whitelisted ASAP.

Adrian Holovaty

unread,
Aug 28, 2006, 4:21:30 PM8/28/06
to django-d...@googlegroups.com
On 8/27/06, Ian Holsman <kry...@gmail.com> wrote:
> I would be +1 on this if it included the site domain in the user-agent.
> having it this way will just cause wikipedia to block it when a
> single badly behaving django-bot uses it.

Great idea, Ian. I agree that this patch should use the site domain in
the user-agent. However, that's a slight problem, because the patch is
to the validator framework, which knows nothing about Web requests.

We could change it to be a validator class, rather than a function,
taking the domain in its __init__() -- but that would be
backwards-incompatible. We could use settings.SITE_ID, but that relies
on that being set (and activated), plus it's coupled to the database
layer/site app. Any other ideas?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

Reply all
Reply to author
Forward
0 new messages