AsyncHTTPClient() problems with User-Agent and some Urls

504 views
Skip to first unread message

Fabio[mbutubuntu]Buda

unread,
Sep 27, 2011, 9:59:15 AM9/27/11
to Tornado Web Server
Hello everybody, I'm getting problems fetching some URLs... problems
seem caused by some User-Agent protection (some websites/hosting-
companies use them) even because curl works fine with all urls, also
with tornado not-working ones!

I've tried to manually add the same curl User-Agent in httpclient
headers{"User-Agent": "curl/7.21.0 (x86_64-unknown-linux-gnu) libcurl/
7.21.0 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5"} but I always get
tornado.httpclient.HTTPError: HTTP 403: Forbidden.

the question is: does tornado replaces User-Agent header? is there any
way to fix this problem?

P.S. this is one of the Url
http://johannburkard.de/blog/www/spam/The-top-10-spam-bot-user-agents-you-MUST-block-NOW.html

Best Regards,
Fabio Buda Web Developer/Designer @ netdesign

Joe Bowman

unread,
Sep 27, 2011, 10:15:26 AM9/27/11
to python-...@googlegroups.com
This is how I set the user agent and other headers

        req = tornado.httpclient.HTTPRequest(
                request_url,
                user_agent = "unscatter.com search request",
                headers = {'X-Chomp-API-Key': self.settings["chomp_api_key"]}
                )

Fabio[mbutubuntu]Buda

unread,
Sep 27, 2011, 10:15:25 AM9/27/11
to Tornado Web Server
A little clarification, also curl doesn't get that URL... gets
forbidden, do you know any way to solve this?

On Sep 27, 3:59 pm, "Fabio[mbutubuntu]Buda" <mbutubu...@yahoo.it>
wrote:
> Hello everybody, I'm getting problems fetching some URLs... problems
> seem caused by some User-Agent protection (some websites/hosting-
> companies use them) even because curl works fine with all urls, also
> with tornado not-working ones!
>
> I've tried to manually add the same curl User-Agent in httpclient
> headers{"User-Agent": "curl/7.21.0 (x86_64-unknown-linux-gnu) libcurl/
> 7.21.0 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5"} but I always get
> tornado.httpclient.HTTPError: HTTP 403: Forbidden.
>
> the question is: does tornado replaces User-Agent header? is there any
> way to fix this problem?
>
> P.S. this is one of the Urlhttp://johannburkard.de/blog/www/spam/The-top-10-spam-bot-user-agents...

Joe Bowman

unread,
Sep 27, 2011, 10:21:22 AM9/27/11
to python-...@googlegroups.com
I'd work with the service provider first. If they're blocking on user agents (or something else) they may have a reason.

But you can use my sample above to see how to set the user agent on requests.

Lorenzo Bolla

unread,
Sep 27, 2011, 10:37:22 AM9/27/11
to python-...@googlegroups.com
This works:
curl -A "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.220 Safari/535.1" "http://johannburkard.de/blog/www/spam/The-top-10-spam-bot-user-agents-you-MUST-block-NOW.html"

L.

Fabio[mbutubuntu]Buda

unread,
Sep 27, 2011, 10:56:40 AM9/27/11
to Tornado Web Server
sorry Joe, but what self.settings["chomp_api_key"] is?
Reply all
Reply to author
Forward
0 new messages