limit to amount of itmes you can hit a server??

154 views
Skip to first unread message

Tom

unread,
Jul 23, 2012, 4:44:51 PM7/23/12
to beauti...@googlegroups.com
Hello,
        I recently got my code to work hitting a yahoo owned website... (www.rivals.com) However after around 1000 hits,  it all the sudden crashes or stops and I get this error.....

Traceback (most recent call last):
  File "C:\Users\Tom\Documents\Python\bs4final.py", line 42, in <module>
    main()
  File "C:\Users\Tom\Documents\Python\bs4final.py", line 15, in main
    page = urllib2.urlopen(request)
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 406, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 999: Unable to process request at this time -- error 999
>>>

PLEASE tell my that there is a way around this issue... if it is server hits related! or is there a limit inside bs4 somehwere?

Thanks,
Tom

Link Swanson

unread,
Jul 23, 2012, 5:15:32 PM7/23/12
to beauti...@googlegroups.com
Looks like you are being rate limited. If you can determine the time window of the rate limit, you can build error handling into your urllib2 calls so that Python will wait until the limit is lifted, or try the call over again through a proxy server until the proxy gets rate-limited ... 


--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beautifulsoup/-/czHhafYyEqQJ.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.



--
Link Swanson
Must Build Digital


Tom Booth

unread,
Jul 23, 2012, 7:26:00 PM7/23/12
to beauti...@googlegroups.com
Hey link, which would u recommend? Im looking into twisted right now but is one option better than the other?

Sent from my iPhone

Link Swanson

unread,
Jul 24, 2012, 10:31:16 AM7/24/12
to beauti...@googlegroups.com
It depends on how they limit you. If your limit is reset on the hour every hour (like Amazon), then you can just wrap the urllib2 calls in a try ... not tested (and probably not the best way; I'm a noob) something like this:

            try:
                response = opener.open(url)
                break             
            except HTTPError, e:                                     
                time.sleep(600)    # adjust this to match the time it takes for your limit to be reset

If you have access to a good proxy, you can build a urllib2 opener with proxy handling  http://docs.python.org/library/urllib2.html 

And use your non-proxy until it gets limited, then switch to the proxy:

            try:
                response = opener.open(url)
                break             
            except HTTPError, e:                                     
                response = proxyopener.open(url)

Again, this is probably naive, but I have had success handling rate limits in this way. 

Link
Reply all
Reply to author
Forward
0 new messages