TOR middleware and httpcache middleware problem

92 views
Skip to first unread message

stygmate

unread,
Feb 11, 2012, 1:09:35 PM2/11/12
to scrapy-users
i'm writting a tor downloader middleware and i have a problem with
httpcache!

for not to be banned i change TOR identity every 1 min with a script.
but it can happen i'm banned for one ip... and the response is then
cached with a banned message.

is there a way of tracking banned message and retry the request with a
'no_cache' meta or something like that ?

Ben.

Максим Горковский

unread,
Feb 11, 2012, 11:52:42 PM2/11/12
to scrapy...@googlegroups.com
You should catch response in a middleware and in case you're banned - send TOR signal to change route. If you're interested how to do it, I could send you example in monday

2012/2/12 stygmate <styg...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.




--
С уважением,
Максим Горковский

stygmate

unread,
Feb 12, 2012, 7:37:29 AM2/12/12
to scrapy-users
that's what i do !

but in order:

the request is passed by my TORproxy_middleware then go thru all the
middlware to reach httpcache...
and backward response pass thru httpcache and reach my TORproxy
checking if a sentence containing the word "banned" is present...

that way if a banned page is reach the identity is renewed but the
page is cached with the contents of the "banned" page ...

i want page to be cached in gz format for offline scraping but i want
to clear cache from "banned" response and retry the request...

How ?

Ben.

On Feb 12, 5:52 am, Максим Горковский <ragzovs...@gmail.com> wrote:
> You should catch response in a middleware and in case you're banned - send
> TOR signal to change route. If you're interested how to do it, I could send
> you example in monday
>
> 2012/2/12 stygmate <stygm...@gmail.com>
Reply all
Reply to author
Forward
0 new messages