Hello. I have searched for this but i haven't found any satisfactory answer:
This is my code. I found some replies that say this should work.
But when i run it as part of my spider i get an infinite loop that apparently get stuck on the line that prints rt.url .
I suspect that there may be a problem with the priority of middlewares but i am not sure.
from auctionzip.settings import USER_AGENT_LIST
import random
from scrapy import log
class RandomUserAgentMiddleware(object):
def process_request(self, request, spider):
ua = random.choice(USER_AGENT_LIST)
rt = request.replace(url = tmp)
#if ua:
# request.headers.setdefault('User-Agent', ua)
print rt.url
return rt
#log.msg('>>>> UA %s'%request.headers)
def process_response(self, request, response, spider):
print "in response handler"
return response
This is the order of my middlewares:
DOWNLOADER_MIDDLEWARES = {
'auctionzip.middlewares.random_user_agent.RandomUserAgentMiddleware': 100,
#'wines_crawler.middlewares.tor_anonymizer.TorMiddleWare' : 401,
#'wines_crawler.middlewares.random_proxy.ProxyMiddleware' : 401,
#'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware': 402,
'scrapy.contrib.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware' : 600,
#'wines_crawler.middlewares.retry_change.RetryTORChangeProxyMiddleWare' : 600,
#'wines_crawler.middlewares.retry_change.RetryChangeProxyMiddleware' : 600,
}
If anyone has some idea or can point me in the right direction it would be very much appreciated,
Thank you.