Hi Leonardo,
Thanks for the reply. However, I don't think that approach works with
proxies that require authentication (as my example shows). This is
what Pablo Hoffman stated in an earlier thread on the mailing list:
http://groups.google.com/group/scrapy-users/browse_frm/thread/9fa92a86d0af2835/
Perhaps Pablo could explain how I take my "user :
pa...@proxy.com:808"
string and use my spider with the proxy? For example, I don't think I
can just set Proxy-Authorization because the user/pass has to be
base-64 encoded first doesn't it?
Isn't there just a way I can set the environment variable or otherwise
pass my http_proxy string to urllib2?
Cheers!
On Feb 21, 6:50 pm, Leonardo Lazzaro <
lazzaroleona...@gmail.com>
wrote:
> you can use meta['proxy'] on each request or download middleware proxy
> (scrapy/contrib/downloadermiddleware/httpproxy.py)
>
> hope it helps
> leo
>
>
>
>
>
>
>
>
>
> On Tue, Feb 21, 2012 at 10:22 AM, Edward <
eddrobin...@gmail.com> wrote:
> > Hi,
>
> > I run all my spiders programatically using a scheduler and Amazon EC2
> > instances. Typically, I launch them using Fabric and an SSH
> > connection.
>
> > This means that I don't easily get the ability to set an environment
> > variable before my scheduler calls "scrapy crawl xxx" over a remote
> > shell.
>
> > I'm looking for a way to set http_proxy from within my spider? Some
> > notes:
>
> > - I've set HttpProxyMiddleware and confirmed everything works if I
> > manually set http_auth before running my spider manually from the
> > command line.
> > - The project I use has multiple spiders (I call different ones via
> > scrapy crawl xxx), but not all of them should use a proxy, as such I'm
> > looking for a solution I can do from within a spider class.
> > - My proxy requires authentication, I set http_proxy normally like
> >
user:p...@proxy.com:808