Scrapy with https proxy

Oana Goga

unread,

Aug 25, 2011, 11:04:23 PM8/25/11

to scrapy-users, oana...@lip6.fr

Hi,

I am trying to use scrapy to access https web pages over a proxy and I have some problems getting it to work.
When I am trying to fetch/view https://www.paypal.com with scrapy I am getting the 501 error (Not Implemented), but when I fetch the page with wget everything is working well. Here are the steps that I am doing:

$ export http_proxy="http://us.proxymesh.com:31280" $ export https_proxy="http://us.proxymesh.com:31280"$ scrapy view https://www.paypal.com 2011-08-25 19:41:43-0700 [scrapy] INFO: Scrapy 0.12.0.2545 started (bot: nice_bot) 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled extensions: FeedExporter, TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled downloader middlewares: HttpProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlCanonicalizerMiddleware, UrlLengthMiddleware, DepthMiddleware 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Enabled item pipelines: 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2011-08-25 19:41:43-0700 [default] INFO: Spider opened 2011-08-25 19:41:43-0700 [scrapy] DEBUG: Cookie: None for https://www.paypal.com 2011-08-25 19:41:44-0700 [scrapy] INFO: Set-Cookie: [] from https://www.paypal.com 2011-08-25 19:41:44-0700 [default] DEBUG: Crawled (501) <GET https://www.paypal.com> (referer: None) 2011-08-25 19:41:44-0700 [default] INFO: Closing spider (finished) 2011-08-25 19:41:48-0700 [default] INFO: Spider closed (finished)

$ wget https://www.paypal.com --2011-08-25 19:44:08-- https://www.paypal.com/ Resolving us.proxymesh.com... 184.106.76.204 Connecting to us.proxymesh.com|184.106.76.204|:31280... connected. Proxy request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `index.html'

I have scrapy 0.12.0.2545 , twisted 11.0.0 and python 2.7.

After some investigation, it appears that scrapy instead of issuing a CONNECT method and then doing a GET it is only issuing a GET requests which causes the fetch to fail.

Do you have any idea why this happens and how it can be fixed?

Thanks,
Oana

Pablo Hoffman

unread,

Aug 26, 2011, 1:44:47 PM8/26/11

to scrapy...@googlegroups.com

https proxies are not supported yet. There's more information on this ticket:
http://dev.scrapy.org/ticket/159

> 2011-08-25 19:41:44-0700 [default] *DEBUG: Crawled (501) <GET
> https://www.paypal.com>* (referer: None)

> 2011-08-25 19:41:44-0700 [default] INFO: Closing spider (finished)
> 2011-08-25 19:41:48-0700 [default] INFO: Spider closed (finished)
>
>
> $ wget https://www.paypal.com
> --2011-08-25 19:44:08-- https://www.paypal.com/
> Resolving us.proxymesh.com... 184.106.76.204
> Connecting to us.proxymesh.com|184.106.76.204|:31280... connected.

> Proxy request sent, awaiting response*... 200 OK*

> Length: unspecified [text/html]
> Saving to: `index.html'
>
> I have scrapy 0.12.0.2545 , twisted 11.0.0 and python 2.7.
>
> After some investigation, it appears that scrapy instead of issuing
> a CONNECT method and then doing a GET it is only issuing a GET
> requests which causes the fetch to fail.
>
> Do you have any idea why this happens and how it can be fixed?
>
> Thanks,
> Oana
>
>
>
>
>
>
>

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>

Palash Jain

unread,

Jun 21, 2016, 4:05:22 AM6/21/16

to scrapy-users

Hi, could you get it to work?

I am facing the same issue, can't get it to work. Any help would be appreciated.

陈伟伟

unread,

Jun 21, 2016, 7:23:46 PM6/21/16

to scrapy-users, oana...@lip6.fr

在 2011年8月26日星期五 UTC+8上午11:04:23，Oana Goga写道：

Does Scrapy work with HTTP proxies?

Yes. Support for HTTP proxies is provided (since Scrapy 0.8) through the HTTP Proxy downloader middleware. See HttpProxyMiddleware.

Reply all

Reply to author

Forward