2014-03-19 14:49:24+0100 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-03-19 14:49:24+0100 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-03-19 14:49:24+0100 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-03-19 14:49:24+0100 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-03-19 14:49:25+0100 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-03-19 14:49:25+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-03-19 14:49:25+0100 [scrapy] INFO: Enabled item pipelines:
2014-03-19 14:49:25+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 2014-03-19 14:49:25+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2014-03-19 14:49:25+0100 [default] INFO: Spider opened
[s] Available Scrapy objects:
[s] item {}
[s] sel <Selector xpath=None data=u'<html itemscope="" itemtype="http://sche'> [s] settings <CrawlerSettings module=None>
[s] spider <Spider 'default' at 0x34220d0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
In [1]: import w3lib.url
In [2]: sel.css("#search ol > li.g > h3.r > a::attr(href)").extract()
Out[2]:
In [3]: [w3lib.url.url_query_parameter(u, "q") for u in sel.css("#search ol > li.g > h3.r > a::attr(href)").extract()]
Out[3]:
In [4]: