2012-11-10 13:08:45-0600 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
2012-11-10 13:08:45-0600 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Enabled item pipelines:
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6028 2012-11-10 13:08:46-0600 [scrapy] DEBUG: Web service listening on 0.0.0.0:6085 2012-11-10 13:08:46-0600 [default] INFO: Spider opened
[s] Available Scrapy objects:
[s] item {}
[s] settings <CrawlerSettings module=None>
[s] spider <BaseSpider 'default' at 0x3861210>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
Type "copyright", "credits" or "license" for more information.
IPython 0.12.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: response.headers
Out[1]:
{'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0',
'Connection': 'close',
'Content-Length': '8934',
'Content-Type': 'text/html',
'Date': 'Sat, 10 Nov 2012 19:08:46 GMT',
'Expires': 'Mon, 26 Jul 1997 05:00:00 GMT',
'Last-Modified': 'Sat, 10 Nov 2012 19:08:46 GMT',
'Pragma': 'no-cache',
'Server': 'Apache/2',
'Set-Cookie': 'VMCHECK=deleted; expires=Fri, 11-Nov-2011 19:08:45 GMT',
'Vary': 'Accept-Encoding,User-Agent',
'X-Powered-By': 'PHP/5.2.5'}
In [2]: response.body[:500]
Out[2]: '<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n<style type="text/css">\n\ta.mainlevel:link, a.mainlevel:visited {\n\t background: #dccadc;\n\t border-bottom: 1px solid #722873;\n\t}\n\t\n\ta.mainlevel:hover {\n\t\tbackground: #722873;\n\t\tborder-bottom: 1px solid #4E1B4F;\n\t}\n\t\n\t#header {\n\t\twidth: 100%;\t\n\t}\n\t\n\t#logo {\n\t\tmargin: 0px 0px 10px 30px'
On Thursday, November 8, 2012 3:02:00 AM UTC-6, sander4000 wrote:
just started with scrapy ande made some scrapers but now shell crashes on this website????
i think it has to do with some cookie check?????
http://www.vandenbergsurf.nl/index.php?red=oud&page=shop.browse&category_id=28&option=com_virtuemart&Itemid=29