Re: shell stops working after opening website

45 views
Skip to first unread message
Message has been deleted

Steven Almeroth

unread,
Nov 10, 2012, 2:09:49 PM11/10/12
to scrapy...@googlegroups.com
works for me:

2012-11-10 13:08:45-0600 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
2012-11-10 13:08:45-0600 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Enabled item pipelines: 
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6028
2012-11-10 13:08:46-0600 [scrapy] DEBUG: Web service listening on 0.0.0.0:6085
2012-11-10 13:08:46-0600 [default] INFO: Spider opened
[s] Available Scrapy objects:
[s]   hxs        <HtmlXPathSelector xpath=None data=u'<html xmlns="http://www.w3.org/1999/xhtm'>
[s]   item       {}
[s]   settings   <CrawlerSettings module=None>
[s]   spider     <BaseSpider 'default' at 0x3861210>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser
Python 2.7.3 (default, Aug  1 2012, 05:14:39) 
Type "copyright", "credits" or "license" for more information.

IPython 0.12.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: response.headers
Out[1]: 
{'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0',
 'Connection': 'close',
 'Content-Length': '8934',
 'Content-Type': 'text/html',
 'Date': 'Sat, 10 Nov 2012 19:08:46 GMT',
 'Expires': 'Mon, 26 Jul 1997 05:00:00 GMT',
 'Last-Modified': 'Sat, 10 Nov 2012 19:08:46 GMT',
 'Pragma': 'no-cache',
 'Server': 'Apache/2',
 'Set-Cookie': 'VMCHECK=deleted; expires=Fri, 11-Nov-2011 19:08:45 GMT',
 'Vary': 'Accept-Encoding,User-Agent',
 'X-Powered-By': 'PHP/5.2.5'}

In [2]: response.body[:500]
Out[2]: '<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n<style type="text/css">\n\ta.mainlevel:link, a.mainlevel:visited {\n\t  background: #dccadc;\n\t  border-bottom: 1px solid #722873;\n\t}\n\t\n\ta.mainlevel:hover {\n\t\tbackground: #722873;\n\t\tborder-bottom: 1px solid #4E1B4F;\n\t}\n\t\n\t#header {\n\t\twidth: 100%;\t\n\t}\n\t\n\t#logo {\n\t\tmargin: 0px 0px 10px 30px'


On Thursday, November 8, 2012 3:02:00 AM UTC-6, sander4000 wrote:
just started with scrapy ande made some scrapers but now shell crashes on this website????
i think it has to do with some cookie check?????

http://www.vandenbergsurf.nl/index.php?red=oud&page=shop.browse&category_id=28&option=com_virtuemart&Itemid=29

Pablo Hoffman

unread,
Nov 12, 2012, 9:37:57 AM11/12/12
to scrapy...@googlegroups.com
@sander4000 could it be that you have gotten blocked by the site?


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/sWxaJG7ooFAJ.

To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

flyer

unread,
Nov 12, 2012, 9:40:26 AM11/12/12
to scrapy...@googlegroups.com
You can try the following method:


And then in the scrapy shell:


Sometimes I encounted the same problem and this method will work.
--
宠辱不惊,闲看庭前花开花落;去留无意,漫随天边云卷云舒。


Reply all
Reply to author
Forward
0 new messages