Trying to get this working with scrapy:
python ./scrapy-ctl.py crawl
byub.org
but I get:
$ python ./scrapy-ctl.py crawl
byub.org
/usr/lib/python2.6/site-packages/twisted/python/filepath.py:12:
DeprecationWarning: the sha module is deprecated; use the hashlib
module instead
import sha
2009-09-26 13:47:01-0600 [-] Log opened.
2009-09-26 13:47:01-0600 [-] /usr/lib/python2.6/site-packages/twisted/
spread/pb.py:30: exceptions.DeprecationWarning: the md5 module is
deprecated; use hashlib instead
2009-09-26 13:47:01-0600 [-] /usr/lib/python2.6/site-packages/twisted/
mail/smtp.py:10: exceptions.DeprecationWarning: the MimeWriter module
is deprecated; use the email package instead
2009-09-26 13:47:01-0600 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/command/cmdline.py", line 132, in execute
scrapymanager.configure(control_reactor=True)
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/core/manager.py", line 74, in configure
spiders.load()
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/contrib/spidermanager.py", line 55, in load
for spider in self._getspiders(ISpider, module):
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/contrib/spidermanager.py", line 66, in _getspiders
allDropins = getCache(package)
--- <exception caught here> ---
File "/usr/lib/python2.6/site-packages/twisted/plugin.py", line
165, in getCache
provider = pluginModule.load()
File "/usr/lib/python2.6/site-packages/twisted/python/modules.py",
line 380, in load
return self.pathEntry.pythonPath.moduleLoader(
self.name)
File "/usr/lib/python2.6/site-packages/twisted/python/reflect.py",
line 456, in namedAny
topLevelPackage = _importAndCheckStack(trialname)
File "/home/jeff/projects/scrapebyu/byub/byub/spiders/byub.py",
line 6, in <module>
from byub.items import ByubItem
exceptions.ImportError: No module named items
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled extensions: LiveStats,
EngineStatus, CloseDomain, MemoryUsage, SchedulerQueue, WebConsole,
Spiderctl, TelnetConsole, StatsDump, CoreStats
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled scheduler middlewares:
DuplicatesFilterMiddleware
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled downloader middlewares:
HttpAuthMiddleware, DownloaderStats, UserAgentMiddleware,
RedirectMiddleware, DefaultHeadersMiddleware, CookiesMiddleware,
HttpCompressionMiddleware, RetryMiddleware
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled spider middlewares:
UrlLengthMiddleware, HttpErrorMiddleware, RefererMiddleware,
OffsiteMiddleware, DepthMiddleware
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled item pipelines:
2009-09-26 13:47:01-0600 [byub] ERROR: Could not find spider for
domain:
byub.org
2009-09-26 13:47:01-0600 [-] scrapy.management.telnet.TelnetConsole
starting on 6023
2009-09-26 13:47:01-0600 [-] scrapy.management.web.WebConsole starting
on 6080
2009-09-26 13:47:01-0600 [scrapy.management.telnet.TelnetConsole]
(Port 6023 Closed)
2009-09-26 13:47:01-0600 [scrapy.management.web.WebConsole] (Port 6080
Closed)
2009-09-26 13:47:01-0600 [-] Main loop terminated.
I used scrapy to generate all files as prescribed in the tutorial.
I've tried tweaking PYTHONPATH and copying items.py to spiders/byub/
items.py.
Many thanks to everyone, scrapy is great. : )
I'd like to get this working quickly, so any help is greatly
appreciated.
Thanks
Signed
Jeff