Crawl fails because folder.items is not found

3,663 views
Skip to first unread message

cookiecaper

unread,
Sep 26, 2009, 3:53:30 PM9/26/09
to scrapy-users
Trying to get this working with scrapy:

python ./scrapy-ctl.py crawl byub.org

but I get:

$ python ./scrapy-ctl.py crawl byub.org
/usr/lib/python2.6/site-packages/twisted/python/filepath.py:12:
DeprecationWarning: the sha module is deprecated; use the hashlib
module instead
import sha
2009-09-26 13:47:01-0600 [-] Log opened.
2009-09-26 13:47:01-0600 [-] /usr/lib/python2.6/site-packages/twisted/
spread/pb.py:30: exceptions.DeprecationWarning: the md5 module is
deprecated; use hashlib instead
2009-09-26 13:47:01-0600 [-] /usr/lib/python2.6/site-packages/twisted/
mail/smtp.py:10: exceptions.DeprecationWarning: the MimeWriter module
is deprecated; use the email package instead
2009-09-26 13:47:01-0600 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/command/cmdline.py", line 132, in execute
scrapymanager.configure(control_reactor=True)
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/core/manager.py", line 74, in configure
spiders.load()
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/contrib/spidermanager.py", line 55, in load
for spider in self._getspiders(ISpider, module):
File "/usr/lib/python2.6/site-packages/scrapy-0.7.0_rc1-py2.6.egg/
scrapy/contrib/spidermanager.py", line 66, in _getspiders
allDropins = getCache(package)
--- <exception caught here> ---
File "/usr/lib/python2.6/site-packages/twisted/plugin.py", line
165, in getCache
provider = pluginModule.load()
File "/usr/lib/python2.6/site-packages/twisted/python/modules.py",
line 380, in load
return self.pathEntry.pythonPath.moduleLoader(self.name)
File "/usr/lib/python2.6/site-packages/twisted/python/reflect.py",
line 456, in namedAny
topLevelPackage = _importAndCheckStack(trialname)
File "/home/jeff/projects/scrapebyu/byub/byub/spiders/byub.py",
line 6, in <module>
from byub.items import ByubItem
exceptions.ImportError: No module named items

2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled extensions: LiveStats,
EngineStatus, CloseDomain, MemoryUsage, SchedulerQueue, WebConsole,
Spiderctl, TelnetConsole, StatsDump, CoreStats
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled scheduler middlewares:
DuplicatesFilterMiddleware
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled downloader middlewares:
HttpAuthMiddleware, DownloaderStats, UserAgentMiddleware,
RedirectMiddleware, DefaultHeadersMiddleware, CookiesMiddleware,
HttpCompressionMiddleware, RetryMiddleware
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled spider middlewares:
UrlLengthMiddleware, HttpErrorMiddleware, RefererMiddleware,
OffsiteMiddleware, DepthMiddleware
2009-09-26 13:47:01-0600 [byub] DEBUG: Enabled item pipelines:
2009-09-26 13:47:01-0600 [byub] ERROR: Could not find spider for
domain: byub.org
2009-09-26 13:47:01-0600 [-] scrapy.management.telnet.TelnetConsole
starting on 6023
2009-09-26 13:47:01-0600 [-] scrapy.management.web.WebConsole starting
on 6080
2009-09-26 13:47:01-0600 [scrapy.management.telnet.TelnetConsole]
(Port 6023 Closed)
2009-09-26 13:47:01-0600 [scrapy.management.web.WebConsole] (Port 6080
Closed)
2009-09-26 13:47:01-0600 [-] Main loop terminated.

I used scrapy to generate all files as prescribed in the tutorial.

I've tried tweaking PYTHONPATH and copying items.py to spiders/byub/
items.py.

Many thanks to everyone, scrapy is great. : )

I'd like to get this working quickly, so any help is greatly
appreciated.

Thanks
Signed
Jeff

Daniel Graña

unread,
Sep 26, 2009, 4:55:08 PM9/26/09
to scrapy...@googlegroups.com
Hi cookiecaper,

Your spider module is named the same as your scrapy project module, so
python is trying to import items relative to byub.py spider.

You are facing a common regret of python imports, see
http://www.python.org/dev/peps/pep-0328

quicks fixes:
* rename your spider module to byub_org.py or similar.
* or use from __future__ import absolute_import in byub.py spider.
* or rename your project to something like byubbot.

regards,
Daniel

cookiecaper

unread,
Oct 10, 2009, 6:13:56 AM10/10/09
to scrapy-users
That fixed it, thanks. : )

On Sep 26, 2:55 pm, Daniel Graña <dan...@gmail.com> wrote:
> Hi cookiecaper,
>
> Your spider module is named the same as your scrapy project module, so
> python is trying to import items relative to byub.py spider.
>
> You are facing a common regret of python imports, seehttp://www.python.org/dev/peps/pep-0328
Reply all
Reply to author
Forward
0 new messages