Need a little help with the scrapy tutorial...

141 views
Skip to first unread message

pike....@googlemail.com

unread,
Oct 16, 2013, 6:27:13 AM10/16/13
to scrapy...@googlegroups.com
Hi all,

I've just followed the tutorial and everything seemed to be going well until this section:

=================================================================

Using our item

Item objects are custom python dicts; you can access the values of their fields (attributes of the class we defined earlier) using the standard dict syntax like:

>>> item = DmozItem()
>>> item['title'] = 'Example title'
>>> item['title']
'Example title'

Spiders are expected to return their scraped data inside Item objects. So, in order to return the data we’ve scraped so far, the final code for our Spider would be like this:

...

=================================================================

If I run the first line in the shell, I get:

=================================================================

>>> item = DmozItem()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'DmozItem' is not defined
>>> 

=================================================================

Does anyone know what's happening here?  This is what my items.py file looks like.

=================================================================

$ cat tutorial/items.py
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

from scrapy.item import Item, Field

class DmozItem(Item):
    title = Field()
    link = Field()
    desc = Field()

=================================================================

And here is my dmoz_spider.py in case this helps.

=================================================================

$ cat tutorial/spiders/dmoz_spider.py
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import DmozItem

class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = [ "dmoz.org" ]
    start_urls = [
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources"
    ]

    def parse(self, response):
#    filename = response.url.split("/")[ -2 ]
#    open(filename, 'wb').write(response.body)
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//ul/li')
    items = []
    for site in sites:
       item = DmozItem()
       title = site.select('a/text()').extract()
       link = site.select('a/@href').extract()
       desc = site.select('text()').extract()
       items.append(item)
    return items

=================================================================

Cheers,
Andy

Jan Wrobel

unread,
Oct 16, 2013, 10:00:24 AM10/16/13
to scrapy...@googlegroups.com
Hello,

In the shell you also need to:
from tutorial.items import DmozItem

Cheers,
Jan
Reply all
Reply to author
Forward
0 new messages