Hi all,
I've just followed the tutorial and everything seemed to be going well until this section:
=================================================================
Using our item
Item objects are custom python dicts; you can access the
values of their fields (attributes of the class we defined earlier) using the
standard dict syntax like:
>>> item = DmozItem()
>>> item['title'] = 'Example title'
>>> item['title']
'Example title'
Spiders are expected to return their scraped data inside
Item objects. So, in order to return the data we’ve
scraped so far, the final code for our Spider would be like this:
...
=================================================================
If I run the first line in the shell, I get:
=================================================================
>>> item = DmozItem()
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'DmozItem' is not defined
>>>
=================================================================
Does anyone know what's happening here? This is what my items.py file looks like.
=================================================================
$ cat tutorial/items.py
# Define here the models for your scraped items
#
# See documentation in:
#
http://doc.scrapy.org/en/latest/topics/items.htmlfrom scrapy.item import Item, Field
class DmozItem(Item):
title = Field()
link = Field()
desc = Field()
=================================================================
And here is my dmoz_spider.py in case this helps.
=================================================================
$ cat tutorial/spiders/dmoz_spider.py
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import DmozItem
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = [ "
dmoz.org" ]
start_urls = [
"
http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"
http://www.dmoz.org/Computers/Programming/Languages/Python/Resources"
]
def parse(self, response):
# filename = response.url.split("/")[ -2 ]
# open(filename, 'wb').write(response.body)
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li')
items = []
for site in sites:
item = DmozItem()
title = site.select('a/text()').extract()
link = site.select('a/@href').extract()
desc = site.select('text()').extract()
items.append(item)
return items
=================================================================
Cheers,
Andy