Hi
Relatively new to scrapy, how can i use scrapy to parse XML files from a local file system.
I have a relatively modest alteration of base scaffold.
import scrapy
class ScrapexmlItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
meeting = scrapy.Field()
number = scrapy.Field()
name = scrapy.Field()
and example.py
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class ScrapexmlItem(CrawlSpider):
name = 'ScrapexmlItem'
def __init__(self, filename=None):
if filename:
with open(filename, 'r') as f:
self.start_urls = f.readlines()
def parse(self, response):
pass
then from root directory i am trying to run the spider with, below, which fails due to keykerror scrapy crawl MySpider -a filename=2015219RHIL0.xml
scrapy crawl MySpider -a filename=2015219RHIL0.xml
I have based my example.py on this SO post
http://stackoverflow.com/a/17307762/461887 but I am not sure i am really approaching it in the correct way. I am hoping just to open and then use the xpath selectors in scrapy to put the data i want in a pipeline.
Is there a more default way to approach this in scrapy?
Cheers
Sayth