Using Scrapy on local XML files

483 views
Skip to first unread message

Sayth Renshaw

unread,
Dec 19, 2015, 12:44:33 PM12/19/15
to scrapy-users
Hi

Relatively new to scrapy, how can i use scrapy to parse XML files from a local file system.


I have a relatively modest alteration of base scaffold.

import scrapy


class ScrapexmlItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    meeting = scrapy.Field()
    number = scrapy.Field()
    name = scrapy.Field()

and example.py

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class ScrapexmlItem(CrawlSpider):
    name = 'ScrapexmlItem'

    def __init__(self, filename=None):
        if filename:
            with open(filename, 'r') as f:
                self.start_urls = f.readlines()


    def parse(self, response):
        pass

then from root directory i am trying to run the spider with, below, which fails due to keykerror scrapy crawl MySpider -a filename=2015219RHIL0.xml

scrapy crawl MySpider -a filename=2015219RHIL0.xml


I have based my example.py on this SO post http://stackoverflow.com/a/17307762/461887  but I am not sure i am really approaching it in the correct way. I am hoping just to open and then use the xpath selectors in scrapy to put the data i want in a pipeline.


Is there a more default way to approach this in scrapy?

Cheers

Sayth

Tiago Katcipis

unread,
Dec 22, 2015, 12:43:37 PM12/22/15
to scrapy-users
Hi,


On Saturday, December 19, 2015 at 3:44:33 PM UTC-2, Sayth Renshaw wrote:
Hi

Relatively new to scrapy, how can i use scrapy to parse XML files from a local file system.

You could crawl a local URI, like file:///your/path/file.xml. That way you use scrapy exactly as it is used to crawl the web, but using a local file protocol (file).

I do this kind of stuff for test automation, works just fine.

Hope this helps you.

Sayth Renshaw

unread,
Dec 28, 2015, 5:33:28 PM12/28/15
to scrapy-users
Awesome thanks.
Reply all
Reply to author
Forward
0 new messages