Writting the Scrapy Main Page Tutorial in Conda

107 views
Skip to first unread message

Tim Fitzhardinge

unread,
Sep 28, 2016, 9:16:53 AM9/28/16
to scrapy-users

Hi

I'm new to Python and running python scripts.

I am using Anaconda

I wanted to run the scrapy tutorial on the home page of the website

https://scrapy.org

$ pip install scrapy
$ cat > myspider.py <<EOF
import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('h2.entry-title'):
            yield {'title': title.css('a ::text').extract_first()}

        next_page = response.css('div.prev-post > a        ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page),      callback=self.parse)
$ EOF

scrapy runspider myspider.py

I also tried replacing the first line with:

conda install -c scrapinghub scrapy=1.1.2

I saved a myspider.py on my user c drive folder. I copied the whole code however when I tried to run the script using python myspider.py in the anaconda prompt it does not work and returns with a syntax error.

What is the syntax of the tutorial that I should use for running in anaconda.

Thanks

Paul Tremberth

unread,
Sep 28, 2016, 9:24:01 AM9/28/16
to scrapy-users
Hi Tim,

what is the syntax error you are getting?
can you copy-paste the text from your console? (not only the last line but also lines before, the whole stacktrace)

Tim Fitzhardinge

unread,
Sep 30, 2016, 5:35:00 AM9/30/16
to scrapy-users
Hi Paul

And thank you for your reply

I have played around with the code a little (refer below) and revised the code to the follow. Attached or below is the screen shot from the Anaconda prompt below. It does not result in any syntax errors however I thought the code is meant to return results on the screen and nothing happen.

import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('h2.entry-title'):
            yield {'title': title.css('a ::text').extract_first()}

        next_page = response.css('div.prev-post > a ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)




Paul Tremberth

unread,
Oct 2, 2016, 4:09:29 AM10/2/16
to scrapy-users
Hi Tim,

Your myspider.py file defines a Spider subclass, and calling `python myspider.py` will not execute much except reading the class definition.

You need to run the spider using `scrapy runspider myspider.py`
which is the last line of the example on https://scrapy.org


Best,
Paul.


On Wednesday, September 28, 2016 at 3:16:53 PM UTC+2, Tim Fitzhardinge wrote:
Reply all
Reply to author
Forward
0 new messages