Writting the Scrapy Main Page Tutorial in Conda

Tim Fitzhardinge

unread,

Sep 28, 2016, 9:16:53 AM9/28/16

to scrapy-users

favorite

Hi

I'm new to Python and running python scripts.

I am using Anaconda

I wanted to run the scrapy tutorial on the home page of the website

https://scrapy.org

$ pip install scrapy
$ cat > myspider.py <<EOF
import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('h2.entry-title'):
            yield {'title': title.css('a ::text').extract_first()}

        next_page = response.css('div.prev-post > a        ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page),      callback=self.parse)
$ EOF

scrapy runspider myspider.py

I also tried replacing the first line with:

conda install -c scrapinghub scrapy=1.1.2

I saved a myspider.py on my user c drive folder. I copied the whole code however when I tried to run the script using python myspider.py in the anaconda prompt it does not work and returns with a syntax error.

What is the syntax of the tutorial that I should use for running in anaconda.

Thanks

Paul Tremberth

unread,

Sep 28, 2016, 9:24:01 AM9/28/16

to scrapy-users

Hi Tim,

what is the syntax error you are getting?
can you copy-paste the text from your console? (not only the last line but also lines before, the whole stacktrace)

Tim Fitzhardinge

unread,

Sep 30, 2016, 5:35:00 AM9/30/16

to scrapy-users

Hi Paul

And thank you for your reply

I have played around with the code a little (refer below) and revised the code to the follow. Attached or below is the screen shot from the Anaconda prompt below. It does not result in any syntax errors however I thought the code is meant to return results on the screen and nothing happen.

import scrapy


class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('h2.entry-title'):
            yield {'title': title.css('a ::text').extract_first()}

        next_page = response.css('div.prev-post > a ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)

Paul Tremberth

unread,

Oct 2, 2016, 4:09:29 AM10/2/16

to scrapy-users

Hi Tim,

Your myspider.py file defines a Spider subclass, and calling `python myspider.py` will not execute much except reading the class definition.

You need to run the spider using `scrapy runspider myspider.py`

which is the last line of the example on https://scrapy.org

You can read more about "runspider" at https://doc.scrapy.org/en/latest/topics/commands.html#std:command-runspider

Best,

Paul.

On Wednesday, September 28, 2016 at 3:16:53 PM UTC+2, Tim Fitzhardinge wrote:

Reply all

Reply to author

Forward