Running pyspider

Skip to first unread message

En Ware

Dec 5, 2018, 10:03:09 AM12/5/18
to pyspider-users

   I used to run scrapy but found out I need message queues and also I like what pyspider has to offer. Here are my questions. 

1. When I try to run the imdb tutorial I push run , which I get a follow then I push play and it seems the screen goes grey but it doesn't do anything. I am running localhost on port 5000 and almost like its running in a loop? What I am doing wrong? 

I am using a virutalenv using pypy3.5 version, can I not use pypy3.5 for my python install? 

I also see that that last commit was 6 days ago to change travis to python 3.5. I am assuming the project is still active? 

2. Is there a forum or IRC channel # that I can talk to about pyspider rather than just the google groups? 

I look forward running pyspider 

- nixfreak 

En Ware

Dec 5, 2018, 10:17:16 AM12/5/18
to pyspider-users
Ok so I answered my own question , pypy3.5.3 doesn't work with pyspider at least not when your using the webui. I install 3.6.3 instead in a brand new virtualenv and using the same tutorial it loaded right up. 
awesome ! 

En Ware

Dec 5, 2018, 12:07:27 PM12/5/18
to pyspider-users
Posting some code , just trying to scrape IMDB

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2018-12-05 09:39:49
# Project: imdb_tutorial

from pyspider.libs.base_handler import *
import re

class Handler(BaseHandler):
    crawl_config = {

    @every(minutes=24 * 60)
    def on_start(self):

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            if re.match("\d+/$", each.attr.href):
                self.crawl(each.attr.href, callback=self.detail_page)
        self.crawl(response.doc('.next-page').attr.href, callback=self.index_page)
        self.crawl(response.doc('.prev-page').attr.href, callback=self.index_page)
    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('.lister-item-header a').text(),
            "date" : response.doc('.text-muted').text(),

So right now I am able to see the next-page and pre-page and index page but not able to extract that information 
I'm using response.doc(.list-item-header a').text() 

Can someone tell me what I am doing wrong ? 

On Wednesday, December 5, 2018 at 9:03:09 AM UTC-6, En Ware wrote:

Roy Binux

Dec 5, 2018, 1:05:49 PM12/5/18
to En Ware, pyspider-users
You are using a selector of index page to extract a detail page.

You received this message because you are subscribed to the Google Groups "pyspider-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
To view this discussion on the web visit
For more options, visit
Reply all
Reply to author
Message has been deleted
Message has been deleted
0 new messages